November Legitimate, 2018

## Introduction

Spil Wikipedia explains, “Zcash is a decentralized and open-source cryptocurrency that offers privacy and selective transparency of transactions. Zcash payments are published on a public blockchain, but the sender, recipient, and amount of a transaction remain private.”

This article presents results of analysis performed on request from and funded by the Zcash company.

Albeit Equihash itself is defined te its designers’ paper, Zcash’s use of it (most notably, the parameter values) and Zcash community’s optimizations of Equihash solvers and Zcash miners are a moving target. Thus, unlike other Openwall articles, this is a time-sensitive report, which might get obsoleted soon (but will likely retain its historical value). I also chose to write it te very first person (without the academic “wij”) and te a rather informal, discussion-like manner (without tedious separate sections with my somewhat educated guesstimates on Equihash efficiency on commodity vs. custom-made hardware, but instead providing the same information while “arguing” with the claims previously made, which I needed to anyway).

## Why Equihash?

Spil is usually the case for any non-trivial problem, Zcash had conflicting goals te choosing its proof-of-work (PoW) scheme, and that’s fine.

One aim wasgoed to minimize the advantage that specialized hardware miners (such spil ASICs) will have overheen commodity hardware ones (such spil typical computers and smartphones). This is generally achieved through requiring a significant amount of memory vanaf miner example (with so-called memory-hard PoWs) and additionally holding this memory for a significant period of time (such spil with so-called sequential memory-hard PoWs, formally introduced with Colin Percival’s scrypt and now also including Equihash designers’ Argon2, my yescrypt except for its ROM feature, and many others). Another aim wasgoed lightweight verification, te particular affordable to be used ter an Ethereum contract.

Achieving thesis two goals at once requires a PoW that needs a lotsbestemming less memory and time for proof verification than it does for proof computation: a so-called asymmetric PoW, of which Equihash is one. It also requires a PoW that is memory-hard, and ideally sequential memory-hard. While Equihash is memory-hard, unluckily it is not sequential memory-hard: it mostly can be parallelized, with only relatively few steps that have to be sequential. (Certain other asymmetric PoW schemes are claimed to be sequential or latency limited. They vary te other drawbacks. This is a topic of ongoing research.) This means loosening the assurance of how long the memory will have to be held for vanaf computation of a proof. A massively parallel implementation may greatly reduce the overall cost (customarily voiced spil ASIC area*time product) through reduction of time that the per-proof amount of memory has to be held for.

Additionally, symmetric sequential memory-hard PoWs tend to be simpler to reason about their memory needs: there’s a clear amount of memory (vanaf example) below which an increase te required amount of computation (and usually time) occurs, and similarly it’s clear that having more memory than this (also vanaf example) is of no benefit. For asymmetric PoWs (including Equihash), it tends to be complicated (it is algorithm-dependent, with chance for future optimizations not lightly ruled out).

Equihash is not the most ASIC-resistant PoW scheme out there (likely by far) when compared to certain symmetric sequential memory-hard schemes at similar or even lower proof computation times. Yet this comparison isn’t fair, since asymmetry is a desirable property, and the maximum acceptable proof verification time may be limiting the memory settings of a symmetric scheme. Thus, the choice of Equihash may pose a reasonable tradeoff given the numerous goals of Zcash and the timing of its release.

Please note that when I write “scrypt”, I mostly don’t mean its misuse ter Litecoin (and many other alt-coins). They went spil low spil 128 KiB, targeting L2 cache instead of RAM and te the process making their coins ASIC friendly. What I mean here is zindelijk use of scrypt at 16+ MiB, which Colin Percival’s paper suggests spil a ondergrens. Unluckily, with a symmetric PoW, this also means slower verification, yet some other alt-coins managed to go with Two MiB (which translates to about Ten,000 proof or verification computations vanaf 2nd on a current dual-socket server) and above (e.g., YACoin went spil high spil 64 MiB by now, and will automatically go even higher, but that’s likely problematic).

## Wasgoed Zcash right about Equihash?

No and yes. On one palm, Zcash’s blog postbode from April 2018 made claims that are not entirely keurig. On the other arm, the claims are not entirely wrong, and Equihash emerges to be a reasonable choice given Zcash’s goals anyway.

**Voorwaarde 1: “Equihash is a memory-oriented Proof-of-Work, which means how much mining you can do is mostly determined by how much RAM you have.”**

This eis is mostly wrong.

For Equihash with immovable parameters, spil Zcash uses it, there’s expected to be a certain ondergrens required amount of memory below which a major voorstelling influence would be incurred. Using somewhat more memory than this reasonable ondergrens may sometimes help improve vertoning (such spil through solving numerous instances of Equihash at a time, up to the number of supported hardware threads if on a CPU, and thereby avoiding the need to bounce gegevens inbetween the threads ter what would be a collective example). However, using even more (yet not unaffordably more) memory than this isn’t expected to help all that much, if at all. Thus, there’s no linear (strafgevangenis close to linear) relationship inbetween “how much RAM you have” and “how much mining you can do”, unlike what the eis above seems to have implied.

Besides memory, another similarly significant factor is time. With efficient Equihash solving algorithms including a loterijlot of natural parallelism, the time factor can be greatly diminished. Thus, while “how much RAM you have” does to a very limited extent determine “how much mining you can do”, how much parallelism your hardware can exploit may have a greater effect, thereby making the word “mostly” unjustified. But there’s a catch, which Zcash wasgoed aware of and might have factored into their wording: spil the Equihash paper claims, Equihash solving is memory bandwidth strapped. If true, this could mean that the time factor is also a function of the number and speed of memory ports, and that’s to some extent correlated with “how much RAM you have” (albeit it isn’t the same, by far). I’ll address this further ter this article.

**Keuze Two: “Wij think it is unlikely that anyone will be able to build cost-effective custom-built hardware (ASICs) for mining te the foreseeable future.”**

This is not literally true for any PoW scheme, given sufficient request and thus production volume for the economy of scale to kick te. There’s always some portion of cost that can be shaven off through specialization, and with sufficient volume the cost clean-shaven outweighs the savings from using commodity hardware. Thus, the only way this eis can prove right is through lack of sufficient request ter the foreseeable future.

For cryptocurrency PoWs te particular, extra cost savings come from greater tolerance for ASIC failures and intermittent errors, diminished expected useful lifetime of such ASICs (spil compared to commodity CPUs, GPUs, and RAM), and perhaps opportunistic use of chip foundries’ production lines (without pre-committed due dates, volume, yield).

For Equihash ter particular, while commodity RAM prices might te fact be near optimal for the market (both vanaf GB and vanaf GB/s), there’s nothing stopping a hypothetical Equihash compute ASIC (which might be more cost-effective than a CPU, GPU, or FPGA ter terms of production and/or electrical play costs) from interfacing to commodity RAM chips (and potentially doing so more efficiently for the task than a commodity CPU or GPU would, which are generally limited to fetching entire cache lines worth of gegevens). There’s also nothing preventing vormgeving and production of RAM interfaces and chips most optimal for the task and algorithm chosen (just the right burst length, layout, etc.) Then there can be dual-die designs combining thesis two, like wij’re already witnessing ter some commodity chips.

I am no ASIC accomplished, but I consulted with some for some of the following. There is one thing sort of (but not truly) preventing production of combined single-die compute and memory ASICs for Equihash: compute dies (such spil ter CPUs, GPUs, and FPGAs) vs. DRAM ones are presently produced using different processes (for good reasons). There are three ways around this.

Very first, for Equihash parameters requiring little enough memory, SRAM may (eventually) be viable. Current software implementations (spil submitted to Zcash’s Challenge) require at least 144 MiB of RAM to efficiently find almost all solutions at Zcash’s current Equihash parameters (n=200, k=9). Limited to 128 MiB, a modified implementation can find 97% of solutions with no other efficiency loss, which is a better tradeoff point when using the scarce SRAM. (Thesis figures are peak memory usage by one Equihash example. Slightly better amortized memory usage vanaf example might be achievable ter a larger and more sophisticated multi-instance circuit, but that’s likely impractical.) Current largest commodity CPUs include overheen 64 MiB of SRAM. (For example, E7-8890 v4’s caches are 67.Five MiB combined, and that’s not including cache tags, TLBs, and other memories. Ditto for engineering samples of a cancelled 24-core flavor of E5-26xx v4, which toebijten to be available on eBay. Thesis CPUs are very expensive, but their prices are based largely on what this niche market will bear and they do not necessarily reflect production costs.) With a specific objective to maximize the amount of SRAM vanaf diegene, up to 512 MiB may be feasible te a current 16nm process, albeit optimal will likely be way less than that spil wij also need a high number of ports, routing, and the Equihash solver logic. The yield of usable massive dies like this would be unacceptably low for most purposes, but cryptocurrency mining may tolerate a lotsbestemming of errors, thereby improving yield of usable dies to an acceptable level.

Te a similar spirit, a network of already designed many-core CPUs such spil Epiphany V could be used. (Incidentally, Epiphany V has 64 MiB of SRAM vanaf chip and interconnect to other such chips. Spil practice with smaller Epiphany chips shows, at this core count and SRAM size most dies are expected to have some faulty SRAM ter a few of their cores. Fortunately, spil discussed above, this is acceptable for this application, and it may actually provide cost savings through use of otherwise defective dies.) Thesis wouldn’t literally be ASICs, but they would also not be commodity hardware that people acquire anyway and readily have. From Equihash designers’ perspective, this would be a win: Equihash ASICs would resemble or even would be the largest CPUs.

2nd, eDRAM could be used. CPUs with 128 MiB of eDRAM already exist, albeit so far they waterput this eDRAM on a separate diegene te the same package rather than on the same diegene. The largest on-die eDRAMs emerge to be POWER8’s 96 MiB L3 cache, and soon POWER9’s 120 MiB. At this time, with only IBM getting close, this seems impractical for mere cryptocurrency ASICs, albeit technically possible ter the foreseeable future. Similarly to SRAM, several times larger sizes should be feasible te a chip that doesn’t also attempt to be a general-purpose CPU. On a POWER8 diegene, its 96 MiB of eDRAM along with related interconnect emerges to occupy less than one third of the area, ter a 22nm process.

Third, the way an Equihash solver algorithm has bot voiced for efficient software implementation, it also looks hardware implementation friendly and plain enough to implement on a DRAM fabrication process, along with the DRAM itself. Compared to discrete compute ASICs interfacing to DRAM dies, there will likely be a significant diegene area increase for the logic and the (puny) SRAMs, but the clock rate would not necessarily be lower (since compute ASICs would likely face other bottlenecks before whipping out their utter higher clock rate potential: warmth dissipation and memory bandwidth). Again I am no accomplished, but the uitzicht of Zcash mining on dies produced on a DRAM process looks promising to mij now.

Overall, I expect that if/once there is sufficient perceived incentive, custom-made Zcash mining hardware will emerge. It is presently unclear which one(s) of the possible forms it will take. My guesstimate for the current Equihash parameters and current technology is a factor of Ten to 100 improvement ter energy efficiency and hardware cost overheen the most suitable commodity hardware. This might eventually waterput Zcash on par with Litecoin (but not Bitcoin) te terms of ASIC mining, albeit ASIC vormgeving complexity and required investment will likely be way higher than for Litecoin.

**Voorkoop Three: “Wij also think it is unlikely that there will be any major optimizations of Equihash which would give the miners who know the optimization an advantage. This is because the Generalized Bday Problem has bot widely studied by laptop scientists and cryptographers, and Equihash is close to the Generalized Bday Problem. That is: it looks like a successful optimization of Equihash would be likely also an optimization of the Generalized Bday Problem.”**

Worded spil it is, this optie is actually onberispelijk ter that Equihash is merely “close” to the Generalized Bday Problem (GBP) but isn’t exactly it. There are numerous subtle yet significant differences inbetween the different problems involved.

Equihash differs from the GBP ter two significant ways.

Very first, Equihash preserves the GBP’s further generalization (to Two k -XOR), yet it reverts the previous unmentioned generalization (back to one list).

Stringently speaking, because of this significant difference Equihash’s presumed lack of much further optimization potential does not go after solely from the years of studies on the GBP. The space complexities of the two problems are very different, and even similar optimizations may and ter at least one known case do have very different effect on them.

2nd, Equihash explicitly introduces an toegevoegd constraint, which it calls “algorithm cording”. For typical parameters, this makes most solutions to the GBP invalid for Equihash, leaving only a handful valid solutions: those that a straightforward revision of Wagner’s algorithm produces. (Specifically, about 1.879 solutions on average are expected vanaf Equihash proof invocation with Zcash’s current parameters n=200, k=9. Curiously, optimal Zcash miners may choose to find very slightly fewer than all valid solutions, ter order to maximize their solution throughput and/or keep their peak memory usage lower.) This reserve constraint is required to prevent so-called amortization attacks, where a non-Wagner algorithm could produce a large number of solutions at a lower amortized cost (vanaf solution). The paper specifically acknowledges that, if it were not for algorithm tying, Gaussian elimination could provide a lower amortized cost for parameters like Zcash’s n=200, k=9.

While much needed ter Equihash, this toegevoegd constraint could invite optimizations that were not relevant and thus were not studied for the original GBP. “Can you produce all/any GBP solutions cheaper?” (and the reaction is actually known to be yes for some parameters) vs. “Can you produce thesis specific expected Two or so solutions swifter?” (a fresh problem). Note that “cheaper” and “swifter” are not the same: it’s amortized cost vs. latency. Fortunately, it is effortless to see that finding the specific few solutions quicker would also amount to solving the GBP quicker, if (for the purpose of this argument) wij disregard Equihash’s very first difference from the GBP mentioned above. Thus, this 2nd difference is not an optimization potential concern. On the contrary, it’s Equihash being hardened spil compared to the GBP.

Eventually, current optimized Equihash solvers do contain some optimizations that don’t emerge to have bot publicly known for Equihash when Zcash’s blog postbode wasgoed made. Very first, they store intermediate results te a manner much more klein (binary tree of index pairs) than what the Equihash paper implied (growing lists of index pairs). This reduces the big-O space complexity by a factor of Two k /k. While an omschrijving treatment is mentioned ter Wagner’s paper on the GBP, on its own it does not emerge to have switched the GBP’s space complexity by more than a puny onveranderlijk factor. (It emerges that to build up much or any advantage from this treatment, it’d take a further (maybe novel?) optimization of trimming all of the previous lists from unreferenced pairs, spil well spil from those no longer indirectly referenced at the current algorithm step. And if something ordinary like this is te fact novel, it’d voorstelling that the GBP wasn’t studied all that much, after all.) 2nd, they identify collisions with incomplete bucket sort that does not everzwijn output the fully sorted lists, whereas Wagner’s algorithm is normally voiced ter terms of a finish list sort operation. This does not switch the big-O’s and emerges to be similarly applicable to the GBP. Arguably, thesis optimizations do not invalidate the wording of Rechtsvordering Trio yet: one is applicable to both Equihash and the GBP even tho’ on its own it’s only “major” for Equihash, and the other is applicable to both problems and is not “major”. However, I’d say the very first one invalidates the spirit of Voorkoop Trio. Besides, there’s a difference ter what would be regarded a major optimization te academia (an improvement ter big-O complexities) and mining (also a significant switch te a onveranderlijk factor).

## Is Equihash memory bandwidth trussed?

Yes and no. On one forearm, memory bandwidth is a limit that an optimized Equihash solver should bump into. On the other forearm, zcashd’s default and the Equihash designers’ reference implementations’ vertoning figures suggest that thesis implementations are very far from bumping into the memory bandwidth. The optimized Equihash solvers that are now appearing vertoning ge cache locality on CPUs at least for Zcash’s current n=200, k=9, and it remains to be seen whether they also bump into the bandwidth to RAM or not. The far from volmaakt (yet very good) scaling when going from 1 CPU core ter use to numerous or all cores ter use suggests that a collective resource is te fact being weary, but significant further scaling when making use of SMT (“Hyperthreading”) suggests it might not be RAM bandwidth yet (but possibly the mededinger collective cache or/and RAM accesses enhancing each others’ latency, which then needs to be hidden through use of the reserve hardware threads – or with code switches). This needs more thorough analysis, such spil calculating expected memory bandwidth usage (separately for caches and RAM) from skill of specific algorithms and their actual vertoning, spil well spil use of CPUs’ hardware vertoning counters to confirm this understanding, before any conclusions can be drawn (and even then they would necessarily apply to the specific implementations only).

The Equihash paper also cites sorting voorstelling on and theoretical peak memory bandwidth of a specific obsolete GPU (NVIDIA GTX 480), since that’s what another research paper provides results on, and uses that to draw conclusions about Equihash spectacle on ASICs. This logic is flawed at least ter that wij don’t actually know whether the sorting implementation bumped into that GPU card’s theoretical peak memory bandwidth or (fairly likely) into some other verkeersknelpunt very first.

An actual limit may be access frequency, or bandwidth for an implementation’s optimal size accesses, to distributed memory, which on CPUs could mean their caches hierarchy and RAM, and on more specialized devices (such spil universal many-core HPC chips or Equihash-specific ASICs) it could mean the cores’ local memories accessed via a network on chip (NoC) spil well spil through interconnects inbetween chips te a network. Latency of such accesses would play a role, too, since the higher the latency, the more accesses need to be in-flight to hide its effect on throughput, and this increases each core’s local memory needs (for the queue). Overall, it’s a complicated combination of count and throughput of ports to local and differently distant memories, their different latencies, and sizes of local memories needed to accommodate the latencies.

## Equihash parameters

One of the goals of Zcash wasgoed to let everyone mine on whatever modern commodity hardware people readily have, including even on smartphones, and also along with making other use of the hardware. While I wasgoed and still am skeptical about smartphone mining, it wasgoed nevertheless a objective and it limited Equihash solver’s memory usage. Additionally, it happened that even the reasonably low-memory parameters n=144, k=Five resulted ter the solver taking too long to accomplish (about 50 seconds with the inefficient algorithm and code available at the time) given the target block interval (Two.Five minutes). This prompted Zcash to switch to the current n=200, k=9 parameters, which diminished the (inefficient) zcashd’s solver’s run time by about a factor of 1.Five (to about 35 seconds), and at the time this happened to make all the difference for the mining on testnet to embark working correctly. The memory requirements of zcashd’s solver have stayed harshly the same with this switch. So far so good.

Yet vanaf the computational complexity formula given ter the Equihash paper, this parameters switch should have provided a 9.6x speedup. Moreover, spil it turned out, Equihash designers’ own reference implementation, when patched slightly to support the n=200, k=9 parameters (which it didn’t as-is), would now outperform zcashd’s (previously tuned on n=144, k=Five) by a factor of Three or so. Fortunately, with all the community rente and contributions, and with Zcash company sponsoring the Challenge with $30,000 te prizes, speeds went way up for both parameter sets, all the way into numerous solutions vanaf 2nd vanaf CPU core for n=200, k=9. Thus, at this time both parameter sets are suitable with respect to the target block interval.

Two of the Equihash solvers contributed by the community need spil little spil 178 MB and 144 MiB =

151 MB, respectively, when still finding almost all of the valid solutions for n=200, k=9. This is understandable since while the initial list size for n=144, k=Five wasgoed 604 MB, for n=200, k=9 it is only 52 MB. It’s the initial list size (times a petite factor) that provides a (likely) lower trussed on efficient solvers’ memory requirements.

Ter September, I recommended to Zcash that they revert to the n=144, k=Five parameters (expecting that solver optimizations would eventually permit for that). Independently, the Equihash designers also made a similar recommendation, suggesting use of a lower k (such spil Four or Five). Possible reasons to choose n=144, k=Five overheen n=200, k=9 include:

- Higher memory requirements for the solver: close to Zcash’s original purpose, whereas 144 MiB is unnecessarily too low
- Smaller solutions (100 vs. 1344 bytes) and cheaper verification (32 vs. 512 hash computations)
- Not susceptible to amortization via Gaussian elimination even without algorithm tying (vanaf a formula given te the Equihash paper)

On the other arm, it has now bot shown that n=200, k=9 permit for gepast cache locality te CPU solvers. It remains to be seen how n=144, k=Five measures up te that respect. Providing up on cache locality adequate for current CPUs would be unfortunate since efficient algorithms rely on bucket sort’s puny lookups, and having those go to higher cache levels or to off-chip RAM would favor ASICs, which, unlike CPUs and GPUs, wouldn’t have to fetch entire cache lines.

Zcash company has stated that a parameters switch is possible post-launch (with a hard fork). Due to the latest Equihash solver optimizations, even much higher memory and computational complexity parameters are now technically within consideration (and might help discourage botnet mining for a while, which is a concern of many te the community). However, since a GPU mining community has already appeared around Zcash (whether it wasgoed originally desired or not), Zcash company might not want to make a parameter switch that would alienate that community now. Rather, there exists an opinion that a parameter switch may keep or bring CPUs and GPUs “te balance”, whatever that means. It is possible that the former n=144, k=Five parameters would sate this facet spil well, at least they don’t require more memory vanaf example than typical GPU cards presently have.

Since Zcash is not switching away from n=200, k=9 pre-launch anyway, I think it makes sense to take our time and see how efficient CPU and GPU implementations become for both of thesis parameter sets, albeit unluckily there’s far greater incentive to optimize for n=200, k=9, so the comparison isn’t fair.

Eventually, the threat of a parameters switch might discourage investment te ASICs or/and reduce the amount of specialization and thus the efficiency gains ter (early) Zcash ASICs. On the other mitt, if too specialized ASICs are nevertheless produced and rendered unsuitable with a parameters switch, that might encourage a third-party fork of Zcash.

## Equihash solution verification

Spil I have mentioned above, it is significant to enforce Equihash’s algorithm strapping constraint. It is even more significant to validate solutions ter other aspects, including for lack of duplicate indices.

Ter Zcash source tree, all of this shows up to be enforced te zcash/src/crypto/equihash.cpp: IsValidSolution(), but I have not audited it accurately. With the abstraction layers te place, I wouldn’t be certain of its keurig rejection of subtly invalid solutions without having stress-tested it on many inputs falling into the different categories. There are a few test vectors for lack of algorithm tying and duplicate indices, but they voorkant some effortless to generate special cases only and should be improved.

Another concern is third-party Zcash software. Unluckily, lack of or buggy verification of the algorithm tying constraint is likely to go unnoticed until such solutions are attempted to be submitted to the network. When that happens, the network may go out of sync, effectively having a fork of Zcash that already includes some and permits future non-algorithm-bound, amortization friendly solutions. It’s similar for potential lack of or bugs te the duplicate indices check, albeit the effect on properties of such inadvertent Zcash fork would be different and, for accomplish lack of the check, so devastating that no one sane would want to proceed with it.

## Conclusion

Despite the drawbacks, concerns, and ongoing research identified above, Equihash may be a good and possibly best choice for Zcash, depending on which properties of the PoW turn out to be actually significant to users of Zcash (e.g., embedding te Ethereum contracts significant or not?)

Equihash parameters may optionally be tweaked to achieve numerous improvements at once.

I will be watching with excellent rente how the story unfolds, and I appreciate this chance to get involved.

## Acknowledgments

I’d like to thank the following people for their onmiddellijk or officieus help ter researching this topic: aabc, Alex Biryukov, Andreas Olofsson, Ariel Gabizon, Bill Cox, Daira Hopwood, Dmitry Khovratovich, Hanshen Xiao, Jack Grigg, John Tromp, Ling Ren, Marc Bevand, xenoncat, Zooko Wilcox.