An introduction to SIGMA the new ‘CPU friendly’ algorithm for Gulden
We will soon be entering public testing of our new ‘CPU friendly’ PoW algorithm. This article will give a high level overview of how it works; however before doing so first a discussion about how we arrived here and why we think a ‘CPU friendly’ algorithm is the way forward for Gulden is necessary.
Generally when the topic of ‘ASIC resistance’ or ‘CPU friendly’ PoW algorithms comes up people tend to immediately be separated into two camps.
- Those whose eyes begin to roll back in their heads as they explain that any algorithm that can execute on a CPU can also have an ASIC constructed for it. That even if this were not the case ASIC resistance is undesirable because ASICs are the only true way to gain resistance to attack via botnet and so on…
- Those who swear that ASICs are the devil, that they lead to centralisation, push the common man out of crypto and should be fought at all costs. Quite often the people in the second group are oblivious too or choose to ignore the many failings or previous ‘ASIC resistant’ algorithms that have since had ASICs constructed for them.
In the past I would have fallen into group 1 as a less extreme member, probably a lot less sarcastic as many other people in this group, but in it nonetheless — after all, the facts on the surface do seem to be on this side of the argument. And indeed in the past I have advised against such algorithms for Gulden.
Therefore it is reasonable that I should offer an explanation for why we now pursue an ‘ASIC resistant’ algorithm.
The reality of course is that, as usual, both extremes are always wrong to some degree and reality is a lot more complex and usually lies somewhere in the middle. The arguments around this are complex and I will not address absolutely everything, for the sake of brevity, but will try to lay out my thoughts on some parts of this argument. Understand that just because I don’t address some specific point here does not mean I am not aware of it nor that I don’t have opinions on it…
There are several things that I shall address, however I will get the largest of them out of the way first as it impacts most the others.
Witnessing — With the introduction of our witnessing system the rules for Gulden have forever changed in various ways, things that previously were ‘gospel’, ‘fact’ or ‘common sense’ for regular PoW are not the case with witnessing in place and therefore absolutely everything should be reevaluated under the new rule set throwing old assumptions aside.
I in fact remain as convinced as ever that a ‘CPU friendly’ algorithm for a PoW only coin would be a mistake, and so it is not really my view that has changed but rather the question itself that has changed from “Does a CPU friendly algorithm make sense for a PoW coin?” to “Does a CPU friendly algorithm make sense for a PoW² coin?”. It turns out that the answer to these questions is different.
CPU friendly mining helps attract new users — It is undeniable that several of the larger alternative cryptocurrencies are ones that have started with “ASIC resistant” mining; while these algorithms may have eventually failed it seems likely that the period during which they worked has allowed for more people to become interested in the currency.
Over the years at Gulden we have had to watch as streams of new people have appeared to ask about how they can mine, only to leave in disappointment when they find out that specialist hardware is required. If only a fraction of these people can instead be attracted to us than this is certainly appealing.
ASIC mining has over the years attracted very few users to our community, while creating a lot of work for the developers. A lot of it undesirable work for instance helping track down and attempt to support various mining pools to upgrade at every major update, mining pools that have at times been toxic toward both the coin and the developers.
This is of course not a technical reason — so should not override technical concerns but the importance of this should not be understated regardless.
An ASIC can be made for any algorithm — As a basic surface statement this is indeed true, however the devil is in the details. The real and closely related questions are “Can an ASIC be made that is massively more energy efficient?” and “Can an ASIC be made that is profitable enough that the extra expense to produce one is worthwhile?”
The answer to these is a lot less clear cut, and over the last few years a lot more research has gone into this. Blockchains are not the only area of computer science where this question is important — its also very important for password hashing, where the PHC was held to try make better algorithms for this.
Based on these algorithms and continued development it appears reasonable to believe that though ASICs may always have an advantage, the advantages can be drastically diminished, potentially to the point that the production costs of small batches of ASICs will simply not be competitive cost wise against CPUs that continue to be made in far larger batches.
Particularly for a low ‘market cap’ coin like Gulden it seems a reasonable assumption that with some clever choices we can hold off ASICs for the foreseeable future. Should we reach a point where ASICs become worthwhile to manufacture we can always at that point adapt further new developments or switch to an ASIC friendly algorithm.
ASIC resistance is undesirable because ASICs provide security that CPUs cannot
This is a complex situation, the main claims are usually:
- Having much more efficient ASICs renders a blockchain immune to botnets/government/well funded corporations; while a CPU only coin would be vulnerable to them.
- The up front costs of purchasing ASICs act as an additional deterrent to attackers.
There is some truth behind these sort of claims, a CPU only PoW coin is indeed quite vulnerable to botnets, a huge amount of expensive ASICs does indeed represent an additional expense for a would be attacker. It is not inconceivable that an advanced botnet could hijack ASICs as well but it is certainly a lot less likely/realistic than a CPU based botnet.
In the case of Gulden the importance of PoW is diminished with witnesses providing a huge part of the network security, this greatly diminishes the threat a botnet could pose. An attacker with a botnet would also require a vast amount of coins locked in witnessing to stage an attack. It is true that a botnet could still come to dominate mining for a time but while not ideal this would not diminish security and may even enhance it.
Further certain types of ASIC resistance (large memory usage for instance) can also act as a form of botnet resistance, as most devices forming botnets tend to not have enough memory capability to participate in such networks.
For a very high value blockchain there is a strong/reasonable argument that instead of ‘ASIC resistance’ it is better instead to have a very simple algorithm for which ASICs are as cheap to manufacture as possible. It is not worth going too much further in depth into this line of argument as Gulden is not currently anywhere near a ‘market cap’ where this is a reality — for lower value coins things are quite clearly flipped around.
Actual security comes from energy expenditure and not the type of hardware
Ultimately relying on the up front expense of hardware, which the ‘ASICs provide security’ argument relies on; to secure a blockchain can be seen as a type of ‘security by obscurity’ which is not really a good idea.
The real security measure of a PoW blockchain is how many watts of energy are protecting it; and this will ultimately end up being roughly the same regardless of the hardware type that is expending the energy. A balance should/will always roughly be obtained where watts expended matches energy prices such that mining is still marginally profitable…
There is a reasonable argument and one I strongly believe in that a CPU only coin can, in certain theoretical conditions, if the right balance is obtained actually attract more watts to protect it than an ASIC one could, the reason for this is ‘edge cases’.
With the rise of renewable energy there are people who have solar PV on their rooftop which at times produce excess energy that is not utilised, and computers in their house that are at times idle. I myself am in this exact situation. For these people the costs of purchasing an ASIC to expend this energy is not worthwhile, but if they can have their existing computer mine with that energy when it would otherwise be idle, then it is essentially free.
There are also people with remote servers hosted elsewhere that they pay the same price for monthly whether the servers are mostly idle or at full usage .
For others there is an attraction to mining even if they are doing it at a loss in present value, as they see value in the future or just find it fun to mine.
For all of these groups of people as well as various other groups the costs of an ASIC are prohibitive but mining with a CPU entirely feasible, and so it is entirely possible that a CPU only coin is capable of achieving more energy protection than would otherwise be possible, provided the right balance can be found such that it taps into these groups.
ASIC resistance is as much about intent as it is about the algorithm
The argument here is that if the developers of a blockchain announce the intent to keep a coin ‘ASIC resistant’ than this alone acts as a deterrent of sorts to manufacturers. The reason for this is that ASIC development is an expensive investment and takes a long time to make that money back; if there is the possibility that after spending vast resources developing ASICs that the developers of a coin may just change the algorithm, then this is obviously a huge risk factor for ASIC developers to consider.
The problem with this for most blockchains is that to change the algorithm requires a fork, and a fork is controlled by miners, and if the majority of miners are ASIC developers then … well you get the picture there are likely to be problems. This was definitely a large factor keeping Gulden on scrypt in the past.
Witnessing again changes things here. Any change in consensus requires a majority of both witnesses and miners to agree in order to become the dominant chain however the witnesses are the far more important part of this equation. Gulden ASIC miners do not hold the power to block updates as they do for pure PoW based blockchains and so as the Gulden developers we have the ability to not only express our future intentions here but also the ability to follow through on them without hassle as long as the changes are agreeable with the community.
And this is of course exactly what we intend to do — SIGMA is much more than just an algorithm change but an expression of intent that we can and do intend to make whatever changes necessary to keep CPU miners competitive for the foreseeable future.
Development of a new algorithm
In the search of a new algorithm I have read through numerous literature. I have read through the entire archives of the the PHC (Password Hashing Competition), I have read through forum/github/mailing lists of various other coins that have developed or are developing interesting new techniques and read the problems they have encountered, the pros/cons of their approaches and so forth.
One of the key aspects that emerges is that any successful algorithm must have a large memory footprint as well as large memory bandwidth utilisation.
Algorithms like Argon2 are great at this, unfortunately the huge problem is that to have Argon2 use e.g. 4gb of memory for miners means that all verifying nodes would also need to use 4gb of memory and this is not really feasible. So the naive approach of simply using Argon2 with large memory requirements is not feasible.
Ideally an asymmetrical algorithm is required instead (lots of memory to mine but small amount of memory to verify) — Merkle Tree Proofs (also based on Argon2) is an interesting example of this however we view the large proof size that must accompany each header under this scheme to be unacceptable.
Another common thread that emerged is that while the problem we face for blockchain PoW shares some similarities to password hashing (PHC) there are also differences. The PHC was very concerned with side channel attacks for instance, which is a non-issue for our specific use case. Using specific instructions like AES-NI is problematic for password hashing as they don’t want to limit hardware compatibility, while with Gulden we have a bit more control over that. So while we should learn from PHC we should not take the results of it as the best that we can achieve.
I have discussed various ideas surrounding ASIC resistance with other programmers, and finally I have also discussed (briefly) with Bill Cox (a very active member of the PHC) an idea that he offered to me unexpectedly when I was asking him a question about a different algorithm of his own.
The final algorithm that I have come up is entirely mine in terms of design and implementation; but it would be wrong to claim it is entirely mine in concept. It lends heavily from the idea presented by Bill and also from various other ideas picked up based on other peoples work.
This is probably a good thing — I’m far more confident in an idea that others who have also put a lot of thought into have come up with and with which I agree after applying a lot of thought of my own. This is far preferable to an idea of my own with no independent thought from elsewhere.
A golden rule of programming is ‘Never roll your own cryptography’ and this extends as well to hash functions. Cryptography and hash functions should be developed by experts in this field.
While the algorithm itself is unique I have taken great care to properly select existing algorithms that fit the above description for all aspects of it, and merely combine them in a unique way. With great consideration for using the algorithms exactly as they are specified.
In doing so I feel I have stuck to this golden rule and stuck within my strengths (applying existing cryptographic algorithms) and avoided an area out of my expertise (creating new cryptographic algorithms) — as such I am confident that this algorithm works as intended.
SIGMA (Semi Iterated Global Memory Argon)
The high level functioning of SIGMA is as follows from a mining perspective:
- A miner assembles a valid block header.
- The full 32 bit nonce of the header is set to zero.
- Using repeated expensive hashing of the header, a large quantity of memory is filled up in smaller chunks based on the largest memory size we are willing to tolerate for an individual hash.
- The nonce is now treated as two separate halves that together total 32 bits, the ‘pre-nonce’ and the ’post-nonce’.
- The miner alters the pre-nonce, and performs an expensive hash on the modified header.
- The expensive hash from 5 is taken as the seed for a PRNG (Pseudo Random Number Generator) from which the miner sequentially generates 2^post-nonce-size sets of N random numbers. (65536 if post-nonce is 16 bits)
- As each set of N numbers is generated the miner sets the post-nonce to the current index and uses the set values as an index to a chunk of the global memory (which we treat as the same number of chunks as sets for this purpose).
- For each N in the set a chunk of global memory is hashed (along with the expensive hash and the modified header) using one of a selection of fast hash algorithms (also determined by the PRNG) and compared against the difficulty target.
- If a set of N values is found for which all the resulting fast hashes are lower than the difficulty target then a valid block has been found.
- The miner repeats 5–8 until a valid block is found
- If the miner exhausts the pre-nonce without success he updates the block data and starts again from step 1.
The upfront costs to generate the global memory array are expensive, the costs of each pre-nonce hash is expensive. The multiple post-nonce hashes that can be generated per pre-nonce hash are very cheap and fast, but to be able to generate them the miner must have all of the memory from the first step.
For N=2 and a post-nonce that is 16 bits a miner who attempts to ‘cheat’ and mine with half the memory, would be able to perform only 16384 out of 65536 cheap hashes for every expensive hash; he would have to perform 4x the amount of expensive hashing to mine at the same speed. For N=4 only 4096 out of 65536.
A miner who attempted to skip the global generation entirely would have to constantly perform expensive hashes for every fast hash attempt which would drastically slow mining down.
It is therefore impractical to design a mining ASIC that uses less memory or to mine using less memory.
The iterative PRNG step with random selection of fast hash algorithm complicates attempts to make the fast hashing parallel and introduces depth and branching that will make GPU implementations less practical and ASIC implementations more expensive.
From a verification perspective the client needs to do as follows:
- The client receives a new header with nonce values already set.
- The client generates a single expensive hash for the header with pre-nonce.
- The client generates the PRNG set of N indices for the header with the post-nonce.
- The client performs another single expensive hash to generate the single required chunk of global memory.
- The client performs a single fast hash for the global chunk of memory.
- The client validates that the fast hash meets the difficulty target, if not the header can be discarded as invalid without any further work.
- Otherwise the client then re-uses the memory from (4) and performs another expensive hash and another fast hash to validate the second fast hash, and again checks it for validity.
- This continues for all N until the entire set is validated
- The memory requirement of validation is constant, if the expensive hash is 16mb then validation can be done with roughly 16mb.
- Validation is reasonably fast and allows to prevent DoS attacks, with invalid headers detectable with only 2 slow and 1 fast hashing operation.
- Light clients on low power hardware can further reduce resource usage by opting to selectively/randomly validate only some of the hashes in the set and still be reasonably sure that they are not being sent down a path of false headers, effectively using less resources to validate than a malicious attacker would have to use to generate the string of false headers.
Gulden specific implementation details:
- Initially we aim to require a 4gb global memory buffer — it is likely we will increase this to 8gb in the forseeable future.
- For the expensive hashing we will use a modified Argon2d (argon2d_echo) with 16mb for the memory parameter; the reason for the deviation from plain argon is to prevent the re-use of any general purpose Argon ASICs that are manufactured.
- The modification is that Echo is used in place of Blake for the initial and final phases in argon (Argon2 is designed to accommodate changes like this) while the bulk of the algorithm remains unchanged. The second change is to use it as a memory filler (to fill large amounts of memory) instead of only as a hash function (small memory output) — again this is completely within the design of Argon2 and not an invasive change.
Echo is chosen for its heavy usage of AES (which most common CPUs have hardware acceleration for).
- The PRNG is again AES based and designed to not allow for jumps. This is important to force post-nonce calculations to be (partially) sequential instead of paralleled complicating things for GPU implementations.
- We have selected N=2 for now for the fast hashes; that is two fast hashes to be performed for every attempt. It is entirely possible that we will increase this in future.
- We have also selected two fast hash algorithms, from which the random hash selection for the fast hashes can pick. Echo and Shavite both chosen for their similar performance and heavy usage of AES (which most common CPUs have hardware acceleration for).
It is quite possible if not likely that we may add additional algorithms here in future.
The choice of Argon2d instead of Argon2i or Argon2id is deliberate; in this use case side channel resistance is not at all important or desirable and therefore Argon2d is the best choice.
Argon2d is carefully designed to reduce the advantages GPUs and ASICs may have over CPUs; as the winner of the PHC it is well suited and evaluated for this task.
The heavy reliance on AES is deliberate as modern CPUs have very well developed hardware acceleration for this, that have had a lot of development put into them by manufacturers. There has been extensive work and multiple studies comparing AES-NI instruction performance vs GPUs and ASICs for pure AES computation; attempting to speed AES up on GPUs etc. and the results have been very favourable for CPUs — with the AES instructions mixed with other branching and instructions we expect an extra advantage for CPUs as a result.
Older CPUs; low end CPUs from before 2012 or high end CPUs from before 2009 that do not have AES specific instructions can still benefit from other hardware accelerated instructions, and are still expected to perform reasonably, albeit not as well as CPUs who do have the instructions. Meeting the memory requirements remains the dominant factor in terms of performance.
The lower (4gb memory size) is chosen for now to be as inclusive as possible for initial launch, so that more miners can be available — as the quantity of miners on the network grows we will increase this memory size requirement over time to increase GPU/ASIC resistance.
There are various other aspects that can be tweaked in various ways to change GPU/ASIC behaviour (and completely break any ASICs that rely on specific behaviours) and over time we may consider these in more detail once we see the algorithm perform in the wild.
For more detailed specifics refer to the source code which will soon be made public.