-
Notifications
You must be signed in to change notification settings - Fork 548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unfinished work #266
Comments
The implementation of a generic scanhash is complicated with n-way parallel hashing with This results in up to 9 different generic scanhash functions to handle each situation for each SSE2: 4 way 32 bit words (4 cases) Algorithms that perform a midstate prehash are not considered at this time. Support would |
x17, xevan and sonoa algorithms are currently up to date with all mods, including generic |
Allium & Lyra2Z AVX512 & AVX2 are up to date with 2 stage blake256 prehash optimization using linear SIMD for the first |
Many chained algorithms have redundant endian byte swaps that can be eliminated. Blake is often the first hash function in a I the case of blake256 it's fully redundant and both can be eliminated. In the case of blake512 it results in a simple swapping An "LE" version of the blake transform functions is added to implement this optimization as werll as associated changes to |
The blake family of core hash fucntions can be optimized with linear vectoring (one way). Blake256 & blake2s can use SSE2 while blake512 & blake2b can use SSE2 or AVX2. For practical reasons only blake256 and blake2b have been so optimized at this time. Edit: blake2s is included in v3.21.3 EDIT: No, blakes2s won't be included. Testing has shown a negative impact from prehashing blake2s using serial SIMD over parallel hashing. Other algos have not had this problem. blake2s was also slower with centralized prehash, serial and parallel, so that won't be impemented for blake2s either |
Another midstate optimization. Centralize midstate prehash by doing it in stratum thread or when a miner thread returns from getwork and sharing the result with all miner threads. Previously each miner thread would do the prehash for itself. |
Some old algos have been found not to have proper stats reporting when using an old CPU (#392). Some will be fixed in v3.21.3 but there may be more remaining. They will be fixed as discovered if they can be tested. Testing these algos is difficult, pun intended. |
There's a good candidate to add (pufferfish2bmb) https://github.com/De-Crypted/dcrptd-miner/tree/master/Algorithms if the're any plans on adding new algos. I see some new (not really) sha algos in the latest release. |
The use of Nway notation in hash functions is being changed to Nx64 or Nx32 where appropriate. This notation is already used for interleave functions. |
This issue is opened to document architectural changes that require changes to the scanhash
function of each algo. These changes may not have been propagated to all algl algos for various
reasons.
The reason for most of the changes is to streamline the code by reducing instructions.
Sale share reduction is the goal of one change, and the generic scanhash will reduce the
work of propagating other changes
individual custom scanhash functions for specific cases:
32 bit interleaved, or de-interleaved.
The remaining scanhash changes are automatically implemented for algos that can use a
generic scanhash function.
vectored byte swap and interleaving of input data, for various N ways, for 32 and 64 bit data.
byte-swap the nonce only when necessary, when a valid share is found, instead of
byte-swapping every nonce tested.
implement new hash for test including pre-test before de-interleaving N way hash.
submit shares in scanhash loop then continue hashing instead of returning to the main thread
loop to submit shares.
thread id argument added to hash call to enable restart flag checking.
There are also changes to the hash functions of each algo:
use union overlay instead of struct for the context holder for algos that use a lot of contexts,
implement midstate prehash when first function use a block size of 64 bytes or less,
use full versions of chained hash functions instead of the 3 step init, update & close,
write final hash directly to output buffer instead of using an intermediate buffer and memcpy,
implement intermediate stale work detection for low hash rate algos to reduce stale shares.
use rintrlv instead of 2 step dintrlv, intrlv when interleaved data needs to be interleaved in a
different format.
ensure hash function returns a default 1 if thread restart checking is not used.
The text was updated successfully, but these errors were encountered: