Single bench covering classic and NNUE eval #2902

vondele · 2020-08-04T18:20:35Z

simply switch between evals.

Needs the default net to be available, also for CI, this needs some work for appveyor (PowerShell)

simply switch between evals. Needs the default net to be available, also for CI, this needs some work for appveyor (PowerShell)

Alayan-stk-2 · 2020-08-04T19:00:55Z

For fishtest, bench is used to evaluate the speed at which the processor can run SF. A mixed bench won't be very effective at this, as the speed ratio between the regular eval and the NNUE eval can be quite different depending on the machine. But if running a test using regular eval, we care about the regular eval speed, and if running a NNUE test, about the NNUE eval speed.

The way fishtest computes the TC in general might need more adjustements, as using the same nps target without checking if NNUE mode is used or not would significantly increase the effective TC.

To fully identify a version (using the bench as a signature), having both a HC part and a NNUE part makes sense.

This implementation isn't ideal from another perspective.

Stockfish is a taxing CPU workload that has some popularity, and so many websites reviewing hardware include a Stockfish chess benchmark. This usually means running Stockfish's own bench with custom threads/depth/hash settings so as to make the test run longer and use all the available thread. The raw nps is typically used as the result, though it's not a direct fit for strength due to multi-threading losses.

The version of Stockfish used in such benchmarks doesn't change too often (Phoronix 3990X review had a SF9 and an asmifsh test, while SF11 had recently come out), however it's inevitable that future Stockfish versions will end up being used as well. The introduction of SF-NNUE, which is very taxing on AVX2, could be a further motivation to update the version used in such benchmarks.

As this is data that can end up being used by people (in particular chess enthusiasts) to make purchase decisions, it would be nice to make bench performance representative of actual CPU performance. One possible idea is a NNUE-subtest including AVX and a traditional HC eval-subtest, another is a command line parameter (just like depth, threads...) for the eval to use in bench. Arguably some changes on the position set could be useful too.

More ambitiously, effective strength of N threads vs 1 thread having N the time could be measured up to say 32 threads, a curve could be fit, and when running the bench with multiple threads, an additional info output line would give a "equivalent 1-thread nps" result through some formula. The raw nps would be representative of performance when running many 1th searches in parallel, while the "equivalent 1th nps" would be representative of peak performance. But this last part is unrelated to the NNUE/HC eval usage in bench.

``` ./stockfish bench 16 1 13 default depth NNUE ./stockfish bench 16 1 13 default depth mixed ./stockfish bench 16 1 13 default depth classical ```

vondele · 2020-08-04T19:30:33Z

@Alayan-stk-2 attached patch would leave choice some choice, I think we will need to think a bit on what is best for our purpose. For example, if we make start using NNUE as a default, we would benefit from having the benchmarks based on that.

vondele · 2020-08-05T08:06:30Z

note: PR is not complete for non-default FEN files.

vondele · 2020-08-08T05:32:10Z

I'm closing this one in favor of #2931

Single bench covering classic and NNUE eval

5ca6dc3

simply switch between evals. Needs the default net to be available, also for CI, this needs some work for appveyor (PowerShell)

vondele added NNUE labels Aug 4, 2020

Leave bench eval choice

d156c06

``` ./stockfish bench 16 1 13 default depth NNUE ./stockfish bench 16 1 13 default depth mixed ./stockfish bench 16 1 13 default depth classical ```

vondele mentioned this pull request Aug 5, 2020

[NNUE] PGO & CI integration #2907

Closed

mstembera mentioned this pull request Aug 7, 2020

Enable NNUE PGO build #2918

Closed

vondele closed this Aug 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Single bench covering classic and NNUE eval #2902

Single bench covering classic and NNUE eval #2902

vondele commented Aug 4, 2020

Alayan-stk-2 commented Aug 4, 2020 •

edited

Loading

vondele commented Aug 4, 2020

vondele commented Aug 5, 2020

vondele commented Aug 8, 2020

Single bench covering classic and NNUE eval #2902

Single bench covering classic and NNUE eval #2902

Conversation

vondele commented Aug 4, 2020

Alayan-stk-2 commented Aug 4, 2020 • edited Loading

vondele commented Aug 4, 2020

vondele commented Aug 5, 2020

vondele commented Aug 8, 2020

Alayan-stk-2 commented Aug 4, 2020 •

edited

Loading