Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single bench covering classic and NNUE eval #2902

Closed

Conversation

vondele
Copy link
Member

@vondele vondele commented Aug 4, 2020

simply switch between evals.

Needs the default net to be available, also for CI, this needs some work for appveyor (PowerShell)

simply switch between evals.

Needs the default net to be available, also for CI, this needs some work for appveyor (PowerShell)
@Alayan-stk-2
Copy link

Alayan-stk-2 commented Aug 4, 2020

For fishtest, bench is used to evaluate the speed at which the processor can run SF. A mixed bench won't be very effective at this, as the speed ratio between the regular eval and the NNUE eval can be quite different depending on the machine. But if running a test using regular eval, we care about the regular eval speed, and if running a NNUE test, about the NNUE eval speed.

The way fishtest computes the TC in general might need more adjustements, as using the same nps target without checking if NNUE mode is used or not would significantly increase the effective TC.

To fully identify a version (using the bench as a signature), having both a HC part and a NNUE part makes sense.

This implementation isn't ideal from another perspective.

Stockfish is a taxing CPU workload that has some popularity, and so many websites reviewing hardware include a Stockfish chess benchmark. This usually means running Stockfish's own bench with custom threads/depth/hash settings so as to make the test run longer and use all the available thread. The raw nps is typically used as the result, though it's not a direct fit for strength due to multi-threading losses.

The version of Stockfish used in such benchmarks doesn't change too often (Phoronix 3990X review had a SF9 and an asmifsh test, while SF11 had recently come out), however it's inevitable that future Stockfish versions will end up being used as well. The introduction of SF-NNUE, which is very taxing on AVX2, could be a further motivation to update the version used in such benchmarks.

As this is data that can end up being used by people (in particular chess enthusiasts) to make purchase decisions, it would be nice to make bench performance representative of actual CPU performance. One possible idea is a NNUE-subtest including AVX and a traditional HC eval-subtest, another is a command line parameter (just like depth, threads...) for the eval to use in bench. Arguably some changes on the position set could be useful too.

More ambitiously, effective strength of N threads vs 1 thread having N the time could be measured up to say 32 threads, a curve could be fit, and when running the bench with multiple threads, an additional info output line would give a "equivalent 1-thread nps" result through some formula. The raw nps would be representative of performance when running many 1th searches in parallel, while the "equivalent 1th nps" would be representative of peak performance. But this last part is unrelated to the NNUE/HC eval usage in bench.

```
./stockfish  bench 16 1 13 default depth NNUE
./stockfish  bench 16 1 13 default depth mixed
./stockfish  bench 16 1 13 default depth classical
```
@vondele
Copy link
Member Author

vondele commented Aug 4, 2020

@Alayan-stk-2 attached patch would leave choice some choice, I think we will need to think a bit on what is best for our purpose. For example, if we make start using NNUE as a default, we would benefit from having the benchmarks based on that.

@vondele
Copy link
Member Author

vondele commented Aug 5, 2020

note: PR is not complete for non-default FEN files.

@mstembera mstembera mentioned this pull request Aug 7, 2020
@vondele
Copy link
Member Author

vondele commented Aug 8, 2020

I'm closing this one in favor of #2931

@vondele vondele closed this Aug 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants