Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable New Pass Manager for Clang. #3346

Closed
wants to merge 1 commit into from

Conversation

gcp
Copy link
Contributor

@gcp gcp commented Feb 9, 2021

Clang has a new optimization pass manager activated by
-fexperimental-new-pass-manager. It's been in development
for several years, and by now generally performing better
than the default. It is planned to be the new default in
Clang 13, but even versions before that perform better with it,
and it's stable enough that major software like Firefox has
been using it already:
https://bugzilla.mozilla.org/show_bug.cgi?id=1619461

It's about 1% speedup for Stockfish.

Result of 100 runs

base (...fish_clang12) = 1946851 +/- 3717
test (./stockfish ) = 1967276 +/- 3408
diff = +20425 +/- 2438

speedup = +0.0105
P(speedup > 0) = 1.0000

CPU: 4 x Intel(R) Xeon(R) CPU E3-1240 v3 @ 3.40GHz
Hyperthreading: on

Thanks to David Major for making me aware of this part
of LLVM development.

Clang has a new optimization pass manager activated by
-fexperimental-new-pass-manager. It's been in development
for several years, and by now generally performing better
than the default. It is planned to be the new default in
Clang 13, but even versions before that perform better with it,
and it's stable enough that major software like Firefox has
been using it already:
https://bugzilla.mozilla.org/show_bug.cgi?id=1619461

It's about 1% speedup for Stockfish.

Result of 100 runs
==================
base (...fish_clang12) =    1946851  +/- 3717
test (./stockfish    ) =    1967276  +/- 3408
diff                   =     +20425  +/- 2438

speedup        = +0.0105
P(speedup > 0) =  1.0000

CPU: 4 x Intel(R) Xeon(R) CPU E3-1240 v3 @ 3.40GHz
Hyperthreading: on

Thanks to David Major for making me aware of this part
of LLVM development.
@gcp
Copy link
Contributor Author

gcp commented Feb 9, 2021

With this and the ThinLTO fix clang is close to 2% faster on my hardware than GCC 10:

Result of 100 runs
==================
base (...ckfish_gcc10) =    1908060  +/- 28912
test (...h_clang12npm) =    1941171  +/- 27715
diff                   =     +33111  +/- 4024

speedup        = +0.0174
P(speedup > 0) =  1.0000

CPU: 4 x Intel(R) Xeon(R) CPU E3-1240 v3 @ 3.40GHz
Hyperthreading: on 

@MichaelB7
Copy link
Contributor

MichaelB7 commented Feb 10, 2021

2 long concurrent runs, both pgo builds , both with "flto". A 0.7% speed pickup for clang over gcc, total runtime over 3 minutes, fwiw, I ran a number of shorter concurrent runs and clang was faster everytime with the pass manager flag noted in this PR.

OS Sys Windows 10 Pro
gcc.exe (Rev6, Built by MSYS2 project) 10.2.0
clang version 11.0.0 (https://github.com/msys2/MINGW-packages 500eeb8c8d8dc104557c9109cc9e32c203feb4e2)
Target: x86_64-w64-windows-gnu

command: 
$ SF-clang bench 256 1 26 >/dev/null && echo "clang" & stockfish bench 256 1 26 >/dev/null && echo "gcc"


Position: 1/45 (rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1)

Position: 1/45 (rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1)

Position: 2/45 (r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w KQkq - 0 10)

Position: 2/45 (r3k2r/p1ppqpb1/bn2pnp1/3PN3/1p2P3/2N2Q1p/PPPBBPPP/R3K2R w KQkq - 0 10)

Position: 3/45 (8/2p5/3p4/KP5r/1R3p1k/8/4P1P1/8 w - - 0 11)

Position: 3/45 (8/2p5/3p4/KP5r/1R3p1k/8/4P1P1/8 w - - 0 11)

Position: 4/45 (4rrk1/pp1n3p/3q2pQ/2p1pb2/2PP4/2P3N1/P2B2PP/4RRK1 b - - 7 19)

Position: 4/45 (4rrk1/pp1n3p/3q2pQ/2p1pb2/2PP4/2P3N1/P2B2PP/4RRK1 b - - 7 19)

Position: 5/45 (rq3rk1/ppp2ppp/1bnpN3/3N2B1/4P3/7P/PPPQ1PP1/2KR3R b - - 0 14)

Position: 5/45 (rq3rk1/ppp2ppp/1bnpN3/3N2B1/4P3/7P/PPPQ1PP1/2KR3R b - - 0 14)

Position: 6/45 (r1bq1r1k/1pp1n1pp/1p1p4/4p2Q/4PpP1/1BNP4/PPP2P1P/3R1RK1 b - g3 0 14)

Position: 6/45 (r1bq1r1k/1pp1n1pp/1p1p4/4p2Q/4PpP1/1BNP4/PPP2P1P/3R1RK1 b - g3 0 14)

Position: 7/45 (r3r1k1/2p2ppp/p1p1bn2/8/1q2P3/2NPQN2/PPP3PP/R4RK1 b - - 2 15)

Position: 7/45 (r3r1k1/2p2ppp/p1p1bn2/8/1q2P3/2NPQN2/PPP3PP/R4RK1 b - - 2 15)

Position: 8/45 (r1bbk1nr/pp3p1p/2n5/1N4p1/2Np1B2/8/PPP2PPP/2KR1B1R w kq - 0 13)

Position: 8/45 (r1bbk1nr/pp3p1p/2n5/1N4p1/2Np1B2/8/PPP2PPP/2KR1B1R w kq - 0 13)

Position: 9/45 (r1bq1rk1/ppp1nppp/4n3/3p3Q/3P4/1BP1B3/PP1N2PP/R4RK1 w - - 1 16)

Position: 9/45 (r1bq1rk1/ppp1nppp/4n3/3p3Q/3P4/1BP1B3/PP1N2PP/R4RK1 w - - 1 16)

Position: 10/45 (4r1k1/r1q2ppp/ppp2n2/4P3/5Rb1/1N1BQ3/PPP3PP/R5K1 w - - 1 17)

Position: 10/45 (4r1k1/r1q2ppp/ppp2n2/4P3/5Rb1/1N1BQ3/PPP3PP/R5K1 w - - 1 17)

Position: 11/45 (2rqkb1r/ppp2p2/2npb1p1/1N1Nn2p/2P1PP2/8/PP2B1PP/R1BQK2R b KQ - 0 11)

Position: 11/45 (2rqkb1r/ppp2p2/2npb1p1/1N1Nn2p/2P1PP2/8/PP2B1PP/R1BQK2R b KQ - 0 11)

Position: 12/45 (r1bq1r1k/b1p1npp1/p2p3p/1p6/3PP3/1B2NN2/PP3PPP/R2Q1RK1 w - - 1 16)

Position: 12/45 (r1bq1r1k/b1p1npp1/p2p3p/1p6/3PP3/1B2NN2/PP3PPP/R2Q1RK1 w - - 1 16)

Position: 13/45 (3r1rk1/p5pp/bpp1pp2/8/q1PP1P2/b3P3/P2NQRPP/1R2B1K1 b - - 6 22)

Position: 13/45 (3r1rk1/p5pp/bpp1pp2/8/q1PP1P2/b3P3/P2NQRPP/1R2B1K1 b - - 6 22)

Position: 14/45 (r1q2rk1/2p1bppp/2Pp4/p6b/Q1PNp3/4B3/PP1R1PPP/2K4R w - - 2 18)

Position: 14/45 (r1q2rk1/2p1bppp/2Pp4/p6b/Q1PNp3/4B3/PP1R1PPP/2K4R w - - 2 18)

Position: 15/45 (4k2r/1pb2ppp/1p2p3/1R1p4/3P4/2r1PN2/P4PPP/1R4K1 b - - 3 22)

Position: 15/45 (4k2r/1pb2ppp/1p2p3/1R1p4/3P4/2r1PN2/P4PPP/1R4K1 b - - 3 22)

Position: 16/45 (3q2k1/pb3p1p/4pbp1/2r5/PpN2N2/1P2P2P/5PP1/Q2R2K1 b - - 4 26)

Position: 16/45 (3q2k1/pb3p1p/4pbp1/2r5/PpN2N2/1P2P2P/5PP1/Q2R2K1 b - - 4 26)

Position: 17/45 (6k1/6p1/6Pp/ppp5/3pn2P/1P3K2/1PP2P2/3N4 b - - 0 1)

Position: 17/45 (6k1/6p1/6Pp/ppp5/3pn2P/1P3K2/1PP2P2/3N4 b - - 0 1)

Position: 18/45 (3b4/5kp1/1p1p1p1p/pP1PpP1P/P1P1P3/3KN3/8/8 w - - 0 1)

Position: 19/45 (2K5/p7/7P/5pR1/8/5k2/r7/8 w - - 4 3)

Position: 20/45 (8/6pk/1p6/8/PP3p1p/5P2/4KP1q/3Q4 w - - 0 1)

Position: 18/45 (3b4/5kp1/1p1p1p1p/pP1PpP1P/P1P1P3/3KN3/8/8 w - - 0 1)

Position: 19/45 (2K5/p7/7P/5pR1/8/5k2/r7/8 w - - 4 3)

Position: 20/45 (8/6pk/1p6/8/PP3p1p/5P2/4KP1q/3Q4 w - - 0 1)

Position: 21/45 (7k/3p2pp/4q3/8/4Q3/5Kp1/P6b/8 w - - 0 1)

Position: 21/45 (7k/3p2pp/4q3/8/4Q3/5Kp1/P6b/8 w - - 0 1)

Position: 22/45 (8/2p5/8/2kPKp1p/2p4P/2P5/3P4/8 w - - 0 1)

Position: 23/45 (8/1p3pp1/7p/5P1P/2k3P1/8/2K2P2/8 w - - 0 1)

Position: 22/45 (8/2p5/8/2kPKp1p/2p4P/2P5/3P4/8 w - - 0 1)

Position: 23/45 (8/1p3pp1/7p/5P1P/2k3P1/8/2K2P2/8 w - - 0 1)

Position: 24/45 (8/pp2r1k1/2p1p3/3pP2p/1P1P1P1P/P5KR/8/8 w - - 0 1)

Position: 24/45 (8/pp2r1k1/2p1p3/3pP2p/1P1P1P1P/P5KR/8/8 w - - 0 1)

Position: 25/45 (8/3p4/p1bk3p/Pp6/1Kp1PpPp/2P2P1P/2P5/5B2 b - - 0 1)

Position: 25/45 (8/3p4/p1bk3p/Pp6/1Kp1PpPp/2P2P1P/2P5/5B2 b - - 0 1)

Position: 26/45 (5k2/7R/4P2p/5K2/p1r2P1p/8/8/8 b - - 0 1)

Position: 27/45 (6k1/6p1/P6p/r1N5/5p2/7P/1b3PP1/4R1K1 w - - 0 1)

Position: 26/45 (5k2/7R/4P2p/5K2/p1r2P1p/8/8/8 b - - 0 1)

Position: 27/45 (6k1/6p1/P6p/r1N5/5p2/7P/1b3PP1/4R1K1 w - - 0 1)

Position: 28/45 (1r3k2/4q3/2Pp3b/3Bp3/2Q2p2/1p1P2P1/1P2KP2/3N4 w - - 0 1)

Position: 28/45 (1r3k2/4q3/2Pp3b/3Bp3/2Q2p2/1p1P2P1/1P2KP2/3N4 w - - 0 1)

Position: 29/45 (6k1/4pp1p/3p2p1/P1pPb3/R7/1r2P1PP/3B1P2/6K1 w - - 0 1)

Position: 29/45 (6k1/4pp1p/3p2p1/P1pPb3/R7/1r2P1PP/3B1P2/6K1 w - - 0 1)

Position: 30/45 (8/3p3B/5p2/5P2/p7/PP5b/k7/6K1 w - - 0 1)

Position: 31/45 (5rk1/q6p/2p3bR/1pPp1rP1/1P1Pp3/P3B1Q1/1K3P2/R7 w - - 93 90)

Position: 30/45 (8/3p3B/5p2/5P2/p7/PP5b/k7/6K1 w - - 0 1)

Position: 31/45 (5rk1/q6p/2p3bR/1pPp1rP1/1P1Pp3/P3B1Q1/1K3P2/R7 w - - 93 90)

Position: 32/45 (4rrk1/1p1nq3/p7/2p1P1pp/3P2bp/3Q1Bn1/PPPB4/1K2R1NR w - - 40 21)

Position: 32/45 (4rrk1/1p1nq3/p7/2p1P1pp/3P2bp/3Q1Bn1/PPPB4/1K2R1NR w - - 40 21)

Position: 33/45 (r3k2r/3nnpbp/q2pp1p1/p7/Pp1PPPP1/4BNN1/1P5P/R2Q1RK1 w kq - 0 16)

Position: 33/45 (r3k2r/3nnpbp/q2pp1p1/p7/Pp1PPPP1/4BNN1/1P5P/R2Q1RK1 w kq - 0 16)

Position: 34/45 (3Qb1k1/1r2ppb1/pN1n2q1/Pp1Pp1Pr/4P2p/4BP2/4B1R1/1R5K b - - 11 40)

Position: 34/45 (3Qb1k1/1r2ppb1/pN1n2q1/Pp1Pp1Pr/4P2p/4BP2/4B1R1/1R5K b - - 11 40)

Position: 35/45 (4k3/3q1r2/1N2r1b1/3ppN2/2nPP3/1B1R2n1/2R1Q3/3K4 w - - 5 1)

Position: 35/45 (4k3/3q1r2/1N2r1b1/3ppN2/2nPP3/1B1R2n1/2R1Q3/3K4 w - - 5 1)

Position: 36/45 (8/8/8/8/5kp1/P7/8/1K1N4 w - - 0 1)

Position: 37/45 (8/8/8/5N2/8/p7/8/2NK3k w - - 0 1)

Position: 38/45 (8/3k4/8/8/8/4B3/4KB2/2B5 w - - 0 1)

Position: 36/45 (8/8/8/8/5kp1/P7/8/1K1N4 w - - 0 1)

Position: 37/45 (8/8/8/5N2/8/p7/8/2NK3k w - - 0 1)

Position: 38/45 (8/3k4/8/8/8/4B3/4KB2/2B5 w - - 0 1)

Position: 39/45 (8/8/1P6/5pr1/8/4R3/7k/2K5 w - - 0 1)

Position: 39/45 (8/8/1P6/5pr1/8/4R3/7k/2K5 w - - 0 1)

Position: 40/45 (8/2p4P/8/kr6/6R1/8/8/1K6 w - - 0 1)

Position: 41/45 (8/8/3P3k/8/1p6/8/1P6/1K3n2 b - - 0 1)

Position: 40/45 (8/2p4P/8/kr6/6R1/8/8/1K6 w - - 0 1)

Position: 42/45 (8/R7/2q5/8/6k1/8/1P5p/K6R w - - 0 124)

Position: 43/45 (6k1/3b3r/1p1p4/p1n2p2/1PPNpP1q/P3Q1p1/1R1RB1P1/5K2 b - - 0 1)

Position: 44/45 (r2r1n2/pp2bk2/2p1p2p/3q4/3PN1QP/2P3R1/P4PP1/5RK1 w - - 0 1)

Position: 45/45 (bb1n1rkr/ppp1Q1pp/3n1p2/3p4/3P4/6Pq/PPP1PP1P/BB1NNRKR w HFhf - 0 5)

Position: 41/45 (8/8/3P3k/8/1p6/8/1P6/1K3n2 b - - 0 1)

Position: 42/45 (8/R7/2q5/8/6k1/8/1P5p/K6R w - - 0 124)

Position: 43/45 (6k1/3b3r/1p1p4/p1n2p2/1PPNpP1q/P3Q1p1/1R1RB1P1/5K2 b - - 0 1)

Position: 44/45 (r2r1n2/pp2bk2/2p1p2p/3q4/3PN1QP/2P3R1/P4PP1/5RK1 w - - 0 1)

Position: 45/45 (bb1n1rkr/ppp1Q1pp/3n1p2/3p4/3P4/6Pq/PPP1PP1P/BB1NNRKR w HFhf - 0 5)

===========================
Total time (ms) : 205880
Nodes searched  : 496116737
Nodes/second    : 2409737
clang

===========================
Total time (ms) : 207232
Nodes searched  : 496116737
Nodes/second    : 2394016
gcc

@vondele
Copy link
Member

vondele commented Feb 10, 2021

I did a comparison, just with master, in my case clang is 3.5% slower (AMD Ryzen 9 3950X):

$ ~/chess/pyshbench/pyshbench.py ./stockfish.master.g++-10 ./stockfish.master.g++-7 10
run       base       test     diff
  1    2302027    2286279   -15748
  2    2308423    2303486    -4937
  3    2302391    2308057    +5666
  4    2310808    2298751   -12057
  5    2300934    2314487   +13553
  6    2291685    2300934    +9249
  7    2299660    2315777   +16117
  8    2284662    2280183    -4479
  9    2368790    2348889   -19901
 10    2287538    2282331    -5207

Result of  10 runs
==================
base (...aster.g++-10) =    2305692  +/- 14684
test (...master.g++-7) =    2303917  +/- 12495
diff                   =      -1774  +/- 7693

speedup        = -0.0008
P(speedup > 0) =  0.3258


$ ~/chess/pyshbench/pyshbench.py ./stockfish.master.g++-10 ./stockfish.master.g++-8 10
run       base       test     diff
  1    2275544    2302027   +26483
  2    2291866    2310257   +18391
  3    2278397    2297298   +18901
  4    2306775    2319287   +12512
  5    2291143    2294398    +3255
  6    2359755    2376535   +16780
  7    2380036    2382181    +2145
  8    2310074    2320768   +10694
  9    2290241    2280005   -10236
 10    2284304    2298387   +14083

Result of  10 runs
==================
base (...aster.g++-10) =    2306814  +/- 21843
test (...master.g++-8) =    2318114  +/- 21324
diff                   =     +11301  +/- 6493

speedup        = +0.0049
P(speedup > 0) =  0.9997

$ ~/chess/pyshbench/pyshbench.py ./stockfish.master.g++-10 ./stockfish.master.g++-9 10
run       base       test     diff
  1    2289159    2258926   -30233
  2    2282869    2263851   -19018
  3    2288439    2255772   -32667
  4    2282869    2277861    -5008
  5    2273587    2274654    +1067
  6    2275722    2261914   -13808
  7    2283227    2260858   -22369
  8    2323735    2323364     -371
  9    2290421    2268620   -21801
 10    2286639    2277326    -9313

Result of  10 runs
==================
base (...aster.g++-10) =    2287667  +/- 8554
test (...master.g++-9) =    2272315  +/- 12109
diff                   =     -15352  +/- 7339

speedup        = -0.0067
P(speedup > 0) =  0.0000


$ ~/chess/pyshbench/pyshbench.py ./stockfish.master.g++-10 ./stockfish.master.clang++-10 10
run       base       test     diff
  1    2286999    2202128   -84871
  2    2298933    2225727   -73206
  3    2278754    2201795   -76959
  4    2289159    2205301   -83858
  5    2314487    2229311   -85176
  6    2353071    2283048   -70023
  7    2294398    2180168  -114230
  8    2292227    2205636   -86591
  9    2286819    2204465   -82354
 10    2364745    2286279   -78466

Result of  10 runs
==================
base (...aster.g++-10) =    2305959  +/- 18299
test (...r.clang++-10) =    2222386  +/- 21961
diff                   =     -83573  +/- 7483

speedup        = -0.0362
P(speedup > 0) =  0.0000

I would say both compilers are pretty equivalent in speed nevertheless.

@gcp
Copy link
Contributor Author

gcp commented Feb 10, 2021

Hi Joost, there's some benchmarking in issue #3341. It does look like they're close (at least with clang-12), but when GCC is faster it tends to be a lot faster, so I withdrew the issue to default clang.

But the clang settings change here should be beneficial though, for people that choose to use clang.

@vondele
Copy link
Member

vondele commented Feb 10, 2021

Do you know what older versions of clang++ support the flag -fexperimental-new-pass-manager flag, and how future versions of clang will deal with the flag? That is, should we be concerned about users of earlier / newer versions of clang where this might have side effects (bugs).

@gcp
Copy link
Contributor Author

gcp commented Feb 10, 2021

Newer versions will Just Work - they default it on internally, and they add a flag to activate the old behavior (legacy pass manager). At some point they may remove the flag, by which point we could remove it here as it will be the default anyway.

The flag has existed since 2016 I think (https://reviews.llvm.org/D28077), so every clang that's newer than 4.0 (or thereabouts) should handle it. Given that we don't default clang, I think that's more than enough margin.

@vondele
Copy link
Member

vondele commented Feb 10, 2021

OK, both PRs combined give a good speedup here:

Result of  10 runs
==================
base (...kfish.master) =    2208553  +/- 19324
test (...ckfish.patch) =    2269327  +/- 23681
diff                   =     +60774  +/- 7046

speedup        = +0.0275
P(speedup > 0) =  1.0000

@vondele vondele added the to be merged Will be merged shortly label Feb 10, 2021
@vondele vondele closed this in 550fed3 Feb 10, 2021
BM123499 pushed a commit to BM123499/Stockfish that referenced this pull request Feb 22, 2021
It's about 1% speedup for Stockfish.

Result of 100 runs
==================
base (...fish_clang12) =    1946851  +/- 3717
test (./stockfish    ) =    1967276  +/- 3408
diff                   =     +20425  +/- 2438

speedup        = +0.0105
P(speedup > 0) =  1.0000

Thanks to David Major for making me aware of this part
of LLVM development.

closes official-stockfish#3346

No functional change
joergoster pushed a commit to joergoster/Stockfish-old that referenced this pull request Feb 27, 2021
It's about 1% speedup for Stockfish.

Result of 100 runs
==================
base (...fish_clang12) =    1946851  +/- 3717
test (./stockfish    ) =    1967276  +/- 3408
diff                   =     +20425  +/- 2438

speedup        = +0.0105
P(speedup > 0) =  1.0000

Thanks to David Major for making me aware of this part
of LLVM development.

closes official-stockfish/Stockfish#3346

No functional change
Fanael pushed a commit to Fanael/Stockfish that referenced this pull request Mar 7, 2021
It's about 1% speedup for Stockfish.

Result of 100 runs
==================
base (...fish_clang12) =    1946851  +/- 3717
test (./stockfish    ) =    1967276  +/- 3408
diff                   =     +20425  +/- 2438

speedup        = +0.0105
P(speedup > 0) =  1.0000

Thanks to David Major for making me aware of this part
of LLVM development.

closes official-stockfish#3346

No functional change
dav1312 pushed a commit to dav1312/Stockfish that referenced this pull request Nov 25, 2022
Passed cutechess STC:
Score of RF 14 vs RF 3: 1124 - 536 - 1130  [0.605] 2790
...      RF 14 playing White: 620 - 246 - 530  [0.634] 1396
...      RF 14 playing Black: 504 - 290 - 600  [0.577] 1394
...      White vs Black: 910 - 750 - 1130  [0.529] 2790
Elo difference: 74.3 +/- 10.0, LOS: 100.0 %, DrawRatio: 40.5 %
SPRT: llr 2.95 (100.2%), lbound -2.94, ubound 2.94 - H1 was accepted

Passed cutechess LTC:
Score of RF 14 vs RF 3: 875 - 355 - 1062  [0.613] 2292
...      RF 14 playing White: 519 - 153 - 475  [0.660] 1147
...      RF 14 playing Black: 356 - 202 - 587  [0.567] 1145
...      White vs Black: 721 - 509 - 1062  [0.546] 2292
Elo difference: 80.2 +/- 10.4, LOS: 100.0 %, DrawRatio: 46.3 %
SPRT: llr 2.95 (100.0%), lbound -2.94, ubound 2.94 - H1 was accepted

Bench: 4918790 (+24 squashed commit)

Squashed commit:

[5118c15] Use Bitboard over Square in movegen

It uses pos.checkers() on target when movegen is the type of EVASION.
It simplify the code. And it's also expected a slightly speed up,
because Bitboard is more direct when doing bitwise.

Passed STC:
LLR: 2.93 (-2.94,2.94) {-1.25,0.25}
Total: 28176 W: 2506 L: 2437 D: 23233
Ptnml(0-2): 80, 1904, 10063, 1949, 92
https://tests.stockfishchess.org/tests/view/60421d18ddcba5f0627bb6a9

Passed LTC:
LLR: 2.93 (-2.94,2.94) {-0.75,0.25}
Total: 9704 W: 402 L: 341 D: 8961
Ptnml(0-2): 3, 279, 4230, 334, 6
https://tests.stockfishchess.org/tests/view/60422823ddcba5f0627bb6ae

closes official-stockfish#3383

No functional change

[42b44ee] Deal with commented lines in UCI input

commands starting with '#' as the first character will be ignored

closes official-stockfish#3378

No functional change

[550d3d8] Add Stockfish namespace.

fixes official-stockfish#3350 and is a small cleanup that might make it easier to use SF
in separate projects, like a NNUE trainer or similar.

closes official-stockfish#3370

No functional change.

[6ccec01] Clean functions returning by const values

The codebase contains multiple functions returning by const-value.
This patch is a small cleanup making those function returns
by value instead, removing the const specifier.

closes official-stockfish#3328

No functional change

[5801707] Import author list and copyright years

Easier to do it this way than track all the cherry picks one by one.

[fd5fc27] Use correct chess terms + fix spelling.

  - "discovered check" (instead of "discovery check")
  - "en passant" (instead of "en-passant")
  - "pseudo-legal" before a noun (instead of "pseudo legal")
  - "3-fold" (instead of "3fold")

closes official-stockfish#3294

No functional change.

[3d8a301] Better code for hash table generation

This patch removes some magic numbers in TT bit management and introduce proper
constants in the code, to improve documentation and ease further modifications.

No function change

[f314344] Allow TT entries with key16==0 to be fetched

Fix the issue where a TT entry with key16==0 would always be reported
as a miss. Instead, we'll use depth8 to detect whether the TT entry is
occupied. In order to do that, we'll change DEPTH_OFFSET to -7
(depth8==0) to distinguish between an unoccupied entry and the
otherwise lowest possible depth, i.e., DEPTH_NONE (depth8==1).

To prevent a performance regression, we'll reorder the TT entry fields
by the access order of TranspositionTable::probe(). Memory in general
works fastest when accessed in sequential order. We'll also match the
store order in TTEntry::save() with the entry field order, and
re-order the 'if-or' expressions in TTEntry::save() from the cheapest
to the most expensive.

Finally, as we now have a proper TT entry occupancy test, we'll fix a
minor corner case with hashfull reporting. To reproduce:
- Use a big hash
- Either:
  a. Start 31 very quick searches (this wraparounds generation to 0); or
  b. Force generation of the first search to 0.
- go depth infinite

Before the fix, hashfull would incorrectly report nearly full hash
immediately after the search start, since
TranspositionTable::hashfull() used to consider only the entry
generation and not whether the entry was actually occupied.

STC:
LLR: 2.95 (-2.94,2.94) {-0.25,1.25}
Total: 36848 W: 4091 L: 3898 D: 28859
Ptnml(0-2): 158, 2996, 11972, 3091, 207
https://tests.stockfishchess.org/tests/view/5f3f98d5dc02a01a0c2881f7

LTC:
LLR: 2.95 (-2.94,2.94) {0.25,1.25}
Total: 32280 W: 1828 L: 1653 D: 28799
Ptnml(0-2): 34, 1428, 13051, 1583, 44
https://tests.stockfishchess.org/tests/view/5f3fe77a87a5c3c63d8f5332

closes official-stockfish#3048

Bench: 3742162

[5b421ab] Enable New Pass Manager for Clang.

It's about 1% speedup for Stockfish.

Result of 100 runs
==================
base (...fish_clang12) =    1946851  +/- 3717
test (./stockfish    ) =    1967276  +/- 3408
diff                   =     +20425  +/- 2438

speedup        = +0.0105
P(speedup > 0) =  1.0000

Thanks to David Major for making me aware of this part
of LLVM development.

closes official-stockfish#3346

No functional change

[f7f7e38] Disable ThinLTO when using Clang.

Benchmarking with current Clang 12 shows that
and ThinLTO is a pessimization, see issue official-stockfish#3341.

closes official-stockfish#3345

No functional change.

[6373f89] Simplify Chess 960 castling

a little cleanup, and small speedup (about 0.3%) for Chess 960.

Verified with perft on a large set of chess960 positions.

Closes official-stockfish#3317

No functional change

[3f4d84c] Speed Up Perft Search

It speeds up generate<LEGAL>, and thus perft, roughly by 2-3%.

closes official-stockfish#3312

No functional change

[88974f5] Clean Up Castling in gives_check

There is no need to add rto or kto on the Bitboard which represents the pieces.

STC:
LLR: 2.93 (-2.94,2.94) {-1.25,0.25}
Total: 57064 W: 5096 L: 5067 D: 46901
Ptnml(0-2): 202, 3862, 20355, 3931, 182
https://tests.stockfishchess.org/tests/view/6005ea2c6019e097de3efa55

LTC:
LLR: 2.92 (-2.94,2.94) {-0.75,0.25}
Total: 30088 W: 1094 L: 1052 D: 27942
Ptnml(0-2): 10, 882, 13217, 926, 9
https://tests.stockfishchess.org/tests/view/6006115a6019e097de3efa6e

closes official-stockfish#3311

No functional change.

[2129ec9] Avoid more expensive legality check

speedup of the code, enough to pass STC, failed LTC.

Passed STC:
LLR: 2.93 (-2.94,2.94) {-0.25,1.25}
Total: 68928 W: 6334 L: 6122 D: 56472
Ptnml(0-2): 233, 4701, 24369, 4943, 218
https://tests.stockfishchess.org/tests/view/6002747f6019e097de3ef8dc

Failed LTC:
LLR: -2.96 (-2.94,2.94) {0.25,1.25}
Total: 44560 W: 1702 L: 1675 D: 41183
Ptnml(0-2): 25, 1383, 19438, 1408, 26
https://tests.stockfishchess.org/tests/view/6002a4836019e097de3ef8e3

About 1% speedup:

Result of  50 runs
==================
base (...kfish.master) =    2237500  +/- 7428
test (...ckfish.patch) =    2267003  +/- 7017
diff                   =     +29503  +/- 4774

speedup        = +0.0132
P(speedup > 0) =  1.0000

closes official-stockfish#3304

No functional change.

[289fbcc] Use stable sort to make sure bench with TB yields same results everywhere.

std::sort() is not stable so different implementations can produce different results:
use the stable version instead.

Observed for '8/6k1/5r2/8/8/8/1K6/Q7 w - - 0 1' yielding different bench results for gcc and MSVC
and 3-4-5 syzygy TB prior to this patch.

closes official-stockfish#3083

No functional change.

[5a344f1] Fix parallel LTO issues on Windows

This adds -save-temps to the linker flags when parallel LTO is used on
MinGW/MSYS.

fixes official-stockfish#2977

closes official-stockfish#2978

No functional change.

[5da0f55] Parallelize Link Time Optimization for GCC, CLANG and MINGW

This patch tries to run multiple LTO threads in parallel, speeding up
the build process of optimized builds if the -j make parameter is used.
This mitigates the longer linking times of optimized builds since the
integration of the NNUE code. Roughly 2x build speedup.

I've tried a similar patch some two years ago but it ran into trouble
with old compiler versions then. Since we're on the C++17 standard now
these old compilers should be obsolete.

closes official-stockfish#2943

No functional change.

[630995a] Remove unnecessay legality check

Possible after the recent reording pos.legal(move) check

official-stockfish#2941

No functional change.

[6f512d9] Do move legality check before pruning.

This alllows to simplify the code because the move counter haven't to be
decremented later if a move isn't legal. As a side effect now illegal
pruned moves doesn't included anymore in move counter. So slightly less
pruning and reductions are done.

STC:
LLR: 2.94 (-2.94,2.94) {-1.50,0.50}
Total: 111016 W: 21106 L: 21077 D: 68833
Ptnml(0-2): 1830, 13083, 25736, 12946, 1913
https://tests.stockfishchess.org/tests/view/5f28816fa5abc164f05e4c26

LTC:
LLR: 2.94 (-2.94,2.94) {-1.50,0.50}
Total: 39264 W: 4909 L: 4843 D: 29512
Ptnml(0-2): 263, 3601, 11854, 3635, 279
https://tests.stockfishchess.org/tests/view/5f297902a5abc164f05e4c8e

closes official-stockfish#2906

Bench: 3795876

[7f6f1b8] Remove pawn tables as well, they're unused

Bench: 3865928

[c7fa058] We don't need specialized endgame eval where we're going

[2e3cd4d] Remove piece lists

This patch removes the incrementally updated piece lists from the Position object.

This has been tried before but always failed. My reasons for trying again are:

* 32-bit systems (including phones) are now much less important than they were some years ago (and are absent from fishtest);
* NNUE may have made SF less finely tuned to the order in which moves were generated.

STC:
LLR: 2.94 (-2.94,2.94) {-1.25,0.25}
Total: 55272 W: 5260 L: 5216 D: 44796
Ptnml(0-2): 208, 4147, 18898, 4159, 224
https://tests.stockfishchess.org/tests/view/5fc2986a42a050a89f02c926

LTC:
LLR: 2.96 (-2.94,2.94) {-0.75,0.25}
Total: 16600 W: 673 L: 608 D: 15319
Ptnml(0-2): 14, 533, 7138, 604, 11
https://tests.stockfishchess.org/tests/view/5fc2f98342a050a89f02c95c

closes official-stockfish#3247

Bench: 3940967

[693c459] We don't need PSQTs where we're going

[c6c710d] Initial import of simple eval
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
to be merged Will be merged shortly
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants