-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Contempt 7 #1361
Contempt 7 #1361
Conversation
Bench 5494441
My 2 cents - it should be not only helpful against weaker engines, but also lower draw rate in fishtest thus making it to use slightly less resources for every test to achieve the same SPRT result. |
@Vizvezdenec I am afraid I do not understand your comment. IMHO the only reasonable way to evaluate if contempt is beneficial for fishtest (if one really wants to) is to take two SF versions sufficiently different in strength (small elo differences take too many resources to measure) and match them twice against each other. Once both with contempt and once both without contempt. Then compare the BayesElo difference (not logistic elo as fishtest uses BayesElo for SPRT). |
@vdbergh it's pretty simple - it reduces draw rate and makes SPRT finish faster because of that while not losing strength. |
I think sf draws too much, even against weaker opposition, and this resulted in sf not qualifying for the superfinal in TCEC Season 10 last year. So the devs came up with this working contempt option. The Contempt level can achieve different things; at lower levels around 2 Elo gain in self-play appears to be gained, a reduction in draw-rate is gained at all levels, more so as contempt value increases, a gain in Elo against weaker opposition is achieved particularly at higher contempt levels. Choosing Contempt=7 for the default seems to aim at the 2 Elo gain in self play, when sf is already a very strong engine. The problem we should be addressing in this request is the high draw rate and below-par results against weaker engines. This is an opportunity to gain several Elo against weaker engines (and therefore in rating lists) and make a big reduction in sf's "drawfish" tendencies, not just the small reduction in draw-rate that Contempt=7 might get. I suggest we use a higher value than 7, say at least 15. I have requested an LTC test to estimate the Elo loss/gain with Contempt=15. An STC test suggested it gives around +2 Elo in self-play (but note: +/-3 Elo). STC with Contempt=15: http://tests.stockfishchess.org/tests/view/5a50c0f90ebc590ccbb8c6fa Stefan Pohl's test results: http://www.sp-cc.de/experiments.htm
|
The problem is that above contempt 7 the engine starts to degenerate loss ratio and change its behavior too much. As a conservative approach contempt 7 is big enough to give some elo while retaining its core strength even against equal opposition. Talking about a top state of art engine as SF is, the best approach IMO is to maintain its stability at any case and so the more conservative option is desirable. I would choose (based on tests) contempt 7 or even lower like 4 to keep the engine as much stable as possible while gaining some elo even against "equal" opposition . |
For some comparison to the old contempt implementation, some tests from 2 months ago: old devmaster vs SF7, stc old devmaster vs SF7, ltc old devmaster old contempt-code, contempt 40 vs SF7 http://tests.stockfishchess.org/tests/view/5a1c011f0ebc590ccbb8b379 old devmaster with new contempt-code, contempt 40 vs SF7 So the new contempt implementation looks to do better elo-wise vs weaker engines with some amount of contempt while having minimal elo difference vs master. 7 or 10 is fine I think, but Imho it also shouldn't go higher than 10. End-users rarely change default settings and thus I don't think it should be too far from the 0 setting. |
I'm not against default contempt but I think before selecting a value we should first define what out goal for default contempt is. If for example it's to score high on rating lists then 7 is way too conservative because most of the opponents are much weaker. On the other hand if we want to give the best objective analysis and play the strongest objective chess it should be 0. So what is our goal? |
"Play the strongest objective chess" is hard to define, and it is not clear from the tests that it is exactly for contempt=0. On the other hand, the feedback we had during TCEC is that current Stockfish is boring, and that it is desirable to keep tension in the positions -- that could be another definition of "better chess program", bending to the side of chess as a fun game. It would be good for the Stockfish project (attracting more developers) if Stockfish gained a reputation of entertaining style. So I would favor a bigger default contempt value, maybe contempt 15 or more. Stéphane |
I agree higher values increase the number of losses, but I think that is a good thing, since we are trying to reduce the number of draws. The aim is to get more wins and more losses, with the net effect being definitely positive against weaker opposition and roughly zero against strong opposition. My view is that the key motivation here is to reduce sf's draw rate. |
Hi all, I wanted to add a link to related testwork from Stefan that apparently you have not seen? He has taken a lot of testwork out of our hands. Stefan Pohl already gave a good estimate of an optimum. Post subject: SPCC: Testrun of Stockfish 171206 with Contempt=+40 finished Big thanks to Stefan Pohl. Please read. (Edit I'm sorry, I see that xoto has already given a synpsis of the results above this, 9 hours ago already. I had overlooked that. But I would just set it to 40. If you don't want to maximize Elo, I'd set it to zero. Not something inbetween. My preference just to leave it to the user to increase contempt but then you would not see improved test results from for instance CEGT and CCRL) Just five cents added not really important: I'm not a big fan of contempt but as long as it stays an UCI parameter (You never know, Marco has something against them), it is easy to set it at 0 again and rating groups would use the positive contempt as default. Everybody happy. If a positive contempt improves analysis, I would say there is something wrong. But I do not exclude at all the possibility. |
@snicolet agreeing with you: in fact, objectively, (proved by tests considering elo gain alone) a little contempt makes the program play better chess even against itself. |
It's important to understand how Contempts works: positive contempt evaluates (in search tree) our moves with somewhat higher evaluation (2*Contempt) than the same moves when it's opponent's turn to move (eval + Contempt VS eval - Contempt). From this it is clear (and confirmed by tests) that big Contempt has no sense, and that it is actually a regression in self-play. Set Contempt=1000 and you will see regression against weaker engines too. But, it is not clear that small contempt is a bad thing, and I wanted to confirm this. |
I think default 7 is good. We can always adjust that For TCEC, though I am happy with 7 there too. I would like it if it was possible to have contempt in analysis, not by default, but remain optional in analysis mode rather than any automatic disabling. Also, I would point out, that strategy in tournament play vs strategy in match play is different. In tournament play you want games to steer toward likely decisive games, so higher contempt make sense there. But in match play loosing a game can be very bad. If you never loose a game, your chances of winning the match get exceedingly good especially if there are a lot of games. But if you played safe in tournaments you may not loose any games but you also are much less likely to get the top spot. I am not saying contempt has to be zero in a match, but it should be very low. |
You have tested that contempt 7 is not regressive, good. But what if contempt 7 makes no difference against weaker opponents? There is no clue contempt 7 helps somehow against weaker opponents, only some speculation. Maybe such contempt is so small that it has no practical effect so that this patch would be misleading. |
Well, one thing it's doing for sure - it's lowering drawrate in selfplay. Drawrate on LTC with C=7 is 72,5% while usual drawrate on LTC is near 74,5% (took 3 latest LTC tests and got 74,9%, 74,3% and 74,6% from them). |
In my lists that have all kind off engines is still contempt=20 best.. contempt=10 gives better results against stronger engines then contempt=20 logic. |
@Ipmanchess can you please confirm that with contempt = 7 you can see improvements in tests against weaker engines? |
It will give something..but not much..i had to use 20 to see real difference..then tried 40 ,not so good..and tried 10 again to have a compare..and 20 ended higher then 10 when you play against all engines in list. |
My worry is to commit a placebo patch. I have no problem to commit a patch even with higher contempt but that is not regressive. I have some doubts committing a placebo, because it will only give illusion to improve, but will not improve anything in practice. |
Well the only way to prove that it's not a placebo is to measure it with framework against, for example, sf7/sf6. |
@Vizvezdenec yes, this is a sound way to proceed. |
Yes..good way to find out.. |
What about future patches? Will be tested against the 'contempt 7' version or against a neutral master? Contempt can help against different and weaker opponents, but "contempt 0" is statistically optimized for self-play, so I fear that whatever contempt we choose as a default next passed patch and optimization will tend to revert to standard 'contempt zero optimized' version. so I'm wondering if the contempt 7 (or whatever) patch should only be applied to releases and not to versions used in fishtest. |
@marrco first thing - c=7 passed [-3;1] SPRT vs c=0 so it brings no measurable regression. Second thing - it lowers drawrate in selfplay which is actually a good thing. |
A default contempt value is really bad for analysis, in particular if the user analyses a variation so that the contempt value switches sign depending on whether it is white or black to move. This seems unacceptable to me (unless contempt is somehow switched off for analysis). (As was discussed some time ago, contempt can have uses in analysis, but the user should then be able to easily control whether contempt is from white's point of view or from black's point of view.) It seems better to me to leave default contempt at 0 and perhaps have an "official" recommendation on what are good contempt values under what circumstances. For TCEC, it is as easy as asking the TCEC people to set contempt to a particular value. |
Thanks for the links! Looking at the original pull request for the new contempt, @snicolet said:
So that's an even higher gain against sf7 than your figure? (And a huge number!) |
Komodo has an UCI option called "UCI_AnalyseMode" with default value "false" and UCI option "Contempt" with a standard value of "10". As I understand it if a GUI is in analyzing mode, "UCI_AnalyseMode" is set to true and therefore Contempt becomes inactive. That solves the problem of users wanting an objective, neutral opinion from the engine without contempt when analyzing. |
Different question. Let's say we run a large tuning, so that we have the best values, then we apply the 'contempt-7' patch and re-run the same large optimization test. So my fear is that using a contempt-7 patch also for fishtest will steer SF development into a different chess style that can't be simply reverted resetting contempt to zero. Or even worse, to the need of a complete retuning of all values to compensate the unbalance created by having contempt-7 also in fishtest. so my idea is that for regular development and fishtest is better to use no contempt and just have 7 (or whatever) as a default for major releases or as an official accommodation. At least until all doubts are cleared. |
I recommend tests against Stockfish 6, as it will represent an average weaker engine well Can somebody do that? I'm not sure how to get Stochish 6 in GitHub. @syzygy1 : I do not understand problems with analysis mode. For me, an analysis mode works just as one move during the game. Could you please explain? |
Thanks for the test. Everything looks good. |
Results of the polls are pretty clear: people mainly want Contempt 20 in Premier Division: but only Contempt 7 in the Superfinal: So one of these should be default, and the other manually set when an occasion arises. |
Shall we open another pull request for default contempt value 20 ? |
Or open another poll ? 😊 |
Set the default contempt value of Stockfish to 20 centipawns. The contempt feature of Stockfish tries to prevent the engine from simplifying the position too quickly when it feels that it is very slightly behind, instead keeping the tension a little bit longer. Various tests in November 2017 have proved that our current imple- mentation works well against SF7 (which is about 130 Elo weaker than current master) and than the Elo gain is an increasing function of contempt, going (against SF7) from +0 Elo when contempt is set at zero centipawns, to +30 Elo when contempt is 40 centipawns. See pull request 1325 for details: official-stockfish/Stockfish#1325 This november discussion left open the decision of which "default" value for contempt we should use for Stockfish, taking into account the various uses ofStockfish (opening preparation for humans, computer online tournaments,analysis tool for web pages, human/computer play, etc). This pull request proposes to set the default contempt value of SF to twenty centipawns, which turns out to be the highest value which is not a regression against current master, as this seemed to be a good compromise between risk and safety. A couple of SPRT[-3..1] tests were done to bisect this value: Contempt 10: http://tests.stockfishchess.org/tests/view/5a5d42d20ebc5902977e2901 (PASSED) Contempt 15: http://tests.stockfishchess.org/tests/view/5a5d41740ebc5902977e28fa (PASSED) Contempt 20: http://tests.stockfishchess.org/tests/view/5a5d42060ebc5902977e28fc (PASSED) Contempt 25: http://tests.stockfishchess.org/tests/view/5a5d433f0ebc5902977e2904 (FAILED) Surprisingly, a test at "very long time control" hinted that using contempt 20 is not only be non-regressive against contempt 0, but may actually exhibit some small Elo gain, giving a likehood of superio- rity of 88.7% after 8500 games: VLTC: ELO: 2.28 +-3.7 (95%) LOS: 88.7% Total: 8521 W: 1096 L: 1040 D: 6385 http://tests.stockfishchess.org/tests/view/5a60b2820ebc590297b9b7e0 Finally, there was some concerns that a contempt value of 20 would be worse than a value of 7, but a test with 20000 games at STC was neutral: STC: ELO: 0.45 +-3.1 (95%) LOS: 61.2% Total: 20000 W: 4222 L: 4196 D: 11582 http://tests.stockfishchess.org/tests/view/5a64d2fd0ebc590297903868 See the comments in pull request 1361 for the long, nice discussion (180 entries :-)) leading to the decision to propose contempt 20 as the default value: official-stockfish/Stockfish#1361 Whether Stockfish should strictly adhere to the Komodo and Houdini semantics and add the UCI commands to force the contempt to be White in the so-called "analysis mode" is still under discussion, and may be or may not be the object of a future commit. Bench: 5571216
Let SF to pulverize and disintegrate all other engines. Contempt 20 looks fine for now! But perhaps another ultimate test is needed: Contempt ''20'' vs ''7'' at 60+0.6 th 1. |
Yes, at least 100K games at 60+0.6. But @mcostalba clearly stated that he also wants advantage against weaker engines to commit this. So, even if contempt 7 is slightly stronger at LTC, contempt 20 is the only serious candidate to be commited. |
I’m convinced C20 is best against weaker engines. Not 100% convince I’m in favor of committing anything however. It is along the lines of voodoo programming - voodoo in the sense we do really understand why it is better - it might be best today , will it be best after tomorrow or after 100 patches. Considering the resources it took to reach where we are now, I question if this the path we should be taking forward. And if it’s not the path we are going to take going forward why even take a step in that direction now. Monkeying around with contempt will probably always get you a few ELO but it does have limited upside potential in the long run. The real ELO gains are by submitting real patches and running real tests. Rather than finding what the ideal contempt value should be , we should be looking where are the weaknesses that let a contempt value other than 0 be strongest. I know my view is minority , but I would think it would be a disgrace if every 6 months we run all these simulations to find out what the contempt value should be. That will dramatically retard the increase in any future ELO gains. So if we commit C7 or C20 , than that should be it for at least another 2-3 years before it is revisited again. I do find that we are gleaning valuable information from all these tests -my concern is that I hope we do not over do it going forward. It should be only done once in a while - 2 to 3 years should be adequate. If that’s the case , then I’m fine with making a contempt setting other than zero and I would favor 20. |
Left out the word “not” in second sentence , “we do not really ...”. iPhone app does not allow me to edit. Sorry. |
Set the default contempt value of Stockfish to 20 centipawns. The contempt feature of Stockfish tries to prevent the engine from simplifying the position too quickly when it feels that it is very slightly behind, instead keeping the tension a little bit longer. Various tests in November 2017 have proved that our current imple- mentation works well against SF7 (which is about 130 Elo weaker than current master) and than the Elo gain is an increasing function of contempt, going (against SF7) from +0 Elo when contempt is set at zero centipawns, to +30 Elo when contempt is 40 centipawns. See pull request 1325 for details: #1325 This november discussion left open the decision of which "default" value for contempt we should use for Stockfish, taking into account the various uses ofStockfish (opening preparation for humans, computer online tournaments,analysis tool for web pages, human/computer play, etc). This pull request proposes to set the default contempt value of SF to twenty centipawns, which turns out to be the highest value which is not a regression against current master, as this seemed to be a good compromise between risk and safety. A couple of SPRT[-3..1] tests were done to bisect this value: Contempt 10: http://tests.stockfishchess.org/tests/view/5a5d42d20ebc5902977e2901 (PASSED) Contempt 15: http://tests.stockfishchess.org/tests/view/5a5d41740ebc5902977e28fa (PASSED) Contempt 20: http://tests.stockfishchess.org/tests/view/5a5d42060ebc5902977e28fc (PASSED) Contempt 25: http://tests.stockfishchess.org/tests/view/5a5d433f0ebc5902977e2904 (FAILED) Surprisingly, a test at "very long time control" hinted that using contempt 20 is not only be non-regressive against contempt 0, but may actually exhibit some small Elo gain, giving a likehood of superio- rity of 88.7% after 8500 games: VLTC: ELO: 2.28 +-3.7 (95%) LOS: 88.7% Total: 8521 W: 1096 L: 1040 D: 6385 http://tests.stockfishchess.org/tests/view/5a60b2820ebc590297b9b7e0 Finally, there was some concerns that a contempt value of 20 would be worse than a value of 7, but a test with 20000 games at STC was neutral: STC: ELO: 0.45 +-3.1 (95%) LOS: 61.2% Total: 20000 W: 4222 L: 4196 D: 11582 http://tests.stockfishchess.org/tests/view/5a64d2fd0ebc590297903868 See the comments in pull request 1361 for the long, nice discussion (180 entries :-)) leading to the decision to propose contempt 20 as the default value: #1361 Whether Stockfish should strictly adhere to the Komodo and Houdini semantics and add the UCI commands to force the contempt to be White in the so-called "analysis mode" is still under discussion, and may be or may not be the object of a future commit. Bench: 5783344
I have merged #1366, so closing this one. I'd like to thank all the people involved. Very impressive work and very good and deep discussion. I think this is one clear example of open source development that works as a real community effort. Congrat everybody! |
Set the default contempt value of Stockfish to 20 centipawns. The contempt feature of Stockfish tries to prevent the engine from simplifying the position too quickly when it feels that it is very slightly behind, instead keeping the tension a little bit longer. Various tests in November 2017 have proved that our current imple- mentation works well against SF7 (which is about 130 Elo weaker than current master) and than the Elo gain is an increasing function of contempt, going (against SF7) from +0 Elo when contempt is set at zero centipawns, to +30 Elo when contempt is 40 centipawns. See pull request 1325 for details: official-stockfish#1325 This november discussion left open the decision of which "default" value for contempt we should use for Stockfish, taking into account the various uses ofStockfish (opening preparation for humans, computer online tournaments,analysis tool for web pages, human/computer play, etc). This pull request proposes to set the default contempt value of SF to twenty centipawns, which turns out to be the highest value which is not a regression against current master, as this seemed to be a good compromise between risk and safety. A couple of SPRT[-3..1] tests were done to bisect this value: Contempt 10: http://tests.stockfishchess.org/tests/view/5a5d42d20ebc5902977e2901 (PASSED) Contempt 15: http://tests.stockfishchess.org/tests/view/5a5d41740ebc5902977e28fa (PASSED) Contempt 20: http://tests.stockfishchess.org/tests/view/5a5d42060ebc5902977e28fc (PASSED) Contempt 25: http://tests.stockfishchess.org/tests/view/5a5d433f0ebc5902977e2904 (FAILED) Surprisingly, a test at "very long time control" hinted that using contempt 20 is not only be non-regressive against contempt 0, but may actually exhibit some small Elo gain, giving a likehood of superio- rity of 88.7% after 8500 games: VLTC: ELO: 2.28 +-3.7 (95%) LOS: 88.7% Total: 8521 W: 1096 L: 1040 D: 6385 http://tests.stockfishchess.org/tests/view/5a60b2820ebc590297b9b7e0 Finally, there was some concerns that a contempt value of 20 would be worse than a value of 7, but a test with 20000 games at STC was neutral: STC: ELO: 0.45 +-3.1 (95%) LOS: 61.2% Total: 20000 W: 4222 L: 4196 D: 11582 http://tests.stockfishchess.org/tests/view/5a64d2fd0ebc590297903868 See the comments in pull request 1361 for the long, nice discussion (180 entries :-)) leading to the decision to propose contempt 20 as the default value: official-stockfish#1361 Whether Stockfish should strictly adhere to the Komodo and Houdini semantics and add the UCI commands to force the contempt to be White in the so-called "analysis mode" is still under discussion, and may be or may not be the object of a future commit. Bench: 5783344
Set the default contempt value of Stockfish to 20 centipawns. The contempt feature of Stockfish tries to prevent the engine from simplifying the position too quickly when it feels that it is very slightly behind, instead keeping the tension a little bit longer. Various tests in November 2017 have proved that our current imple- mentation works well against SF7 (which is about 130 Elo weaker than current master) and than the Elo gain is an increasing function of contempt, going (against SF7) from +0 Elo when contempt is set at zero centipawns, to +30 Elo when contempt is 40 centipawns. See pull request 1325 for details: official-stockfish/Stockfish#1325 This november discussion left open the decision of which "default" value for contempt we should use for Stockfish, taking into account the various uses ofStockfish (opening preparation for humans, computer online tournaments,analysis tool for web pages, human/computer play, etc). This pull request proposes to set the default contempt value of SF to twenty centipawns, which turns out to be the highest value which is not a regression against current master, as this seemed to be a good compromise between risk and safety. A couple of SPRT[-3..1] tests were done to bisect this value: Contempt 10: http://tests.stockfishchess.org/tests/view/5a5d42d20ebc5902977e2901 (PASSED) Contempt 15: http://tests.stockfishchess.org/tests/view/5a5d41740ebc5902977e28fa (PASSED) Contempt 20: http://tests.stockfishchess.org/tests/view/5a5d42060ebc5902977e28fc (PASSED) Contempt 25: http://tests.stockfishchess.org/tests/view/5a5d433f0ebc5902977e2904 (FAILED) Surprisingly, a test at "very long time control" hinted that using contempt 20 is not only be non-regressive against contempt 0, but may actually exhibit some small Elo gain, giving a likehood of superio- rity of 88.7% after 8500 games: VLTC: ELO: 2.28 +-3.7 (95%) LOS: 88.7% Total: 8521 W: 1096 L: 1040 D: 6385 http://tests.stockfishchess.org/tests/view/5a60b2820ebc590297b9b7e0 Finally, there was some concerns that a contempt value of 20 would be worse than a value of 7, but a test with 20000 games at STC was neutral: STC: ELO: 0.45 +-3.1 (95%) LOS: 61.2% Total: 20000 W: 4222 L: 4196 D: 11582 http://tests.stockfishchess.org/tests/view/5a64d2fd0ebc590297903868 See the comments in pull request 1361 for the long, nice discussion (180 entries :-)) leading to the decision to propose contempt 20 as the default value: official-stockfish/Stockfish#1361 Whether Stockfish should strictly adhere to the Komodo and Houdini semantics and add the UCI commands to force the contempt to be White in the so-called "analysis mode" is still under discussion, and may be or may not be the object of a future commit. Bench: 5783344
Set the default contempt value of Stockfish to 20 centipawns. The contempt feature of Stockfish tries to prevent the engine from simplifying the position too quickly when it feels that it is very slightly behind, instead keeping the tension a little bit longer. Various tests in November 2017 have proved that our current imple- mentation works well against SF7 (which is about 130 Elo weaker than current master) and than the Elo gain is an increasing function of contempt, going (against SF7) from +0 Elo when contempt is set at zero centipawns, to +30 Elo when contempt is 40 centipawns. See pull request 1325 for details: official-stockfish#1325 This november discussion left open the decision of which "default" value for contempt we should use for Stockfish, taking into account the various uses ofStockfish (opening preparation for humans, computer online tournaments,analysis tool for web pages, human/computer play, etc). This pull request proposes to set the default contempt value of SF to twenty centipawns, which turns out to be the highest value which is not a regression against current master, as this seemed to be a good compromise between risk and safety. A couple of SPRT[-3..1] tests were done to bisect this value: Contempt 10: http://tests.stockfishchess.org/tests/view/5a5d42d20ebc5902977e2901 (PASSED) Contempt 15: http://tests.stockfishchess.org/tests/view/5a5d41740ebc5902977e28fa (PASSED) Contempt 20: http://tests.stockfishchess.org/tests/view/5a5d42060ebc5902977e28fc (PASSED) Contempt 25: http://tests.stockfishchess.org/tests/view/5a5d433f0ebc5902977e2904 (FAILED) Surprisingly, a test at "very long time control" hinted that using contempt 20 is not only be non-regressive against contempt 0, but may actually exhibit some small Elo gain, giving a likehood of superio- rity of 88.7% after 8500 games: VLTC: ELO: 2.28 +-3.7 (95%) LOS: 88.7% Total: 8521 W: 1096 L: 1040 D: 6385 http://tests.stockfishchess.org/tests/view/5a60b2820ebc590297b9b7e0 Finally, there was some concerns that a contempt value of 20 would be worse than a value of 7, but a test with 20000 games at STC was neutral: STC: ELO: 0.45 +-3.1 (95%) LOS: 61.2% Total: 20000 W: 4222 L: 4196 D: 11582 http://tests.stockfishchess.org/tests/view/5a64d2fd0ebc590297903868 See the comments in pull request 1361 for the long, nice discussion (180 entries :-)) leading to the decision to propose contempt 20 as the default value: official-stockfish#1361 Whether Stockfish should strictly adhere to the Komodo and Houdini semantics and add the UCI commands to force the contempt to be White in the so-called "analysis mode" is still under discussion, and may be or may not be the object of a future commit. Bench: 5783344
Current master implements a scaling of the raw NNUE output value with a formula equivalent to 'eval = alpha * NNUE_output', where the scale factor alpha varies between 1.8 (for early middle game) and 0.9 (for pure endgames). This feature allows Stockfish to keep material on the board when she thinks she has the advantage, and to seek exchanges and simplifications when she thinks she has to defend. This patch slightly offsets the turning point between these two strategies, by adding to Stockfish's evaluation a small "optimism" value before actually doing the scaling. The effect is that SF will play a little bit more risky, trying to keep the tension a little bit longer when she is defending, and keeping even more material on the board when she has an advantage. We note that this patch is similar in spirit to the old "Contempt" idea we used to have in classical Stockfish, but this implementation differs in two key points: a) it has been tested as an Elo-gainer against master; b) the values output by the search are not changed on average by the implementation (in other words, the optimism value changes the tension/exchange strategy, but a displayed value of 1.0 pawn has the same signification before and after the patch). See the old comment official-stockfish/Stockfish#1361 (comment) for some images illustrating the ideas. ------- finished yellow at STC: LLR: -2.94 (-2.94,2.94) <0.00,2.50> Total: 165048 W: 41705 L: 41611 D: 81732 Ptnml(0-2): 565, 18959, 43245, 19327, 428 https://tests.stockfishchess.org/tests/view/61942a3dcd645dc8291c876b passed LTC: LLR: 2.95 (-2.94,2.94) <0.50,3.00> Total: 121656 W: 30762 L: 30287 D: 60607 Ptnml(0-2): 87, 12558, 35032, 13095, 56 https://tests.stockfishchess.org/tests/view/61962c58cd645dc8291c8877 ------- How to continue from there? a) the shape (slope and amplitude) of the sigmoid used to compute the optimism value could be tweaked to try to gain more Elo, so the parameters of the sigmoid function in line 391 of search.cpp could be tuned with SPSA. Manual tweaking is also possible using this Desmos page: https://www.desmos.com/calculator/jhh83sqq92 b) in a similar vein, with two recents patches affecting the scaling of the NNUE evaluation in evaluate.cpp, now could be a good time to try a round of SPSA tuning of the NNUE network; c) this patch will tend to keep tension in middlegame a little bit longer, so any patch improving the defensive aspect of play via search extensions in risky, tactical positions would be welcome. ------- closes ??? Bench: 6184852
Current master implements a scaling of the raw NNUE output value with a formula equivalent to 'eval = alpha * NNUE_output', where the scale factor alpha varies between 1.8 (for early middle game) and 0.9 (for pure endgames). This feature allows Stockfish to keep material on the board when she thinks she has the advantage, and to seek exchanges and simplifications when she thinks she has to defend. This patch slightly offsets the turning point between these two strategies, by adding to Stockfish's evaluation a small "optimism" value before actually doing the scaling. The effect is that SF will play a little bit more risky, trying to keep the tension a little bit longer when she is defending, and keeping even more material on the board when she has an advantage. We note that this patch is similar in spirit to the old "Contempt" idea we used to have in classical Stockfish, but this implementation differs in two key points: a) it has been tested as an Elo-gainer against master; b) the values output by the search are not changed on average by the implementation (in other words, the optimism value changes the tension/exchange strategy, but a displayed value of 1.0 pawn has the same signification before and after the patch). See the old comment official-stockfish/Stockfish#1361 (comment) for some images illustrating the ideas. ------- finished yellow at STC: LLR: -2.94 (-2.94,2.94) <0.00,2.50> Total: 165048 W: 41705 L: 41611 D: 81732 Ptnml(0-2): 565, 18959, 43245, 19327, 428 https://tests.stockfishchess.org/tests/view/61942a3dcd645dc8291c876b passed LTC: LLR: 2.95 (-2.94,2.94) <0.50,3.00> Total: 121656 W: 30762 L: 30287 D: 60607 Ptnml(0-2): 87, 12558, 35032, 13095, 56 https://tests.stockfishchess.org/tests/view/61962c58cd645dc8291c8877 ------- How to continue from there? a) the shape (slope and amplitude) of the sigmoid used to compute the optimism value could be tweaked to try to gain more Elo, so the parameters of the sigmoid function in line 391 of search.cpp could be tuned with SPSA. Manual tweaking is also possible using this Desmos page: https://www.desmos.com/calculator/jhh83sqq92 b) in a similar vein, with two recents patches affecting the scaling of the NNUE evaluation in evaluate.cpp, now could be a good time to try a round of SPSA tuning of the NNUE network; c) this patch will tend to keep tension in middlegame a little bit longer, so any patch improving the defensive aspect of play via search extensions in risky, tactical positions would be welcome. ------- closes ??? Bench: 6184852
Current master implements a scaling of the raw NNUE output value with a formula equivalent to 'eval = alpha * NNUE_output', where the scale factor alpha varies between 1.8 (for early middle game) and 0.9 (for pure endgames). This feature allows Stockfish to keep material on the board when she thinks she has the advantage, and to seek exchanges and simplifications when she thinks she has to defend. This patch slightly offsets the turning point between these two strategies, by adding to Stockfish's evaluation a small "optimism" value before actually doing the scaling. The effect is that SF will play a little bit more risky, trying to keep the tension a little bit longer when she is defending, and keeping even more material on the board when she has an advantage. We note that this patch is similar in spirit to the old "Contempt" idea we used to have in classical Stockfish, but this implementation differs in two key points: a) it has been tested as an Elo-gainer against master; b) the values output by the search are not changed on average by the implementation (in other words, the optimism value changes the tension/exchange strategy, but a displayed value of 1.0 pawn has the same signification before and after the patch). See the old comment official-stockfish/Stockfish#1361 (comment) for some images illustrating the ideas. ------- finished yellow at STC: LLR: -2.94 (-2.94,2.94) <0.00,2.50> Total: 165048 W: 41705 L: 41611 D: 81732 Ptnml(0-2): 565, 18959, 43245, 19327, 428 https://tests.stockfishchess.org/tests/view/61942a3dcd645dc8291c876b passed LTC: LLR: 2.95 (-2.94,2.94) <0.50,3.00> Total: 121656 W: 30762 L: 30287 D: 60607 Ptnml(0-2): 87, 12558, 35032, 13095, 56 https://tests.stockfishchess.org/tests/view/61962c58cd645dc8291c8877 ------- How to continue from there? a) the shape (slope and amplitude) of the sigmoid used to compute the optimism value could be tweaked to try to gain more Elo, so the parameters of the sigmoid function in line 391 of search.cpp could be tuned with SPSA. Manual tweaking is also possible using this Desmos page: https://www.desmos.com/calculator/jhh83sqq92 b) in a similar vein, with two recents patches affecting the scaling of the NNUE evaluation in evaluate.cpp, now could be a good time to try a round of SPSA tuning of the NNUE network; c) this patch will tend to keep tension in middlegame a little bit longer, so any patch improving the defensive aspect of play via search extensions in risky, tactical positions would be welcome. ------- closes official-stockfish/Stockfish#3797 Bench: 6184852
Current master implements a scaling of the raw NNUE output value with a formula equivalent to 'eval = alpha * NNUE_output', where the scale factor alpha varies between 1.8 (for early middle game) and 0.9 (for pure endgames). This feature allows Stockfish to keep material on the board when she thinks she has the advantage, and to seek exchanges and simplifications when she thinks she has to defend. This patch slightly offsets the turning point between these two strategies, by adding to Stockfish's evaluation a small "optimism" value before actually doing the scaling. The effect is that SF will play a little bit more risky, trying to keep the tension a little bit longer when she is defending, and keeping even more material on the board when she has an advantage. We note that this patch is similar in spirit to the old "Contempt" idea we used to have in classical Stockfish, but this implementation differs in two key points: a) it has been tested as an Elo-gainer against master; b) the values output by the search are not changed on average by the implementation (in other words, the optimism value changes the tension/exchange strategy, but a displayed value of 1.0 pawn has the same signification before and after the patch). See the old comment official-stockfish/Stockfish#1361 (comment) for some images illustrating the ideas. ------- finished yellow at STC: LLR: -2.94 (-2.94,2.94) <0.00,2.50> Total: 165048 W: 41705 L: 41611 D: 81732 Ptnml(0-2): 565, 18959, 43245, 19327, 428 https://tests.stockfishchess.org/tests/view/61942a3dcd645dc8291c876b passed LTC: LLR: 2.95 (-2.94,2.94) <0.50,3.00> Total: 121656 W: 30762 L: 30287 D: 60607 Ptnml(0-2): 87, 12558, 35032, 13095, 56 https://tests.stockfishchess.org/tests/view/61962c58cd645dc8291c8877 ------- How to continue from there? a) the shape (slope and amplitude) of the sigmoid used to compute the optimism value could be tweaked to try to gain more Elo, so the parameters of the sigmoid function in line 391 of search.cpp could be tuned with SPSA. Manual tweaking is also possible using this Desmos page: https://www.desmos.com/calculator/jhh83sqq92 b) in a similar vein, with two recents patches affecting the scaling of the NNUE evaluation in evaluate.cpp, now could be a good time to try a round of SPSA tuning of the NNUE network; c) this patch will tend to keep tension in middlegame a little bit longer, so any patch improving the defensive aspect of play via search extensions in risky, tactical positions would be welcome. ------- closes official-stockfish/Stockfish#3797 Bench: 6184852
Current master implements a scaling of the raw NNUE output value with a formula equivalent to 'eval = alpha * NNUE_output', where the scale factor alpha varies between 1.8 (for early middle game) and 0.9 (for pure endgames). This feature allows Stockfish to keep material on the board when she thinks she has the advantage, and to seek exchanges and simplifications when she thinks she has to defend. This patch slightly offsets the turning point between these two strategies, by adding to Stockfish's evaluation a small "optimism" value before actually doing the scaling. The effect is that SF will play a little bit more risky, trying to keep the tension a little bit longer when she is defending, and keeping even more material on the board when she has an advantage. We note that this patch is similar in spirit to the old "Contempt" idea we used to have in classical Stockfish, but this implementation differs in two key points: a) it has been tested as an Elo-gainer against master; b) the values output by the search are not changed on average by the implementation (in other words, the optimism value changes the tension/exchange strategy, but a displayed value of 1.0 pawn has the same signification before and after the patch). See the old comment official-stockfish/Stockfish#1361 (comment) for some images illustrating the ideas. ------- finished yellow at STC: LLR: -2.94 (-2.94,2.94) <0.00,2.50> Total: 165048 W: 41705 L: 41611 D: 81732 Ptnml(0-2): 565, 18959, 43245, 19327, 428 https://tests.stockfishchess.org/tests/view/61942a3dcd645dc8291c876b passed LTC: LLR: 2.95 (-2.94,2.94) <0.50,3.00> Total: 121656 W: 30762 L: 30287 D: 60607 Ptnml(0-2): 87, 12558, 35032, 13095, 56 https://tests.stockfishchess.org/tests/view/61962c58cd645dc8291c8877 ------- How to continue from there? a) the shape (slope and amplitude) of the sigmoid used to compute the optimism value could be tweaked to try to gain more Elo, so the parameters of the sigmoid function in line 391 of search.cpp could be tuned with SPSA. Manual tweaking is also possible using this Desmos page: https://www.desmos.com/calculator/jhh83sqq92 b) in a similar vein, with two recents patches affecting the scaling of the NNUE evaluation in evaluate.cpp, now could be a good time to try a round of SPSA tuning of the NNUE network; c) this patch will tend to keep tension in middlegame a little bit longer, so any patch improving the defensive aspect of play via search extensions in risky, tactical positions would be welcome. ------- closes official-stockfish/Stockfish#3797 Bench: 6184852
Current master implements a scaling of the raw NNUE output value with a formula equivalent to 'eval = alpha * NNUE_output', where the scale factor alpha varies between 1.8 (for early middle game) and 0.9 (for pure endgames). This feature allows Stockfish to keep material on the board when she thinks she has the advantage, and to seek exchanges and simplifications when she thinks she has to defend. This patch slightly offsets the turning point between these two strategies, by adding to Stockfish's evaluation a small "optimism" value before actually doing the scaling. The effect is that SF will play a little bit more risky, trying to keep the tension a little bit longer when she is defending, and keeping even more material on the board when she has an advantage. We note that this patch is similar in spirit to the old "Contempt" idea we used to have in classical Stockfish, but this implementation differs in two key points: a) it has been tested as an Elo-gainer against master; b) the values output by the search are not changed on average by the implementation (in other words, the optimism value changes the tension/exchange strategy, but a displayed value of 1.0 pawn has the same signification before and after the patch). See the old comment official-stockfish/Stockfish#1361 (comment) for some images illustrating the ideas. ------- finished yellow at STC: LLR: -2.94 (-2.94,2.94) <0.00,2.50> Total: 165048 W: 41705 L: 41611 D: 81732 Ptnml(0-2): 565, 18959, 43245, 19327, 428 https://tests.stockfishchess.org/tests/view/61942a3dcd645dc8291c876b passed LTC: LLR: 2.95 (-2.94,2.94) <0.50,3.00> Total: 121656 W: 30762 L: 30287 D: 60607 Ptnml(0-2): 87, 12558, 35032, 13095, 56 https://tests.stockfishchess.org/tests/view/61962c58cd645dc8291c8877 ------- How to continue from there? a) the shape (slope and amplitude) of the sigmoid used to compute the optimism value could be tweaked to try to gain more Elo, so the parameters of the sigmoid function in line 391 of search.cpp could be tuned with SPSA. Manual tweaking is also possible using this Desmos page: https://www.desmos.com/calculator/jhh83sqq92 b) in a similar vein, with two recents patches affecting the scaling of the NNUE evaluation in evaluate.cpp, now could be a good time to try a round of SPSA tuning of the NNUE network; c) this patch will tend to keep tension in middlegame a little bit longer, so any patch improving the defensive aspect of play via search extensions in risky, tactical positions would be welcome. ------- closes official-stockfish/Stockfish#3797 Bench: 6184852
Current master implements a scaling of the raw NNUE output value with a formula equivalent to 'eval = alpha * NNUE_output', where the scale factor alpha varies between 1.8 (for early middle game) and 0.9 (for pure endgames). This feature allows Stockfish to keep material on the board when she thinks she has the advantage, and to seek exchanges and simplifications when she thinks she has to defend. This patch slightly offsets the turning point between these two strategies, by adding to Stockfish's evaluation a small "optimism" value before actually doing the scaling. The effect is that SF will play a little bit more risky, trying to keep the tension a little bit longer when she is defending, and keeping even more material on the board when she has an advantage. We note that this patch is similar in spirit to the old "Contempt" idea we used to have in classical Stockfish, but this implementation differs in two key points: a) it has been tested as an Elo-gainer against master; b) the values output by the search are not changed on average by the implementation (in other words, the optimism value changes the tension/exchange strategy, but a displayed value of 1.0 pawn has the same signification before and after the patch). See the old comment official-stockfish/Stockfish#1361 (comment) for some images illustrating the ideas. ------- finished yellow at STC: LLR: -2.94 (-2.94,2.94) <0.00,2.50> Total: 165048 W: 41705 L: 41611 D: 81732 Ptnml(0-2): 565, 18959, 43245, 19327, 428 https://tests.stockfishchess.org/tests/view/61942a3dcd645dc8291c876b passed LTC: LLR: 2.95 (-2.94,2.94) <0.50,3.00> Total: 121656 W: 30762 L: 30287 D: 60607 Ptnml(0-2): 87, 12558, 35032, 13095, 56 https://tests.stockfishchess.org/tests/view/61962c58cd645dc8291c8877 ------- How to continue from there? a) the shape (slope and amplitude) of the sigmoid used to compute the optimism value could be tweaked to try to gain more Elo, so the parameters of the sigmoid function in line 391 of search.cpp could be tuned with SPSA. Manual tweaking is also possible using this Desmos page: https://www.desmos.com/calculator/jhh83sqq92 b) in a similar vein, with two recents patches affecting the scaling of the NNUE evaluation in evaluate.cpp, now could be a good time to try a round of SPSA tuning of the NNUE network; c) this patch will tend to keep tension in middlegame a little bit longer, so any patch improving the defensive aspect of play via search extensions in risky, tactical positions would be welcome. ------- closes official-stockfish/Stockfish#3797 Bench: 6184852
Current master implements a scaling of the raw NNUE output value with a formula equivalent to 'eval = alpha * NNUE_output', where the scale factor alpha varies between 1.8 (for early middle game) and 0.9 (for pure endgames). This feature allows Stockfish to keep material on the board when she thinks she has the advantage, and to seek exchanges and simplifications when she thinks she has to defend. This patch slightly offsets the turning point between these two strategies, by adding to Stockfish's evaluation a small "optimism" value before actually doing the scaling. The effect is that SF will play a little bit more risky, trying to keep the tension a little bit longer when she is defending, and keeping even more material on the board when she has an advantage. We note that this patch is similar in spirit to the old "Contempt" idea we used to have in classical Stockfish, but this implementation differs in two key points: a) it has been tested as an Elo-gainer against master; b) the values output by the search are not changed on average by the implementation (in other words, the optimism value changes the tension/exchange strategy, but a displayed value of 1.0 pawn has the same signification before and after the patch). See the old comment official-stockfish/Stockfish#1361 (comment) for some images illustrating the ideas. ------- finished yellow at STC: LLR: -2.94 (-2.94,2.94) <0.00,2.50> Total: 165048 W: 41705 L: 41611 D: 81732 Ptnml(0-2): 565, 18959, 43245, 19327, 428 https://tests.stockfishchess.org/tests/view/61942a3dcd645dc8291c876b passed LTC: LLR: 2.95 (-2.94,2.94) <0.50,3.00> Total: 121656 W: 30762 L: 30287 D: 60607 Ptnml(0-2): 87, 12558, 35032, 13095, 56 https://tests.stockfishchess.org/tests/view/61962c58cd645dc8291c8877 ------- How to continue from there? a) the shape (slope and amplitude) of the sigmoid used to compute the optimism value could be tweaked to try to gain more Elo, so the parameters of the sigmoid function in line 391 of search.cpp could be tuned with SPSA. Manual tweaking is also possible using this Desmos page: https://www.desmos.com/calculator/jhh83sqq92 b) in a similar vein, with two recents patches affecting the scaling of the NNUE evaluation in evaluate.cpp, now could be a good time to try a round of SPSA tuning of the NNUE network; c) this patch will tend to keep tension in middlegame a little bit longer, so any patch improving the defensive aspect of play via search extensions in risky, tactical positions would be welcome. ------- closes official-stockfish/Stockfish#3797 Bench: 6184852
Current master implements a scaling of the raw NNUE output value with a formula equivalent to 'eval = alpha * NNUE_output', where the scale factor alpha varies between 1.8 (for early middle game) and 0.9 (for pure endgames). This feature allows Stockfish to keep material on the board when she thinks she has the advantage, and to seek exchanges and simplifications when she thinks she has to defend. This patch slightly offsets the turning point between these two strategies, by adding to Stockfish's evaluation a small "optimism" value before actually doing the scaling. The effect is that SF will play a little bit more risky, trying to keep the tension a little bit longer when she is defending, and keeping even more material on the board when she has an advantage. We note that this patch is similar in spirit to the old "Contempt" idea we used to have in classical Stockfish, but this implementation differs in two key points: a) it has been tested as an Elo-gainer against master; b) the values output by the search are not changed on average by the implementation (in other words, the optimism value changes the tension/exchange strategy, but a displayed value of 1.0 pawn has the same signification before and after the patch). See the old comment official-stockfish/Stockfish#1361 (comment) for some images illustrating the ideas. ------- finished yellow at STC: LLR: -2.94 (-2.94,2.94) <0.00,2.50> Total: 165048 W: 41705 L: 41611 D: 81732 Ptnml(0-2): 565, 18959, 43245, 19327, 428 https://tests.stockfishchess.org/tests/view/61942a3dcd645dc8291c876b passed LTC: LLR: 2.95 (-2.94,2.94) <0.50,3.00> Total: 121656 W: 30762 L: 30287 D: 60607 Ptnml(0-2): 87, 12558, 35032, 13095, 56 https://tests.stockfishchess.org/tests/view/61962c58cd645dc8291c8877 ------- How to continue from there? a) the shape (slope and amplitude) of the sigmoid used to compute the optimism value could be tweaked to try to gain more Elo, so the parameters of the sigmoid function in line 391 of search.cpp could be tuned with SPSA. Manual tweaking is also possible using this Desmos page: https://www.desmos.com/calculator/jhh83sqq92 b) in a similar vein, with two recents patches affecting the scaling of the NNUE evaluation in evaluate.cpp, now could be a good time to try a round of SPSA tuning of the NNUE network; c) this patch will tend to keep tension in middlegame a little bit longer, so any patch improving the defensive aspect of play via search extensions in risky, tactical positions would be welcome. ------- closes official-stockfish/Stockfish#3797 Bench: 6184852
Current master implements a scaling of the raw NNUE output value with a formula equivalent to 'eval = alpha * NNUE_output', where the scale factor alpha varies between 1.8 (for early middle game) and 0.9 (for pure endgames). This feature allows Stockfish to keep material on the board when she thinks she has the advantage, and to seek exchanges and simplifications when she thinks she has to defend. This patch slightly offsets the turning point between these two strategies, by adding to Stockfish's evaluation a small "optimism" value before actually doing the scaling. The effect is that SF will play a little bit more risky, trying to keep the tension a little bit longer when she is defending, and keeping even more material on the board when she has an advantage. We note that this patch is similar in spirit to the old "Contempt" idea we used to have in classical Stockfish, but this implementation differs in two key points: a) it has been tested as an Elo-gainer against master; b) the values output by the search are not changed on average by the implementation (in other words, the optimism value changes the tension/exchange strategy, but a displayed value of 1.0 pawn has the same signification before and after the patch). See the old comment official-stockfish/Stockfish#1361 (comment) for some images illustrating the ideas. ------- finished yellow at STC: LLR: -2.94 (-2.94,2.94) <0.00,2.50> Total: 165048 W: 41705 L: 41611 D: 81732 Ptnml(0-2): 565, 18959, 43245, 19327, 428 https://tests.stockfishchess.org/tests/view/61942a3dcd645dc8291c876b passed LTC: LLR: 2.95 (-2.94,2.94) <0.50,3.00> Total: 121656 W: 30762 L: 30287 D: 60607 Ptnml(0-2): 87, 12558, 35032, 13095, 56 https://tests.stockfishchess.org/tests/view/61962c58cd645dc8291c8877 ------- How to continue from there? a) the shape (slope and amplitude) of the sigmoid used to compute the optimism value could be tweaked to try to gain more Elo, so the parameters of the sigmoid function in line 391 of search.cpp could be tuned with SPSA. Manual tweaking is also possible using this Desmos page: https://www.desmos.com/calculator/jhh83sqq92 b) in a similar vein, with two recents patches affecting the scaling of the NNUE evaluation in evaluate.cpp, now could be a good time to try a round of SPSA tuning of the NNUE network; c) this patch will tend to keep tension in middlegame a little bit longer, so any patch improving the defensive aspect of play via search extensions in risky, tactical positions would be welcome. ------- closes official-stockfish/Stockfish#3797 Bench: 6184852
STC:
LLR: 2.95 (-2.94,2.94) [-3.00,1.00]
Total: 50665 W: 9813 L: 9745 D: 31107
LTC:
LLR: 2.96 (-2.94,2.94) [-3.00,1.00]
Total: 13045 W: 1834 L: 1703 D: 9508
Contempt 4 tests were also good:
http://tests.stockfishchess.org/tests/view/5a512eea0ebc590ccbb8c723
http://tests.stockfishchess.org/tests/view/5a5205000ebc590ccbb8c762
Contempt 10 tests were also good:
http://tests.stockfishchess.org/tests/view/5a5227410ebc590ccbb8c76f
http://tests.stockfishchess.org/tests/view/5a550fac0ebc590296938a24
For safety reasons, it seems the best to use the medium value (7),
where tests anyway showed the greatest gain.
There is no an obvious Elo gain here, but it should be helpful against weaker engines.
Bench 5494441