-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VLTC tuning #4154
VLTC tuning #4154
Conversation
I honestly prefer bigpenor version which seems to do much better at VLTC. |
Are the parameters you want to change related? The guidelines on Fishtest wiki allow only a combo of at most 2 patches, which are trivial. In my opinion, changes in bunch of unrelated parameters should not be accepted in one patch. It's much better to tweak related parameters in smaller sets and, if they pass tests, proceed to the next set. |
Generally speaking everything in search is related. |
Then if the problem is switching to another local maximum while changing a bunch of parameters, the elo gain should be considerable and passing just standard SPRT tests is not enough. |
@atumanian Tuning many parameters together has always been accepted and is not considered a combo. |
Obviously, I thank bigpenor a lot for his efforts, and he is the one who inspired my recent tests.
Again, I'm not against accepting bigpenor patch, I am ok with either one passing, I am just excited that one of these 2 patches gets accepted fast so I can begin the next generation of tuning on top of the passed patch. Do you think it's beneficial if we run a head-to-head match between mine and bigpenor patch at a specific time control? |
Then what is considered a combo? |
This is what Fishtest wiki says: Parameter tweaks testsAmong parameter tweaks a special sub-case is the so called union patch or combo patch, that is a bundling of patches that failed SPRT but with positive or near positive score. Sometime retesting the union as a whole passes SPRT. Due to the nature of the approach and because of each individual patch failed already, a union has some constraints:
|
If you are asking me, I'm not against such a match. But, in my opinion, 2 elo increase shouldn't be enough to accept such big changes. If it were 4 elo, then things would be different. |
@atumanian A combo is a merger of two patches that were created separately. They need to be simple and so parameter tweaks are often good candidates but other simple patches qualify as well. If many parameters are tuned together it is one patch and not a combo. |
Isn't it the lines of code changed that affect the engine's behaviour and developers' work? Why should the method of creation of a patch matter in deciding whether to commit it? |
Unlike most patches, tuning parameters doesn't increase the complexity of the codebase/engine. There is no reason to be reluctant to update parameters if testing show it to be beneficial. The only reason to limit combo testing of parameters is to avoid spamming dozens of minor variations testing the same thing again and again, using a lot of resources until maybe a false positive lucks out. This is irrelevant here. |
Anyway I think vondele prefers if all the commits are squished. |
It's not only about comparing the complexity before and after a change. It's about the complexity of the change itself. When you change a lot of code, other developers my be working on the same parts of code. When they do some testing, and the tested code changes, the results become less relevant. Your patch may even completely invalidate the work they have done so far. |
@atumanian I don't know why you refuse to believe what everyone is telling you but here you go #1008 (comment) straight from a maintainer. |
there were like dozen commits like this. And this patch is actually good for development because we've been stuck in local search maximum for a while. Maybe with tuning we can go some stuff better. |
yes, no objects to this kind of patch from my side. However, I'd like to see the PR (with associated test results) by @candirufish before merging one of these tunes. |
So you think that changing every search and evaluation parameter in one patch can be commited with the same tests as changing of just one parameter? |
The patch Marco refers there is different from the present one. That match patches a set of related parameters. It's just 3 tables and at least 2 of them are closely related. In contrast, the present patch changes a set of many unrelated of loosely related parameters. |
I'm happy to make those calls as needed, from time to time those exceptional PRs are made. I will request additional tests as I see fit. That's my role as maintainer... |
What calls are you referring to? |
make the call means make the decision, in this context... |
Ok. Some colleagues here think that changes any set of parameters require only the standard SPRT tests, just as with one parameter. I disagree with this position are think that more complex changes even in this context should require stricter testing criteria. I'm interested in your position on this matter. |
No, let's not do this wordplay. |
Last time - combo of functional patches is not really allowed because they can all gain like 0,1 elo and be simplified away one by one. Which will lead to endless cycle of merging patch and then removing it parts. This is the main reason why functional combos are not allowed. It's a resource wasting blackhole. |
Your statement is not true. As seen from the discussion of #1008 some members, e.g. @stockfishdeveloper are critical of even a simpler patch. And I don't repeat like a parrot - I try to find words so that my ideas are better understood. |
@vondele I did post the test results in my PR, I think you are asking candirufish also to open a PR with his patch right? |
correct, if we have both PRs maybe we need some additional tests. |
since we have no second PR, I'll proceed merging this one. |
Tuning some parameters that scale very well the longer the time control is:
Failed STC:
https://tests.stockfishchess.org/tests/view/6313424d8202a039920e130a
LLR: -2.94 (-2.94,2.94) <-1.75,0.25>
Total: 42680 W: 11231 L: 11540 D: 19909
Ptnml(0-2): 191, 5008, 11232, 4737, 172
Passed LTC:
https://tests.stockfishchess.org/tests/view/6311e2cd874169ca52ae7933
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 53448 W: 14782 L: 14437 D: 24229
Ptnml(0-2): 101, 5214, 15740, 5577, 92
Passed VLTC:
https://tests.stockfishchess.org/tests/view/6312530cfa99a92e3002c927
LLR: 2.95 (-2.94,2.94) <0.50,2.50>
Total: 123336 W: 33465 L: 33007 D: 56864
Ptnml(0-2): 38, 11466, 38204, 11920, 40