VLTC tuning #4154

FauziAkram · 2022-09-04T10:17:51Z

Tuning some parameters that scale very well the longer the time control is:

Failed STC:
https://tests.stockfishchess.org/tests/view/6313424d8202a039920e130a
LLR: -2.94 (-2.94,2.94) <-1.75,0.25>
Total: 42680 W: 11231 L: 11540 D: 19909
Ptnml(0-2): 191, 5008, 11232, 4737, 172

Passed LTC:
https://tests.stockfishchess.org/tests/view/6311e2cd874169ca52ae7933
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 53448 W: 14782 L: 14437 D: 24229
Ptnml(0-2): 101, 5214, 15740, 5577, 92

Passed VLTC:
https://tests.stockfishchess.org/tests/view/6312530cfa99a92e3002c927
LLR: 2.95 (-2.94,2.94) <0.50,2.50>
Total: 123336 W: 33465 L: 33007 D: 56864
Ptnml(0-2): 38, 11466, 38204, 11920, 40

bench:5609606

Vizvezdenec · 2022-09-04T10:23:03Z

I honestly prefer bigpenor version which seems to do much better at VLTC.

atumanian · 2022-09-04T19:13:12Z

Are the parameters you want to change related? The guidelines on Fishtest wiki allow only a combo of at most 2 patches, which are trivial. In my opinion, changes in bunch of unrelated parameters should not be accepted in one patch. It's much better to tweak related parameters in smaller sets and, if they pass tests, proceed to the next set.
But nevertheless, thank you the effort you put for the project.

Vizvezdenec · 2022-09-04T19:49:32Z

Generally speaking everything in search is related.
Approach of tuning massive number of search constants that allows us to jump from one local maximum to another local maximum and find some parts of code that scale well simultaneously is proven to be beneficial.
It couldn't be done with 1-2 param changes since you need 60 parameters changed to switch local maximums.
Recent examples:
b0b3155
84b1940
3ec6e1d

atumanian · 2022-09-04T20:01:42Z

Then if the problem is switching to another local maximum while changing a bunch of parameters, the elo gain should be considerable and passing just standard SPRT tests is not enough.
The problem with such patches is that they are more likely to conflict or have adverse effects when combined with what other developers do at the same time.
If all the parameters if the engine can be changed at once with a standard SPRT test, for what are the guidelines that prohibit combining more than 2 trivial patches?

mstembera · 2022-09-04T21:00:23Z

@atumanian Tuning many parameters together has always been accepted and is not considered a combo.

FauziAkram · 2022-09-04T21:12:27Z

m is switching to another local maximum while changing a bunch of parameters, the elo gain should be considerable and passing just standard SPRT tests is not enough.
The problem with such patches is that they are more likely to conflict or have adverse effects when combined with what other developers do at the same time.
If all the parameters if the engine can be changed at once with a standard SPRT test, for what are the guidelines that prohibit combining more than 2 trivial patches?

Obviously, I thank bigpenor a lot for his efforts, and he is the one who inspired my recent tests.
However, the arguments I can do for this patch, in comparison with bigpenor patch.

I think that this patch produces very similar results but only 16 parameter changes, while the other patch has around 120 parameter changes.
After including the simplifications in his patch, it looks like it gains only around 2.5 Elo on VLTC, and my patch should bring around 1.5 Elo on VLTC, and 2 Elo on LTC.

Again, I'm not against accepting bigpenor patch, I am ok with either one passing, I am just excited that one of these 2 patches gets accepted fast so I can begin the next generation of tuning on top of the passed patch.

Do you think it's beneficial if we run a head-to-head match between mine and bigpenor patch at a specific time control?

atumanian · 2022-09-04T21:39:28Z

@atumanian Tuning many parameters together has always been accepted and is not considered a combo.

Then what is considered a combo?
By the way, tuning and accepting a patch are different things. I'm not against tuning many parameters at once. But they should past stricter tests to be accepted.

atumanian · 2022-09-04T21:49:26Z

This is what Fishtest wiki says:

Parameter tweaks tests

Among parameter tweaks a special sub-case is the so called union patch or combo patch, that is a bundling of patches that failed SPRT but with positive or near positive score. Sometime retesting the union as a whole passes SPRT. Due to the nature of the approach and because of each individual patch failed already, a union has some constraints:

Maximum 2 patches per union
Each patch shall be trivial, like a parameter tweak. Patches that add/remove a concept/idea/feature shall pass individually.

atumanian · 2022-09-04T21:54:58Z

Do you think it's beneficial if we run a head-to-head match between mine and bigpenor patch at a specific time control?

If you are asking me, I'm not against such a match. But, in my opinion, 2 elo increase shouldn't be enough to accept such big changes. If it were 4 elo, then things would be different.

mstembera · 2022-09-04T22:03:03Z

@atumanian A combo is a merger of two patches that were created separately. They need to be simple and so parameter tweaks are often good candidates but other simple patches qualify as well. If many parameters are tuned together it is one patch and not a combo.

atumanian · 2022-09-05T12:21:31Z

@atumanian A combo is a merger of two patches that were created separately. They need to be simple and so parameter tweaks are often good candidates but other simple patches qualify as well. If many parameters are tuned together it is one patch and not a combo.

Isn't it the lines of code changed that affect the engine's behaviour and developers' work? Why should the method of creation of a patch matter in deciding whether to commit it?

Alayan-stk-2 · 2022-09-05T12:54:44Z

Unlike most patches, tuning parameters doesn't increase the complexity of the codebase/engine. There is no reason to be reluctant to update parameters if testing show it to be beneficial.

The only reason to limit combo testing of parameters is to avoid spamming dozens of minor variations testing the same thing again and again, using a lot of resources until maybe a false positive lucks out. This is irrelevant here.

Disservin · 2022-09-05T12:57:27Z

Anyway I think vondele prefers if all the commits are squished.

atumanian · 2022-09-05T17:13:05Z

Unlike most patches, tuning parameters doesn't increase the complexity of the codebase/engine. There is no reason to be reluctant to update parameters if testing show it to be beneficial.

It's not only about comparing the complexity before and after a change. It's about the complexity of the change itself. When you change a lot of code, other developers my be working on the same parts of code. When they do some testing, and the tested code changes, the results become less relevant. Your patch may even completely invalidate the work they have done so far.
Consider an extreme example. Suppose I want to completely rewrite the body of the search function. My code is 1 line smaller than the current code, so my patch if even a simplification by your criteria. Suppose my patch adds 1 elo. But what happens with the work others have put into trying to improve the current code of the function?
That's why the more complex are the changes, the stricter tests they must pass. More complex changes create more inconvenience for the developers working on the same parts of the code. Such changes need to provide more benefit as a compensation for the inconvenience.

mstembera · 2022-09-05T22:05:54Z

@atumanian I don't know why you refuse to believe what everyone is telling you but here you go #1008 (comment) straight from a maintainer.

Vizvezdenec · 2022-09-06T01:29:20Z

there were like dozen commits like this. And this patch is actually good for development because we've been stuck in local search maximum for a while. Maybe with tuning we can go some stuff better.

vondele · 2022-09-06T05:03:13Z

yes, no objects to this kind of patch from my side. However, I'd like to see the PR (with associated test results) by @candirufish before merging one of these tunes.

atumanian · 2022-09-06T11:02:35Z

yes, no objects to this kind of patch from my side. However, I'd like to see the PR (with associated test results) by @candirufish before merging one of these tunes.

So you think that changing every search and evaluation parameter in one patch can be commited with the same tests as changing of just one parameter?

atumanian · 2022-09-06T11:06:53Z

@atumanian I don't know why you refuse to believe what everyone is telling you but here you go #1008 (comment) straight from a maintainer.

The patch Marco refers there is different from the present one. That match patches a set of related parameters. It's just 3 tables and at least 2 of them are closely related. In contrast, the present patch changes a set of many unrelated of loosely related parameters.
Anyway, my opinion is not decisive, since I'm not a maintainer.

vondele · 2022-09-06T11:10:22Z

So you think that changing every search and evaluation parameter in one patch can be commited with the same tests as changing of just one parameter?

I'm happy to make those calls as needed, from time to time those exceptional PRs are made. I will request additional tests as I see fit. That's my role as maintainer...

atumanian · 2022-09-06T11:12:45Z

I'm happy to make those calls as needed, from time to time those exceptional PRs are made. I will request additional tests as I see fit. That's my role as maintainer...

What calls are you referring to?

vondele · 2022-09-06T11:15:30Z

make the call means make the decision, in this context...

atumanian · 2022-09-06T11:21:31Z

make the call means make the decision, in this context...

Ok. Some colleagues here think that changes any set of parameters require only the standard SPRT tests, just as with one parameter. I disagree with this position are think that more complex changes even in this context should require stricter testing criteria. I'm interested in your position on this matter.

Vizvezdenec · 2022-09-06T11:26:36Z

No, let's not do this wordplay.
Not "some colleagues".
Every single sf contributor that posted anything in PRs like this has no problem with this apart from you, period.
I'm extremely tired of reading this because you produce nothing new and repeat the same phrase in different variations like a parrot.

Vizvezdenec · 2022-09-06T11:29:18Z

Last time - combo of functional patches is not really allowed because they can all gain like 0,1 elo and be simplified away one by one. Which will lead to endless cycle of merging patch and then removing it parts. This is the main reason why functional combos are not allowed. It's a resource wasting blackhole.
Which is not the case for param tweaks since you can't "simplify away" param tweaks.

atumanian · 2022-09-06T11:39:04Z

No, let's not do this wordplay. Not "some colleagues". Every single sf contributor that posted anything in PRs like this has no problem with this apart from you, period. I'm extremely tired of reading this because you produce nothing new and repeat the same phrase in different variations like a parrot.

Your statement is not true. As seen from the discussion of #1008 some members, e.g. @stockfishdeveloper are critical of even a simpler patch. And I don't repeat like a parrot - I try to find words so that my ideas are better understood.

FauziAkram · 2022-09-06T15:54:03Z

to this kind of patch from my side. However, I'd like to see the PR (with associated test results) by @candirufish before merging one of these tunes.

@vondele I did post the test results in my PR, I think you are asking candirufish also to open a PR with his patch right?

vondele · 2022-09-06T16:13:27Z

@vondele I did post the test results in my PR, I think you are asking candirufish also to open a PR with his patch right?

correct, if we have both PRs maybe we need some additional tests.

vondele · 2022-09-07T05:33:53Z

since we have no second PR, I'll proceed merging this one.

FauziAkram added 5 commits August 31, 2022 14:45

Update search.cpp

45ebbea

Update search.cpp

01fc846

Update search.cpp

de8a095

Update AUTHORS

fb8e958

bench:5609606

3a20ca9

bench:5609606

vondele added the to be merged Will be merged shortly label Sep 7, 2022

vondele closed this in 5eeb96d Sep 7, 2022

Vizvezdenec mentioned this pull request Sep 23, 2022

Equal quotas of CPU power for each user (description of the problem and proposal) official-stockfish/fishtest#1427

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VLTC tuning #4154

VLTC tuning #4154

FauziAkram commented Sep 4, 2022

Vizvezdenec commented Sep 4, 2022

atumanian commented Sep 4, 2022 •

edited

Loading

Vizvezdenec commented Sep 4, 2022

atumanian commented Sep 4, 2022 •

edited

Loading

mstembera commented Sep 4, 2022

FauziAkram commented Sep 4, 2022 •

edited

Loading

atumanian commented Sep 4, 2022 •

edited

Loading

atumanian commented Sep 4, 2022

atumanian commented Sep 4, 2022

mstembera commented Sep 4, 2022

atumanian commented Sep 5, 2022 •

edited

Loading

Alayan-stk-2 commented Sep 5, 2022

Disservin commented Sep 5, 2022

atumanian commented Sep 5, 2022

mstembera commented Sep 5, 2022

Vizvezdenec commented Sep 6, 2022

vondele commented Sep 6, 2022

atumanian commented Sep 6, 2022

atumanian commented Sep 6, 2022

vondele commented Sep 6, 2022

atumanian commented Sep 6, 2022

vondele commented Sep 6, 2022

atumanian commented Sep 6, 2022 •

edited

Loading

Vizvezdenec commented Sep 6, 2022

Vizvezdenec commented Sep 6, 2022 •

edited

Loading

atumanian commented Sep 6, 2022 •

edited

Loading

FauziAkram commented Sep 6, 2022

vondele commented Sep 6, 2022

vondele commented Sep 7, 2022

VLTC tuning #4154

VLTC tuning #4154

Conversation

FauziAkram commented Sep 4, 2022

Vizvezdenec commented Sep 4, 2022

atumanian commented Sep 4, 2022 • edited Loading

Vizvezdenec commented Sep 4, 2022

atumanian commented Sep 4, 2022 • edited Loading

mstembera commented Sep 4, 2022

FauziAkram commented Sep 4, 2022 • edited Loading

atumanian commented Sep 4, 2022 • edited Loading

atumanian commented Sep 4, 2022

Parameter tweaks tests

atumanian commented Sep 4, 2022

mstembera commented Sep 4, 2022

atumanian commented Sep 5, 2022 • edited Loading

Alayan-stk-2 commented Sep 5, 2022

Disservin commented Sep 5, 2022

atumanian commented Sep 5, 2022

mstembera commented Sep 5, 2022

Vizvezdenec commented Sep 6, 2022

vondele commented Sep 6, 2022

atumanian commented Sep 6, 2022

atumanian commented Sep 6, 2022

vondele commented Sep 6, 2022

atumanian commented Sep 6, 2022

vondele commented Sep 6, 2022

atumanian commented Sep 6, 2022 • edited Loading

Vizvezdenec commented Sep 6, 2022

Vizvezdenec commented Sep 6, 2022 • edited Loading

atumanian commented Sep 6, 2022 • edited Loading

FauziAkram commented Sep 6, 2022

vondele commented Sep 6, 2022

vondele commented Sep 7, 2022

atumanian commented Sep 4, 2022 •

edited

Loading

atumanian commented Sep 4, 2022 •

edited

Loading

FauziAkram commented Sep 4, 2022 •

edited

Loading

atumanian commented Sep 4, 2022 •

edited

Loading

atumanian commented Sep 5, 2022 •

edited

Loading

atumanian commented Sep 6, 2022 •

edited

Loading

Vizvezdenec commented Sep 6, 2022 •

edited

Loading

atumanian commented Sep 6, 2022 •

edited

Loading