Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VLTC tuning #4154

Closed
wants to merge 5 commits into from
Closed

Conversation

FauziAkram
Copy link
Contributor

Tuning some parameters that scale very well the longer the time control is:

Failed STC:
https://tests.stockfishchess.org/tests/view/6313424d8202a039920e130a
LLR: -2.94 (-2.94,2.94) <-1.75,0.25>
Total: 42680 W: 11231 L: 11540 D: 19909
Ptnml(0-2): 191, 5008, 11232, 4737, 172

Passed LTC:
https://tests.stockfishchess.org/tests/view/6311e2cd874169ca52ae7933
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 53448 W: 14782 L: 14437 D: 24229
Ptnml(0-2): 101, 5214, 15740, 5577, 92

Passed VLTC:
https://tests.stockfishchess.org/tests/view/6312530cfa99a92e3002c927
LLR: 2.95 (-2.94,2.94) <0.50,2.50>
Total: 123336 W: 33465 L: 33007 D: 56864
Ptnml(0-2): 38, 11466, 38204, 11920, 40

@Vizvezdenec
Copy link
Contributor

I honestly prefer bigpenor version which seems to do much better at VLTC.

@atumanian
Copy link

atumanian commented Sep 4, 2022

Are the parameters you want to change related? The guidelines on Fishtest wiki allow only a combo of at most 2 patches, which are trivial. In my opinion, changes in bunch of unrelated parameters should not be accepted in one patch. It's much better to tweak related parameters in smaller sets and, if they pass tests, proceed to the next set.
But nevertheless, thank you the effort you put for the project.

@Vizvezdenec
Copy link
Contributor

Generally speaking everything in search is related.
Approach of tuning massive number of search constants that allows us to jump from one local maximum to another local maximum and find some parts of code that scale well simultaneously is proven to be beneficial.
It couldn't be done with 1-2 param changes since you need 60 parameters changed to switch local maximums.
Recent examples:
b0b3155
84b1940
3ec6e1d

@atumanian
Copy link

atumanian commented Sep 4, 2022

Then if the problem is switching to another local maximum while changing a bunch of parameters, the elo gain should be considerable and passing just standard SPRT tests is not enough.
The problem with such patches is that they are more likely to conflict or have adverse effects when combined with what other developers do at the same time.
If all the parameters if the engine can be changed at once with a standard SPRT test, for what are the guidelines that prohibit combining more than 2 trivial patches?

@mstembera
Copy link
Contributor

@atumanian Tuning many parameters together has always been accepted and is not considered a combo.

@FauziAkram
Copy link
Contributor Author

FauziAkram commented Sep 4, 2022

m is switching to another local maximum while changing a bunch of parameters, the elo gain should be considerable and passing just standard SPRT tests is not enough.
The problem with such patches is that they are more likely to conflict or have adverse effects when combined with what other developers do at the same time.
If all the parameters if the engine can be changed at once with a standard SPRT test, for what are the guidelines that prohibit combining more than 2 trivial patches?

Obviously, I thank bigpenor a lot for his efforts, and he is the one who inspired my recent tests.
However, the arguments I can do for this patch, in comparison with bigpenor patch.

  1. I think that this patch produces very similar results but only 16 parameter changes, while the other patch has around 120 parameter changes.
  2. After including the simplifications in his patch, it looks like it gains only around 2.5 Elo on VLTC, and my patch should bring around 1.5 Elo on VLTC, and 2 Elo on LTC.

Again, I'm not against accepting bigpenor patch, I am ok with either one passing, I am just excited that one of these 2 patches gets accepted fast so I can begin the next generation of tuning on top of the passed patch.

Do you think it's beneficial if we run a head-to-head match between mine and bigpenor patch at a specific time control?

@atumanian
Copy link

atumanian commented Sep 4, 2022

@atumanian Tuning many parameters together has always been accepted and is not considered a combo.

Then what is considered a combo?
By the way, tuning and accepting a patch are different things. I'm not against tuning many parameters at once. But they should past stricter tests to be accepted.

@atumanian
Copy link

This is what Fishtest wiki says:

Parameter tweaks tests

Among parameter tweaks a special sub-case is the so called union patch or combo patch, that is a bundling of patches that failed SPRT but with positive or near positive score. Sometime retesting the union as a whole passes SPRT. Due to the nature of the approach and because of each individual patch failed already, a union has some constraints:

  • Maximum 2 patches per union
  • Each patch shall be trivial, like a parameter tweak. Patches that add/remove a concept/idea/feature shall pass individually.

@atumanian
Copy link

Do you think it's beneficial if we run a head-to-head match between mine and bigpenor patch at a specific time control?

If you are asking me, I'm not against such a match. But, in my opinion, 2 elo increase shouldn't be enough to accept such big changes. If it were 4 elo, then things would be different.

@mstembera
Copy link
Contributor

@atumanian A combo is a merger of two patches that were created separately. They need to be simple and so parameter tweaks are often good candidates but other simple patches qualify as well. If many parameters are tuned together it is one patch and not a combo.

@atumanian
Copy link

atumanian commented Sep 5, 2022

@atumanian A combo is a merger of two patches that were created separately. They need to be simple and so parameter tweaks are often good candidates but other simple patches qualify as well. If many parameters are tuned together it is one patch and not a combo.

Isn't it the lines of code changed that affect the engine's behaviour and developers' work? Why should the method of creation of a patch matter in deciding whether to commit it?

@Alayan-stk-2
Copy link

Unlike most patches, tuning parameters doesn't increase the complexity of the codebase/engine. There is no reason to be reluctant to update parameters if testing show it to be beneficial.

The only reason to limit combo testing of parameters is to avoid spamming dozens of minor variations testing the same thing again and again, using a lot of resources until maybe a false positive lucks out. This is irrelevant here.

@Disservin
Copy link
Member

Anyway I think vondele prefers if all the commits are squished.

@atumanian
Copy link

Unlike most patches, tuning parameters doesn't increase the complexity of the codebase/engine. There is no reason to be reluctant to update parameters if testing show it to be beneficial.

It's not only about comparing the complexity before and after a change. It's about the complexity of the change itself. When you change a lot of code, other developers my be working on the same parts of code. When they do some testing, and the tested code changes, the results become less relevant. Your patch may even completely invalidate the work they have done so far.
Consider an extreme example. Suppose I want to completely rewrite the body of the search function. My code is 1 line smaller than the current code, so my patch if even a simplification by your criteria. Suppose my patch adds 1 elo. But what happens with the work others have put into trying to improve the current code of the function?
That's why the more complex are the changes, the stricter tests they must pass. More complex changes create more inconvenience for the developers working on the same parts of the code. Such changes need to provide more benefit as a compensation for the inconvenience.

@mstembera
Copy link
Contributor

@atumanian I don't know why you refuse to believe what everyone is telling you but here you go #1008 (comment) straight from a maintainer.

@Vizvezdenec
Copy link
Contributor

there were like dozen commits like this. And this patch is actually good for development because we've been stuck in local search maximum for a while. Maybe with tuning we can go some stuff better.

@vondele
Copy link
Member

vondele commented Sep 6, 2022

yes, no objects to this kind of patch from my side. However, I'd like to see the PR (with associated test results) by @candirufish before merging one of these tunes.

@atumanian
Copy link

yes, no objects to this kind of patch from my side. However, I'd like to see the PR (with associated test results) by @candirufish before merging one of these tunes.

So you think that changing every search and evaluation parameter in one patch can be commited with the same tests as changing of just one parameter?

@atumanian
Copy link

@atumanian I don't know why you refuse to believe what everyone is telling you but here you go #1008 (comment) straight from a maintainer.

The patch Marco refers there is different from the present one. That match patches a set of related parameters. It's just 3 tables and at least 2 of them are closely related. In contrast, the present patch changes a set of many unrelated of loosely related parameters.
Anyway, my opinion is not decisive, since I'm not a maintainer.

@vondele
Copy link
Member

vondele commented Sep 6, 2022

So you think that changing every search and evaluation parameter in one patch can be commited with the same tests as changing of just one parameter?

I'm happy to make those calls as needed, from time to time those exceptional PRs are made. I will request additional tests as I see fit. That's my role as maintainer...

@atumanian
Copy link

I'm happy to make those calls as needed, from time to time those exceptional PRs are made. I will request additional tests as I see fit. That's my role as maintainer...

What calls are you referring to?

@vondele
Copy link
Member

vondele commented Sep 6, 2022

make the call means make the decision, in this context...

@atumanian
Copy link

atumanian commented Sep 6, 2022

make the call means make the decision, in this context...

Ok. Some colleagues here think that changes any set of parameters require only the standard SPRT tests, just as with one parameter. I disagree with this position are think that more complex changes even in this context should require stricter testing criteria. I'm interested in your position on this matter.

@Vizvezdenec
Copy link
Contributor

No, let's not do this wordplay.
Not "some colleagues".
Every single sf contributor that posted anything in PRs like this has no problem with this apart from you, period.
I'm extremely tired of reading this because you produce nothing new and repeat the same phrase in different variations like a parrot.

@Vizvezdenec
Copy link
Contributor

Vizvezdenec commented Sep 6, 2022

Last time - combo of functional patches is not really allowed because they can all gain like 0,1 elo and be simplified away one by one. Which will lead to endless cycle of merging patch and then removing it parts. This is the main reason why functional combos are not allowed. It's a resource wasting blackhole.
Which is not the case for param tweaks since you can't "simplify away" param tweaks.

@atumanian
Copy link

atumanian commented Sep 6, 2022

No, let's not do this wordplay. Not "some colleagues". Every single sf contributor that posted anything in PRs like this has no problem with this apart from you, period. I'm extremely tired of reading this because you produce nothing new and repeat the same phrase in different variations like a parrot.

Your statement is not true. As seen from the discussion of #1008 some members, e.g. @stockfishdeveloper are critical of even a simpler patch. And I don't repeat like a parrot - I try to find words so that my ideas are better understood.

@FauziAkram
Copy link
Contributor Author

to this kind of patch from my side. However, I'd like to see the PR (with associated test results) by @candirufish before merging one of these tunes.

@vondele I did post the test results in my PR, I think you are asking candirufish also to open a PR with his patch right?

@vondele
Copy link
Member

vondele commented Sep 6, 2022

@vondele I did post the test results in my PR, I think you are asking candirufish also to open a PR with his patch right?

correct, if we have both PRs maybe we need some additional tests.

@vondele
Copy link
Member

vondele commented Sep 7, 2022

since we have no second PR, I'll proceed merging this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
to be merged Will be merged shortly
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants