Equal quotas of CPU power for each user (description of the problem and proposal) #1427

atumanian · 2022-09-21T15:45:32Z

I had already started the topic on the Discord channel, but decided to write my thoughts also here, so that they are better seen.

The problem

Fishtest is now dominated by a few developers who submit tests at a high rate, running 5-10 of them simultaneously each at a normal throughput. Those who are more parsimonious have a hard time testing their ideas.
Some statistics to demonstrate the amount of inequality. The table shows the distribution of tests and CPU cores among users (at 15:00, yesterday)

User	Number of tests	Number of cores	Percentage of cores
Viz	10	1926	50.4%
SFi	3	1186	31.0%
sg	2	430	11.2%
Fis	1	135	3.5%
xot	1	87	2.3%
atu	1	60	1.6%
--	--	--	--
total	18	3824	100%

Moreover, the leader in this table decided that 100% throughput was too much for my only test and lowered it to 50%.

The current practice is unfair for developers who employ more parsimonious approach to testing and run tests one after another. It is also detrimental to development of Stockfish. Aggressive developers take CPU power away from others, who work on different areas of improvement or simply use power more efficiently.
To list the negative effects:

Efficient testing plans aren't encouraged at all, because you can get more CPU power only by sheer numbers of submitted tests.
Since aggressive developers work only on a few parts of the code, other areas of the engine remain underdeveloped.

My proposal

The proposal is simple. Rewrite the server so that it schedules tasks in such a way that every user gets equal amount of CPU power for all the tests they are running together. Then the power is further distributed within each user quota proportionally to the throughput value set for each test run. A user can run one test at a time or many of them by their choice, but it shouldn't change the overall amount of power utilized by the user.
The benefits of this approach are:

Developers will be encouraged to seek more efficient ways to test their ideas, since their resources are limited.
This greatly reduces the competition for power among developers, so that there will be less one another blaming and oppressing for wasting resources.
More different ideas and areas of development will receive their place in Fishtest.
This encourages cooperation among developers. If one thinks their ideas need more CPU power, they will try to persuade others to join in testing them.
The policy is enforced automatically, which means there's no human factor in distributing the CPU power.

vondele · 2022-09-21T16:35:58Z

no thanks.

vondele · 2022-09-21T16:39:54Z

that the current scheme works fine, no need to change it.

peregrineshahin · 2022-09-21T16:41:31Z

Just submit whatever you want
And let approvers decide if it is worth the testing
I don't know why you took this reaction from one occasion that negatively
You are allowed to test I think
But Master vs Master still should be put to 50% throughput IMO also

Disservin · 2022-09-21T17:29:56Z

This is just nonsense. Tests take time that’s normal. Engine development takes time and at this level multiple tries to get a change right. You might also be the only who considers this a bug you also stated that it only makes analysis a bit worse. Stockfish isn’t really developed for pure analysis. There was a recent patch by me that increases the singular extension check depth when the previous depth was really high. By your logic this was also a bug? This issue looks like you are annoyed that in total you have less cores than other people who submit many patches but no one cares about that… we have no time pressure and viz has made many successful patches in the past and future

Sopel97 · 2022-09-21T17:51:01Z

that the current scheme works fine, no need to change it.

I've explained why the current practice is not fine. What are your arguments?

You shown that there is disparity in tests distribution by users, not that there is an issue. Currently there is no issue as tests complete and don't backlog.

mstembera · 2022-09-21T18:44:33Z

There is an important distinction between someone having 6 different ideas and submitting 6 unique patches versus someone having one idea and submitting 6 different versions simultaneously only differing by constants. Yes tests still do complete but if one user has 50% of fishtest usage to themselves everyone else's tests take twice as long as they otherwise would. As a result I quietly stopped contributing CPU time to fishtest and while I still participate in development this is a big reason I do so less than in the past. This issue recurring shows it's discouraging to others as well. I therefore wonder how good this situation is for SF development in the long run. There were some constructive thoughts by @vondele here official-stockfish/Stockfish#3234 (comment) and a nice proposal to improve things here #869. I wish it would get acted on.

TheBlackPlague · 2022-09-21T19:32:16Z

As mentioned in the relevant Discord #ideas channel post... there are a few issues with the suggestion. I would suggest you make an effort to not dodge these issues and actually make reasonable responses to them:

The main goal of Stockfish is to be the strongest chess engine in the world. Providing world-class analysis is something that comes along with it. To put it bluntly, Magnus Carlsen's analysis of the game would be world-class compared to a 1200-rated player on Lichess. Likewise, Stockfish, being the strongest engine in the world, provides a world-class analysis in comparison. This, by all means, doesn't make Stockfish's main goal as a position solver or analyzer. It is just an added benefit that to be the top engine, Stockfish has to be good at analyzing a vast majority of positions accurately. The way I see it, gaining elo is currently the main goal of Stockfish.

Fishtest exists for providing SF contributors with actionable statistical data. Thus, it makes sense to allow some tests that would sway from the main goal, but help collect such data. However, the main goal should always still be prioritized, since it's the main goal. Thus, there are guidelines in place which outline how much computational resources each patch/test gets. This makes most efficient use of the computational resources available to Fishtest, and benefits the path chosen by SF contributors the most — something Fishtest should be doing.

The current way patches are tested for SF is using SPRT. What this means is that if a patch is both really good or really bad, the test finishes relatively quickly. Hence, even if one is talking about quality over quantity, it is never an issue to begin with. Write good patches, and they'll pass quickly anyways. Write bad patches, same thing applies. Neutral patches is where the issue is. And if your patch is neutral, it doesn't matter whether you think it was a high quality or a super thought out change. Statistically, it doesn't work out, and hence it is same as all the other 10 patches submitted by a single user using up 50% of Fishtest computational resources. Why should your patch be any different?

In the Discord post & here, you are specifically targeting a Stockfish contributor for using a vast majority of Fishtest resources with a multitude of patches. Same contributor who has helped Stockfish gain a lot of elo in the past, furthering Stockfish on its main path. Why shouldn't more computational resources be invested into his tests?

I think what's missing is an understanding of investment. Like a company wouldn't invest equally into everything they're doing, especially not into projects that sway from its main goal, why should Fishtest invest into stuff that sways from Stockfish's main goal?

Disservin · 2022-09-21T20:42:11Z

So far i think the people with the most tests have made stockfish gain the most elo. So…..

TheBlackPlague · 2022-09-22T01:17:44Z

I don't agree with this. Why is gaining elo the main goal of Stockfish? As I said before, the official page says about analysis with the engine.

Quite frankly, it doesn't matter whether you agree with it or not. You're always welcome to fork Stockfish and start your analysis version (as long as you adhere to the license). You don't devise the primary goal of Stockfish, but active contributors of Stockfish for the past years do, and this is the goal they've set. It is also the goal Stockfish started with.

In analysis we need evaluations to be as accurate as possible. It's not a big concern if the engine is 10 elo weaker. Does it matter how strongly the engine plays if it shows incorrect information?

Again, if the analysis is what you seek and you're fine with the weaker engine, feel free to fork Stockfish and do your thing. Official Stockfish refuses to support it; therefore, Fishtest (unless you fork and start your instance) also refuses to support it with full power. I repeat myself, it doesn't matter whether you agree with this or not.

Other parts of the code remain underdeveloped...

False statement. There are many techniques implemented in Stockfish's search that complement one another. Stuff like History Heuristic greatly assist Late Move Reductions to be as effective as it is.

This was the first try on an idea and it passed. Why? I didn't submit multiple versions of the patch to Fishtest - I just ran many speed tests on my local computer and submitted the best version. This is an efficient testing plan.

This is what we call a super small sample size. How many tests like these have you made a pass on the first try? Are you saying that if I flip a coin a certain way and it is Heads once, it will always be Heads? See how your statements are made with no statistical evidence?

jnlt3 · 2022-09-22T07:32:28Z

This is not about the people, each good idea matters. If a developer can come up with 6 valid ideas (Yes, this does include variations of the same idea, they are essential to development) each one of those ideas deserve as much processing power as the others.

mstembera · 2022-09-22T19:36:35Z

@dsekercioglu Yes and (yes) :) All I'm saying is don't be selfish and schedule all variations "at the same time" and take 50%
of fishtest for yourself. If fishtest could balance this automatically we wouldn't have to worry about it.

TheBlackPlague · 2022-09-23T02:00:04Z

@dsekercioglu Yes and (yes) :) All I'm saying is don't be selfish and schedule all variations "at the same time" and take 50% of fishtest for yourself. If fishtest could balance this automatically we wouldn't have to worry about it.

Doing such actually allows Stockfish to get a gainer a lot of the time. This is a feature. Not a bug.

mstembera · 2022-09-23T07:17:55Z

@TheBlackPlague How is scheduling many variants simultaneously any more likely to produce an elo gainer than scheduling those same variants say 2 or 3 at a time? In fact if one variant from the first batch passes the others may not need to be scheduled at all. In the mean time the other developers can be more productive because their turn arounds are faster.

TheBlackPlague · 2022-09-23T07:27:50Z

@TheBlackPlague How is scheduling many variants simultaneously any more likely to produce an elo gainer than scheduling those same variants say 2 or 3 at a time? In fact if one variant from the first batch passes the others may not need to be scheduled at all. In the mean time the other developers can be more productive because their turn arounds are faster.

Do you know when tests are going to pass and fail? No. Hence, the test may pass or fail while I'm asleep, and then the other one would have to be tested tomorrow (making it longer for the whole idea and its variations to be tested).

Fishtest isn't poor. It isn't in need of this kind of worrying. It has a lot of cores.

mstembera · 2022-09-23T07:39:47Z

Yes waiting about the same amount of time as everyone else is fair. Those cores are not going idle. They work on everyone's patches. One person over scheduling is at the expense of everyone else.

Vizvezdenec · 2022-09-23T07:56:32Z

Yeah I'm all in for this to be implemented - I will finally have a serious reason to stop trying to contribute to sf.
Would be ofc pretty interesting to do because of 1 guy who had been inactive for 4 years and contributed square 0 elo gainers in general + the only reason to have this megapatch is a massive butthurt because I didn't let his incredibly useful "160k games of master vs master" test to run at full throughput because it's by general intelligence is absolutely useless bullshit.
But life is showing us nowadays that even weirder stuff can happen.

Vizvezdenec · 2022-09-23T08:24:52Z

And yeah, you seem to be really keen on spurting long discussions where you repeat the same "reasoning" almost no one buys as if it's some new point. Repeating the same shit 50 times in a row wouldn't make it 50 times more viable.
official-stockfish/Stockfish#4154

vdbergh · 2022-09-23T08:33:50Z

I do not understand why the OP objects to running a statistics gathering test at 50 percent throughput. It is standard Fishtest practice that research should be carried out at lower throughput.

TheBlackPlague · 2022-09-23T08:38:10Z

I do not understand why the OP objects to running a statistics gathering test at 50 percent throughput. It is standard Fishtest practice that research should be carried out at lower throughput.

He believes that Stockfish contributors should change the primary goal of Stockfish from being the strongest engine in the world to being a better analysis tool (based on what criteria? I don't know).

This is evident with him saying the following:

In analysis we need evaluations to be as accurate as possible. It's not a big concern if the engine is 10 elo weaker. Does it matter how strongly the engine plays if it shows incorrect information?

Therefore, with his thinking in mind, one can see why he believes that this test should run at full throughput. Anyone rational enough to realize that Stockfish isn't a project that revolves around him would know that such is not the standard and will likely never be since Stockfish's primary goal was, is, and will remain as being the strongest engine in the world.

TheBlackPlague · 2022-09-23T14:32:45Z

To be clear, I consider gaining elo one of the main goals of Stockfish but not the only one.

No, that's the only one. Everything else is just another use of Stockfish. Not a primary goal.

Mind also that both Chess.com and Lichess use Stockfish for analysis. If they switch to Komodo, will it be good for Stockfish's reputation?

Mind my language, but Stockfish doesn't give two shits what they both use. They use Stockfish because it's the strongest engine. That's it, period. If Stockfish continues to be the strongest engine, they'll continue to use it. It's that simple.

I don't understand how you can develop a component of the engine if no tests are run on it.

Well, that seems like a YOU problem. Because most people who understand Stockfish's search see how History Heuristic assists Late Move Reductions.

It wasn't a coinflip, because I selected the version that had performed best in local tests, not a random one.

That has nothing to do with what I said. @Vizvezdenec literally just said you ramble on pointlessly and you're doing just that. For example, your argument here was that Stockfish Contributors actually cares about what Chess.com or Lichess use. No, they do not.

Quite frankly, I understand your urge to have a pointless discussion but could you stop? Majority of people on this issue have already stated that they're against what you proposed. Hence, it won't be happening. Start your own instance of Fishtest if you really want it.

Vizvezdenec · 2022-09-23T14:34:21Z

https://github.com/official-stockfish/Stockfish/graphs/contributors?from=2021-09-23&to=2022-09-23&type=c
https://github.com/official-stockfish/Stockfish/graphs/contributors?from=2020-09-23&to=2022-09-23&type=c
btw, not 2015 where patches for +10 elo were hanging all over the place.
Half of this is "submit 5 versions of the same idea, see what works".
And not only me, some people managed to get elo gainers on 13th-15th try of the same idea in different forms / with different constants.

Vizvezdenec · 2022-09-23T14:37:52Z

@TheBlackPlague How is scheduling many variants simultaneously any more likely to produce an elo gainer than scheduling those same variants say 2 or 3 at a time? In fact if one variant from the first batch passes the others may not need to be scheduled at all. In the mean time the other developers can be more productive because their turn arounds are faster.

Very nice point. Running related tests one after another is always better than running them simultaneously. This is an obvious flaw in the system that encourages wasting resources and discourages efficient approaches.

can you at least not try to blanantly LIE?
You tried to simplify away dynamic contempt wasting > 2kk STC games on smth that basically reached 4 sigma of being elo positive. And back then we didn't have 5k cores.
You dare to open your mouth about efficient approaches?
At least I've never tried to waste 2kk games on simplifying away part of code I doesn't like while it clearly was showing a reasonable gain all this time.
Efficient approacher my ass, holy shit.

vondele · 2022-09-23T15:13:12Z

can we stop this discussion? It is pointless and a waste of time.

This comment was marked as abuse.

Sign in to view

This comment was marked as spam.

Sign in to view

This comment was marked as abuse.

Sign in to view

This comment was marked as spam.

Sign in to view

This comment was marked as abuse.

Sign in to view

This comment was marked as spam.

Sign in to view

This comment was marked as abuse.

Sign in to view

zungur closed this as not planned Won't fix, can't repro, duplicate, stale Sep 23, 2022

This comment was marked as spam.

Sign in to view

official-stockfish locked and limited conversation to collaborators Sep 23, 2022

Equal quotas of CPU power for each user (description of the problem and proposal) #1427

Equal quotas of CPU power for each user (description of the problem and proposal) #1427

Comments

atumanian commented Sep 21, 2022 • edited Loading

The problem

My proposal

vondele commented Sep 21, 2022

This comment was marked as abuse.

vondele commented Sep 21, 2022

This comment was marked as abuse.

peregrineshahin commented Sep 21, 2022

This comment was marked as abuse.

Disservin commented Sep 21, 2022

Sopel97 commented Sep 21, 2022

This comment was marked as spam.

mstembera commented Sep 21, 2022

This comment was marked as abuse.

This comment was marked as abuse.

TheBlackPlague commented Sep 21, 2022

Disservin commented Sep 21, 2022

This comment was marked as spam.

This comment was marked as abuse.

TheBlackPlague commented Sep 22, 2022 • edited Loading

jnlt3 commented Sep 22, 2022

mstembera commented Sep 22, 2022

TheBlackPlague commented Sep 23, 2022

mstembera commented Sep 23, 2022

TheBlackPlague commented Sep 23, 2022

mstembera commented Sep 23, 2022

Vizvezdenec commented Sep 23, 2022 • edited Loading

Vizvezdenec commented Sep 23, 2022

vdbergh commented Sep 23, 2022

TheBlackPlague commented Sep 23, 2022

This comment was marked as abuse.

TheBlackPlague commented Sep 23, 2022

This comment was marked as spam.

Vizvezdenec commented Sep 23, 2022

Vizvezdenec commented Sep 23, 2022 • edited Loading

This comment was marked as abuse.

vondele commented Sep 23, 2022

This comment was marked as abuse.

This comment was marked as spam.

atumanian commented Sep 21, 2022 •

edited

Loading

TheBlackPlague commented Sep 22, 2022 •

edited

Loading

Vizvezdenec commented Sep 23, 2022 •

edited

Loading

Vizvezdenec commented Sep 23, 2022 •

edited

Loading