Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Equal quotas of CPU power for each user (description of the problem and proposal) #1427

Closed
atumanian opened this issue Sep 21, 2022 · 36 comments

Comments

@atumanian
Copy link

atumanian commented Sep 21, 2022

I had already started the topic on the Discord channel, but decided to write my thoughts also here, so that they are better seen.

The problem

Fishtest is now dominated by a few developers who submit tests at a high rate, running 5-10 of them simultaneously each at a normal throughput. Those who are more parsimonious have a hard time testing their ideas.
Some statistics to demonstrate the amount of inequality. The table shows the distribution of tests and CPU cores among users (at 15:00, yesterday)

User Number of tests Number of cores Percentage of cores
Viz 10 1926 50.4%
SFi 3 1186 31.0%
sg 2 430 11.2%
Fis 1 135 3.5%
xot 1 87 2.3%
atu 1 60 1.6%
-- -- -- --
total 18 3824 100%

Moreover, the leader in this table decided that 100% throughput was too much for my only test and lowered it to 50%.

The current practice is unfair for developers who employ more parsimonious approach to testing and run tests one after another. It is also detrimental to development of Stockfish. Aggressive developers take CPU power away from others, who work on different areas of improvement or simply use power more efficiently.
To list the negative effects:

  • Efficient testing plans aren't encouraged at all, because you can get more CPU power only by sheer numbers of submitted tests.
  • Since aggressive developers work only on a few parts of the code, other areas of the engine remain underdeveloped.

My proposal

The proposal is simple. Rewrite the server so that it schedules tasks in such a way that every user gets equal amount of CPU power for all the tests they are running together. Then the power is further distributed within each user quota proportionally to the throughput value set for each test run. A user can run one test at a time or many of them by their choice, but it shouldn't change the overall amount of power utilized by the user.
The benefits of this approach are:

  • Developers will be encouraged to seek more efficient ways to test their ideas, since their resources are limited.
  • This greatly reduces the competition for power among developers, so that there will be less one another blaming and oppressing for wasting resources.
  • More different ideas and areas of development will receive their place in Fishtest.
  • This encourages cooperation among developers. If one thinks their ideas need more CPU power, they will try to persuade others to join in testing them.
  • The policy is enforced automatically, which means there's no human factor in distributing the CPU power.
@vondele
Copy link
Member

vondele commented Sep 21, 2022

no thanks.

@atumanian

This comment was marked as abuse.

@vondele
Copy link
Member

vondele commented Sep 21, 2022

that the current scheme works fine, no need to change it.

@atumanian

This comment was marked as abuse.

@peregrineshahin
Copy link
Contributor

Just submit whatever you want
And let approvers decide if it is worth the testing
I don't know why you took this reaction from one occasion that negatively
You are allowed to test I think
But Master vs Master still should be put to 50% throughput IMO also

@atumanian

This comment was marked as abuse.

@Disservin
Copy link
Member

This is just nonsense. Tests take time that’s normal. Engine development takes time and at this level multiple tries to get a change right. You might also be the only who considers this a bug you also stated that it only makes analysis a bit worse. Stockfish isn’t really developed for pure analysis. There was a recent patch by me that increases the singular extension check depth when the previous depth was really high. By your logic this was also a bug? This issue looks like you are annoyed that in total you have less cores than other people who submit many patches but no one cares about that… we have no time pressure and viz has made many successful patches in the past and future

@Sopel97
Copy link
Member

Sopel97 commented Sep 21, 2022

that the current scheme works fine, no need to change it.

I've explained why the current practice is not fine. What are your arguments?

You shown that there is disparity in tests distribution by users, not that there is an issue. Currently there is no issue as tests complete and don't backlog.

@atumanian

This comment was marked as spam.

@mstembera
Copy link

There is an important distinction between someone having 6 different ideas and submitting 6 unique patches versus someone having one idea and submitting 6 different versions simultaneously only differing by constants. Yes tests still do complete but if one user has 50% of fishtest usage to themselves everyone else's tests take twice as long as they otherwise would. As a result I quietly stopped contributing CPU time to fishtest and while I still participate in development this is a big reason I do so less than in the past. This issue recurring shows it's discouraging to others as well. I therefore wonder how good this situation is for SF development in the long run. There were some constructive thoughts by @vondele here official-stockfish/Stockfish#3234 (comment) and a nice proposal to improve things here #869. I wish it would get acted on.

@atumanian

This comment was marked as abuse.

@atumanian

This comment was marked as abuse.

@TheBlackPlague
Copy link

As mentioned in the relevant Discord #ideas channel post... there are a few issues with the suggestion. I would suggest you make an effort to not dodge these issues and actually make reasonable responses to them:

The main goal of Stockfish is to be the strongest chess engine in the world. Providing world-class analysis is something that comes along with it. To put it bluntly, Magnus Carlsen's analysis of the game would be world-class compared to a 1200-rated player on Lichess. Likewise, Stockfish, being the strongest engine in the world, provides a world-class analysis in comparison. This, by all means, doesn't make Stockfish's main goal as a position solver or analyzer. It is just an added benefit that to be the top engine, Stockfish has to be good at analyzing a vast majority of positions accurately. The way I see it, gaining elo is currently the main goal of Stockfish.

Fishtest exists for providing SF contributors with actionable statistical data. Thus, it makes sense to allow some tests that would sway from the main goal, but help collect such data. However, the main goal should always still be prioritized, since it's the main goal. Thus, there are guidelines in place which outline how much computational resources each patch/test gets. This makes most efficient use of the computational resources available to Fishtest, and benefits the path chosen by SF contributors the most — something Fishtest should be doing.

The current way patches are tested for SF is using SPRT. What this means is that if a patch is both really good or really bad, the test finishes relatively quickly. Hence, even if one is talking about quality over quantity, it is never an issue to begin with. Write good patches, and they'll pass quickly anyways. Write bad patches, same thing applies. Neutral patches is where the issue is. And if your patch is neutral, it doesn't matter whether you think it was a high quality or a super thought out change. Statistically, it doesn't work out, and hence it is same as all the other 10 patches submitted by a single user using up 50% of Fishtest computational resources. Why should your patch be any different?

In the Discord post & here, you are specifically targeting a Stockfish contributor for using a vast majority of Fishtest resources with a multitude of patches. Same contributor who has helped Stockfish gain a lot of elo in the past, furthering Stockfish on its main path. Why shouldn't more computational resources be invested into his tests?

I think what's missing is an understanding of investment. Like a company wouldn't invest equally into everything they're doing, especially not into projects that sway from its main goal, why should Fishtest invest into stuff that sways from Stockfish's main goal?

@Disservin
Copy link
Member

So far i think the people with the most tests have made stockfish gain the most elo. So…..

@atumanian

This comment was marked as spam.

@atumanian

This comment was marked as abuse.

@TheBlackPlague
Copy link

TheBlackPlague commented Sep 22, 2022

I don't agree with this. Why is gaining elo the main goal of Stockfish? As I said before, the official page says about analysis with the engine.

Quite frankly, it doesn't matter whether you agree with it or not. You're always welcome to fork Stockfish and start your analysis version (as long as you adhere to the license). You don't devise the primary goal of Stockfish, but active contributors of Stockfish for the past years do, and this is the goal they've set. It is also the goal Stockfish started with.

In analysis we need evaluations to be as accurate as possible. It's not a big concern if the engine is 10 elo weaker. Does it matter how strongly the engine plays if it shows incorrect information?

Again, if the analysis is what you seek and you're fine with the weaker engine, feel free to fork Stockfish and do your thing. Official Stockfish refuses to support it; therefore, Fishtest (unless you fork and start your instance) also refuses to support it with full power. I repeat myself, it doesn't matter whether you agree with this or not.

Other parts of the code remain underdeveloped...

False statement. There are many techniques implemented in Stockfish's search that complement one another. Stuff like History Heuristic greatly assist Late Move Reductions to be as effective as it is.

This was the first try on an idea and it passed. Why? I didn't submit multiple versions of the patch to Fishtest - I just ran many speed tests on my local computer and submitted the best version. This is an efficient testing plan.

This is what we call a super small sample size. How many tests like these have you made a pass on the first try? Are you saying that if I flip a coin a certain way and it is Heads once, it will always be Heads? See how your statements are made with no statistical evidence?

@jnlt3
Copy link

jnlt3 commented Sep 22, 2022

This is not about the people, each good idea matters. If a developer can come up with 6 valid ideas (Yes, this does include variations of the same idea, they are essential to development) each one of those ideas deserve as much processing power as the others.

@mstembera
Copy link

@dsekercioglu Yes and (yes) :) All I'm saying is don't be selfish and schedule all variations "at the same time" and take 50%
of fishtest for yourself. If fishtest could balance this automatically we wouldn't have to worry about it.

@TheBlackPlague
Copy link

@dsekercioglu Yes and (yes) :) All I'm saying is don't be selfish and schedule all variations "at the same time" and take 50% of fishtest for yourself. If fishtest could balance this automatically we wouldn't have to worry about it.

Doing such actually allows Stockfish to get a gainer a lot of the time. This is a feature. Not a bug.

@mstembera
Copy link

@TheBlackPlague How is scheduling many variants simultaneously any more likely to produce an elo gainer than scheduling those same variants say 2 or 3 at a time? In fact if one variant from the first batch passes the others may not need to be scheduled at all. In the mean time the other developers can be more productive because their turn arounds are faster.

@TheBlackPlague
Copy link

@TheBlackPlague How is scheduling many variants simultaneously any more likely to produce an elo gainer than scheduling those same variants say 2 or 3 at a time? In fact if one variant from the first batch passes the others may not need to be scheduled at all. In the mean time the other developers can be more productive because their turn arounds are faster.

Do you know when tests are going to pass and fail? No. Hence, the test may pass or fail while I'm asleep, and then the other one would have to be tested tomorrow (making it longer for the whole idea and its variations to be tested).

Fishtest isn't poor. It isn't in need of this kind of worrying. It has a lot of cores.

@mstembera
Copy link

Yes waiting about the same amount of time as everyone else is fair. Those cores are not going idle. They work on everyone's patches. One person over scheduling is at the expense of everyone else.

@Vizvezdenec
Copy link

Vizvezdenec commented Sep 23, 2022

Yeah I'm all in for this to be implemented - I will finally have a serious reason to stop trying to contribute to sf.
Would be ofc pretty interesting to do because of 1 guy who had been inactive for 4 years and contributed square 0 elo gainers in general + the only reason to have this megapatch is a massive butthurt because I didn't let his incredibly useful "160k games of master vs master" test to run at full throughput because it's by general intelligence is absolutely useless bullshit.
But life is showing us nowadays that even weirder stuff can happen.

@Vizvezdenec
Copy link

And yeah, you seem to be really keen on spurting long discussions where you repeat the same "reasoning" almost no one buys as if it's some new point. Repeating the same shit 50 times in a row wouldn't make it 50 times more viable.
official-stockfish/Stockfish#4154

@vdbergh
Copy link
Contributor

vdbergh commented Sep 23, 2022

I do not understand why the OP objects to running a statistics gathering test at 50 percent throughput. It is standard Fishtest practice that research should be carried out at lower throughput.

@TheBlackPlague
Copy link

I do not understand why the OP objects to running a statistics gathering test at 50 percent throughput. It is standard Fishtest practice that research should be carried out at lower throughput.

He believes that Stockfish contributors should change the primary goal of Stockfish from being the strongest engine in the world to being a better analysis tool (based on what criteria? I don't know).

This is evident with him saying the following:

In analysis we need evaluations to be as accurate as possible. It's not a big concern if the engine is 10 elo weaker. Does it matter how strongly the engine plays if it shows incorrect information?

Therefore, with his thinking in mind, one can see why he believes that this test should run at full throughput. Anyone rational enough to realize that Stockfish isn't a project that revolves around him would know that such is not the standard and will likely never be since Stockfish's primary goal was, is, and will remain as being the strongest engine in the world.

@atumanian

This comment was marked as abuse.

@TheBlackPlague
Copy link

To be clear, I consider gaining elo one of the main goals of Stockfish but not the only one.

No, that's the only one. Everything else is just another use of Stockfish. Not a primary goal.

Mind also that both Chess.com and Lichess use Stockfish for analysis. If they switch to Komodo, will it be good for Stockfish's reputation?

Mind my language, but Stockfish doesn't give two shits what they both use. They use Stockfish because it's the strongest engine. That's it, period. If Stockfish continues to be the strongest engine, they'll continue to use it. It's that simple.

I don't understand how you can develop a component of the engine if no tests are run on it.

Well, that seems like a YOU problem. Because most people who understand Stockfish's search see how History Heuristic assists Late Move Reductions.

It wasn't a coinflip, because I selected the version that had performed best in local tests, not a random one.

That has nothing to do with what I said. @Vizvezdenec literally just said you ramble on pointlessly and you're doing just that. For example, your argument here was that Stockfish Contributors actually cares about what Chess.com or Lichess use. No, they do not.

Quite frankly, I understand your urge to have a pointless discussion but could you stop? Majority of people on this issue have already stated that they're against what you proposed. Hence, it won't be happening. Start your own instance of Fishtest if you really want it.

@atumanian

This comment was marked as spam.

@Vizvezdenec
Copy link

https://github.com/official-stockfish/Stockfish/graphs/contributors?from=2021-09-23&to=2022-09-23&type=c
https://github.com/official-stockfish/Stockfish/graphs/contributors?from=2020-09-23&to=2022-09-23&type=c
btw, not 2015 where patches for +10 elo were hanging all over the place.
Half of this is "submit 5 versions of the same idea, see what works".
And not only me, some people managed to get elo gainers on 13th-15th try of the same idea in different forms / with different constants.

@Vizvezdenec
Copy link

Vizvezdenec commented Sep 23, 2022

@TheBlackPlague How is scheduling many variants simultaneously any more likely to produce an elo gainer than scheduling those same variants say 2 or 3 at a time? In fact if one variant from the first batch passes the others may not need to be scheduled at all. In the mean time the other developers can be more productive because their turn arounds are faster.

Very nice point. Running related tests one after another is always better than running them simultaneously. This is an obvious flaw in the system that encourages wasting resources and discourages efficient approaches.

can you at least not try to blanantly LIE?
You tried to simplify away dynamic contempt wasting > 2kk STC games on smth that basically reached 4 sigma of being elo positive. And back then we didn't have 5k cores.
You dare to open your mouth about efficient approaches?
At least I've never tried to waste 2kk games on simplifying away part of code I doesn't like while it clearly was showing a reasonable gain all this time.
Efficient approacher my ass, holy shit.

@atumanian

This comment was marked as abuse.

@vondele
Copy link
Member

vondele commented Sep 23, 2022

can we stop this discussion? It is pointless and a waste of time.

@atumanian

This comment was marked as abuse.

@zungur zungur closed this as not planned Won't fix, can't repro, duplicate, stale Sep 23, 2022
@atumanian

This comment was marked as spam.

@official-stockfish official-stockfish locked and limited conversation to collaborators Sep 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests