Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make bot-share proportional to "Awaiting PR" #2612

Open
h-vetinari opened this issue May 15, 2024 · 6 comments
Open

make bot-share proportional to "Awaiting PR" #2612

h-vetinari opened this issue May 15, 2024 · 6 comments

Comments

@h-vetinari
Copy link
Contributor

Just noticed the following in the most recent bot-bot run:

      MigrationYaml-snappy120: 1 - gets 141.620771 seconds (1.966955 percent)
      MigrationYaml-numpy2: 20 - gets 226.593234 seconds (3.147128 percent)
      [...]
      MigrationYaml-python312: 120 - gets 1359.559402 seconds (18.882769 percent)

I don't know how the bot shares are calculated exactly, but that to me seems really wasteful, and also not very responsive to new migrations. In particular, snappy120 is an old migration where only one feedstock doesn't have a PR yet ("Not solvable" for tensorflow), and yet it receives about 2/3rds of the resources of the numpy2 migration, where we have 300 PRs awaiting action.

Also, old migrations like python312 constantly hogging 20% of our resources despite very little movement seems like a waste to me as well. I'm not sure how easy it is to square all circles here (e.g. maybe need to include migrator age), but if we were using a score per migration à la1:

   |{Awaiting PR}| + 0.2 * (|{not solvable}| + |{bot error}|)

to determine bot-share, then we'd be IMO:

  • more responsive to new migrations
  • waste less cycles on really old migrations (unless nothing else is happening)
  • still have a way to retry failed PRs regularly

Within a migration I wouldn't want to deprioritize the error cases though, IMO the choice within a migrator should be driven by the total number of children - what I want to avoid is that an early failure of a big blocker doesn't get retried until all the other small-fry feedstocks get through the queue.

CC @beckermr

Footnotes

  1. |{...}| is the cardinality of the set, i.e. how many feedstocks per status

@beckermr
Copy link
Contributor

Yeah I've messed with this a lot over the years. I've definitely made things proportionate to waiting prs and that was a problem for other reasons. (The issue is that new migrations take over and block things if you make the share proportional to the number of waiting prs.)

Also, the bot ramps up migrations automatically so if you've just merged or restarted one, you might wait for about six to ten hours to see what happens.

I'm honestly reluctant to adjust things now but open to ideas. Most of the time nobody notices which is basically the point.

@beckermr
Copy link
Contributor

One more comment is that we used to not retry old migrations very much and then we got complaints that once something was fixed, the bot didn't get back to it fast enough. It's kind of a crap shoot no matter what.

@h-vetinari
Copy link
Contributor Author

It's kind of a crap shoot no matter what.

Yeah, I get that optimizing for all cases is very tricky. Which is why I'm so adamant to at least not waste the scarce cycles we have on stuff that will 100% fail (#2613).

@h-vetinari
Copy link
Contributor Author

h-vetinari commented May 15, 2024

One thing that would IMO be quite an easy win would be to open PRs for "not solvable" feedstocks after a given time or number of retries. That way, it gets out of the bot retry queue, and also gets it to the attention of the maintainers (rather than having to go dig into the status page entrails). To me that'd be a double-win.

The same reasoning can probably be applied to "bot error" as well.

@beckermr
Copy link
Contributor

Yes this last thing we should do for sure. There are some edge cases around bot automerge but those are easy to handle.

@h-vetinari
Copy link
Contributor Author

Just realized that if we do that, it automatically also solves #2613 for free 🙃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants