Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow for tracking PRs assignment #1754

Closed

Conversation

apiraino
Copy link
Contributor

@apiraino apiraino commented Dec 13, 2023

General overview at: #1753

This patch implements the first part:

  • a new DB table with just the fields to track how many PRs are assigned to a contributor at any time
  • Update this table everytime a PR is assigned or unassigned

No initial sync is planned at this time. Populating the table will happen over time.

This feature is enabled by adding the following to the triagebot.toml in the Rust git repository:

# Enable this feature to keep track of the PR review assignment
[review-work-queue]

@apiraino apiraino force-pushed the new-pull-request-assignment-1 branch from a9800b7 to 7465641 Compare December 13, 2023 15:50
@apiraino apiraino marked this pull request as draft December 14, 2023 14:39
@apiraino apiraino force-pushed the new-pull-request-assignment-1 branch 6 times, most recently from b394940 to 595eb3c Compare December 29, 2023 17:01
@apiraino apiraino force-pushed the new-pull-request-assignment-1 branch from 595eb3c to 911efd9 Compare January 5, 2024 12:58
General overview at: rust-lang#1753

- Added a new DB table with the fields to track how many PRs are
  assigned to a contributor at any time
- This tDB table will be updated everytime a PR is assigned or
  unassigned

No initial sync is planned at this time. DB table population will happen
over time automatically.
@apiraino apiraino force-pushed the new-pull-request-assignment-1 branch from 911efd9 to 9d7bfcd Compare January 5, 2024 14:05
@apiraino apiraino marked this pull request as ready for review January 5, 2024 14:05
@apiraino
Copy link
Contributor Author

apiraino commented Jan 5, 2024

@jdno I think we can have a look at thist first step of implementation, as per the tracking issue #1753

cc: @rust-lang/infra

thanks!

Comment on lines +368 to +369
[review-work-queue]
enabled-for-teams = ["aaa", "bbb"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to enable this feature only for a subset of teams, rather than tracking all PRs of every contributor from the start.

But that would add a little of complexity to this PR. Maybe worth adding in a subsequent patch, when/if this one is merged?

// 2) create or increase by one the team members work queue
// create team members if they don't exist
for u in event.issue.assignees.iter() {
let user_id = u.id.expect("Github user was expected! Please investigate.");
Copy link
Contributor Author

@apiraino apiraino Jan 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is just to cover the unlikely case where we receive a User with an id=None

Unsure why the User struct is like that but better safe than sorry.

src/db.rs Outdated Show resolved Hide resolved
@apiraino
Copy link
Contributor Author

gentle ping for review @rust-lang/infra

(maybe @ehuss?)

thanks

@Kobzol
Copy link
Contributor

Kobzol commented Jan 19, 2024

CC @jackh726, if you could please take a look.

@@ -307,5 +307,13 @@ CREATE UNIQUE INDEX jobs_name_scheduled_at_unique_index
ON jobs (
name, scheduled_at
);
",
"
CREATE table review_prefs (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intention that this will eventually contain review preferences too? (That's what the name suggests)

My first thought was that there should instead be a table of PRs, with assigned reviewers in that list. But, maybe that doesn't make a bunch of sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. The new DB table at the end of the story will also persist the review preferences for each reviewer (thus the name). I'm trying to slip in changes one by one and as slowly as possible, as per the plan detailed in #1753

PRs in a separate joined table def. make sense in general. But in this case I think the use case is so minimal to not warrant that. If in the far future the tool develops such as to make it a compelling choice, we can always migrate.


// Note: When changing assignees for a PR, we don't receive the assignee(s) removed, we receive
// an event `Unassigned` and the remaining assignees

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect this function to "diff" the current assignees with the new assigness, and only update based on that diff...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also why it makes sense to me that the table be of PRs with a list of assignees.

But, that complicates the querying. Could think about a table view that aggregates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect this function to "diff" the current assignees with the new assigness, and only update based on that diff

Hm, I'm a bit unclear on this comment 🤔 do you suggest a "3-way" comparison between two arrays (list of current assignees from the DB vs. new list of assignees from GH)? Example:

  • assign to new assignees
  • check if current assignees are still such
  • unassign from removed assignees

(or maybe something else?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I mean, just compare the lists and find what needs to be removed and what needs to be added.

src/handlers/review_work_queue.rs Outdated Show resolved Hide resolved
triagebot.toml Outdated Show resolved Hide resolved
@jackh726
Copy link
Member

I'm thinking:

It's pretty trivial to get a list of all open PRs and their assignees:

Here's the graphql query (you do still have to paginate):

query {
  repository(owner: "rust-lang", name: "rust") {
    pullRequests(first: 100, states:OPEN) {
      pageInfo {
        hasNextPage
        endCursor
      }
      nodes {
        number
        assignees(first: 100) {
          nodes {
            login
            id
          }
        }
      }
    }
  }
}

It's probably worth just setting up a sync once on start, and again every 30 minutes, along with listening for events. I don't think only just waiting for the table to populate over time will really work.

I'd be curious to know how long it would take to run a whole "query the current state and update the DB". Probably not that long - even if we way over estimate, we should only have on the order of a 1000 PRs to deal with.

@apiraino apiraino force-pushed the new-pull-request-assignment-1 branch from f8ff17d to ed0389b Compare January 23, 2024 18:59
@rustbot
Copy link
Collaborator

rustbot commented Jan 23, 2024

Error: Invalid triagebot.toml at position 7:2:

TOML parse error at line 7, column 2
  |
7 | [review-work-queue]
  |  ^^^^^^^^^^^^^^^^^
unknown field `review-work-queue`, expected one of `relabel`, `assign`, `ping`, `nominate`, `prioritize`, `major-change`, `glacier`, `close`, `autolabel`, `notify-zulip`, `github-releases`, `review-submitted`, `review-requested`, `shortcut`, `note`, `mentions`, `no-merges`, `validate-config`

Please file an issue on GitHub at triagebot if there's a problem with this bot, or reach out on #t-infra on Zulip.

@apiraino apiraino force-pushed the new-pull-request-assignment-1 branch from ed0389b to e375877 Compare January 23, 2024 19:20
@rustbot
Copy link
Collaborator

rustbot commented Jan 23, 2024

Error: Invalid triagebot.toml at position 7:2:

TOML parse error at line 7, column 2
  |
7 | [review-work-queue]
  |  ^^^^^^^^^^^^^^^^^
unknown field `review-work-queue`, expected one of `relabel`, `assign`, `ping`, `nominate`, `prioritize`, `major-change`, `glacier`, `close`, `autolabel`, `notify-zulip`, `github-releases`, `review-submitted`, `review-requested`, `shortcut`, `note`, `mentions`, `no-merges`, `validate-config`

Please file an issue on GitHub at triagebot if there's a problem with this bot, or reach out on #t-infra on Zulip.

@apiraino
Copy link
Contributor Author

It's probably worth just setting up a sync once on start, and again every 30 minutes, along with listening for events. I don't think only just waiting for the table to populate over time will really work.

Why wouldn't it? Can you elaborate a bit on that?

I'll think about your proposal but my immediate thought I think is that I don't agree with this. We would be losing the concept of a stable "source of truth". I am afraid that querying GH every n-minutes would introduce a point of failure in the consistency, the DB would just become a 'cache' reflecting the GH situation. Then why not just make it a full real cache serializing a json and avoid the DB table altogether?

My reasoning about having the DB table is to have a storage persistence that can be used by the PR review assignment and not only by that. I imagine a future where it could be expanded and used also for statistics, for everything we want to know about PRs and reviews.

@rustbot
Copy link
Collaborator

rustbot commented Jan 24, 2024

Error: Invalid triagebot.toml at position 7:2:

TOML parse error at line 7, column 2
  |
7 | [review-work-queue]
  |  ^^^^^^^^^^^^^^^^^
unknown field `review-work-queue`, expected one of `relabel`, `assign`, `ping`, `nominate`, `prioritize`, `major-change`, `glacier`, `close`, `autolabel`, `notify-zulip`, `github-releases`, `review-submitted`, `review-requested`, `shortcut`, `note`, `mentions`, `no-merges`, `validate-config`

Please file an issue on GitHub at triagebot if there's a problem with this bot, or reach out on #t-infra on Zulip.

@jackh726
Copy link
Member

We would be losing the concept of a stable "source of truth". I am afraid that querying GH every n-minutes would introduce a point of failure in the consistency, the DB would just become a 'cache' reflecting the GH situation.

The concern I have is that Github is the source of truth right now. Do you expect otherwise?

Then why not just make it a full real cache serializing a json and avoid the DB table altogether?

I mean, this isn't a bad option, to keep just everything in memory. But seems unnecessary.

I imagine a future where it could be expanded and used also for statistics, for everything we want to know about PRs and reviews.

I think this is useful, but not with the current table schema.

@apiraino
Copy link
Contributor Author

Closing and restarting from scratch in #1773 ...

@apiraino apiraino closed this Feb 20, 2024
@rustbot
Copy link
Collaborator

rustbot commented Feb 20, 2024

Error: Invalid triagebot.toml at position 7:2:

TOML parse error at line 7, column 2
  |
7 | [review-work-queue]
  |  ^^^^^^^^^^^^^^^^^
unknown field `review-work-queue`, expected one of `relabel`, `assign`, `ping`, `nominate`, `prioritize`, `major-change`, `glacier`, `close`, `autolabel`, `notify-zulip`, `github-releases`, `review-submitted`, `review-requested`, `shortcut`, `note`, `mentions`, `no-merges`, `validate-config`

Please file an issue on GitHub at triagebot if there's a problem with this bot, or reach out on #t-infra on Zulip.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants