Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check collection job equivalence by query, not batch ID. #1878

Merged
merged 1 commit into from
Sep 5, 2023

Conversation

branlwyd
Copy link
Member

@branlwyd branlwyd commented Sep 1, 2023

This allows current-batch requests to be reissued in fixed-size tasks, simplifying implementation of a Collector & avoiding the potential for loss of a collection job due to an inopportunely-timed task failure.

While I'm at it, I also update get_collection_job to take a task ID in addition to a collection job ID. This provides isolation between different tasks, which is especially imporant now that the Collector generates the collection job IDs -- otherwise, it would be trivial for Collectors to generate the same collection job ID for two tasks, which would cause some very strange behavior in Janus.

Resolves #1877. Part of #1860.

This allows current-batch requests to be reissued in fixed-size tasks,
simplifying implementation of a Collector.

While I'm at it, I also update `get_collection_job` to take a task ID in
addition to a collection job ID. This provides isolation between
different tasks, which is especially imporant now that the Collector
generates the collection job IDs -- otherwise, it would be trivial for
Collectors to generate the same collection job ID for two tasks, which
would cause some very strange behavior in Janus.
@branlwyd branlwyd added the allow-changed-migrations Override the ci-migrations check to allow migrations that have changed. label Sep 1, 2023
@branlwyd branlwyd marked this pull request as ready for review September 1, 2023 22:38
@branlwyd branlwyd requested a review from a team as a code owner September 1, 2023 22:38
Copy link
Contributor

@inahga inahga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I'm at it, I also update get_collection_job to take a task ID in addition to a collection job ID. This provides isolation between different tasks, which is especially imporant now that the Collector generates the collection job IDs -- otherwise, it would be trivial for Collectors to generate the same collection job ID for two tasks, which would cause some very strange behavior in Janus.

I imagine this should be backported to our other active branches?

@branlwyd
Copy link
Member Author

branlwyd commented Sep 5, 2023

I imagine [the get_collection_job change] should be backported to our other active branches?

Yes, I think this change should be backported.

The batch-equivalence check should probably also be backported; we must backport it if we want to implement #1860 (comment) (since implementing that requires knowing the query that was used to generate a collection job).

@branlwyd branlwyd merged commit 870b119 into main Sep 5, 2023
6 of 7 checks passed
@branlwyd branlwyd deleted the bran/collection-job-dedup-by-query branch September 5, 2023 22:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
allow-changed-migrations Override the ci-migrations check to allow migrations that have changed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Janus collection checks allow fixed-size batches to be lost
2 participants