-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: helper-side panic when repeating request for GC-eligible aggregation job #2810
Comments
This is no longer a panic on the latest release, but a 500 error. We attempt to write a new aggregation job, but that operation fails because one is already in the database. The bigger problem may be that the second request is accepted in the first place. Why would we attempt to write an aggregation job containing reports who are all past the report expiry age anyway? |
This bug reveals a couple of possible problems:
However I will table working on this bug for now to work on other priorities. AFAIK it has never happened in production, and it only came up while I was searching for a different bug. As mentioned above it doesn't trigger a panic anymore. |
Looked into this a bit more with @branlwyd and @divergentdave. The bug still applies. The problem is here https://github.com/divviup/janus/blob/main/aggregator_core/src/datastore.rs#L2580, where the GC-eligible job is excluded by the WHERE filter. Swapping the sign fixes the case, but breaks the case where an aggregation job ID collides with new report data, and the first aggregation job is GC eligible (i.e. the report interval is more up to date). We may be able to fix this by deleting report aggregations for an aggregation job, after we put the aggregation job. But this may have unforeseen consequences. The edge case that triggers this bug is quite narrow, and the fix may be complex, so we are not going to address it at this time, until it happens in production. |
The text was updated successfully, but these errors were encountered: