Delete access logs after specified time #5084
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Checklist
Description
Creates a background task to delete access logs after a number of days, specified in the Constance configuration as ACCESS_LOG_LIFESPAN.
Notes
Creates a spawning task that collects all ids of non-submission group access logs over ACCESS_LOG_LIFESPAN days old, and all submission groups where the latest date is more than ACCESS_LOG_LIFESPAN days ago, and enqueues a subtask per batch of 1000. The subtask then fetches the audit logs and deletes the batch.
The batching is to prevent tying up the database (or the celery queue) for too long, and so if something goes wrong with one batch of deletions the rest can continue.
The
more-itertools
library provides an easier way of breaking up the list of logs to be deleted into equal-sized sublists. It is a well-maintained library with a solid reputation https://github.com/more-itertools/more-itertools. The built-initertools
library actually has this method, but only in python 3.12. If and when we get there, we can remove the new import.Blocked by #5080.