Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add history purge command to minder server cli. #3976

Merged
merged 3 commits into from
Jul 25, 2024
Merged

Add history purge command to minder server cli. #3976

merged 3 commits into from
Jul 25, 2024

Conversation

blkt
Copy link
Contributor

@blkt blkt commented Jul 24, 2024

Summary

This command is used to manage the life cycle of the history log. The business requirement is to delete all records older than 30 days maintaining the most recent one for each entity/rule pair even if older than 30 days.

Implementing this requirement mandates the processing of the whole set of records older than 30 days, which cannot be processed in chunks without creating arbitrary holes in the history. As a first approximation, the proposed implementation loads all records in RAM, filters out the ones to keep, and issues a series of deletions of up to 1000 records each. A test was added to keep track of the record size. A future improvement would be to spill records to secondary storage, where we would perform sorting and filtering, but it was overly complex and unjustified at this point in time.

Fixes #3636

Change Type

  • Bug fix (resolves an issue without affecting existing features)
  • Feature (adds new functionality without breaking changes)
  • Breaking change (may impact existing functionalities or require documentation updates)
  • Documentation (updates or additions to documentation)
  • Refactoring or test improvements (no bug fixes or new functionality)

Testing

Some unit tests, mostly manual tests.

Review Checklist:

  • Reviewed my own code for quality and clarity.
  • Added comments to complex or tricky code sections.
  • Updated any affected documentation.
  • Included tests that validate the fix or feature.
  • Checked that related changes are merged.

@blkt blkt self-assigned this Jul 24, 2024
@coveralls
Copy link

coveralls commented Jul 24, 2024

Coverage Status

coverage: 54.307% (-0.02%) from 54.324%
when pulling cc162d4 on issue-3636
into 4cf4cee on main.

@blkt blkt force-pushed the issue-3636 branch 3 times, most recently from d3c3752 to 5d6d4ac Compare July 24, 2024 15:54
@blkt blkt marked this pull request as ready for review July 24, 2024 15:54
@blkt blkt requested a review from a team as a code owner July 24, 2024 15:54
return nil
}

// filterRecords sift through the records separating the latest for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this could be implemented in the SQL query by using the latest_evaluation_statuses table and the EXCEPT operator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I integrated this suggestion, but I haven't used the EXCEPT operator and implemented it using joins.

@blkt blkt force-pushed the issue-3636 branch 2 times, most recently from 6ca77ea to d7a66c7 Compare July 25, 2024 11:25
blkt added 3 commits July 25, 2024 16:44
This command is used to manage the life cycle of the history log. The
business requirement is to delete all records older than 30 days
maintaining the most recent one for each entity/rule pair even if
older than 30 days.

Implementing this requirement mandates the processing of the whole set
of records older than 30 days, which cannot be processed in chunks
without creating arbitrary holes in the history. As a first
approximation, the proposed implementation loads all records in RAM,
filters out the ones to keep, and issues a series of deletions of up
to 1000 records each. A test was added to keep track of the record
size. A future improvement would be to spill records to secondary
storage, where we would perform sorting and filtering, but it was
overly complex and unjustified at this point in time.

Fixes #3636
Table `latest_evaluation_statuses` tracks the latest evaluation id for
any given entity/rule pair. Adding it via left join allows us to
determine which records are not the latest ones among those older than
30 days by relying totally on the database rather than doing the
processing in application code. This also lowers a little bit the
resources necessary to process deletions.
@blkt blkt merged commit 07a22f1 into main Jul 25, 2024
21 checks passed
@blkt blkt deleted the issue-3636 branch July 25, 2024 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement cleanup for history evaluation logs
3 participants