-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: sqlite limited delete query bug #2111
Conversation
This PR may contain changes to database schema of one of the drivers. If you are introducing any changes to the schema, make sure the upgrade from the latest release to this change passes without any errors/issues. Please make sure the label |
You can find the image built from this PR at
Built from 54a19b0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks!
Very well explained the issue.
Btw, this issue would happen if we start applying the upcoming size-based retention policy, right?
We never faced it because we are not applying a loop in the currently existing retention policies, right?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me. This shows another reason why we need a unique "index" column in the schema that is derived according to the deterministic message hashing algorithm here: https://rfc.vac.dev/spec/14/#deterministic-message-hashing
This deterministic message hashing must be applied everywhere as the authoritative message ID. The existing id
column can likely be dropped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm thanks.
Can we write a unit test to catch this? Don't want to block so another PR is fine.
Actually, this was caught while extending the retention policy test case, I stress tested (not on GB scale but slightly more than 2x the original data), how about keeping the existing test case that broke it? |
Description
Observed a bug while testing the retention policy where the database, if filled beyond its size limit, reaches a point where the deletion query
proc deleteOldestMessagesNotWithinLimitQuery(table: string, limit: int): SqlQueryStr =
ceases to function. The database size remains unchanged, leading to an endless loop in the deletion process, especially when the deletion is expected to be achieved via a loop.This impacts the retention policy process. Turns out that the sqlite DB query assumes that the
id
column alone can uniquely identify rows in the table. If two different rows have the sameid
but differentstoredAt
values, then using onlyid
in the subquery may not produce the intended results. Given the schema has a composite primary key (storedAt, id, pubsubTopic
), just checking against id might not be precise enough.For the size-based retention policy, there isn't a direct correlation between the database's size and its number of rows. Also, deletion based on page count isn't feasible. To manage database size effectively:
Ascertain the total number of messages/rows.
Delete 20% of the messages and then perform a database Vacuum.
Confirm if the adjusted database size aligns with the intended target size.
If not, iterate the process.
So in this process, if the database size increases beyond a limit then a forever loop is encountered.
Changes
Only the query has been modified.
Issue
#1885