Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: sqlite limited delete query bug #2111

Merged
merged 1 commit into from
Oct 9, 2023

Conversation

ABresting
Copy link
Contributor

Description

Observed a bug while testing the retention policy where the database, if filled beyond its size limit, reaches a point where the deletion query proc deleteOldestMessagesNotWithinLimitQuery(table: string, limit: int): SqlQueryStr = ceases to function. The database size remains unchanged, leading to an endless loop in the deletion process, especially when the deletion is expected to be achieved via a loop.

This impacts the retention policy process. Turns out that the sqlite DB query assumes that the id column alone can uniquely identify rows in the table. If two different rows have the same id but different storedAt values, then using only id in the subquery may not produce the intended results. Given the schema has a composite primary key (storedAt, id, pubsubTopic), just checking against id might not be precise enough.

For the size-based retention policy, there isn't a direct correlation between the database's size and its number of rows. Also, deletion based on page count isn't feasible. To manage database size effectively:

Ascertain the total number of messages/rows.
Delete 20% of the messages and then perform a database Vacuum.
Confirm if the adjusted database size aligns with the intended target size.
If not, iterate the process.

So in this process, if the database size increases beyond a limit then a forever loop is encountered.

Changes

Only the query has been modified.

  • Able to delete the outdated database rows correctly

Issue

#1885

@github-actions
Copy link

github-actions bot commented Oct 5, 2023

This PR may contain changes to database schema of one of the drivers.

If you are introducing any changes to the schema, make sure the upgrade from the latest release to this change passes without any errors/issues.

Please make sure the label release-notes is added to make sure upgrade instructions properly highlight this change.

@github-actions
Copy link

github-actions bot commented Oct 5, 2023

You can find the image built from this PR at

quay.io/wakuorg/nwaku-pr:2111

Built from 54a19b0

Copy link
Collaborator

@Ivansete-status Ivansete-status left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks!
Very well explained the issue.

Btw, this issue would happen if we start applying the upcoming size-based retention policy, right?

We never faced it because we are not applying a loop in the currently existing retention policies, right?

@ABresting
Copy link
Contributor Author

LGTM! Thanks! Very well explained the issue.

Btw, this issue would happen if we start applying the upcoming size-based retention policy, right?
Yes

We never faced it because we are not applying a loop in the currently existing retention policies, right?
Yes that's correct!

Copy link
Contributor

@jm-clius jm-clius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. This shows another reason why we need a unique "index" column in the schema that is derived according to the deterministic message hashing algorithm here: https://rfc.vac.dev/spec/14/#deterministic-message-hashing
This deterministic message hashing must be applied everywhere as the authoritative message ID. The existing id column can likely be dropped.

Copy link
Contributor

@alrevuelta alrevuelta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks.

Can we write a unit test to catch this? Don't want to block so another PR is fine.

@ABresting
Copy link
Contributor Author

lgtm thanks.

Can we write a unit test to catch this? Don't want to block so another PR is fine.

Actually, this was caught while extending the retention policy test case, I stress tested (not on GB scale but slightly more than 2x the original data), how about keeping the existing test case that broke it?

@ABresting ABresting merged commit 06bc433 into master Oct 9, 2023
10 checks passed
@ABresting ABresting deleted the fix-sqlite-delete-limited-query branch October 9, 2023 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants