Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #238: Non converging operations with unordered deletes. Migration to fix existing datastores. #241

Merged
merged 2 commits into from
Nov 11, 2024

Conversation

hsanjuan
Copy link
Collaborator

@hsanjuan hsanjuan commented Nov 8, 2024

This is made of two commits, one fixes the problem with the wrong values being set for elements that are partially tombstoned. The second adds a migration to fix potentially wrong values, but also to ensure that we return the right thing now that we don't check for tombstoning anymore on Get(). Fixes #238 .

@hsanjuan hsanjuan self-assigned this Nov 8, 2024
@hsanjuan hsanjuan force-pushed the fix/238-non-converging branch 2 times, most recently from 84d8f8a to 279fb69 Compare November 8, 2024 14:48
@hsanjuan
Copy link
Collaborator Author

hsanjuan commented Nov 8, 2024

@dozyio I wonder if you are able to have a look

@dozyio
Copy link
Contributor

dozyio commented Nov 10, 2024

Being playing around a bit with this PR and it seems good! Puts and deletes converging as expected.

Tested the migrations a little bit and seems ok.

Thanks for fixing the bug!

set.go Outdated Show resolved Hide resolved
crdt_test.go Outdated Show resolved Hide resolved
crdt_test.go Outdated Show resolved Hide resolved
set.go Outdated Show resolved Hide resolved
The problem: a replica may tombstone a value and then receive a new write for
that value that had happened BEFORE the tombstone from a different replica.
The final value/priority pair should be set to the value/priority that was not
tombstoned. However we did not do that. We knew the key was not tombstoned,
but the value returned corresponded to an update that was tombstoned.

The solution: every time a tombstone arrives, we need to look for the "best
value/priority", that is, we need to make sure that among all the key values
that we have, we set the best one that was not tombstoned according to the
CRDT rules (highest priority or lexicographical sorting when equal).

The consecuences: this makes tombstoning a more expensive operation but it
also allows us to remove value/priority altogether when all the values have
been tombstoned. As such, we don't need to check if a value has been
tombstoned anymore when doing Gets/List, before returning the element. That
saves lookups and that also means we no longer need to bloom filter, which was
supposed to speed up this operation. In general, datastore which mostly add
data will be better afterwards.
The fix for #238 does not mean that everything is fine. We will have databases
which have the wrong value/priority sets written and this would only fix
itself on new writes or deletes to the same key.

So we are unfortunately forced to manually fix it on start. For this we
introduce a data migration.

During a fresh start, we will then find all the keys affected by tombstones
and loop them, finding the best value for them (the correct one) and fixing
them. Once done we record that we are on version=1 and don't run this again.

Future fuckups can be fixed with other migrations.
@hsanjuan
Copy link
Collaborator Author

Thanks very much @dozyio for the review and for finding the bug!

@hsanjuan hsanjuan merged commit b43de73 into master Nov 11, 2024
4 checks passed
@hsanjuan hsanjuan deleted the fix/238-non-converging branch November 11, 2024 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Non-Converging Operations
2 participants