Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIXED] Improvements to dealing with old or non-existant index.db #5893

Merged
merged 1 commit into from
Sep 17, 2024

Conversation

derekcollison
Copy link
Member

We had a condition where an old index.db was not able to properly restore a stream due to max msgs per subject being set and certain blocks being compacted away and removing subject info for those sequences. In addition we fixed recovery after Truncate and PurgeEx by subject when the index.db was corrupt or not available.

This change also moves generating the index.db file to after writing the blocks during a snapshot and we do a force call to make sure it is written even when complex.

Signed-off-by: Derek Collison [email protected]

…g snapshot restore or restart.

We had a condition where an old index.db was not able to properly restore a stream due to max msgs per subject being set and certain blocks being compacted away and removing subject info for those sequences.
In addition we fixed recovery after Truncate and PurgeEx by subject when the index.db was corrupt or not available.

This change also moves generating the index.db file to after writing the blocks during a snapshot and we do a force call to make sure it is written even when complex.

Signed-off-by: Derek Collison <[email protected]>
@derekcollison derekcollison requested a review from a team as a code owner September 16, 2024 22:12
Copy link
Member

@neilalexander neilalexander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@derekcollison derekcollison merged commit 83c77b4 into main Sep 17, 2024
5 checks passed
@derekcollison derekcollison deleted the fs-compact-recover branch September 17, 2024 14:46
wallyqs added a commit that referenced this pull request Sep 17, 2024
derekcollison pushed a commit that referenced this pull request Sep 18, 2024
Extension to #5893

If we can't update the index.db upon shutdown, for example during a hard
kill, we'd enter into this condition if `MaxMsgsPer` was set.

https://github.com/nats-io/nats-server/pull/5893/files#diff-384c189826934c9a6fc3554dafc63dab2076245010e3d6fce5c71a93e15e9877R1752

However, all limits-based fields have this issue not just `MaxMsgsPer`.
Running similar tests where `nats str info` before hard kill should
equal its output after hard kill:
- `MaxMsgsPer`: −7,877 diff (fixed with addition of above condition/PR)
- `MaxMsgs`: +2,123 diff
- `MaxAge`: no diff (correct messages, but still `[WRN] Filestore
[stream] loadBlock error: message block data missing`)
- `MaxBytes`: +3,567 diff (had a MaxBytes set of 1016 MiB, but after
restart the state has more messages and Bytes: 1020 MiB)

I think we shouldn't only target `MaxMsgsPer`, since other fields can
also trigger this and making it specific to also include these other
fields would come back to bite if we add other limits-based fields in
the future and forget to add it in this condition.
We need to detect index.db was not written during shutdown or there is a
difference between index.db and our msg blocks. If we detect this we
can't rely on it being correct still, so I'd propose to simplify and
upon detecting defer to rebuilding.

Signed-off-by: Maurice van Veen <[email protected]>

---------

Signed-off-by: Maurice van Veen <[email protected]>
neilalexander pushed a commit that referenced this pull request Sep 20, 2024
Extension to #5893

If we can't update the index.db upon shutdown, for example during a hard
kill, we'd enter into this condition if `MaxMsgsPer` was set.

https://github.com/nats-io/nats-server/pull/5893/files#diff-384c189826934c9a6fc3554dafc63dab2076245010e3d6fce5c71a93e15e9877R1752

However, all limits-based fields have this issue not just `MaxMsgsPer`.
Running similar tests where `nats str info` before hard kill should
equal its output after hard kill:
- `MaxMsgsPer`: −7,877 diff (fixed with addition of above condition/PR)
- `MaxMsgs`: +2,123 diff
- `MaxAge`: no diff (correct messages, but still `[WRN] Filestore
[stream] loadBlock error: message block data missing`)
- `MaxBytes`: +3,567 diff (had a MaxBytes set of 1016 MiB, but after
restart the state has more messages and Bytes: 1020 MiB)

I think we shouldn't only target `MaxMsgsPer`, since other fields can
also trigger this and making it specific to also include these other
fields would come back to bite if we add other limits-based fields in
the future and forget to add it in this condition.
We need to detect index.db was not written during shutdown or there is a
difference between index.db and our msg blocks. If we detect this we
can't rely on it being correct still, so I'd propose to simplify and
upon detecting defer to rebuilding.

Signed-off-by: Maurice van Veen <[email protected]>

---------

Signed-off-by: Maurice van Veen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants