-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FIXED] Also recover on old index.db when not using MaxMsgsPer #5901
Conversation
Signed-off-by: Maurice van Veen <[email protected]>
Signed-off-by: Maurice van Veen <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For MaxMsgs and MaxBytes, and MaxAge (for now), we can try a different approach such that when we detect we did not match on last block we whip through the blocks recovered via index.db and os.Stat() the file.
If not present, remove from top level accounting, if exists recover that one from disk and break since that will be all that is needed..
I could try to take your updated test (awesome) and see if that approach would work.
I could play with that tomorrow for sure. Last week I did already try the os.Stat approach upon initially adding the block higher up. But that would break lost data accounting. Might be doing it lower down here would work better. |
Actually thinking some more about this, we would need to redo the PSIM layer since we do not know what we lost and hence we would need to load those blocks anyway. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @MauriceVanVeen ! |
Extension to #5893 If we can't update the index.db upon shutdown, for example during a hard kill, we'd enter into this condition if `MaxMsgsPer` was set. https://github.com/nats-io/nats-server/pull/5893/files#diff-384c189826934c9a6fc3554dafc63dab2076245010e3d6fce5c71a93e15e9877R1752 However, all limits-based fields have this issue not just `MaxMsgsPer`. Running similar tests where `nats str info` before hard kill should equal its output after hard kill: - `MaxMsgsPer`: −7,877 diff (fixed with addition of above condition/PR) - `MaxMsgs`: +2,123 diff - `MaxAge`: no diff (correct messages, but still `[WRN] Filestore [stream] loadBlock error: message block data missing`) - `MaxBytes`: +3,567 diff (had a MaxBytes set of 1016 MiB, but after restart the state has more messages and Bytes: 1020 MiB) I think we shouldn't only target `MaxMsgsPer`, since other fields can also trigger this and making it specific to also include these other fields would come back to bite if we add other limits-based fields in the future and forget to add it in this condition. We need to detect index.db was not written during shutdown or there is a difference between index.db and our msg blocks. If we detect this we can't rely on it being correct still, so I'd propose to simplify and upon detecting defer to rebuilding. Signed-off-by: Maurice van Veen <[email protected]> --------- Signed-off-by: Maurice van Veen <[email protected]>
Extension to #5893
If we can't update the index.db upon shutdown, for example during a hard kill, we'd enter into this condition if
MaxMsgsPer
was set.https://github.com/nats-io/nats-server/pull/5893/files#diff-384c189826934c9a6fc3554dafc63dab2076245010e3d6fce5c71a93e15e9877R1752
However, all limits-based fields have this issue not just
MaxMsgsPer
.Running similar tests where
nats str info
before hard kill should equal its output after hard kill:MaxMsgsPer
: −7,877 diff (fixed with addition of above condition/PR)MaxMsgs
: +2,123 diffMaxAge
: no diff (correct messages, but still[WRN] Filestore [stream] loadBlock error: message block data missing
)MaxBytes
: +3,567 diff (had a MaxBytes set of 1016 MiB, but after restart the state has more messages and Bytes: 1020 MiB)I think we shouldn't only target
MaxMsgsPer
, since other fields can also trigger this and making it specific to also include these other fields would come back to bite if we add other limits-based fields in the future and forget to add it in this condition.We need to detect index.db was not written during shutdown or there is a difference between index.db and our msg blocks. If we detect this we can't rely on it being correct still, so I'd propose to simplify and upon detecting defer to rebuilding.
Signed-off-by: Maurice van Veen [email protected]