-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolve race conditions in shard ownership #3
Comments
Summary after investigating and finding resolution options:
By default it is set to shardCheckFrequency*5. At low shardCheckFrequencies (eg. 500ms, if the app is running slow for some reason), this can cause race conditions whereby the owner of the shard is normally processing a record, but another shard claims ownership. In That will also cause issues with leadership - if the leader is rebooting while another client runs refreshShards(), there's a good chance that the second shard incorrectly claims leadership. At low shardCheckFrequencies, rebooting takes longer than clientRecordMaxAge, and a chain of incorrect shard changes will be detected across other clients too. Solutions:
|
Resolved in #13 |
Because a record's 'staleness' interval is tied to the shardCheckFrequency, when we set this frequency to a low number, we can capture a shard, and have another client claim ownership of the shard before it's released.
We can resolve this race condition by decoupling the shardCheckFrequency from the maxAgeForClientRecord, since it is reasonable that we may take longer then 2.5 seconds to process a record when we poll for records every half-second.
Also investigate if maxAgeForLeaderRecord has similar issue.
The text was updated successfully, but these errors were encountered: