Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataShard: keep small outgoing readsets in memory #10772

Open
snaury opened this issue Oct 23, 2024 · 0 comments
Open

DataShard: keep small outgoing readsets in memory #10772

snaury opened this issue Oct 23, 2024 · 0 comments
Assignees

Comments

@snaury
Copy link
Member

snaury commented Oct 23, 2024

Historically DataShards could exchange large readsets (with actual read results), so it makes sense that DataShard doesn't keep them in memory after sending, and re-reads them on every resend attempts. However, modern "generic" readsets are usually very small (3 bytes), and it doesn't make any sense to re-read them on every disconnect. We should just store these small readsets in memory and avoid unnecessary transactions.

Additionally, we don't use column families for OutReadSets, so readset data is stored with all other columns. This means we read all data at init time anyway. We should probably move data column into a separate column family, or keep that data in memory until we send these readsets for the first time.

Next, the current progress queue may be quadratic in nature, since we don't clear sent readsets hashset when adding them to the progress queue. This means in an unlikely case where pipe fails again before the queue is drained we could re-add these readsets multiple times unnecessarily. This should be an easy fix too.

@snaury snaury self-assigned this Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant