DataShard: keep small outgoing readsets in memory #10772

snaury · 2024-10-23T11:39:41Z

Historically DataShards could exchange large readsets (with actual read results), so it makes sense that DataShard doesn't keep them in memory after sending, and re-reads them on every resend attempts. However, modern "generic" readsets are usually very small (3 bytes), and it doesn't make any sense to re-read them on every disconnect. We should just store these small readsets in memory and avoid unnecessary transactions.

Additionally, we don't use column families for OutReadSets, so readset data is stored with all other columns. This means we read all data at init time anyway. We should probably move data column into a separate column family, or keep that data in memory until we send these readsets for the first time.

Next, the current progress queue may be quadratic in nature, since we don't clear sent readsets hashset when adding them to the progress queue. This means in an unlikely case where pipe fails again before the queue is drained we could re-add these readsets multiple times unnecessarily. This should be an easy fix too.

The text was updated successfully, but these errors were encountered:

snaury self-assigned this Oct 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataShard: keep small outgoing readsets in memory #10772

DataShard: keep small outgoing readsets in memory #10772

snaury commented Oct 23, 2024

DataShard: keep small outgoing readsets in memory #10772

DataShard: keep small outgoing readsets in memory #10772

Comments

snaury commented Oct 23, 2024