Don't use pure managed memory for the target threshold #7421

crusaderky · 2022-12-19T17:57:51Z

Current design

Every time a new key is inserted in Worker.data, if the managed memory (output of sizeof) exceeds the target threshold, keys are spilled from the bottom of the LRU cache until the managed memory goes below target.
This is a synchronous process that does not release the event loop. This isn't great, but it's bounded in the sense that it's never going to spill more bytes than the size of the key that has just been inserted.
Every 100ms (distributed.worker.memory.monitor-interval), measure the process memory through psutil. If the process memory exceeds the spill threshold, start spilling keys until the process memory goes below the target threshold (hysteresis cycle). Re-measure process memory, call garbage collection, and release the event loop multiple times in this process, which can potentially take many seconds.

The intent of this design is to have a very responsive, cheap, but inaccurate first threshold and a slow-to-notice, expensive, but accurate second one. The design however is problematic:

when unmanaged memory (process - managed) is very high, e.g. due to a leak, high heap from the running user functions, or underestimated output of sizeof(). In the extreme cases of memory leaking, you're going to reach the spill threshold without having ever hit the target threshold and then spill the whole contents of Worker.data all at once.
when unmanaged memory is negative, due to overestimated output of sizeof(). This will cause target to start spilling too soon, when there's plenty of memory still available.

Proposed design

In zict:

Add an offset property to zict.LRU. This property is added to total_weights for the purpose of eviction.

In distributed.worker_memory:

Every 100ms, measure process memory and calculate unmanaged memory.
If process memory is above the spill threshold and there is data in Worker.fast, garbage collect and re-measure it.
Update Worker.data.fast.offset to the amount of unmanaged memory.
Manually trigger spilling in zict.

In distributed.worker_state_machine._transition_to_memory, distributed.Worker.execute, and distributed.Worker.get_data: no change, but now the offset is considered every time a key is inserted in fast.

Notes

This could cause zict to synchronously spill many GiBs at once, without ever releasing the event loop. This change should be paired with Asynchronous Disk Access in Workers #4424.
Leaving the current thresholds unchanged, you'll start spilling a lot earlier. Effectively, target is the new spill. I think it's safe to bump both by 0.1 (making spill the same as pause)
We should rename "spill" to "aggressive_gc" to clarify its new meaning.

The text was updated successfully, but these errors were encountered:

crusaderky added the memory label Dec 19, 2022

crusaderky mentioned this issue Mar 10, 2023

Asynchronous Disk Access in Workers #4424

Open

crusaderky self-assigned this Mar 10, 2023

crusaderky mentioned this issue Apr 4, 2023

LRU.offset dask/zict#101

Merged

crusaderky closed this as completed in dask/zict#101 Apr 6, 2023

crusaderky reopened this Apr 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't use pure managed memory for the target threshold #7421

Don't use pure managed memory for the target threshold #7421

crusaderky commented Dec 19, 2022 •

edited

Loading

Don't use pure managed memory for the target threshold #7421

Don't use pure managed memory for the target threshold #7421

Comments

crusaderky commented Dec 19, 2022 • edited Loading

Current design

Proposed design

Notes

crusaderky commented Dec 19, 2022 •

edited

Loading