Monitor spilled data that is still referenced #6220

crusaderky · 2022-04-27T13:05:28Z

Follow-up to #5936

In a normal situation, a worker holds some in-memory tasks that are relatively at rest while it is actively working on others.
In that case, in case of memory pressure, the tasks at rest are spilled to disk using a LRU algorithm.

In case of extreme memory pressure, however, a key may be spilled to disk while it is in use - either because it's an input to another task that's currently running or because it's being sent to another worker. When that happens, the data is spilled but the memory is not released until the compute or send has finished; in the GUI, its RAM will transition from "managed" to a double effect of "unmanaged recent" plus "spilled".

It would be valuable to separate this kind of memory usage from the opaque unmanaged blob.
This is straightforward after #5936:

class SpillBuffer:
    @property
    def spilled_but_still_referenced(self) -> int:
        if not has_zict_220:
            return 0
        cache = cast(Cache, self.slow)
        slow = cast(Slow, cache.data)
        return sum(slow.weight_by_key[key].memory for key in cache.cache)

The above is O(n) to the number of active computations and transfers - so negligible most times.
The output could be sent to the scheduler during heartbeat and contribute to distributed.scheduler.MemoryState, like it already happens for SpillBuffer.spilled_total.

TODO

Come up with a good way to visualize this info in the GUI

OUT OF SCOPE

UIse the new metric in algorithms (but feel free to discuss here)

The text was updated successfully, but these errors were encountered:

crusaderky added the memory label Apr 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Monitor spilled data that is still referenced #6220

Monitor spilled data that is still referenced #6220

crusaderky commented Apr 27, 2022 •

edited

Loading

Monitor spilled data that is still referenced #6220

Monitor spilled data that is still referenced #6220

Comments

crusaderky commented Apr 27, 2022 • edited Loading

TODO

OUT OF SCOPE

crusaderky commented Apr 27, 2022 •

edited

Loading