Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitor spilled data that is still referenced #6220

Open
crusaderky opened this issue Apr 27, 2022 · 0 comments
Open

Monitor spilled data that is still referenced #6220

crusaderky opened this issue Apr 27, 2022 · 0 comments
Labels

Comments

@crusaderky
Copy link
Collaborator

crusaderky commented Apr 27, 2022

Follow-up to #5936

In a normal situation, a worker holds some in-memory tasks that are relatively at rest while it is actively working on others.
In that case, in case of memory pressure, the tasks at rest are spilled to disk using a LRU algorithm.

In case of extreme memory pressure, however, a key may be spilled to disk while it is in use - either because it's an input to another task that's currently running or because it's being sent to another worker. When that happens, the data is spilled but the memory is not released until the compute or send has finished; in the GUI, its RAM will transition from "managed" to a double effect of "unmanaged recent" plus "spilled".

It would be valuable to separate this kind of memory usage from the opaque unmanaged blob.
This is straightforward after #5936:

class SpillBuffer:
    @property
    def spilled_but_still_referenced(self) -> int:
        if not has_zict_220:
            return 0
        cache = cast(Cache, self.slow)
        slow = cast(Slow, cache.data)
        return sum(slow.weight_by_key[key].memory for key in cache.cache)

The above is O(n) to the number of active computations and transfers - so negligible most times.
The output could be sent to the scheduler during heartbeat and contribute to distributed.scheduler.MemoryState, like it already happens for SpillBuffer.spilled_total.

TODO

Come up with a good way to visualize this info in the GUI

OUT OF SCOPE

UIse the new metric in algorithms (but feel free to discuss here)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant