Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document and test spill->target hysteresis cycle #5813

Merged
merged 16 commits into from
Feb 21, 2022

Conversation

crusaderky
Copy link
Collaborator

@crusaderky crusaderky commented Feb 15, 2022

Follow-up to #5788.

  • Add documentation and write unit tests around the (to me previously unknown) behaviour where, once a worker reaches the spill threshold, it won't stop spilling until it goes below the target threshold.
  • Fix bug when target=False and spill is not False, where the worker would be flushed completely empty.
  • Tentatively fix flakiness in test_pause_executor; see Memory may not shrink fast enough #5840.

@crusaderky crusaderky self-assigned this Feb 15, 2022
@crusaderky crusaderky marked this pull request as ready for review February 16, 2022 11:15
@github-actions
Copy link
Contributor

github-actions bot commented Feb 16, 2022

Unit Test Results

       11 files   -        1         11 suites   - 1   6h 25m 25s ⏱️ - 46m 6s
  2 608 tests +       1    2 528 ✔️ +       2    80 💤 +    1  0  - 1 
14 179 runs   - 1 387  13 291 ✔️  - 1 174  888 💤  - 210  0  - 2 

Results for commit f32a7cd. ± Comparison against base commit 43dfb61.

♻️ This comment has been updated with latest results.

@crusaderky
Copy link
Collaborator Author

All test failures are unrelated; ready for review and merge

Comment on lines 163 to 168
1. At 60% of memory load (as estimated by ``sizeof``), spill least recently used data
to disk
to disk.
2. At 70% of memory load (as reported by the OS), spill least recently used data to
disk regardless of what is reported by ``sizeof``; this accounts for memory used by
the python interpreter, modules, global variables, memory leaks, etc.
the python interpreter, modules, global variables, memory leaks, etc. The spilling
stops when the memory goes below 60%, in a hysteresis cycle.
Copy link
Member

@fjetter fjetter Feb 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These descriptions always confused me and without reading (and understanding) the entire code it is almost impossible to infer the actual behaviour. How about something like this


Passive spilling

At 60% (memory_target_threshold) memory load (managed memory (xref to page with memory description) , as estimated by sizeof), spill least recently used data to disk whenever a new task finishes. This typically happens whenever a task finishes or remote data is fetched.

Active spilling / Hysteresis

At 70% (memory_spill_fraction) of memory load (RSS memory as reported by the OS (xref total memory in page with memory description), start to spill least recently used data to disk until process memory falls below 60% (memory_target_threshold). This happens in the background even while tasks are being executed.


Note: Passive/active may not make a lot of sense. I think it helps giving these two mechanisms a different name since they act on different measurements and behave very, very differently. Just listing them in an ordered list doesn't give this justice, imo.

Writing out the two mechanisms above the way I did, I'm also wondering if we should change the configuration to make this distinction more explicit instead of reusing the same config parameter for an actually very different mechanism

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I like "This happens in the background even while tasks are being executed", as it is misleading for multi-threaded workers where the target threshold is also being hit while tasks are being executed.

I rewrote the section; see if you like it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. I think the new docs are much clearer on the behaviour than what we had before 👏

docs/source/worker.rst Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants