Remove unbuffered mode #414

bdice · 2021-01-02T05:46:06Z

Feature description

Currently we support an "unbuffered" mode in signac-flow that disables signac's buffering capabilities, by disabling the config option use_buffered_mode (it's on by default). I don't know if it still makes sense to support an unbuffered mode because we require signac >= 1.0. I believe buffering was initially an optional feature (rather than always-on) because signac didn't support buffering in some earlier versions. Our CI takes a long time to run and testing both options (on/off) seems unnecessary at this point.

Proposed solution

The option defaults to True, and I think this feature could be immediately removed with no deprecation cycle -- in my understanding, it's basically a flag that disables an important performance optimization. However, I don't want to immediately remove it from the config validator (that's more of a breaking change than just requiring buffered mode). I suggest the following.

Deprecation:

Change the behavior of FlowProject._potentially_buffered to warn if use_buffered_mode is specified and ignore the option (always buffer).
Remove the tests for unbuffered projects.

Removal (after 2 minor versions):

Remove use_buffered_mode option from the config validator (config files with the option specified are no longer valid).
The FlowProject._potentially_buffered context manager could be removed and replaced with: with signac.buffered(): or we could just change the name to FlowProject._buffered.

Before anyone begins work on this issue, I would like to get input from @glotzerlab/signac-committers.

The text was updated successfully, but these errors were encountered:

atravitz · 2021-01-02T16:11:26Z

Doing a short deprecation cycle for this seems reasonable to me 👍

vyasr · 2021-01-04T19:00:25Z

The main hesitation I would have with changing this immediately is relating to glotzerlab/signac#397. It should be relatively easy to modify the new code for glotzerlab/signac#249 to handle thread parallelism transparently (and I plan to do this), but I don't know of any way to make it intrinsically safe for process parallelism. If a signac-flow user sets the status_parallelization option to 'process', I think signac-flow will have to be responsible for doing the update in a parallel-safe manner. It's possible that future refactorings in flow will make this easier to achieve (or a moot point), but so long as that limitation exists the options are:

Always disable buffering when using process-parallel status updates. This will always be safe, but may end up making things slow enough that process parallelism becomes useless. Since switching to process-parallel status updates indicates that the user has identified processing rather than I/O as the bottleneck, there's no way to know for sure without testing on a particular workflow.
Always set a massive (or unlimited) buffer size when using process-parallel status updates. This would mostly work, but could silently fail once there's enough data, or in the unlimited case lead to deadlock at some point if there's a massive number of I/O requests at once (Lustre filesystems in particular can have this problem).
Simply include user warnings to the effect that process-parallel status updates may not be safe for large documents. This solution is just asking for users to shoot themselves in the foot, though.
Allow the user to disable buffering using the flag in this issue.

Option (4) isn't great, but none of the others currently are either so we may need to consider that possibility before deciding whether or not to deprecate this.

bdice · 2021-01-04T19:10:48Z

@vyasr Fetching scheduler status is done before the parallel section begins. That’s the only part that writes to the project document. The parallel portion of the code is read-only, evaluating conditions and determining eligibility.

vyasr · 2021-01-04T20:05:52Z

Ah right. So in that case, am I correct that status_parallelization='thread' is useful only if users write I/O-bound label functions (which is admittedly not unlikely if people are using job documents etc)? Is the call to _fetch_scheduler_status not done in buffered mode? Did I change it to just update the cached_status?

vyasr · 2021-01-04T20:28:19Z

I discussed my previous comment offline with @bdice to explain my point a little more clearly, and I took a look at the code. Tl;dr, I'm fine with deprecating this flag.

My point here was that since we default to thread-parallel execution of job/group label acquisition, these tasks are only accelerated to the extent that they are I/O-bound because Python is inherently single-threaded. If label functions are expensive because they're actually doing some significant amount of processing, then process parallelism is the way to go. Having said that, all of this parallel code happens after the serial writing of the scheduler status to the project document as Bradley mentioned above, so I think we're fine to move forward with this deprecation.

bdice · 2021-01-20T19:05:12Z

I'm going to mark this issue as resolved - the design decisions have all been made and implemented in #425, and the only remaining task is to remove the deprecation warning (and remove the default value from the flow config monkeypatch) in a later release.

bdice added enhancement New feature or request refactor Code refactoring labels Jan 2, 2021

bdice mentioned this issue Jan 13, 2021

Deprecate unbuffered mode #425

Merged

12 tasks

bdice closed this as completed Jan 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove unbuffered mode #414

Remove unbuffered mode #414

bdice commented Jan 2, 2021 •

edited

Loading

atravitz commented Jan 2, 2021

vyasr commented Jan 4, 2021

bdice commented Jan 4, 2021

vyasr commented Jan 4, 2021

vyasr commented Jan 4, 2021

bdice commented Jan 20, 2021

Remove unbuffered mode #414

Remove unbuffered mode #414

Comments

bdice commented Jan 2, 2021 • edited Loading

Feature description

Proposed solution

atravitz commented Jan 2, 2021

vyasr commented Jan 4, 2021

bdice commented Jan 4, 2021

vyasr commented Jan 4, 2021

vyasr commented Jan 4, 2021

bdice commented Jan 20, 2021

bdice commented Jan 2, 2021 •

edited

Loading