Next: Refactor internals to support aggregation, improved performance and documentation. #427

bdice · 2021-01-18T19:42:07Z

Description

This PR merges the next branch. This branch refactors the internals of signac-flow to support aggregation (execution of operations/groups on multiple jobs at once). The next branch contains the following high-level changes:

Refactor internal workings of signac-flow to support aggregation (Google Summer of Code project by @kidrahahjo). While aggregation itself is not yet supported by the user API, the internal refactoring will allow us to enable the feature in a subsequent release.
Significant improvements to public/developer APIs (including a switch to NumPy-style docstrings)
Many internal API changes to environments, schedulers, status rendering, etc.
Extensive optimization of the FlowProject's major features (status, run, submit). Performance should now meet or exceed that of previous versions before the introduction of groups.

To-do before merging:

Finalize New _fetch_status code. #417 and Refactor aggregation #422
Make aggregator class private (leaving it internal for now), adapt make_group as needed (done in Make class aggregator and method get_aggregate_id private #432)
Update the changelog, there are some important changes that are not yet marked
Check code for any remaining TODO items

Committers: this PR should NOT be squashed when merging.

After this PR is merged:

Review our documentation again; deprecate any remaining public APIs that should be private
Encourage developers to try out the new features, installing from source
Wait around 2 weeks to let users try it out in practice / feature freeze
Release signac-flow 0.12.0 🎉

Types of Changes

Documentation update
Bug fix
New feature
Breaking change¹

¹The change breaks (or has the potential to break) existing functionality.

Checklist:

I am familiar with the Contributing Guidelines.
I agree with the terms of the Contributor Agreement.
My name is on the list of contributors.
My code follows the code style guideline of this project.
The changes introduced by this pull request are covered by existing or newly introduced tests.

If necessary:

I have updated the API documentation as part of the package doc-strings.
I have created a separate pull request to update the framework documentation on signac-docs and linked it here.
I have updated the changelog.

Changes the submission id generation to prepare for aggregation. The `__str__` method of `_JobOperation` is also changed to better handle job aggregation. The unique id generated using a hash of all the `_JobOperation` object's data is still the same, so the uniqueness should not be in question; the only change is to the readable part of the id. The change to the `str` implementation mimics NumPy, and the submission id favors brevity.

Add a new decorator by which operations can be tagged for various types of aggregation, along with classes to store actual aggregates once they have been generated.

…ing (#351) * Use sort_ascending instead of reverse_order. * Fix and simplify hash methods. * Clarify docstrings. * Initialize list/dict with literals. * Use one-line function instead of defining a separate method. * Don't wrap line. * class -> object. * use itertools reference while using groupby * Use reference for zip_longest too * Document _get_unique_function_id Co-authored-by: Hardik Ojha <[email protected]>

…erlab/signac-flow into feature/enable-aggregation

…egation

…erlab/signac-flow into feature/enable-aggregation

Change the internals of flow so that everything operates on default aggregates (aggregates of size 1) rather than individual jobs.

Feature/enable aggregation

* Remove unused __init__, the implicit parent constructor is equivalent. * Use public .job attribute. * Remove unused argument from _add_operation_selection_arg_group. * Use logger.warning instead of deprecated logger.warn.

…led (it's a hard error).

* Deprecate CPUEnvironment and GPUEnvironment. The classes are unused and untested. * Remove unused private functions for importing from files. * Update changelog.

Introduces stylistic changes suggested by pylint.

* Refactor dicts, OrderedDict, and comprehensions. Resolves #323. * Import Mapping from collections.abc. * Use {}

* Use longer/clearer variable names. * Update _no_aggregation. * Rename _operation_to/from_tuple to _job_operation_to/from_tuple Co-authored-by: Brandon Butler <[email protected]>

* Make _to_hashable private. * Copy _to_hashable method from signac.

* Add and improve docstrings. * Minor edits to docstrings. * Use shorter one-line summaries in some long docstrings.

* Clean up docs and variables relating to pre-conditions and post-conditions. * Clean up docstrings, variable names, defaults, remove obsolete helper functions. * Use f-strings, avoid second-person "you". * Use full name for template_filters. * Docstring revisions.

* Fix other instances of job_id missed in #363. * Update changelog. * Use descriptive variable name job_id.

Validates docstrings using pydocstyle and adds all the necessary documentation for this to pass.

* Remove the use of argument * Remove the use of env attribute while submit * Pass environment as keyword argument. * Remove env argument in a few missed places. * Fix template reference data. It was using mpiexec (default) and should use the cluster-specific MPI commands. I don't know why the reference data was wrong. * Remove extra comment about status parallelization. Co-authored-by: Bradley Dice <[email protected]>

* Always buffer, add deprecation warning if use_buffered_mode=False. * Remove unbuffered tests. * Reduce number of jobs to speed up tests. * Update changelog. * Update deprecation notice.

* Refactor status rendering into a private function. Co-authored-by: Alyssa Travitz <[email protected]> Co-authored-by: Hardik Ojha <[email protected]> * Renamed fakescheduler to fake_scheduler. Co-authored-by: Alyssa Travitz <[email protected]> Co-authored-by: Hardik Ojha <[email protected]> * Make environment metaclass private. Co-authored-by: Alyssa Travitz <[email protected]> Co-authored-by: Hardik Ojha <[email protected]> * Remove unused status.py file. Co-authored-by: Alyssa Travitz <[email protected]> Co-authored-by: Hardik Ojha <[email protected]> * Remove out argument from templates. Co-authored-by: Alyssa Travitz <[email protected]> Co-authored-by: Hardik Ojha <[email protected]> * Refactor scheduler classes, unify code. Co-authored-by: Alyssa Travitz <[email protected]> Co-authored-by: Hardik Ojha <[email protected]> * Remove metaclass from docs. Co-authored-by: Alyssa Travitz <[email protected]> Co-authored-by: Hardik Ojha <[email protected]> * Update docstrings. * Update docstrings. * Update changelog. * Update docstrings. * Update changelog.txt * Update changelog.txt Co-authored-by: Alyssa Travitz <[email protected]> Co-authored-by: Hardik Ojha <[email protected]>

Cleans up various internals associated with aggregation, introducing additional abstraction layers and adding docstrings. * Refactor aggregation. * Fix failing tests from refactor. * Improve clarity and coverage of tests. * Add more tests to enforce that aggregates are tuples. * Expand tests of default aggregator. * Update tests. * Unify __iter__. * __getitem__ should raise a KeyError if not found. * Update error message. * Update flow/aggregates.py Co-authored-by: Vyas Ramasubramani <[email protected]>

Rewrite the _fetch_status method, which was extremely large and unwieldy. The new code is more efficient since it avoids making multiple passes through the jobs. It also uses some newly introduced functions and context managers as well as newer tqdm functions to clean up existing code. * Initial rewrite of _fetch_status. * Update _fetch_scheduler_status to require aggregates. * Clarify docstring for status_parallelization. * Experiment with callback approach (only iterate over aggregates/groups once). * Skip evaluation of post-conditions during eligibility if known to be incomplete. * Change operations to groups. * Use context manager for updating the cached status in bulk. * Separate scheduler query from cached status update. * Fix docstring. * Use context manager for _fetch_scheduler_status cache update. * Use cached status context manager for _fetch_status. * Refactor status updates so the scheduler query happens on-the-fly and only considers selected groups/aggregates. * Intermediate commit with notes. * Simplify eligibility check. * Update docstring. * Update tests to account for removed method. * Rename cid to cluster_id. * Define parallel executor. * Only update status if the update dictionary has data. * Add parallel execution. * Remove status callback. * Clean up status fetching. * Remove unnecessary internal methods. * Fix test (scheduler is queried internally). * Rename jobs -> aggregate. * Fix bug in tests (the dynamically altered jobs should NOT be resubmitted, this was probably an error due to the use of cached status rather than querying the scheduler). * Show aggregate id in str of _JobOperation. * Fix script output. * Remove keyword argument. * Reset MockScheduler in setUp for all tests. * Mock root directory (can't override Project.root_directory() because it is used for all job document paths and buffering, and must reflect an actual path on disk). * Update template reference data. * Refactor mocked root. * Pickle function to be executed in parallel using cloudpickle so that it can be sent across threads. * Fix pre-commit hooks. * Fix root directory mocking. * Improve progress iterators. * Use chunked job label fetching. * Refactor parallel executor to accept one iterable and use Pool for process parallelism. * Use integers in the cached status update. * Fix mocking of system executable. Resolves #413. * Update changelog. * Mock environment directly rather than using **kwargs for compatibility with signac 1.0.0. * Buffer during project status update. * Use ordered results. * Don't buffer during status update (no performance difference). * Refactor job labels so that the list of individual jobs is generated during the same single loop over the project to generate aggregates. * Update flow/util/misc.py Co-authored-by: Vyas Ramasubramani <[email protected]> * Fix function parameters in test, use kwargs for _fetch_status. * Use process_map. * Use MOCK_EXECUTABLE. * Add comments explaining use of FakeScheduler. * Collect errors that occur during status evaluation. * Mock the id generation method instead of injecting mocked attributes. Co-authored-by: Vyas Ramasubramani <[email protected]> Co-authored-by: Vyas Ramasubramani <[email protected]>

codecov · 2021-01-20T15:59:18Z

Codecov Report

Merging #427 (eb27f7b) into master (9cad854) will increase coverage by 3.58%.
The diff coverage is 78.32%.

@@            Coverage Diff             @@
##           master     #427      +/-   ##
==========================================
+ Coverage   66.85%   70.44%   +3.58%     
==========================================
  Files          29       29              
  Lines        2794     2994     +200     
  Branches      496      553      +57     
==========================================
+ Hits         1868     2109     +241     
+ Misses        795      737      -58     
- Partials      131      148      +17

Impacted Files	Coverage Δ
flow/__main__.py	`0.00% <0.00%> (ø)`
flow/environments/incite.py	`90.32% <ø> (+12.90%)`	⬆️
flow/environments/umich.py	`75.00% <ø> (ø)`
flow/errors.py	`95.65% <ø> (ø)`
flow/labels.py	`58.82% <ø> (ø)`
flow/project.py	`73.64% <ø> (+0.51%)`	⬆️
flow/scheduling/__init__.py	`100.00% <ø> (ø)`
flow/testing.py	`50.00% <ø> (ø)`
flow/util/template_filters.py	`64.06% <ø> (ø)`
flow/operations.py	`42.22% <37.50%> (ø)`
... and 23 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9cad854...eb27f7b. Read the comment docs.

* Make aggregator and get_aggregate_id private * Remove reference from docs * Use private name for _aggregator in test_aggregates.py. * Remove aggregates from documentation, add TODOs. * Fix docstring reference. * Update changelog. Co-authored-by: Bradley Dice <[email protected]>

kidrahahjo and others added 30 commits August 13, 2020 11:30

Add aggregator classes in flow (#348)

6109bb3

Add a new decorator by which operations can be tagged for various types of aggregation, along with classes to store actual aggregates once they have been generated.

Update the branch with master

67cc864

Merge branch 'feature/enable-aggregation' of https://github.com/glotz…

f3a405d

…erlab/signac-flow into feature/enable-aggregation

Merge remote-tracking branch 'origin/master' into feature/enable-aggr…

ad0e0df

…egation

Merge remote-tracking branch 'origin/master' into feature/enable-aggr…

f204c78

…egation

Merge branch 'feature/enable-aggregation' of https://github.com/glotz…

19c2d8a

…erlab/signac-flow into feature/enable-aggregation

Add default aggregate support to flow (#335)

c2ad3c4

Change the internals of flow so that everything operates on default aggregates (aggregates of size 1) rather than individual jobs.

Merge pull request #364 from glotzerlab/feature/enable-aggregation

699873f

Feature/enable aggregation

Merge branch 'feature/blacken' into feature/next/blacken-apply

dbbac16

Merge branch 'feature/blacken-apply' into feature/next/blacken-apply

7357241

Use tqdm.auto for better notebook compatibility. (#371)

1fda006

Remove unused attributes/methods. (#380)

b7b65bd

* Remove unused __init__, the implicit parent constructor is equivalent. * Use public .job attribute. * Remove unused argument from _add_operation_selection_arg_group. * Use logger.warning instead of deprecated logger.warn.

Refactor error variables and messages. (#373)

088150b

Remove unused **kwargs from _FlowProjectClass metaclass constructor.

ea0885b

Raise ImportError instead of RuntimeWarning if pprofile is not instal…

72d1a1e

…led (it's a hard error).

Merge branch 'next' of github.com:glotzerlab/signac-flow into next

d5d6a4d

Deprecate CPUEnvironment and GPUEnvironment (#381)

9baec4d

* Deprecate CPUEnvironment and GPUEnvironment. The classes are unused and untested. * Remove unused private functions for importing from files. * Update changelog.

Make _verify_group_compatibility a class method. (#379)

9d5bfbc

Use isinstance check for IgnoreConditions. (#378)

307ac0f

Remove dangling else/elif clauses after raise/return statements. (#377)

08f1652

Introduces stylistic changes suggested by pylint.

Use f-string.

f1b6740

Linting dicts. (#374)

7ef25fc

* Refactor dicts, OrderedDict, and comprehensions. Resolves #323. * Import Mapping from collections.abc. * Use {}

Use descriptive variable names. (#376)

5205433

* Use longer/clearer variable names. * Update _no_aggregation. * Rename _operation_to/from_tuple to _job_operation_to/from_tuple Co-authored-by: Brandon Butler <[email protected]>

Make _to_hashable a private function. (#384)

db3ea44

* Make _to_hashable private. * Copy _to_hashable method from signac.

Adding/improving docstrings and comments. (#375)

15833b8

* Add and improve docstrings. * Minor edits to docstrings. * Use shorter one-line summaries in some long docstrings.

Use job_id consistently (adds to #363). (#386)

2428899

* Fix other instances of job_id missed in #363. * Update changelog. * Use descriptive variable name job_id.

Add all missing docstrings, enforce pydocstyle. (#387)

3f70417

Validates docstrings using pydocstyle and adds all the necessary documentation for this to pass.

bdice and others added 5 commits January 15, 2021 17:14

Merge remote-tracking branch 'origin/master' into next

488031d

Update tqdm requirement.

a53506e

Deprecate unbuffered mode (#425)

3eecb61

* Always buffer, add deprecation warning if use_buffered_mode=False. * Remove unbuffered tests. * Reduce number of jobs to speed up tests. * Update changelog. * Update deprecation notice.

bdice changed the title ~~Merge next: Add aggregation feature, improved performance and documentation.~~ Next: Add aggregation feature, improved performance and documentation. Jan 18, 2021

bdice self-assigned this Jan 18, 2021

bdice added this to the v0.12.0 milestone Jan 18, 2021

bdice added aggregation enhancement New feature or request labels Jan 18, 2021

bdice changed the title ~~Next: Add aggregation feature, improved performance and documentation.~~ Next: Refactor internals to support aggregation, improved performance and documentation. Jan 18, 2021

bdice and others added 4 commits January 19, 2021 13:51

Simplify _register_aggregates method. (#430)

a688445

Change reference name from fakescheduler to fake_scheduler (#431)

4f8bf3f

bdice marked this pull request as ready for review January 20, 2021 18:49

bdice requested review from a team as code owners January 20, 2021 18:49

bdice requested review from b-butler and cbkerr and removed request for a team, cbkerr and b-butler January 20, 2021 18:49

bdice enabled auto-merge January 20, 2021 18:52

bdice disabled auto-merge January 20, 2021 18:56

bdice merged commit 03fe709 into master Jan 20, 2021

bdice deleted the next branch January 20, 2021 18:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Next: Refactor internals to support aggregation, improved performance and documentation. #427

Next: Refactor internals to support aggregation, improved performance and documentation. #427

bdice commented Jan 18, 2021 •

edited

Loading

codecov bot commented Jan 20, 2021 •

edited

Loading

Next: Refactor internals to support aggregation, improved performance and documentation. #427

Next: Refactor internals to support aggregation, improved performance and documentation. #427

Conversation

bdice commented Jan 18, 2021 • edited Loading

Description

Types of Changes

Checklist:

codecov bot commented Jan 20, 2021 • edited Loading

Codecov Report

bdice commented Jan 18, 2021 •

edited

Loading

codecov bot commented Jan 20, 2021 •

edited

Loading