-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Sort __globals__ dict to make pickle string more deterministic #418
Conversation
Separate out the retrieval of the pickle string from the subprocess with the unpickling, so the string can be directly compared. This patch also adds an add_env parameter to allow subprocesses to be passed environment variables. The main use case under consideration is setting PYTHONHASHSEED, allowing these functions to be used to confirm (in)consistency of strings under different initial conditions.
Codecov Report
@@ Coverage Diff @@
## master #418 +/- ##
==========================================
- Coverage 91.76% 91.75% -0.02%
==========================================
Files 4 4
Lines 668 667 -1
Branches 136 136
==========================================
- Hits 613 612 -1
Misses 34 34
Partials 21 21
Continue to review full report at Codecov.
|
a2237d3
to
ca7b0e8
Compare
@@ -2321,6 +2323,22 @@ def __type__(self): | |||
o = MyClass() | |||
pickle_depickle(o, protocol=self.protocol) | |||
|
|||
@pytest.mark.skipif( | |||
sys.version_info < (3, 6, 0), | |||
reason="Dict determinism is a lost cause in Python < 3.6") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With 3.5 EOL, it does not seem worth putting any effort into this. Downstream tools that care about determinism have likely been 3.6+ anyway.
Note that failure appears to be related to pytest-dev/pytest#8539 / https://bugs.python.org/issue43798. Should be fixed at next pytest release. |
@ogrisel - any thoughts on this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think __globals__
should never be so large than sorting yield a significant performance overhead but I would still be interested in a quick non-regression benchmark with a case where there are 1000+ global variables.
Other than that, LGTM.
@ogrisel I put in a test. Let me know if that's what you had in mind. |
_TEST_BIG_GLOBAL_SPACE.__code__) == gvars | ||
tic = time.time() | ||
subprocess_pickle_string(_TEST_BIG_GLOBAL_SPACE, protocol=self.protocol) | ||
assert time.time() - tic < 0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather not put tests that depend on the runtime because the CI workers can overloaded from time to time and those tests would fail at random.
Instead just run a one-off benchmark and post the results in the comments of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to address this if we chose to go for #428 (which I think we should ;).
Discussing with @pierreglaser we found an alternative way to achieve the same results without sorting explicitly: #428. |
I feel like I used the I'm good with closing this in favor of #428, but I'll leave when to you. |
Closing in favor of #428. |
I have been told that deterministic pickle strings are not an explicit goal of this package, but this is a small patch that allows us to exclude a source of non-determinism that is related to the Python random number generator, rather than the complexity of the pickled object.
Specifically, this patch sorts the
__globals__
dict, the ordering of which can be sensitive to the global RNG. This required some changes to allow us to access the results of pickling under different states.For more information on the use case, see nipype/pydra#457.