Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream reflectiontable msgpack directly to file #1115

Merged
merged 12 commits into from
Jan 30, 2020
Merged

Stream reflectiontable msgpack directly to file #1115

merged 12 commits into from
Jan 30, 2020

Conversation

Anthchirp
Copy link
Member

The previous implementation held the reflection table up to 3 times in
memory during the output phase. This implementation passes the python
filehandle into C++ space and then writes the msgpack output directly
to the file. As a result memory consumption is significantly reduced.

Fixes #1112

The previous implementation held the reflection table up to 3 times in
memory during the output phase. This implementation passes the python
filehandle into C++ space and then writes the msgpack output directly
to the file. As a result memory consumption is significantly reduced.

Fixes #1112
@Anthchirp
Copy link
Member Author

Fails on Python3 because boost_adaptbx streambuf is not Python3 compatible.

Presumably this was never noticed because the tests have been intentionally broken for half a year
cctbx/cctbx_project#367

@Anthchirp
Copy link
Member Author

Anthchirp commented Jan 29, 2020

Compared maximum memory use in kBytes on writing between pickle, existing msgpack implementation and msgpack streaming method in development.

reflection table size P2 pickle P3 pickle P2 msgpack P3 msgpack P2 msgpack stream P3 msgpack stream
1 59348 59512 59076 59196 59528 59384
1000 59624 59944 60280 60324 59380 59700
10000 62112 63604 71560 70036 61828 62252
20000 65072 67296 82484 81724 64760 65176
40000 71116 75180 104060 103304 70616 71040
80000 83072 90336 147220 146872 82336 82140
1000000 350196 445224 1168244 1168508 344076 344484

@Anthchirp
Copy link
Member Author

Compared maximum memory use in kBytes on reading between pickle and the existing msgpack implementation. A streaming read implementation is possible but would very likely be a lot more complex than the current implementation.

reflection table size P2 pickle P3 pickle P2 msgpack P3 msgpack
1 59328 59504 58140 58360
1000 59856 60136 58704 58932
10000 64520 65228 63520 63672
20000 69920 70920 69060 69500
40000 80720 82308 80680 80868
80000 101996 105268 103232 103532
1000000 597188 631700 627184 626508

Fix compilation

Compiler appears confused between iotbx::mtz::object and
boost::python::object. Be explicit that we want the latter.
Functions that encapsulate up to three separate functions and
distinguish which one to call based on a string parameter are not sane.
Refactor the test extension functions into simple, flat functions
with a sane API.
Remove unused or irrelevant code.
instead of a home-grown solution.
The streambuf interface deals exclusively with binary data.
We need to be explicit about this so that it works in Python 3.
So make functions return bytes, not strings.

Incidentally, we do not need a precompiler macro, as in Python 2
PyBytes == PyString
Otherwise ostream only flushes on Python garbage collection,
which may not happen at deterministic times.
@Anthchirp
Copy link
Member Author

Thanks to @dagewa, @ndevenish, and @graeme-winter for their assistance in getting this PR off the ground.

@Anthchirp
Copy link
Member Author

Anthchirp commented Jan 30, 2020

DIALS and xia2 full regression tests passed (tested on Python 2 only)

@Anthchirp Anthchirp merged commit 50d6250 into master Jan 30, 2020
@Anthchirp Anthchirp deleted the streambuf branch January 30, 2020 12:14
Anthchirp added a commit that referenced this pull request Feb 3, 2020
The previous implementation held the reflection table up to 3 times in
memory during the output phase. This implementation passes the python
filehandle into C++ space and then writes the msgpack output directly
to the file. As a result memory consumption is significantly reduced.
@Anthchirp Anthchirp mentioned this pull request Feb 5, 2020
Anthchirp added a commit that referenced this pull request Feb 10, 2020
* Make resolution estimation more stable in presence of ice and powder rings and with small molecule data (#1097)
* Fix spot finding and integration of files with index 0 (#1128, cctbx/dxtbx#133)
* Fix cutoff value on recent data files from DLS I03 (cctbx/dxtbx#136)
* Reduce memory usage when writing .refl files (#1115)
* `dials.integrate`: Fix broken memory check in cases of high multiplicity (#1121)
* `dials.symmetry`: Prevent failures when dealing with small numbers of reflections (#1130, cctbx/cctbx_project#435)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MessagePack requires much more memory than pickle
2 participants