Stream reflectiontable msgpack directly to file #1115

Anthchirp · 2020-01-27T10:58:45Z

The previous implementation held the reflection table up to 3 times in
memory during the output phase. This implementation passes the python
filehandle into C++ space and then writes the msgpack output directly
to the file. As a result memory consumption is significantly reduced.

Fixes #1112

The previous implementation held the reflection table up to 3 times in memory during the output phase. This implementation passes the python filehandle into C++ space and then writes the msgpack output directly to the file. As a result memory consumption is significantly reduced. Fixes #1112

Anthchirp · 2020-01-27T11:21:05Z

Fails on Python3 because boost_adaptbx streambuf is not Python3 compatible.

Presumably this was never noticed because the tests have been intentionally broken for half a year
cctbx/cctbx_project#367

Anthchirp · 2020-01-29T12:04:06Z

Compared maximum memory use in kBytes on writing between pickle, existing msgpack implementation and msgpack streaming method in development.

reflection table size	P2 pickle	P3 pickle	P2 msgpack	P3 msgpack	P2 msgpack stream	P3 msgpack stream
1	59348	59512	59076	59196	59528	59384
1000	59624	59944	60280	60324	59380	59700
10000	62112	63604	71560	70036	61828	62252
20000	65072	67296	82484	81724	64760	65176
40000	71116	75180	104060	103304	70616	71040
80000	83072	90336	147220	146872	82336	82140
1000000	350196	445224	1168244	1168508	344076	344484

Anthchirp · 2020-01-29T13:07:44Z

Compared maximum memory use in kBytes on reading between pickle and the existing msgpack implementation. A streaming read implementation is possible but would very likely be a lot more complex than the current implementation.

reflection table size	P2 pickle	P3 pickle	P2 msgpack	P3 msgpack
1	59328	59504	58140	58360
1000	59856	60136	58704	58932
10000	64520	65228	63520	63672
20000	69920	70920	69060	69500
40000	80720	82308	80680	80868
80000	101996	105268	103232	103532
1000000	597188	631700	627184	626508

Fix compilation Compiler appears confused between iotbx::mtz::object and boost::python::object. Be explicit that we want the latter.

Functions that encapsulate up to three separate functions and distinguish which one to call based on a string parameter are not sane. Refactor the test extension functions into simple, flat functions with a sane API. Remove unused or irrelevant code.

instead of a home-grown solution.

The streambuf interface deals exclusively with binary data. We need to be explicit about this so that it works in Python 3. So make functions return bytes, not strings. Incidentally, we do not need a precompiler macro, as in Python 2 PyBytes == PyString

Otherwise ostream only flushes on Python garbage collection, which may not happen at deterministic times.

Anthchirp · 2020-01-29T14:20:37Z

Thanks to @dagewa, @ndevenish, and @graeme-winter for their assistance in getting this PR off the ground.

Anthchirp · 2020-01-30T12:04:25Z

DIALS and xia2 full regression tests passed (tested on Python 2 only)

The previous implementation held the reflection table up to 3 times in memory during the output phase. This implementation passes the python filehandle into C++ space and then writes the msgpack output directly to the file. As a result memory consumption is significantly reduced.

* Make resolution estimation more stable in presence of ice and powder rings and with small molecule data (#1097) * Fix spot finding and integration of files with index 0 (#1128, cctbx/dxtbx#133) * Fix cutoff value on recent data files from DLS I03 (cctbx/dxtbx#136) * Reduce memory usage when writing .refl files (#1115) * `dials.integrate`: Fix broken memory check in cases of high multiplicity (#1121) * `dials.symmetry`: Prevent failures when dealing with small numbers of reflections (#1130, cctbx/cctbx_project#435)

Anthchirp added 10 commits January 29, 2020 13:53

Start with boost_adaptbx streambuf

4732c3f

Fix compilation Compiler appears confused between iotbx::mtz::object and boost::python::object. Be explicit that we want the latter.

add extension used for testing

452f0b0

Migrate test to pytest, apply black

5348398

StringIO -> BytesIO

bcf17b6

Use mock for instrumentation

7566813

instead of a home-grown solution.

Add missing flush() calls to test extension

021d4c1

Otherwise ostream only flushes on Python garbage collection, which may not happen at deterministic times.

Apply clang format

32fa893

Switch to the fixed streambuf implementation

0f24025

add newsfragment

1791dbe

dagewa approved these changes Jan 30, 2020

View reviewed changes

Anthchirp merged commit 50d6250 into master Jan 30, 2020

Anthchirp deleted the streambuf branch January 30, 2020 12:14

Anthchirp mentioned this pull request Feb 5, 2020

DIALS 2.1.3 #1136

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stream reflectiontable msgpack directly to file #1115

Stream reflectiontable msgpack directly to file #1115

Anthchirp commented Jan 27, 2020

Anthchirp commented Jan 27, 2020

Anthchirp commented Jan 29, 2020 •

edited

Loading

Anthchirp commented Jan 29, 2020

Anthchirp commented Jan 29, 2020

Anthchirp commented Jan 30, 2020 •

edited

Loading

Stream reflectiontable msgpack directly to file #1115

Stream reflectiontable msgpack directly to file #1115

Conversation

Anthchirp commented Jan 27, 2020

Anthchirp commented Jan 27, 2020

Anthchirp commented Jan 29, 2020 • edited Loading

Anthchirp commented Jan 29, 2020

Anthchirp commented Jan 29, 2020

Anthchirp commented Jan 30, 2020 • edited Loading

Anthchirp commented Jan 29, 2020 •

edited

Loading

Anthchirp commented Jan 30, 2020 •

edited

Loading