Don't share host_array when receiving from network #8308

crusaderky · 2023-10-27T08:04:03Z

This PR is blocked by and incorporates Don't use bytearray().join #8312

Resolve issues with memory deallocation:

Two or more numpy or pandas objects are packed into the same network message by WorkerStateMachine._select_for_gather, scatter, or Client.gather. After they are received, one of the objects is dereferenced, but its memory won't be released until all objects with a buffer in the original message have been dereferenced.
An object with both buffers and non-trivial amounts of pure-pickle data - such as a pandas.DataFrame with object columns - is sent over the network. For as long as the object lives, the memory holding the pickled version of the object column won't be released.
In Zero-copy array shuffle #8282, when using a MemoryBuffer the shards that have already been merged into output chunks are not dereferenced until all shards on the same worker have been merged. This is because shards belonging to different output chunks were sent over within the same RPC call.

Notes

These issues only apply to uncompressed data.
Use case 2 also afflicts the SpillBuffer. It is out of scope of this PR.

github-actions · 2023-10-27T10:20:36Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

      27 files ±    0       27 suites ±0 15h 7m 53s ⏱️ + 54m 42s
  3 963 tests +  20   3 841 ✔️ +  18   117 💤 ±  0 5 ❌ +3
49 786 runs +260 47 371 ✔️ +217 2 409 💤 +41 6 ❌ +3

For more details on these failures, see this check.

Results for commit bf43081. ± Comparison against base commit 954e9d0.

♻️ This comment has been updated with latest results.

crusaderky · 2023-10-29T18:43:07Z

A/B test results:

no wall time changes whatsoever
test_dataframe_align uses 10% less avg memory and 15% less peak memory
test_set_index_on_uber_lyft[tasks] uses 10% less avg memory and 15% less peak memory

crusaderky · 2023-10-29T19:48:49Z

This PR is blocked by and incorporates #8312
Ready for review (it's made of exactly 2 commits)

milesgranger

Not sure I'd depend on my review alone, but looks good to me, with one clarifying comment. :) Thanks!

milesgranger · 2023-11-02T12:39:40Z

distributed/comm/tcp.py

+        n = await stream.read_into(chunk)  # type: ignore[arg-type]
+        assert n == chunk_nbytes, (n, chunk_nbytes)


Apologies for any naive things here, for my own understanding, this assert is a sanity check, as according to the docs for read_into this won't return until chunk (of which is chunk_nbytes in length) is completely filled. Is there a potential that it won't be filled due to chunk being larger than any remaining bytes, and thus sit idle?

I would suppose not judging by the caller setting n but got slightly confused with the openssl sizes and that may make a chunk larger than any remaining stream data.

Or now that I look at it again.. bytes in the stream would always be larger than the buffer, and thus stream would always have bytes to write given we're accounting for OpenSSL sizing. Do I have that right?

yes that assertion is just defensive programming.

fjetter

Changes overall LGTM. Most/all of my comments are rather for clarification but as long as I didn't entirely misunderstand this, I'm happy to merge.

fjetter · 2023-11-02T08:46:59Z

distributed/comm/tcp.py

-                chunk_nbytes = chunk.nbytes
-                n = await stream.read_into(chunk)
-                assert n == chunk_nbytes, (n, chunk_nbytes)
+            # Don't store multiple numpy or parquet buffers into the same buffer, or


nit: To my knowledge, we're never sending parquet over the network unless of course a user decides to do this themselves.

I guess you are referring to pyarrow Table objects or anything that can be directly instantiated from a buffer

wrong location for the comment?
parquet was a typo; i meant arrow.

fjetter · 2023-11-02T13:06:55Z

distributed/comm/tcp.py

+    frames_nbytes = [header_nbytes, *frames_nbytes]
+    frames_nbytes_total += header_nbytes
+
+    if frames_nbytes_total < 2**17:  # 128kiB


Given the testing you did recently. Do you think this number still makes sense? Something to look into or not worth the effort?

It looks about right given my recent testing.

fjetter · 2023-11-02T13:27:14Z

distributed/comm/tests/test_comms.py

+        (1, 0, False),  # <2 kiB (including prologue and msgpack header)
+        (1, 1800, False),  # <2 kiB
+        (1, 2100, True),  # >2 kiB
+        (200_000, 9500, False),  # <5% of numpy array


IIUC the 9500 extra bytes here will be written to the same memory buffer the numpy array is using, i.e. those 9500 bytes will only be released once the numpy array is released. Similarly, the numpy array will only be released once the other thing has been released. However, that other thing is guaranteed to be a bytes object or some header information or some other garbage that is guaranteed to be released after the message is deserialized.

So, in other words, we're accepting a memory overhead of up to 5% for numpy arrays/arrow tables/etc. (and previously this could've been a multiple, depending on how large a single fetch was)

So, in other words, we're accepting a memory overhead of up to 5% for numpy arrays/arrow tables/etc.

This is correct. This is really only material for pandas objects with substantial pure-python index / columns / other metadata; numpy objects tend to be <100 bytes worth of metadata.

(and previously this could've been a multiple, depending on how large a single fetch was)

It was worse than a multiple.
There were two nightmare scenarios:

pandas dataframe heavy with object string columns, with some numerical columns. The whole serialized data for the object columns remains alive for as long as the deserialized object is alive, because it's referenced by the numerical columns.

The key at the top of the WorkerStateMachine.fetch heap is a 49 MiB nump array. The second object in the heap from the same worker is a 1 MiB numpy array (or vice versa). The two are fetched together (distributed.worker.transfer.message-bytes-limit: 50 MiB). The 49 MiB array will survive its own free-keys command, as it is referenced by the 1 MiB array.

crusaderky self-assigned this Oct 27, 2023

crusaderky force-pushed the share_host_array branch from fab8265 to 4ac4f36 Compare October 27, 2023 09:11

crusaderky changed the title ~~[WIP] Don't share host_array when receiving from network~~ Don't share host_array when receiving from network Oct 27, 2023

crusaderky force-pushed the share_host_array branch 4 times, most recently from f4a902d to 14df449 Compare October 29, 2023 14:23

crusaderky force-pushed the share_host_array branch from 83b308a to 3059514 Compare October 29, 2023 18:50

crusaderky mentioned this pull request Oct 29, 2023

Zero-copy array shuffle #8282

Merged

crusaderky marked this pull request as ready for review October 29, 2023 19:52

crusaderky requested a review from fjetter as a code owner October 29, 2023 19:52

crusaderky force-pushed the share_host_array branch from 3059514 to 7ac154a Compare October 30, 2023 17:34

This was referenced Nov 1, 2023

Speed up network transfer for small buffers #8318

Merged

P2P shuffle: pickle tiny buffers into monolithic bytes objects #8321

Merged

milesgranger approved these changes Nov 2, 2023

View reviewed changes

fjetter approved these changes Nov 2, 2023

View reviewed changes

Don't share host_array between objects

bf43081

crusaderky force-pushed the share_host_array branch from 7ac154a to bf43081 Compare November 2, 2023 20:31

Don't shadow 'n' variable

0136a5b

crusaderky merged commit c91a735 into dask:main Nov 3, 2023
26 of 30 checks passed

crusaderky deleted the share_host_array branch November 3, 2023 21:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't share host_array when receiving from network #8308

Don't share host_array when receiving from network #8308

crusaderky commented Oct 27, 2023 •

edited

Loading

github-actions bot commented Oct 27, 2023 •

edited

Loading

crusaderky commented Oct 29, 2023

crusaderky commented Oct 29, 2023 •

edited

Loading

milesgranger left a comment

milesgranger Nov 2, 2023

milesgranger Nov 2, 2023

crusaderky Nov 3, 2023

fjetter left a comment

fjetter Nov 2, 2023

fjetter Nov 2, 2023

crusaderky Nov 3, 2023

fjetter Nov 2, 2023

crusaderky Nov 3, 2023

fjetter Nov 2, 2023

crusaderky Nov 3, 2023

		n = await stream.read_into(chunk) # type: ignore[arg-type]
		assert n == chunk_nbytes, (n, chunk_nbytes)

Don't share host_array when receiving from network #8308

Don't share host_array when receiving from network #8308

Conversation

crusaderky commented Oct 27, 2023 • edited Loading

Notes

github-actions bot commented Oct 27, 2023 • edited Loading

Unit Test Results

crusaderky commented Oct 29, 2023

crusaderky commented Oct 29, 2023 • edited Loading

milesgranger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fjetter left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

crusaderky commented Oct 27, 2023 •

edited

Loading

github-actions bot commented Oct 27, 2023 •

edited

Loading

crusaderky commented Oct 29, 2023 •

edited

Loading