Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-40078: [C++] Import/Export ArrowDeviceArrayStream #40807

Merged
merged 41 commits into from
May 21, 2024

Conversation

zeroshade
Copy link
Member

@zeroshade zeroshade commented Mar 26, 2024

Rationale for this change

The original PRs for adding support for importing and exporting the new C Device interface (#36488 / #36489) only added support for the Arrays themselves, not for the stream structure. We should support both.

What changes are included in this PR?

Adding parallel functions for Import/Export of streams that accept ArrowDeviceArrayStream.

Are these changes tested?

Test writing in progress, wanted to get this up for review while I write tests.

Are there any user-facing changes?

No, only new functions have been added.

Copy link

⚠️ GitHub issue #40078 has been automatically assigned in GitHub to PR creator.

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm excited this will work for ChunkedArray as well!

cpp/src/arrow/c/bridge.cc Outdated Show resolved Hide resolved
cpp/src/arrow/c/bridge.h Outdated Show resolved Hide resolved
cpp/src/arrow/c/helpers.h Show resolved Hide resolved
cpp/src/arrow/record_batch.h Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes awaiting change review Awaiting change review and removed awaiting committer review Awaiting committer review awaiting changes Awaiting changes labels Mar 27, 2024
@github-actions github-actions bot added awaiting changes Awaiting changes Component: Python awaiting change review Awaiting change review and removed awaiting change review Awaiting change review awaiting changes Awaiting changes labels Mar 27, 2024
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

Comment on lines 260 to 278
/// If all of the data for this record batch is in host memory, then this
/// should return null (the default impl). If the data for this batch is
/// on a device, then if synchronization is needed before accessing the
/// data the returned sync event will allow for it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand this part. AFAIK a RecordBatch currently is agnostic the device of its buffers. So for example you can have a RecordBatch backed by buffers that live in CUDA memory, but then this method will always hardcoded return NULL, which is not correct in that case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, at the moment I don't have an implementation there. I put a note on the ImportDeviceRecordBatchReader documentation if you look at bridge.h:

/// We are not yet bubbling the sync events from the buffers up to
/// the `GetSyncEvent` method of an imported RecordBatch. This will be added in a future
/// update.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea that longer term there would be a generic implementation here in RecordBatch that checks the sync events of its underlying buffers? Because in practice, we don't subclass RecordBatch in Arrow C++ to add CUDA support, so that method cannot be overriden

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was either that or introducing a subclass, yes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I've propagated the event and device_type throughout the record batch and the Make functions (defaulting it to nullptr) which should allow us to ensure this is all correct now. I'll update the documentation comments accordingly

/// \param[in] reader RecordBatchReader object to export
/// \param[out] out C struct to export the stream to
ARROW_EXPORT
Status ExportDeviceRecordBatchReader(std::shared_ptr<RecordBatchReader> reader,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially related to my other comment, but the existing ExportDeviceArray has a sync keyword that the user of this API needs to provide. Is there a reason those methods don't have that?
The returned ArrowDeviceArrayStream itself doesn't have a sync event member, but the ArrowDeviceArrays that it will return still have that. The user shouldn't pass the sync event to set in those arrays?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why I put the GetSyncEvent on the RecordBatch object since there could potentially be a different sync event for each record batch returned by the reader. Right now it currently is hardcoded to return null, but the GetSyncEvent function is virtual, so someone could potentially have a RecordBatchReader that returns a class which inherits from RecordBatch that implements GetSyncEvent to return the correct event object etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to add GetSyncEvent() when it actually does something? It seems like this is perhaps guessing at a future API that we don't know will exist yet or that we are not sure will be used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made a bunch of changes propagating the sync event and the device type now, so GetSyncEvent now does something :)

Thoughts?

Copy link
Contributor

@felipecrv felipecrv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Seems to match the structure of the single array x non-stream case.

cpp/src/arrow/c/helpers.h Show resolved Hide resolved
cpp/src/arrow/c/bridge.cc Outdated Show resolved Hide resolved
/// \brief Get the device type for record batches this reader produces
///
/// default implementation is to return ARROW_DEVICE_CPU
virtual DeviceAllocationType device_type() const { return DeviceAllocationType::kCPU; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be wrong if we import a RecordBatchReader from a non-CPU ArrowDeviceArrayStream? Or is that right now not yet possible?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of yet, I haven't implemented figuring out what type to send here. Since we aren't storing the device type at the record batch level, this would need to spin through the buffers of its columns which could potentially be from multiple devices if a user did something weird. So I'm not sure what the best solution here is. I'd rather not add a RecordBatch level device_type member, but I don't yet know how to best handle the situation if a consumer does something bad

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like for now this will have to be supplied by the caller on construction? That seems like a better intermediate state than returning a value that could lead to segfaults?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this is an abstract class, so there's no constructor here :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the suggestion is we should add a device_type member to the RecordBatch class and have it be provided?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've propagated the device_type throughout the readers now and to the static Make methods for constructing the readers, so now this should be correct.

@jorisvandenbossche
Copy link
Member

Needs some tests?

cpp/src/arrow/record_batch.h Outdated Show resolved Hide resolved
cpp/src/arrow/record_batch.h Outdated Show resolved Hide resolved
@@ -23,6 +23,7 @@
#include <vector>

#include "arrow/compare.h"
#include "arrow/device.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we instead add the necessary declarations to type_fwd.h?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would have to move the nested Device::SyncEvent outside of the Device class to do that since you can't forward declare an inner class like that. It would also require pushing the entire DeviceAllocationType declaration to type_fwd.h which I'm not sure if we want to do. So I don't think we can do this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at least moving DeviceAllocationType to type_fwd.h would make sense.

As for Device::SyncEvent, that's indeed a problem with nested classes (and a good reason to avoid them :-)). That could be fixed by moving SyncEvent out of Device and adding a compatibility alias.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at least moving DeviceAllocationType to type_fwd.h would make sense.

I've done that here #43853

cpp/src/arrow/record_batch.h Outdated Show resolved Hide resolved
cpp/src/arrow/record_batch.h Outdated Show resolved Hide resolved
@zeroshade zeroshade force-pushed the import-export-device-stream branch from 5cf3bc4 to 9b14645 Compare May 9, 2024 18:41
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels May 9, 2024
@zeroshade zeroshade merged commit 8169d6e into apache:main May 21, 2024
37 checks passed
@zeroshade zeroshade removed the awaiting change review Awaiting change review label May 21, 2024
@zeroshade zeroshade deleted the import-export-device-stream branch May 21, 2024 19:40
Copy link

After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 8169d6e.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 9 possible false positives for unstable benchmarks that are known to sometimes produce them.

int type = 0;
for (const auto& buf : buffers) {
if (!buf) continue;
if (type == 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition implies that, conversely, in non-debug mode we could immediately return when we encounter a buffer? Instead of continue looping on all buffers and children...

@github-actions github-actions bot added the awaiting committer review Awaiting committer review label May 23, 2024
@@ -358,6 +369,16 @@ struct ARROW_EXPORT ArrayData {
/// \see GetNullCount
int64_t ComputeLogicalNullCount() const;

/// \brief Returns the device_type of the underlying buffers and children
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, but we tend to use infinitives in docstring ("return", not "returns")

vibhatha pushed a commit to vibhatha/arrow that referenced this pull request May 25, 2024
)

### Rationale for this change
The original PRs for adding support for importing and exporting the new C Device interface (apache#36488 / apache#36489) only added support for the Arrays themselves, not for the stream structure. We should support both.

### What changes are included in this PR?
Adding parallel functions for Import/Export of streams that accept `ArrowDeviceArrayStream`.

### Are these changes tested?
Test writing in progress, wanted to get this up for review while I write tests.

### Are there any user-facing changes?
No, only new functions have been added.

* GitHub Issue: apache#40078

Lead-authored-by: Matt Topol <[email protected]>
Co-authored-by: Felipe Oliveira Carvalho <[email protected]>
Co-authored-by: Benjamin Kietzman <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: Matt Topol <[email protected]>
zeroshade added a commit that referenced this pull request May 28, 2024
Responding to feedback on #40807:

              This condition implies that, conversely, in non-debug mode we could immediately return when we encounter a buffer? Instead of continue looping on all buffers and children...

_Originally posted in #40807 (comment)

Authored-by: Matt Topol <[email protected]>
Signed-off-by: Matt Topol <[email protected]>
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants