-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-37212: [C++] IO: BufferReader always owned buffer #37271
GH-37212: [C++] IO: BufferReader always owned buffer #37271
Conversation
|
Seems I meet a problem like nlohmann/json#586 Should I use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach will work, but I'm reluctant to introduce a code path where we allocate-and-copy on read in a class from which users could otherwise reasonably expect trivial IO costs.
As a potential alternative:
These non-owning constructors don't seem widely used; mostly tests and deserialization of FunctionOptions. Perhaps instead demoting BufferReader to only sometimes zero-copy, it'd be better to delete these constructors and require users to own the buffers to be read (probably doing a single copy all at once on construction instead of one copy per read).
Also reasonable to me, though you'd have to go through a deprecation cycle I believe. |
Or you could have those constructors do the copy for you (and if you really want to zero-copy from a temporary, you can always explicitly construct the Buffer yourself) |
I also vote for this, let me implement it |
@@ -83,7 +83,8 @@ Result<std::unique_ptr<FunctionOptions>> GenericOptionsType::Deserialize( | |||
|
|||
Result<std::unique_ptr<FunctionOptions>> DeserializeFunctionOptions( | |||
const Buffer& buffer) { | |||
io::BufferReader stream(buffer); | |||
// FIXME: Would ToString to heavy? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder would this be a bit heavy? Should I implement a DeserializeFunctionOptions(const std::shared_ptr<Buffer>& buffer)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can explicitly std::make_shared<Buffer>(data_, nbytes)
to preserve the prior behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I'm not fully understand.
io::BufferReader stream(buffer.ToString());
is same as io::BufferReader
. I mean should change DeserializeFunctionOptions(const Buffer& buffer)
to DeserializeFunctionOptions(const std::shared_ptr<Buffer>& buffer)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I mean is that you can keep const Buffer&
and just explicitly wrap it in a non-owning shared_ptr<Buffer>
to pass to BufferReader
, which keeps the old behavior (zero-copy without owning)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got it, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apply your advice and change the implemention, thanks!
@bkietz @lidavidm I've remove the non-owned style, and find #37271 (comment) might introduce cost, would you mind take a look? |
No idea why it failed... |
explicit BufferReader(const Buffer& buffer); | ||
BufferReader(const uint8_t* data, int64_t size); | ||
|
||
/// \brief Instantiate from std::string or std::string_view. Does not | ||
/// own data | ||
explicit BufferReader(std::string_view data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should deprecate instead of remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could even introduce static factories that have the same behavior as these constructors, but whose names explicitly indicate that they may be unsafe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, seems that I should mark them deprecated, but keep them as before (including zero-copy)?
Then how do I fix #37271 (comment) ? Using a BufferReader::FromString
? Or other way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that's annoying.
FromString
is probably best.
Rationale for this change
Previously, when input an non-owned string,
arrow::io::BufferReader
would zero-copy it. It would cause lifetime problem. This patch force it copy in this case.What changes are included in this PR?
arrow::io::BufferReader
to be "owned"BufferReader(std::string)
, and remove non-owned ctorsAre these changes tested?
Yes
Are there any user-facing changes?
User can use BufferReader safely, but might get little more memcopy