-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-44464: [C++] Added rvalue-reference-qualified overload for arrow::Result::status() returning value instead of reference #44477
base: main
Are you sure you want to change the base?
Conversation
…rrow::Result::status() returning value instead of reference
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM since seems that abseil do the same. Also paper like https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2718r0.html might prevent from this behavior...
/// | ||
/// \return The stored non-OK status object, or an OK status if this object | ||
/// has a value. | ||
Status status() && { return status_; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Status status() && { return status_; } | |
Status status() && { return ok() ? Status::OK() : std::move(status_); } |
Without the move in here, we wind up copying status out of this and into the return value. See also absl::StatusOr<T>::status()&&
:
https://github.com/abseil/abseil-cpp/blob/master/absl/status/statusor.h#L703
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, this requires default construction of the value in the result or constructing an error status placeholder. Maybe not worth it, but we should add a comment here since others will also assume this is an error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue with this is that std::move(status_)
will leave status_
in an OK
state (since the internal state_
pointer will be NULL
afterward). As a result, the Result destructor will attempt to call a destructor on the storage_
field, leading to undefined behavior because storage_
is uninitialized when the original Result
was constructed in an error state. One possible solution (assuming we need to minimize the size of Result
without simply putting status_
and storage_
in a variant
) is for the move constructor of Status
to preserve the binary OK/error state of the moved-out object. This could be achieved without changing the size of Status
by using pointer tagging for its state_
field. There is a proposal for this, which will likely make it into C++26, but until then, there’s no truly safe way to do this while complying with the current C++ standard…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bkietz I'd love to hear your feedback on that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue with this is that std::move(status_) will leave status_ in an OK state
Please mark the result as an error with a placeholder status and move the error status out. This won't be any more expensive than what you've already written, and we can replace the placeholder with a static status after #44493
Status status() && { return status_; } | |
Status status() && { | |
if (ok()) return Status::OK(); | |
auto out = std::move(status_); | |
status_ = Status::UnknownError("Uninitialized Result<T>"); | |
return out; | |
} |
There is a proposal for this, which will likely make it into C++26
Whenever this is ready, it'll make constructs like static error statuses much nicer.
Rationale for this change
In the current implementation,
arrow::Result::status()
always returns the internalstatus_
field by a const lvalue reference, regardless of the value category ofResult
. This can lead to potential bugs. For example, consider the following code:In this case, the call to
status.ok()
results in undefined behavior becausestatus
is a dangling const lvalue reference that points to an object returned byfunctionReturningArrowResult()
, which is destroyed after the semicolon.If
arrow::Result
had two overloads of thestatus()
method for different reference qualifiers:This would prevent such bugs and potentially allow for better optimization, as the
Status
could be moved from an expiringResult
object.What changes are included in this PR?
This PR adds the proposed overload for the
arrow::Result::status()
method and makes other rvalue-qualifiedarrow::Result
methods preserve object ref-category during tailstatus()
calls.Unfortunately, we can't move the
status_
field in the rvalue-qualifiedstatus()
method, as the state ofstatus_
must be preserved until the destructor is called. This is because thestorage_
field is either destructed or considered empty based on the state ofstatus_
.Are these changes tested?
Since this change is trivial (the new overload doesn't modify the
Result
object and returnsStatus
by value), there's nothing significant to test, so no new tests were added.Are there any user-facing changes?
No existing code will be broken by this change. In all cases where
status()
is called on an lvalueResult
, the same reference-returning overload will be called. Meanwhile, code callingstatus()
on an rvalueResult
will invoke the new overload, returningStatus
by value instead.