Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-44464: [C++] Added rvalue-reference-qualified overload for arrow::Result::status() returning value instead of reference #44477

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

igor-anferov
Copy link

@igor-anferov igor-anferov commented Oct 19, 2024

Rationale for this change

In the current implementation, arrow::Result::status() always returns the internal status_ field by a const lvalue reference, regardless of the value category of Result. This can lead to potential bugs. For example, consider the following code:

if (auto&& status = functionReturningArrowResult().status(); status.ok())
  return 0;
return -1;

In this case, the call to status.ok() results in undefined behavior because status is a dangling const lvalue reference that points to an object returned by functionReturningArrowResult(), which is destroyed after the semicolon.

If arrow::Result had two overloads of the status() method for different reference qualifiers:

template <…>
class Result {
  …
  auto status() const & -> const Status& { ... }
  auto status() && -> Status { ... }
  …
};

This would prevent such bugs and potentially allow for better optimization, as the Status could be moved from an expiring Result object.

What changes are included in this PR?

This PR adds the proposed overload for the arrow::Result::status() method and makes other rvalue-qualified arrow::Result methods preserve object ref-category during tail status() calls.

Unfortunately, we can't move the status_ field in the rvalue-qualified status() method, as the state of status_ must be preserved until the destructor is called. This is because the storage_ field is either destructed or considered empty based on the state of status_.

Are these changes tested?

Since this change is trivial (the new overload doesn't modify the Result object and returns Status by value), there's nothing significant to test, so no new tests were added.

Are there any user-facing changes?

No existing code will be broken by this change. In all cases where status() is called on an lvalue Result, the same reference-returning overload will be called. Meanwhile, code calling status() on an rvalue Result will invoke the new overload, returning Status by value instead.

…rrow::Result::status() returning value instead of reference
Copy link

⚠️ GitHub issue #44464 has been automatically assigned in GitHub to PR creator.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Oct 21, 2024
Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM since seems that abseil do the same. Also paper like https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2718r0.html might prevent from this behavior...

///
/// \return The stored non-OK status object, or an OK status if this object
/// has a value.
Status status() && { return status_; }
Copy link
Member

@bkietz bkietz Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Status status() && { return status_; }
Status status() && { return ok() ? Status::OK() : std::move(status_); }

Without the move in here, we wind up copying status out of this and into the return value. See also absl::StatusOr<T>::status()&&:

https://github.com/abseil/abseil-cpp/blob/master/absl/status/statusor.h#L703

Copy link
Member

@bkietz bkietz Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, this requires default construction of the value in the result or constructing an error status placeholder. Maybe not worth it, but we should add a comment here since others will also assume this is an error

Copy link
Member

@bkietz bkietz Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened #44491 to suggest how we could make it cheaper to put a placeholder error status in here

draft PR #44493

Copy link
Author

@igor-anferov igor-anferov Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with this is that std::move(status_) will leave status_ in an OK state (since the internal state_ pointer will be NULL afterward). As a result, the Result destructor will attempt to call a destructor on the storage_ field, leading to undefined behavior because storage_ is uninitialized when the original Result was constructed in an error state. One possible solution (assuming we need to minimize the size of Result without simply putting status_ and storage_ in a variant) is for the move constructor of Status to preserve the binary OK/error state of the moved-out object. This could be achieved without changing the size of Status by using pointer tagging for its state_ field. There is a proposal for this, which will likely make it into C++26, but until then, there’s no truly safe way to do this while complying with the current C++ standard…

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bkietz I'd love to hear your feedback on that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with this is that std::move(status_) will leave status_ in an OK state

Please mark the result as an error with a placeholder status and move the error status out. This won't be any more expensive than what you've already written, and we can replace the placeholder with a static status after #44493

Suggested change
Status status() && { return status_; }
Status status() && {
if (ok()) return Status::OK();
auto out = std::move(status_);
status_ = Status::UnknownError("Uninitialized Result<T>");
return out;
}

There is a proposal for this, which will likely make it into C++26

Whenever this is ready, it'll make constructs like static error statuses much nicer.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants