Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-34535: [C++] Move ChunkResolver to the public API #40226

Closed
wants to merge 6 commits into from

Conversation

SChakravorti21
Copy link
Contributor

@SChakravorti21 SChakravorti21 commented Feb 25, 2024

Rationale for this change

See #34535.

What changes are included in this PR?

  • Check unit test coverage and expand as necessary to handle edge cases
  • Move ChunkResolver out of the internal namespace
    • Refactor current usages of the class in Arrow C++ as necessary
  • Update the documentation
    • Update the API/reference docs as necessary
    • Add a section, perhaps in the User Guide or Examples, on how to use this class

As requested by @felipecrv, this PR also contains a minor simplification to ChunkResolver's existing interface.

Are these changes tested?

Opening PR in draft state, unit tests still need to be implemented.

Are there any user-facing changes?

Yes, users will now be able to use the ChunkResolver from the public ::arrow namespace. There are no breaking changes.

Opening PR in draft state, some documentation may still need to be added.

Copy link

⚠️ GitHub issue #34535 has been automatically assigned in GitHub to PR creator.

Comment on lines +64 to +73
struct ChunkedArrayResolver : protected ::arrow::ChunkResolver {
ChunkedArrayResolver(const ChunkedArrayResolver& other)
: ::arrow::internal::ChunkResolver(other.chunks_), chunks_(other.chunks_) {}
: ::arrow::ChunkResolver(other.chunks_), chunks_(other.chunks_) {}

explicit ChunkedArrayResolver(const std::vector<const Array*>& chunks)
: ::arrow::internal::ChunkResolver(chunks), chunks_(chunks) {}
: ::arrow::ChunkResolver(chunks), chunks_(chunks) {}

template <typename ArrayType>
ResolvedChunk<ArrayType> Resolve(int64_t index) const {
const auto loc = ::arrow::internal::ChunkResolver::Resolve(index);
const auto loc = ::arrow::ChunkResolver::Resolve(index);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could actually remove the ::arrow now because the compile won't be confused by ::arrow::internal and ::arrow::compute::internal anymore.

Comment on lines +82 to +89
# TODO: Why is this failing?
#
# > [10/16] RUN gem install --no-document bundler && bundle install --gemfile /arrow/c_glib/Gemfile:
# 24.94 ERROR: Error installing bundler:
# 24.94 The last version of bundler (>= 0) to support your Ruby & RubyGems was 2.4.22. Try installing it with `gem install bundler -v 2.4.22`
# 24.94 bundler requires Ruby version >= 3.0.0. The current ruby version is 2.7.0.0.
COPY c_glib/Gemfile /arrow/c_glib/
RUN gem install --no-document bundler && \
RUN gem install --no-document bundler -v 2.4.22 && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Locally or in CI?

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Feb 26, 2024
Comment on lines +90 to 92
arrow::ChunkLocation locate(int64_t index) {
return resolver_.Resolve(index);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The linter wants this to be in a single line.

@felipecrv
Copy link
Contributor

I implemented Valid and Next in #40368. Feel free to copy and paste here.

@anjakefala
Copy link
Collaborator

Hey @SChakravorti21!

I was thinking about adopting this PR. Would you be okay with that?

@SChakravorti21
Copy link
Contributor Author

Hey @SChakravorti21!

I was thinking about adopting this PR. Would you be okay with that?

Hey @anjakefala! I'm so sorry for dropping the ball on this. Yes, please free to take it over if you'd like :)

@assignUser
Copy link
Member

Superseded by #44357

@assignUser assignUser closed this Oct 18, 2024
felipecrv pushed a commit that referenced this pull request Oct 21, 2024
### Rationale for this change

Adopting #40226. 

 The creation and return of a shared_ptr does result in some performance overhead, that makes a difference for a performance-sensitive application.

If someone could use ChunkResolver to learn the indices, they could then instead access the data directly. 

### What changes are included in this PR?

- [X] Updates to documentation (thanks to @ SChakravorti21 )
- [X] Moving `ChunkResolver` to public API, and updating all references to it in the code

### Are these changes tested?

There seemed to be comprehensive tests already: https://github.com/apache/arrow/blob/main/cpp/src/arrow/chunked_array_test.cc#L324 If an edgecase is missing, I'd be happy to add it.

### Are there any user-facing changes?

`ChunkResolver` and `TypedChunkLocation` are now in the public API.
* GitHub Issue: #34535

Lead-authored-by: Anja Kefala <[email protected]>
Co-authored-by: anjakefala <[email protected]>
Co-authored-by: Bryce Mecum <[email protected]>
Co-authored-by: SChakravorti21 <[email protected]>
Co-authored-by: SChakravorti21<[email protected]>
Co-authored-by: Jacob Wujciak-Jens <[email protected]>
Signed-off-by: Felipe Oliveira Carvalho <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants