Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Support Hudi reader #2011

Merged
merged 12 commits into from
Apr 1, 2024
Merged

[FEAT] Support Hudi reader #2011

merged 12 commits into from
Apr 1, 2024

Conversation

xushiyan
Copy link
Contributor

@xushiyan xushiyan commented Mar 13, 2024

  • Add basic Hudi APIs
  • Add basic Hudi COW reader support

for #2070

Copy link

codecov bot commented Mar 13, 2024

Codecov Report

Attention: Patch coverage is 88.86311% with 48 lines in your changes are missing coverage. Please review.

Project coverage is 84.96%. Comparing base (3c43394) to head (9d99688).
Report is 7 commits behind head on main.

❗ Current head 9d99688 differs from pull request most recent head 34ed977. Consider uploading reports for the commit 34ed977 to get more accurate results

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2011      +/-   ##
==========================================
+ Coverage   84.72%   84.96%   +0.24%     
==========================================
  Files          62       68       +6     
  Lines        6840     7271     +431     
==========================================
+ Hits         5795     6178     +383     
- Misses       1045     1093      +48     
Files Coverage Δ
daft/__init__.py 24.24% <ø> (ø)
daft/hudi/pyhudi/timeline.py 100.00% <100.00%> (ø)
daft/io/__init__.py 95.65% <100.00%> (+0.19%) ⬆️
daft/io/_hudi.py 100.00% <100.00%> (ø)
daft/hudi/pyhudi/table.py 99.15% <99.15%> (ø)
daft/hudi/pyhudi/utils.py 98.30% <98.30%> (ø)
daft/hudi/pyhudi/filegroup.py 95.00% <95.00%> (ø)
daft/hudi/hudi_scan.py 66.14% <66.14%> (ø)

@xushiyan xushiyan marked this pull request as ready for review March 20, 2024 07:26
Copy link
Contributor

@jaychia jaychia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems pretty good to me overall!

We're happy to keep code in Daft for now in the daft/hudi/pyhudi folder, but we should probably chat about how you envision the split and API between our libraries.

Couple of important things we should verify:

  • Does this work with S3? We should try reading a Hudi table from AWS S3 and make sure that works. We do have some examples of integration tests using minio too that you can look at if you want a locally runnable example (see: tests/integration/iceberg)
  • Does this work with the full range of Hudi's typesystem?
  • We should also throw an error early if there features we don't support and detect that early (e.g. merge-on-read)

daft/hudi/hudi_scan.py Outdated Show resolved Hide resolved
daft/hudi/pyhudi/table.py Outdated Show resolved Hide resolved
daft/hudi/hudi_scan.py Outdated Show resolved Hide resolved
@xushiyan xushiyan requested a review from jaychia March 28, 2024 17:06
Copy link
Contributor

@jaychia jaychia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!!! 🚀 🚀 🚀 🚀

tests/io/hudi/test_table_read.py Outdated Show resolved Hide resolved
tests/io/hudi/test_table_read.py Show resolved Hide resolved
@jaychia jaychia added the enhancement New feature or request label Apr 1, 2024
@jaychia jaychia merged commit ba94773 into Eventual-Inc:main Apr 1, 2024
28 of 29 checks passed
@xushiyan xushiyan deleted the read-hudi branch April 1, 2024 23:49
@xushiyan xushiyan mentioned this pull request Apr 1, 2024
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants