Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Snapshots table metadata #524

Merged
merged 3 commits into from
Mar 21, 2024
Merged

Conversation

Fokko
Copy link
Contributor

@Fokko Fokko commented Mar 15, 2024

No description provided.

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woot! this is great, first metadata table

tests/integration/test_writes.py Show resolved Hide resolved
@Fokko Fokko mentioned this pull request Mar 18, 2024
8 tasks
Copy link
Contributor

@HonahX HonahX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! This is a great start for metadata table! @Fokko.

Just have one question: I was thinking if later we need those metadata table classes, StaticTableScan, and StaticDataTask like what Java did. These may become useful when other engines (Daft, Ray) wants to represent the metadata tables in their dataframe. But since the metadata tables are normally not very large so using pyarrow as a bridge may be enough?

@Gowthami03B
Copy link
Contributor

@Fokko Can we merge this? I am almost done with "Files" table, so I can rebase my code before creating a PR.

@Fokko
Copy link
Contributor Author

Fokko commented Mar 21, 2024

Just have one question: I was thinking if later we need those metadata table classes, StaticTableScan, and StaticDataTask like what Java did. These may become useful when other engines (Daft, Ray) wants to represent the metadata tables in their dataframe. But since the metadata tables are normally not very large so using pyarrow as a bridge may be enough?

I'm open for that, but I would like to defer that to a later PR. I don't like the hard dependency on PyArrow, and would love to get rid of that, but I'm not sure what the best format is then. An Arrow table can be used in most engines without any copying.

I'll move this forward so @Gowthami03B can continue her work.

@Fokko Fokko merged commit 69b9e39 into apache:main Mar 21, 2024
7 checks passed
@Fokko Fokko deleted the fd-snapshots-table-metadata branch March 21, 2024 20:09
@kevinjqliu kevinjqliu mentioned this pull request May 14, 2024
39 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants