Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement partitioned read in listing table provider #1139

Closed
rdettai opened this issue Oct 18, 2021 · 0 comments · Fixed by #1141
Closed

Implement partitioned read in listing table provider #1139

rdettai opened this issue Oct 18, 2021 · 0 comments · Fixed by #1141
Assignees
Labels
enhancement New feature or request

Comments

@rdettai
Copy link
Contributor

rdettai commented Oct 18, 2021

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
It is usual to organize data files by partitions. There are many ways to do that, but hive partitioning is the most common:

/table_path/customer=1/year=2020/file001.parquet
...
/table_path/customer=1/year=2020/file009.parquet
/table_path/customer=2/year=2020/filexxx.parquet
/table_path/customer=1/year=2021/filexxx.parquet
/table_path/customer=3/year=2021/filexxx.parquet

Describe the solution you'd like
In the ListingTableProvider, when resolving the list of files:

  • their path should be parsed. The PartitionedFile will contain the value of all of the partition dimensions.
  • files that belong to partitions that can be excluded by the filter should be ignored

Additional context
Closing #133 and #204 in favor of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant