-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Scanning hive partitioned files where hive columns are partially included in the file #18626
Conversation
crates/polars-io/src/hive.rs
Outdated
}; | ||
// Insert these hive columns in the order they are stored in the file. | ||
for s in hive_columns { | ||
let i = match df.get_columns().binary_search_by_key( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels very expensive? Is there a way we can make a hashmap up front to amortize cost?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I didn't like this either, I have cooked something up and pushed it in a commit 😄
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #18626 +/- ##
=======================================
Coverage 79.91% 79.91%
=======================================
Files 1506 1506
Lines 203047 203086 +39
Branches 2889 2891 +2
=======================================
+ Hits 162271 162306 +35
- Misses 40226 40230 +4
Partials 550 550 ☔ View full report in Codecov by Sentry. |
No description provided.