-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Filter out marker files from glob scan #1999
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1999 +/- ##
==========================================
+ Coverage 84.67% 84.70% +0.03%
==========================================
Files 58 58
Lines 6363 6363
==========================================
+ Hits 5388 5390 +2
+ Misses 975 973 -2 |
src/daft-io/src/object_store_glob.rs
Outdated
@@ -309,6 +309,15 @@ fn _should_return(fm: &FileMetadata) -> bool { | |||
{ | |||
false | |||
} | |||
// These are marker files used by Spark that we don't want to return | |||
FileType::File | |||
if fm.filepath.ends_with("_metadata") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do a .lower()
on the file names and have these stored as constants?
Ideally it should also be configurable by the user but I think we can ignore that for now.
src/daft-io/src/object_store_glob.rs
Outdated
FileType::File | ||
if MARKER_SUFFIXES | ||
.iter() | ||
.any(|suffix| fm.filepath.to_lowercase().ends_with(suffix)) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for _metadata
, _common_metadata
, and _success
this may be bug prone.
Imagine you have a parquet file called image_metadata
. this would filter that file out. Instead you want to find an exact match of filename
where filename
is basepath / filename
for those 3.
Closes #1987