-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEAT] Native glob
functionality
#1450
Conversation
6c8657e
to
a7101b8
Compare
41363d4
to
83391ef
Compare
} | ||
Ok(PyList::new(py, to_rtn)) | ||
} | ||
|
||
#[pyfunction] | ||
fn io_list( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets drop the io_list functionality
let full_fragment = GlobFragment::new(glob); | ||
if !full_fragment.has_special_character() { | ||
let glob = full_fragment.escaped_str().to_string(); | ||
return Ok(stream! { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for a stream!
macro here. you should be able to use StreamExt::filter
source.iter_dir(glob.as_str(), Some("/"), None).await?
.filter(|fm| Ok(matches!(fm.filetype, FileType::Directory)))
.boxed()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been having some trouble with streams and lifetimes. The stream!
macro seems to correctly move values into the Future so that it owns it? For instance:
return Ok(source.iter_dir(glob.as_str(), Some("/"), None).await?.boxed());
The above throws an error:
cannot return value referencing local variable `source`
returns a value referencing data owned by the current function
I'm guessing this is because source
is on the local stack, and does not correctly move into the returned Future since the function signature here is: async iter_dir(&self, ...)
and the returned Future holds a reference to the local stack's source
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One workaround I found is to change the function signature of iter_dir
to: iter_dir(self: Arc<Self>, ...) -> BoxStream<'a , ...> where Self: 'a
This forces the BoxStream to own an Arc<dyn ObjectSource>
instead of &dyn ObjectSource
, so it doesn't hold a reference to some local Stack variable.
This in turn lets us return BoxStreams that contain local Arc<dyn ObjectSource>
objects. Otherwise, we will have to immediately consume them in the local block before the Arc goes out of scope.
Seems a little heavy-handed though? Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's cause stream!
is moving all the values into a closure whereas source in source.iter_dir
is being coerced into a reference. You can do your own explict move closure instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can go through this tomorrow in person, I wasn't able to get lifetimes working correctly...
return stream!{ let s = source.iter_dir(...); while let Some(_) = s.next().await{...} };
moves source
into a wrapper stream, which then owns both the source
and the resultant BoxStream
from source.iter_dir
, making this new stream legal to return.
Returning the stream directly however (return source.iter_dir(...).filter(...).boxed();
) returns a stream that holds a reference to source
which is on the local stack. Even if I wrap this in a move closure, source
will still remain on the local stack.
@jaychia I also don't see any test cases for delimited special characters or local filepaths. I'm a little worried about the windows local filepath globing so we should add a few test cases there. They should also not be under the integration test path since they don't require fixures. |
This PR only implements tests for S3, I'm going to tackle each other backend separately in follow-up PRs. |
…ck against glob before recursing
887e30b
to
428f1b3
Compare
Adds a new
io_glob
Python function that performs globbing using the rules provided by the globset crate (https://docs.rs/globset/latest/globset/)