Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add stdlib-compatible shim layer. #114

Open
asford opened this issue Aug 25, 2019 · 11 comments
Open

Add stdlib-compatible shim layer. #114

asford opened this issue Aug 25, 2019 · 11 comments

Comments

@asford
Copy link
Contributor

asford commented Aug 25, 2019

@martindurant This is a trial-balloon issue for this idea, mostly to see if fsspec is the "right" place for this feature and/or if I've missed an existing implementation. If so, I've a working proof-of-concept built off an older version of dask that could serve as a starting-point PR.

fsspec is a great module for new development or projects built on the existing dask ecosystem and enables an amazing S3-is-my-filesystem paradigm. However, the vast majority of projects make use of the python's built-in file operations, and are tightly coupled to a local filesystem. This causes major development friction when integrating existing third-party libraries into a project, as one almost invariably needs to work out an integration-specific flow between local files and remote storage.

One can workaround this problem with FUSE-based mounts, however this complexifies deployment and containerization. Alternatively, any of several (fsspec, smart_open, pyfilesystem2, et. al.) filesystem abstractions could be used, but updating a third-party component to an alternative filesystem interface is a painful and risky development lift. One either needs to maintain a private fork 😳 or open a massive and risky PR 😬.

I've found that a solid majority of filesystem use cases are covered by a relatively small set of operations, all of which is already covered by fsspec. By providing strictly-compatible shims for a small set of the stdlib (eg: open, os.path.exists, os.remove, glob.glob, shutil.rmtree, et. al.) and then swapping these via import level changes one can quickly teach most libraries to seamlessly interact with all the file systems supported by fsspec.

This shim layer would mandate strict adherence to standard library semantics for local file operations, likely by directly forwarding all local paths into the standard library and forwarding non-local paths through fsspec-based implementations. The explicit goal would be to enable a majority of basic use cases, deferring to fsspec interfaces for more robust integration and/or specialized use cases. This would turn fsspec into a massively useful layer for updating existing systems to cloud-compatible storage, as updating a library to support s3 and gcsfs would be as simple as:

try:
    from fsspec.stdlib import open
    import fsspec.stdlib.os.path as os.path
    import fsspec.stdlib.shutil as shutil
except ImportError:
    import os.path
    import shutil
@martindurant
Copy link
Member

Seems a reasonable thing to try - I don't know how much demand there is for this. Of course, I planned the API of the library to try to be a reasonable set of methods with names and functionality inspired from a number of sources, the stdlib being just one.

@asford
Copy link
Contributor Author

asford commented Aug 26, 2019

Agreed on the existing demand, there's n=3 folks I work with but this may not be a widespread solution. Would you be open to PR here for this idea?

@asford
Copy link
Contributor Author

asford commented Sep 19, 2019

I've prototyped this a bit more internally think the best approach would be a fsspec-backed implementation of the pathlib.Path interface. There's already some prior art implementing this interface for http url paths, which could be generalized comparatively easily for fsspec paths.

Would you be open to having that land in fsspec?

@martindurant
Copy link
Member

Sure, I'm not opposed so long as it doesn't affect non-Path users. I don't think there are other people asking for it, but I could easily be wrong.

@asford
Copy link
Contributor Author

asford commented Sep 19, 2019

Yes, this would be a layer sitting above the current fsspec implementation and wouldn't impact the existing APIs. I don't think there's a major outstanding call for this; I'm primarily looking for a logical spot for features I've already built over fsspec and think this would be a logical (and discoverable) home.

@ghost
Copy link

ghost commented Nov 3, 2019

+1 - I've been using pathlib pretty heavily and just found / started using fsspec, so I'd find a pathlib compatible API shim quite useful.

@martindurant
Copy link
Member

Perhaps between the two of you you could work out something?

@ghost
Copy link

ghost commented Nov 15, 2019

Here's a (mostly) pathlib compatible wrapper that I'm using. I take it there's not a ton of interest, so I'll just throw it out there as a gist in case it is of use to anyone.

@martindurant
Copy link
Member

That is interesting. If you would like to post it as a PR, it may get more attention - if you feel it is mostly ready for the public, of course.

@quantology
Copy link

quantology commented Apr 6, 2020

fwiw, I'm using this gist to shim pytest-datadir to work with s3. I'd certainly +1 this being included in fsspec or being its own module.

@martindurant
Copy link
Member

Yes, please, someone wrap this as a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants