-
Notifications
You must be signed in to change notification settings - Fork 353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LocalFileSystem fix _strip_protocol for root directory #1477
Conversation
I'd like to see this go into the |
The trouble here is, that for real filesystems (local, ftp, ssh, ...), files cannot e d with "/", even though directory names are sometimes printed with the extra character. So for those, stripping is always the right thing to do. Similarly, in s3 and gcs, "/" is the "separation" character used with the remote API and also has a special place, if not exactly the same. So I think I would for now recommend that implementations that need it should provide their own _strip_protocol rather than changing the default one. |
I see your point, but on the other hand, trailing slashes are also the defining hint of directories in many implementations - in a different world, in those implementations, all file-ops could raise In my case, I am working with URIs similiar to S3, of the structure But since the subclass implementation works fine in our case, I see no harm in keeping that in place for now. |
My apologies if the updates have been noisy. I'm a novice when it comes to git and pull requests. The other history has been removed when I switched to using a nested function so that remove_trailing_slash is always respected. I note there is an error for test_fsmap_error_on_protocol_keys.
I haven't used
should be without the trailing separator?
|
fsspec/implementations/local.py
Outdated
return path.rstrip("/") if remove_trailing_slash else path | ||
|
||
|
||
def _make_path_posix(path, sep=os.sep): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why split up the function? The windows-root check is a rare one and should ideally be checked after more common things. This would be a second case within "# windows full path"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My reasoning was that the original make_path_posix
function has several return statements. Wrapping the original function enables detection of special edge cases but also adds support for stripping the trailing suffix where appropriate.
fsspec/implementations/local.py
Outdated
path = _make_path_posix(path, sep) | ||
if os.name == "nt" and len(path) == 3 and path[1] == ":": | ||
return path # windows root | ||
return path.rstrip("/") if remove_trailing_slash else path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably this i what breaks the mapper test; the key in question I think would have hit "if path.startswith("/"):" previously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trouble here is, that for real filesystems (local, ftp, ssh, ...), files cannot e d with "/", even though directory names are sometimes printed with the extra character.
The filename ends with a trailing '/'. Is it supposed to?
I should mention that I also added that check in open, however the issue still occurs if the check is disabled.
The issue can be resolved by stripping the trailing suffix in mapping.py->__getitem__
. Doing doesn't appear to cause any regression.
k = self._key_to_str(key).rstrip("/")
…th_posix. make_path_posix with some optimisation tweaks such as change re.match to a string comparison. Should also properly format concatenated windows paths as posix.
…with all arguments
test_fsmap_access_with_suffix - enable for windows.
- now converts to string if not.
|
fsspec/mapping.py
Outdated
self.root = fs._strip_protocol(root).rstrip("/") | ||
try: | ||
self.root = fs._strip_protocol(root, remove_trailing_slash=True) | ||
except TypeError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unfortunate; in what case does rstrip
fail? In this case, I think we might be explicitly joining on the separator anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Windows if the root is something like c:/
, rstrip
will cause a root of c:
which is treated like '.', the current working directory instead of the root.
On posix rstrip
will make /
change to
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only place that the special case for LocalFileSystem leaks, and I don't like it :|. I think it may be better to let through strange behaviour if the user thinks they want to may their whole C drive (and gets only cwd files instead).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed this back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But without .rstrip("/")
make_path_posix - removed handling of `file://' (should be done in _strip_protocol).
py38 does not support removeprefix
|
I think |
disable test_jupyter.test_simple for windows
The latest changes remove |
@@ -38,6 +38,7 @@ def jupyter(tmpdir): | |||
P.terminate() | |||
|
|||
|
|||
@pytest.mark.skipif(WIN, reason="Subprocess gets stuck in a loop") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't notice this before. Any idea what's going on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found the problem - Something to do with the temp dir passed to Jupyterlab in the launch call showing up as an invalid path. Surrounding the path with "" fixed it. The test now passes when I run it on Windows.
cmd = f'jupyter notebook --notebook-dir="{tmpdir}" --no-browser --port=5566'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line was to be removed according to the thread above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite right - I made the changes but forgot to push them. It is done now.
please merge from master to pick up the constraint on pytest that should fix CI |
Did you mean something other than to sync the fork? |
That's exactly what I meant - sorry I didn't get a notification |
No problem. |
@nils-braun , @a24lorie , any idea why DBFS should start failing, but only for some python versions? It is also happening on other PRs. @fleming79 , you may wish to take the commit from #1533 , which should at least squash the intermittent SMB failures. |
@martindurant I have found this to be an issue with the pytest-vcr and urllib3 version 2.x with python 3.10 and 3.11. |
Please update your fork following #1533 |
I synced the fork with GitHub. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry this has been taking so long...
I wonder, since we now are essentially calling _strip_protocol from within LocalFileSystem, would it be easy to split off the optional remove_trailing_slash= into a separate method, to maintain the previous signature? I am still not sure what the best default is, since in most places we think of paths not having terminal sep characters even when they refer to directories.
@@ -38,6 +38,7 @@ def jupyter(tmpdir): | |||
P.terminate() | |||
|
|||
|
|||
@pytest.mark.skipif(WIN, reason="Subprocess gets stuck in a loop") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line was to be removed according to the thread above
fsspec/mapping.py
Outdated
self.root = fs._strip_protocol(root).rstrip("/") | ||
try: | ||
self.root = fs._strip_protocol(root, remove_trailing_slash=True) | ||
except TypeError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only place that the special case for LocalFileSystem leaks, and I don't like it :|. I think it may be better to let through strange behaviour if the user thinks they want to may their whole C drive (and gets only cwd files instead).
The fork has been synced, with no other changes. |
:) |
This modifies LocalFileSystem to fix #945.
All local tests pass on Windows.