-
-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent behavior of pathlib.WindowsPath with drive paths #80486
Comments
The behavior of WindowsPath is undefined when used with drive paths, specifically with no trailing slash: WindowsPath('cc:').absolute() -> WindowsPath('C:/Users/maor/cc:') |
This is correct. "cc:" is not a drive, i.e. the Note that Windows allows a trailing colon for DOS devices such as "con:". Colons can also be used for file streams, but "cc:" isn't valid. The anonymous stream can only be specified explicitly by including the stream type, e.g. "cc::$DATA".
These are correct. Drive-relative paths depend on the current working directory of the drive. By definition they cannot be absolute. And the last one can be no more resolved than WindowsPath('spam/eggs') / 'p'. Note that Windows gets the working directory for drives from 'hidden' environment variables such as "=C:". If no value is set for a drive, it defaults to the root directory. Windows uses these values, but never sets them itself. Applications and libraries such as the C runtime library are responsible for setting the values. CMD and Python both set them, but PowerShell does not.
This is a bug. absolute() should resolve "C:" to the current directory on the drive and join it with the remaining parts. |
I agree.
|
Update after editing my PR - the bugs are:
It'd be great if someone could review the PR so we can make progress with fixing the bugs. |
(Note: I consider all of these to be *extremely* obscure corner cases)
[Note there is no absolute() method - I assume you mean resolve()] Why? If './b:a' resulted from joining '.' and 'b:a', then it seems to To put it another way, I'm comfortable with WindowsPath("./b:a") I'd be reluctant to "solve" this issue with a fix that doesn't address
Same as (2)
I don't think we should worry about the PR until it's clearly |
In conclusion, I stand by my original fix offers. They are correct according to windows' documentation and behavior. |
Huh, weird. It's not in
OK, sure. My point is that "relative path to the current directory on
Windows is inconsistent here - it can *also* be interpreted as a path
That second case only results in a valid filename if b:a is viewed as Also, in effect it means that Path(".") / some_path can return a
I remain of the view that the Windows documentation introduces a I'm happy to agree to differ on this point, though. If the new BTW, was there an actual use case for this issue, or was it simply a |
In pathlib, each path object has a drive, a root and path parts. Seeing pathlib's logic and tests (many tests are specifically testing the behavior of drive-relative paths), I think pathlib completely supports drive relative paths, with the exception of a few small bugs.
I think that when parsing windows paths, pathlib should behave exactly like the windows API does. This is crucial for interaction with the windows API itself or with other applications that might use it. I don't see any other way to parse windows paths other than according to the normal windows behavior.
This behavior is completely normal. Should WindowsPath('C:\\f') / WindowsPath('D:\\f2') return anything other than WindowsPath('D:/f2')?
As I wrote earlier, I think this is incorrect as the pathlib.Path class holds the attributes _drv, _root and _parts, which allows it to fully support drive-relative paths, by having a _drv and not having a _root.
I'm still that my case is convincing enough, but if not - does that require me to make any changes in order to make progress with my PR?
I've had an annoying bug using pathlib, traced it to the first bug I've presented in this issue, and discovered a few similar unhandled edge cases. Again, the "bugginess" I set upon to fix (call it a bug or an undefined behavior) is an incompatibility issue with the way paths are normally treated in windows. |
I'm not going to block this PR. I'd prefer it if we at least Ultimately the only person whose views matter are yours (as the person |
Alright, documentation is always good :) |
Paul, I agree that joining Path(".") and Path("c:a") should yield Path("c:a"). However, I disagree that we should always be able to construct Path("a/b") from components Path("a") and Path("b"). It doesn't apply to Path("./c:a"). There's no way to split it up as the joining of two pathlib paths because there is no way to represent "c:a" by itself as anything other than a drive-relative path. The name "./c:a" has to be taken as a unit, which is fundamentally different from "c:a". pathlib agrees: >>> p1 = Path('./c:a')
>>> p1
WindowsPath('c:a')
>>> [p1.drive, p1.root, p1.parts]
['', '', ('c:a',)]
>>> p2 = Path('.') / Path('c:a')
>>> p2
WindowsPath('c:a')
>>> [p2.drive, p2.root, p2.parts]
['c:', '', ('c:', 'a')] Path('./c:a') is correctly parsed as a relative filename (no root and no drive). So, if it helps any, on the PR I wasn't requesting to change how it's parsed. The ambiguity is due to the way pathlib always collapses all "." components. I would like it to retain an initial "." component. That way the string representation will come out correctly as ".\\c:a" as opposed to the drive-relative path "c:a". Some Windows API and runtime library functions behave differently depending whether a relative path has a leading "." or ".." component. We're at a disadvantage if we throw this information away. For example, "./spam/eggs.ext" and "spam/eggs.ext" can yield different results when searching for the file via SearchPathW, CreateProcessW (if using lpCommandLine, not lpApplicationName), or LoadLibraryExW (data/image DLL loading, not normal module loading). "./spam/eggs.ext" will be resolved relative to the process working directory, but "spam/eggs.ext" will be tried against every directory in the default file, executable, or library search path, which may not even include the working directory. (The latter behavior is unique to Windows. POSIX never searches for a name with a slash in it.) The CreateProcessW case is a generalization of the case that we're used to across various platforms, in which, for the sake of security, the "." entry is excluded from PATH. In this case, the only way to run an executable in the working directory is to reference it explicitly. For example (in Linux): >>> p = Path('./test.sh')
>>> open(p, 'w').write('#!/bin/sh\necho spam\n')
20
>>> os.chmod(p, 0o700)
>>> subprocess.call(['./test.sh'])
spam
0
>>> try: subprocess.call([str(p)])
... except FileNotFoundError: print('eggs')
...
eggs
>>> str(p)
'test.sh' This would work if pathlib kept the initial "." component. An example where we currently retain information that's not obviously needed is with ".." components. Even Path.absolute() retains ".." components. It's important in POSIX. For example, "spam/../eggs" shouldn't be reduced to "eggs" because "spam" might be a symlink. This doesn't generally matter in Windows, since it normalizes paths in user mode as strings before they're passed to the kernel, but we still don't throw the information away because it could be useful to code that implements POSIX-like behavior. |
Ah, I think I follow now. But I'm not sure what you mean by wanting it
Yes, I see your point now. Whether the initial string representation
Thanks, this is a really useful example, as it makes it clear that
Yes. Maybe not stripping an initial ./ can be modelled on that example
The exact behaviour needs to be clarified, of course: Path('./a')
Path('./a:b')
Path('.', 'a')
Path('./', 'a')
Path('.', '.', 'a') ... etc. We should have well-defined behaviour for all of these (I'm Having said all of this, I'm not at all sure how much it relates to |
I've dealt with that in my PR by checking in the __str__ method if the path could be misinterpreted, and if so prepending a './'.
In my opinion, this is a different problem, that its solution doesn't necessarily belong in pathlib. Pathlib is responsible to parse, manipulate and display paths correctly (which it fails to do with the case of Path('./a:b') -> Path('a:b')). In contrast, in the case of CreateProcessW, both Path('./exec') and Path('exec') are the *correct* paths to the executable. The ./ is not necessary in that case to display the path correctly, but just to interact correctly with CreateProcessW's interface.
I completely agree that any change made to support retaining the initial './' for process invocation should be made in a different bpo item and PR. I do disagree though, that such change should be made, as like I wrote above, this is only needed for CreateProcessW's interface, and isn't related to pathlib's logic itself. |
What can we do in order to make progress here? |
It's semantic information, just for different reasons than the "./c:a" case. In this case, it's about what a path means in a search context. I think specifying a filesystem path for a search disposition is as valid a use case as is specifying a path for an open or create disposition. In Windows, the qualified path r".\exec" is resolved relative to just the working directory. Classically, for an unqualified name such as "exec", the working directory is checked after the application directory. Starting with Windows Vista, CreateProcessW and the CMD shell won't check the working directory at all for unqualified "exec" if NoDefaultCurrentDirectoryInExePath is set in the environment, not unless a "." entry is explicitly added to PATH. The behavior of CreateProcessW also differs for r".\spam\exec" vs r"spam\exec". Without explicitly including the leading ".", the system searches every directory in the executable search path (i.e. the application directory, working directory, system directories, and PATH). This differs from Unix, in which any path with a slash in it is always resolved relative to the working directory. Getting this behavior in Windows requires using the qualified path r".\spam\exec". We may want either behavior. I think if a path is specified explicitly with a leading "." component, or joined from one, the "." component should be preserved, just as we already preserve a leading ".." component. This is a separate issue, as I suggested in the PR comment. It was just supposed to be a related issue that you might be interested in. I didn't intend to conflate it with this issue. |
Alright. And what is left to do with the current issue? |
Is this related to the weird paths I am seeing when getting different output in venv compared to without venv: from pathlib import Path
filename = Path(__file__)
filename2 = Path('C:\\path\\to\\file.py')
print(filename)
print(filename2) Where the result is: Venv run: |
This is probably not related, sounds like something caused by the venv implementation. On a different note, how do I get my PR reviewed? (#12361) |
This can be solved quite simply once #101667 lands. |
It looks like |
…102454) Co-authored-by: Maor Kleinberger <[email protected]>
pythonGH-102454) Co-authored-by: Maor Kleinberger <[email protected]>
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
Linked PRs
The text was updated successfully, but these errors were encountered: