Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-101362: Optimise PurePath(PurePath(...)) #101667

Merged

Conversation

barneygale
Copy link
Contributor

@barneygale barneygale commented Feb 7, 2023

The previous _parse_args() method pulled the _parts out of any supplied PurePath objects; these were subsequently joined in _from_parts() using os.path.join(). This is actually a slower form of joining than calling fspath() on the path object, because it doesn't take advantage of the fact that the contents of _parts is normalized!

This reduces the time taken to run PurePath("foo", "bar") by ~20%, and the time taken to run PurePath(p, "cheese"), where p = PurePath("/foo", "bar", "baz"), by ~40%.

Automerge-Triggered-By: GH:AlexWaygood

The previous `_parse_args()` method pulled the `_parts` out of any
supplied `PurePath` objects; these were subsequently joined in
`_from_parts()` using `os.path.join()`. This is actually a slower form of
joining than calling `fspath()` on the path object, because it doesn't take
advantage of the fact that the contents of `_parts` is normalized!

This reduces the time taken to run `PurePath("foo", "bar") by ~20%, and
the time taken to run `PurePath(p, "cheese")`, where
`p = PurePath("/foo", "bar", "baz")`, by ~40%.
@barneygale
Copy link
Contributor Author

barneygale commented Feb 25, 2023

@zooba per our discussion in #101362, I've tweaked the docs in this PR. I'm trying to convey that we treat PurePath objects like any other kind of os.PathLike object, which implies we call os.fspath(). Let me know if you'd like us to be more explicit? But I don't want to consume too much docs space on this, as it's a bit off the trodden path, and it stops making sense when paths have drives IMO.

@zooba
Copy link
Member

zooba commented Feb 27, 2023

I guess my only question now is whether join handles differing types the same way as parse_args used to? Specifically, each element is converted to a string before being joined, rather than failing with TypeError.

If it works the same, then I have no concerns.

@barneygale
Copy link
Contributor Author

os.path.join() requires that you don't mix str and bytes arguments (after fspath()) - it raises TypeError if you do. But it supports subclasses of str and bytes and won't convert them to true strings/bytes.

pathlib requires that every argument is a str (after fspath()). It also supports subclasses of str, and will convert them to true strings.

I think this patch maintains all previous behaviour. You still get a TypeError if any argument you supply, after fspath(), isn't a subclass of str.

@barneygale
Copy link
Contributor Author

Actually, there don't seem to be good tests for this. I'll write some!

@zooba
Copy link
Member

zooba commented Feb 28, 2023

LGTM!

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! A few minor suggestions

Doc/library/pathlib.rst Outdated Show resolved Hide resolved
Lib/pathlib.py Outdated Show resolved Hide resolved
Lib/test/test_pathlib.py Outdated Show resolved Hide resolved
Lib/pathlib.py Outdated Show resolved Hide resolved
Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@barneygale
Copy link
Contributor Author

Thanks so much, Alex! \o/

@AlexWaygood AlexWaygood added the performance Performance or resource usage label Mar 5, 2023
@miss-islington
Copy link
Contributor

Status check is done, and it's a success ✅.

@miss-islington miss-islington merged commit 6716254 into python:main Mar 5, 2023
hugovk pushed a commit to hugovk/cpython that referenced this pull request Mar 6, 2023
The previous `_parse_args()` method pulled the `_parts` out of any supplied `PurePath` objects; these were subsequently joined in `_from_parts()` using `os.path.join()`. This is actually a slower form of joining than calling `fspath()` on the path object, because it doesn't take advantage of the fact that the contents of `_parts` is normalized!

This reduces the time taken to run `PurePath("foo", "bar")` by ~20%, and the time taken to run `PurePath(p, "cheese")`, where `p = PurePath("/foo", "bar", "baz")`, by ~40%.

Automerge-Triggered-By: GH:AlexWaygood
carljm added a commit to carljm/cpython that referenced this pull request Mar 6, 2023
* main: (21 commits)
  pythongh-102192: Replace PyErr_Fetch/Restore etc by more efficient alternatives in sub interpreters module (python#102472)
  pythongh-95672: Fix versionadded indentation of get_pagesize in test.rst (pythongh-102455)
  pythongh-102416: Do not memoize incorrectly loop rules in the parser (python#102467)
  pythonGH-101362: Optimise PurePath(PurePath(...)) (pythonGH-101667)
  pythonGH-101362: Check pathlib.Path flavour compatibility at import time (pythonGH-101664)
  pythonGH-101362: Call join() only when >1 argument supplied to pathlib.PurePath() (python#101665)
  pythongh-102444: Fix minor bugs in `test_typing` highlighted by pyflakes (python#102445)
  pythonGH-102341: Improve the test function for pow (python#102342)
  Fix unused classes in a typing test (pythonGH-102437)
  pythongh-101979: argparse: fix a bug where parentheses in metavar argument of add_argument() were dropped (python#102318)
  pythongh-102356: Add thrashcan macros to filter object dealloc (python#102426)
  Move around example in to_bytes() to avoid confusion (python#101595)
  pythonGH-97546: fix flaky asyncio `test_wait_for_race_condition` test (python#102421)
  pythongh-96821: Add config option `--with-strict-overflow` (python#96823)
  pythongh-101992: update pstlib module documentation (python#102133)
  pythongh-63301: Set exit code when tabnanny CLI exits on error (python#7699)
  pythongh-101863: Fix wrong comments in EUC-KR codec (pythongh-102417)
  pythongh-102302 Micro-optimize `inspect.Parameter.__hash__` (python#102303)
  pythongh-102179: Fix `os.dup2` error reporting for negative fds (python#102180)
  pythongh-101892: Fix `SystemError` when a callable iterator call exhausts the iterator (python#101896)
  ...
carljm added a commit to carljm/cpython that referenced this pull request Mar 7, 2023
* main: (37 commits)
  pythongh-102192: Replace PyErr_Fetch/Restore etc by more efficient alternatives in sub interpreters module (python#102472)
  pythongh-95672: Fix versionadded indentation of get_pagesize in test.rst (pythongh-102455)
  pythongh-102416: Do not memoize incorrectly loop rules in the parser (python#102467)
  pythonGH-101362: Optimise PurePath(PurePath(...)) (pythonGH-101667)
  pythonGH-101362: Check pathlib.Path flavour compatibility at import time (pythonGH-101664)
  pythonGH-101362: Call join() only when >1 argument supplied to pathlib.PurePath() (python#101665)
  pythongh-102444: Fix minor bugs in `test_typing` highlighted by pyflakes (python#102445)
  pythonGH-102341: Improve the test function for pow (python#102342)
  Fix unused classes in a typing test (pythonGH-102437)
  pythongh-101979: argparse: fix a bug where parentheses in metavar argument of add_argument() were dropped (python#102318)
  pythongh-102356: Add thrashcan macros to filter object dealloc (python#102426)
  Move around example in to_bytes() to avoid confusion (python#101595)
  pythonGH-97546: fix flaky asyncio `test_wait_for_race_condition` test (python#102421)
  pythongh-96821: Add config option `--with-strict-overflow` (python#96823)
  pythongh-101992: update pstlib module documentation (python#102133)
  pythongh-63301: Set exit code when tabnanny CLI exits on error (python#7699)
  pythongh-101863: Fix wrong comments in EUC-KR codec (pythongh-102417)
  pythongh-102302 Micro-optimize `inspect.Parameter.__hash__` (python#102303)
  pythongh-102179: Fix `os.dup2` error reporting for negative fds (python#102180)
  pythongh-101892: Fix `SystemError` when a callable iterator call exhausts the iterator (python#101896)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage topic-pathlib
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants