Optimize path joins with os.sep.join(...). #515
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Extends on work started in #511. In a few performance-critical locations that use internal paths (not user-provided paths), we can sacrifice the safety benefits of
os.path.join(*paths)
in favor of the speedieros.sep.join(paths)
.This yields about a 20-25% speedup when iterating over a project without accessing job state points.
We can only apply this optimization in places that deal with internal paths, like "project_workspace / job_id" but not like
job.fn(some_user_path)
. User paths must be accessed throughos.path.join
, which includes the project workspace (which is configurable). Regardless, we are able to safely handle the most important cases (the job workspace and the job state point filename) that are called O(N) times.Benchmarks
Below I have benchmarks of the following script, which just iterates over the project without accessing any job properties. There are 1000 jobs in the workspace.
Before
The call to
os.path.join
(displayed asposixpath.py:71(join)
) takes a chunk of time, which is eliminated in the "After".After
The bulk of the iteration is
stat
calls, which cannot be further optimized. 👍Motivation and Context
Seeking some final obvious optimizations for synced collections (#484).
Types of Changes
1The change breaks (or has the potential to break) existing functionality.
Checklist:
If necessary: