Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize path joins with os.sep.join(...). #515

Merged
merged 2 commits into from
Feb 22, 2021
Merged

Conversation

bdice
Copy link
Member

@bdice bdice commented Feb 20, 2021

Description

Extends on work started in #511. In a few performance-critical locations that use internal paths (not user-provided paths), we can sacrifice the safety benefits of os.path.join(*paths) in favor of the speedier os.sep.join(paths).

This yields about a 20-25% speedup when iterating over a project without accessing job state points.

We can only apply this optimization in places that deal with internal paths, like "project_workspace / job_id" but not like job.fn(some_user_path). User paths must be accessed through os.path.join, which includes the project workspace (which is configurable). Regardless, we are able to safely handle the most important cases (the job workspace and the job state point filename) that are called O(N) times.

Benchmarks

Below I have benchmarks of the following script, which just iterates over the project without accessing any job properties. There are 1000 jobs in the workspace.

import signac

project = signac.get_project()

def no_load_data():
    data = [job for job in project]

no_load_data()

Before

The call to os.path.join (displayed as posixpath.py:71(join)) takes a chunk of time, which is eliminated in the "After".
image

After

The bulk of the iteration is stat calls, which cannot be further optimized. 👍
image

Motivation and Context

Seeking some final obvious optimizations for synced collections (#484).

Types of Changes

  • Documentation update
  • Bug fix
  • New feature
  • Breaking change1

1The change breaks (or has the potential to break) existing functionality.

Checklist:

If necessary:

  • I have updated the API documentation as part of the package doc-strings.
  • I have created a separate pull request to update the framework documentation on signac-docs and linked it here.
  • I have updated the changelog and added all related issue and pull request numbers for future reference (if applicable). See example below.

@bdice bdice requested review from a team as code owners February 20, 2021 17:59
@bdice bdice requested review from kidrahahjo and tommy-waltmann and removed request for a team February 20, 2021 17:59
@bdice bdice self-assigned this Feb 20, 2021
@bdice bdice added the enhancement New feature or request label Feb 20, 2021
@bdice bdice added this to the 1.7.0 milestone Feb 20, 2021
@codecov
Copy link

codecov bot commented Feb 20, 2021

Codecov Report

Merging #515 (54a8b9a) into master (29c0749) will not change coverage.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #515   +/-   ##
=======================================
  Coverage   78.05%   78.05%           
=======================================
  Files          63       63           
  Lines        6986     6986           
  Branches     1310     1310           
=======================================
  Hits         5453     5453           
  Misses       1228     1228           
  Partials      305      305           
Impacted Files Coverage Δ
signac/contrib/job.py 89.96% <100.00%> (ø)
signac/contrib/project.py 85.08% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 29c0749...54a8b9a. Read the comment docs.

Copy link
Collaborator

@kidrahahjo kidrahahjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting find, the changes looks good!

@vyasr vyasr merged commit ac97b88 into master Feb 22, 2021
@vyasr vyasr deleted the feature/optimize-joins branch February 22, 2021 01:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants