Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: find Python files in Rust #591

Merged
merged 3 commits into from
Mar 24, 2024

Conversation

mkniewallner
Copy link
Collaborator

@mkniewallner mkniewallner commented Mar 14, 2024

PR Checklist

  • A description of the changes is added to the description of this PR.
  • If there is a related issue, make sure it is linked to this PR.
  • If you've fixed a bug or added code that should be tested, add tests!
  • Documentation in docs is updated

Description of changes

Leverage Rust to find Python files by using https://crates.io/crates/ignore. The crate provides similar options to https://pypi.org/project/pathspec/ in order to handle .gitignore while also handling files walking. For the exclusion, the regex crate that we already depend on is used, to match the behaviour we have with the Python implementation. At some point, although this would be a breaking change, I would really like to move to globsets instead of regexes.

In terms of performance, testing the changes on a repository with ~7k files shows a 27% speedup, dropping from 1.43s to 1.04s.

@mkniewallner mkniewallner force-pushed the feat/python-file-finder-in-rust branch 2 times, most recently from d7909a0 to fe5cdb3 Compare March 14, 2024 22:15
Copy link

codecov bot commented Mar 14, 2024

Codecov Report

Attention: Patch coverage is 50.00000% with 5 lines in your changes are missing coverage. Please review.

Project coverage is 90.0%. Comparing base (4f697a1) to head (8fd8a07).

❗ Current head 8fd8a07 differs from pull request most recent head 920fa9c. Consider uploading reports for the commit 920fa9c to get more accurate results

Files Patch % Lines
python/deptry/core.py 28.5% 5 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##            main    #591     +/-   ##
=======================================
- Coverage   91.3%   90.0%   -1.4%     
=======================================
  Files         34      33      -1     
  Lines        996     951     -45     
  Branches     202     191     -11     
=======================================
- Hits         910     856     -54     
- Misses        69      78      +9     
  Partials      17      17             

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mkniewallner mkniewallner force-pushed the feat/python-file-finder-in-rust branch 4 times, most recently from 28492a6 to e87980a Compare March 18, 2024 23:18
@fpgmaas fpgmaas added this to the 0.15 milestone Mar 22, 2024
@fpgmaas fpgmaas mentioned this pull request Mar 22, 2024
4 tasks
@mkniewallner mkniewallner force-pushed the feat/python-file-finder-in-rust branch 3 times, most recently from c49a62c to 3113e5b Compare March 23, 2024 22:27
@mkniewallner mkniewallner force-pushed the feat/python-file-finder-in-rust branch from 3113e5b to 8fd8a07 Compare March 24, 2024 00:57
@mkniewallner mkniewallner marked this pull request as ready for review March 24, 2024 01:06
@fpgmaas
Copy link
Owner

fpgmaas commented Mar 24, 2024

Looks good! I see that the verbose logging has become a bit more verbose:

Collecting Python files to scan...
built glob set; 0 literals, 0 basenames, 0 extensions, 0 prefixes, 0 suffixes, 2 required extensions, 0 regexes
glob converted to regex: Glob { glob: "**/*.py[cod]", re: "(?-u)^(?:/?|.*/)[^/]*\\.py[cod]$", opts: GlobOptions { case_insensitive: false, literal_separator: true, backslash_escape: true, empty_alternates: false }, tokens: Tokens([RecursivePrefix, ZeroOrMore, Literal('.'), Literal('p'), Literal('y'), Class { negated: false, ranges: [('c', 'c'), ('o', 'o'), ('d', 'd')] }]) }
glob converted to regex: Glob { glob: "**/.coverage.*", re: "(?-u)^(?:/?|.*/)\\.coverage\\.[^/]*$", opts: GlobOptions { case_insensitive: false, literal_separator: true, backslash_escape: true, empty_alternates: false }, tokens: Tokens([RecursivePrefix, Literal('.'), Literal('c'), Literal('o'), Literal('v'), Literal('e'), Literal('r'), Literal('a'), Literal('g'), Literal('e'), Literal('.'), ZeroOrMore]) }
built glob set; 5 literals, 63 basenames, 10 extensions, 0 prefixes, 0 suffixes, 2 required extensions, 2 regexes
whitelisting python/deptry/config.py: Whitelist(IgnoreMatch(Types(Glob(Matched { def: FileTypeDef { name: "python", globs: ["*.py"] } }))))
whitelisting python/deptry/dependency_getter/pdm.py: Whitelist(IgnoreMatch(Types(Glob(Matched { def: FileTypeDef { name: "python", globs: ["*.py"] } }))))
...
Python files to scan for imports:
python/deptry/config.py
python/deptry/dependency_getter/pdm.py
python/deptry/dependency_getter/pep_621.py

Do we want that level of detail exposed to the user in the verbose logging?

@fpgmaas fpgmaas self-requested a review March 24, 2024 08:38
@mkniewallner
Copy link
Collaborator Author

Looks good! I see that the verbose logging has become a bit more verbose:

Collecting Python files to scan...
built glob set; 0 literals, 0 basenames, 0 extensions, 0 prefixes, 0 suffixes, 2 required extensions, 0 regexes
glob converted to regex: Glob { glob: "**/*.py[cod]", re: "(?-u)^(?:/?|.*/)[^/]*\\.py[cod]$", opts: GlobOptions { case_insensitive: false, literal_separator: true, backslash_escape: true, empty_alternates: false }, tokens: Tokens([RecursivePrefix, ZeroOrMore, Literal('.'), Literal('p'), Literal('y'), Class { negated: false, ranges: [('c', 'c'), ('o', 'o'), ('d', 'd')] }]) }
glob converted to regex: Glob { glob: "**/.coverage.*", re: "(?-u)^(?:/?|.*/)\\.coverage\\.[^/]*$", opts: GlobOptions { case_insensitive: false, literal_separator: true, backslash_escape: true, empty_alternates: false }, tokens: Tokens([RecursivePrefix, Literal('.'), Literal('c'), Literal('o'), Literal('v'), Literal('e'), Literal('r'), Literal('a'), Literal('g'), Literal('e'), Literal('.'), ZeroOrMore]) }
built glob set; 5 literals, 63 basenames, 10 extensions, 0 prefixes, 0 suffixes, 2 required extensions, 2 regexes
whitelisting python/deptry/config.py: Whitelist(IgnoreMatch(Types(Glob(Matched { def: FileTypeDef { name: "python", globs: ["*.py"] } }))))
whitelisting python/deptry/dependency_getter/pdm.py: Whitelist(IgnoreMatch(Types(Glob(Matched { def: FileTypeDef { name: "python", globs: ["*.py"] } }))))
...
Python files to scan for imports:
python/deptry/config.py
python/deptry/dependency_getter/pdm.py
python/deptry/dependency_getter/pep_621.py

Do we want that level of detail exposed to the user in the verbose logging?

Unfortunately I don't think there's a way to control that, at least not easily. Setting debug log level mode in Python will also set the same level in Rust through pyo3, so both languages will output debug logs when enabling debug mode.

@fpgmaas fpgmaas merged commit b58868b into fpgmaas:main Mar 24, 2024
13 of 27 checks passed
@mkniewallner mkniewallner deleted the feat/python-file-finder-in-rust branch March 24, 2024 10:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants