Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Improvement Ideas #4

Open
alexevanczuk opened this issue May 10, 2022 · 4 comments
Open

Performance Improvement Ideas #4

alexevanczuk opened this issue May 10, 2022 · 4 comments
Labels
enhancement New feature or request

Comments

@alexevanczuk
Copy link
Contributor

It's important for code_ownership to be fast in various contexts. Here are different situations we want to be fast:

  1. When getting code ownership for a single class/file in production. This is important at Gusto since we tag requests and async jobs with code ownership, so we don't want to add a lot of overhead here.
  2. When running code ownership validations locally. We want this to be fast when run on all files OR on a subset of files in the commit hook

On very large codebases, the slowest part of this is file annotations – reading all files and checking for annotations can be slow. I wanted to share a laundry list of some ideas for improvement here:

  1. Switch to team-based annotations for these files and give the option to turn off file annotations.
  • Pro: This would be much faster
  • Con: A possibly worse UX for annotating file ownership
  1. Switch file annotations to run after checking for ownership via packages and team globs
  • Pros: This will often short circuit the more files that are owned in other ways
  • Cons: This changes public API (we can possibly defer this by allowing code_ownership.yml to specify an order that mappers run in. This wouldn't improve the speed of generating code owners since we ask all mappers to show ownership for all files.
  1. Have mappers "water fall down" the files they have already mapped. This would allow file annotations to not need to look at files we already know we have owners for.
  • Pros: This could speed a lot of different things up.
  • Cons: Increases complexity
  1. Have file annotations open and read files in parallel
  • Pros: File IO is a great parallelizable task, so it might be well suited
  • Cons: Adds a potentially complex dependency. All of codenwership is also already often run in parallel in the context of commit hooks.

Separately, we should consider a test/build step which measures code ownership speed on a large system. At Gusto we might include a post-build step which sends a metric to datadog about how long different commands take to run so we can track and observe changes over time.

@alexevanczuk alexevanczuk added the enhancement New feature or request label May 10, 2022
@schoblaska
Copy link
Contributor

@alexevanczuk Now that there's a Rust implementation for code_ownership (https://github.com/rubyatscale/codeowners-rs) do you think this issue is worth keeping open?

@alexevanczuk
Copy link
Contributor Author

Totally up to you! When I left, both the rust and ruby version were "fast enough" for local usage in pre-commit hooks and in production. I'd say there's only a need to keep investing in performance if it's a problem!

@technicalpickles
Copy link
Collaborator

I have been doing some digging in this again because our bin/codeownership validate is taking upwards of 40s for me locally 😅

The crux of the problem is time spent in Dir.glob. I used singed to get a flamegraph, and some 30s out of 40s were spent in Dir.glob. I asked around about Dir.glob performance, and got some pointers towards hungry expressions, ie **/*.{rb,rbi}, which will be inherently slow in large directories.

The code that does this is here:

@tracked_files ||= Dir.glob(configuration.owned_globs) - Dir.glob(configuration.unowned_globs)

In practice for our app, I would describe how we are using the globs as:

  • owned_globs to list only directories that can have ownable code, ie {app,spec}/**/*
  • owned_globs to select specific file types, ie *.{rb,js}
  • unowned_globs to exclude some subdirectories that will never have owned code, ie **/node_modules/**
  • unowned_globs to exclude specific files, ie **/{.eslintrc.js,.prettierrc.js,.eslintrc.js}

I experimented a big with File.find, ie https://docs.ruby-lang.org/en/3.3/Find.html and it looks very promising. It does involve changing from globs though, which would be an API change.

@alexevanczuk
Copy link
Contributor Author

Thanks for looking into this @technicalpickles !

A different implementation to find the files sounds great to me! It could be an opt-in (deprecating the old implementation) to avoid breaking the API right now.

Also big plug for @mzruya 's rust port:
https://github.com/mzruya/codeowners ... would probably solve the performance problem 😂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants