Skip to content
This repository has been archived by the owner on Mar 25, 2021. It is now read-only.

Run linting logic in spawned threads #2328

Closed
JoshuaKGoldberg opened this issue Mar 11, 2017 · 11 comments
Closed

Run linting logic in spawned threads #2328

JoshuaKGoldberg opened this issue Mar 11, 2017 · 11 comments

Comments

@JoshuaKGoldberg
Copy link
Contributor

JoshuaKGoldberg commented Mar 11, 2017

Followup to #389 & #1131. I'm taking a look at this for work because we have a few thousand files to lint. Running them in serial takes a long time. Our options are:

  1. Parallelize TSLint.MSBuild (users of other build systems wouldn't get the benefits, so probably not)
  2. Have TSLint itself run across multiple threads / cores.

Within option 2 I see there being two possibilities:

  1. Refactor the internals of Runner and Linter to be asynchronous
    • I think this would be a nice but making Linter in particular would be a breaking API change
    • Do you think you'd want to start to migrate to asynchronous file reading anyway?
  2. Create a new ProcessSplitter class between tslint-cli.ts and Runner
    • Don't use it unless --parallel is provided and > 1
    • Have it spawn processes equal to --parallel
      • It would do its own file globbing to get the full list of files to lint
      • Each process would get 1/<--parallel> of those files
    • After the processes are done it would collect and sort the results and report them as if they were synchronous

Thoughts?

@ajafff
Copy link
Contributor

ajafff commented Mar 11, 2017

Some random thoughts:

async processing:

  • does async I/O work with --type-check or do we get race conditions from that?
  • +1 to make I/O in Runner async
    • could use glob-stream for globbing to start linting as fast as possible (although that's not the bottleneck)
    • use a concurrent transform stream to read the file
    • use a write stream to do the linting
    • Runner should not throw away the ts.Program (created when --project is passed) when not type checking. We could then avoid the globbing and reuse the SourceFiles instead of reading the files again and creating new SourceFiles
  • there should also be an async version of Configuration and RuleLoader
  • making Linter async does not seem to be necessary?
    • the only I/O made here is used to apply fixes
    • unrelated: after applying fixes, use ts.updateSourceFile instead of this.getSourceFile

multiple threads:

  • each thread would need to set up typescript, require all rules, cache the paths to the rule, ...
    • don't know if that would slow things down
    • requires much more memory

result cache:

  • cache the linting result and only lint that file again if it changed in the meantime
  • eslint has an option for this, too

@andy-hanson
Copy link
Contributor

andy-hanson commented Mar 11, 2017

Should probably do #2282 first.
Also, #2235 would address the problem of reading files multiple times.

@JoshuaKGoldberg
Copy link
Contributor Author

Glad the reaction has been favorable so far. :) I'll start work on some of the low hanging fruit.

Re the downsides of multiple threads: agreed, and this should definitely not be used for "normal" (most) projects. The scenario I'm approaching this for is overpowered 16-core monstrosities (dev machines, build lab machines, obscene Azure VMs, and the like) made to deal with huge code bases. They're fantastic at MSBuild/gcc/etc. but amusingly running TSLint tasks slows them down something terrible. The argument could (and should) be made that projects big enough to slow down TSLint should be split up into sub-projects, and those sub-projects' TSLint tasks should be parallelized... but that's not always possible.

@beenotung
Copy link

maybe merge-able with #943

@beenotung
Copy link

beenotung commented Oct 12, 2018

parallel linting is not supported so far, my workaround is

find src -name '*.ts' | parallel node_modules/.bin/tslint -p . --fix

sometimes also use git status to skip some files.

@timocov
Copy link
Contributor

timocov commented Jan 12, 2019

Hi there,

I have one suggestion for this: what if we'll move all rules, which doesn't require a type checking into separate processes and split all files between them, but all rules, which require a type checking we'll leave in the main thread (to avoid compilation of the project multiple times)? 🤔

I just played with that and have some results.

  • Linted files count - 1894
  • Enabled rules requires type checking - 9
  • Enabled rules doesn't require type checking - 123
  • CPU count - 4

I've created 3 separate processes (os.cpus().length - 1) to linting by non-typed rules and leave linting by typed rules in the main thread. So, each separate process lints ~630 files by 123 rules, the main thread lints all 1894 files by 9 rules.

In my runs each process works ~70-85 seconds, the main thread lints all files ~70 seconds. Time to create ts.Program is 15-16 seconds. So, the total time of linting the whole project is ~100-105 seconds. tslint from npm does it in ~170 seconds (parallel version is ~40% faster).

To summarize the results:

  • Parallel linter takes ~100-105 seconds
  • Single-thread linter (from npm) takes ~170 seconds

So, the parallel version is ~40% faster in my case.

What do you think about that?

For me (I'm not familiar with tslint's code a lot) there is still some open questions:

  1. We can't send/receive the code/nodejs state between processes, so we can't transport the whole LintResult object to the main thread. In my case I just use errorCount and output from that object - it seems that it's enough to print the error(s).
  2. (derives from 1) We can't apply fixes in this case (or we need to transfer serialized state of the fix to the main process and then apply them).
  3. (derives from 1) For now a sorting of the output is lost, but I'm not sure that this is a big problem.

/cc @JoshuaKGoldberg @ajafff

@JoshuaKGoldberg
Copy link
Contributor Author

This looks like a great start @timocov! Super exciting that you got this to work 👏

  1. We can't send/receive the code/nodejs state between processes

Why not? Can you elaborate? https://nodejs.org/api/child_process.html#child_process_options_stdio implies there are some built-in options there.

Applying fixes is done by a formatter AFAICS. For both 2/fixes and 3/sorting, from my perspective it should be possible to run something like:

  1. Set up linting options, etc. (same as usual)
  2. Spawn off to different threads as you've specified, collecting all their output from the full JSON formatter
  3. Combine output from the threads into one big output
  4. Deal with that output as if it came from a single thread (same as usual)

Thoughts?

@timocov
Copy link
Contributor

timocov commented Jan 19, 2019

Why not? Can you elaborate? https://nodejs.org/api/child_process.html#child_process_options_stdio implies there are some built-in options there.

I meant that we can't just send the whole instance of some class from one process to another process without implementing (de)serialization code. Or we can? If so, I can't realize how nodejs does it 🙂
Of course, we can send serialized state/JSON between processes to share result (and this is what I did in my the first implementation).

If you wish I can prepare the code (actually you already can see rough solution here, but I need to make it more readable and remove duplications) of the first implementation and we can discuss on it.

@JoshuaKGoldberg
Copy link
Contributor Author

we can send serialized state/JSON between processes to share result (and this is what I did in my the first implementation)

Awesome, that's exactly what we'd need. +1 to preparing the code!

@timocov timocov mentioned this issue Jan 27, 2019
4 tasks
@timocov
Copy link
Contributor

timocov commented Jan 27, 2019

@JoshuaKGoldberg @adidahiya I just created #4483 with a little bit prepared code of parallel running the linter.

As I said before - I'm not familiar with tslint's code base a lot and can made an epic fails in the PR - keep this in mind 🙂.

@JoshuaKGoldberg
Copy link
Contributor Author

☠️ TSLint's time has come! ☠️

TSLint is no longer accepting most feature requests per #4534. See typescript-eslint.io for the new, shiny way to lint your TypeScript code with ESLint. ✨

It was a pleasure open sourcing with you all!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants