Batch invokes of linters (and other dynamically sized process invokes) #13462

ryanking · 2021-11-02T03:18:41Z

Pants does not limit the number of files passed to black and isort at the same time, which leads to errors like Error launching process: Os { code: 7, kind: Other, message: "Argument list too long" }.

Pants version
2.7.0

OS
MacOS

Additional info
We have about 4500 python files in our repo, so when I run a lint across the whole repo at once I get something like "Run Black on 4434 files.",.

The text was updated successfully, but these errors were encountered:

stuhood · 2021-11-02T04:19:20Z

Sorry for the trouble! We really ought to be generating batches in this case for performance reasons anyway: we only support either "run for all" or "run per file" (https://www.pantsbuild.org/docs/reference-lint#advanced-options), and the performance sweet spot (not to mention the size that would avoid issues like this) is likely to be batches of a size somewhere in between.

stuhood · 2022-01-05T21:43:24Z

Assuming that we do batching, it seems like there is also an opportunity to align with #9964, and to dynamically batch based on how much parallelism is available. But dynamically choosing batch sizes based on parallelism would require manipulating cache keys (because otherwise hosts with different numbers of cores would never get remote cache hits)... and that is a risky business, which also requires figuring out how to safely split the outputs of tools (or to require that batched processes opt-out of capturing outputs when they succeed, for example).

thejcannon · 2022-01-07T15:38:41Z

So it turns out there is a "hack" that can be used in the meantime.

Setting the relevant parallelism options for each tool turns out to be a "poor-mans-batch" is faster than not setting it (obviously) but also faster than the --per-file-caching.

E.g.
If I wanted to run fmt with isort and black and I have a 64 core machine. Running with --black-args='-W 64' --isort-args='--jobs=64' is always the fastest option.
Whereas with lint (which runs in parallel) you probably want to take N cores and divide by M checkers... --black-args='-W 16' --isort-args='--jobs=16' --flake8-args='--jobs=16' --pylint-args='--jobs=16'.

Depending on how long fixing the issue might take, it might be worth finding/writing clever little plugins to set the num jobs smartly based on CPU count

stuhood · 2022-01-07T20:20:12Z

@thejcannon : Thanks for the report: that's very useful information. I don't think that I realized that so many Python linters supported parallelization arguments, but it makes sense. Will post a summary soon.

thejcannon · 2022-01-07T20:42:46Z

FWIW as well, I'd expect those linters to have better parallelization in-tool than out-of-tool because they know what can/can't be shared across processes and have the ability to play clever tricks.

That being said out-of-tool parallelization still beats no parallelization 😉

stuhood · 2022-01-07T21:26:35Z

There are a few angles to this issue and to #9964:

In the long term, we want to lower the per-process overhead (from 1. sandbox creation, 2. process startup, 3. inability to reuse shared inputs) to the point where running per-file is clearly the right choice to maximize cache hit rates (by reducing/reusing sandboxes, and even reusing processes in sandboxes with nailgun).

But on the (long) road to zero per-process overhead, we should implement batching both for the purposes of:

correctness in the case of very large argument lists (when processes don't have support for @argfiles... i.e., the original problem of this ticket)
parallelism (when processes don't have native support for parallelizing themselves)

Because batching inherently affects the output of processes, it will require use of an explicit API. While it would be very interesting to be able to automatically batch all single-file processes, we cannot generically split outputs. It would also mean doing lots of redundant work in @rules, and potentially throwing away knowledge that files will be processed identically, only to rediscover it later.

AFAICT, there is no need for the rust code's cooperation in this, so I'm planning to implement this as a new @rule to ease iteration: probably something like:

@dataclass
class BatchProcess:
  # A process representing all inputs, but with a partial `args` list which will be filled in with (portions
  # of) the `xargs` list.
  process_template: Process
  # A list of arguments which will be partitioned as needed and appended to the `args`.
  xargs: tuple[str, ...]

Future implementations could support templating into other positions in the args list. Additionally, how to handle process outputs can be configurable, but will likely start out as per-process stdout/stderr concatenated, and output digests merged.

This will get us a small amount of additional parallelism in some cases, but as @thejcannon pointed out: the vast majority of the processes we invoke support their own parallellism, and so BatchProcess should not partition the xargs for the purposes of parallelism (by default): only to avoid OS arg length limits (EDIT: having written up #9964, there might still be some advantage to lowering the batch size here, since it allows better behavior when we're overcommitted. Note to self: use stable hash prefix partitioning).

To take advantage of that internal parallelism, we will likely also implement #9964: expect more there soon.

thejcannon · 2022-01-08T16:52:14Z

I like the idea! Some thoughts (which also apply to #9964 so I'll repeat there):

For clarification, I assume caching won't be any different here than with --no-per-file-caching, right? This is essentially still an "all-in-one" run, just now parallel under-the-hood.
You likely were already going to do this, but it likely makes sense to error if this and --per-file-caching are enabled.
It is worth some time to think about how this change affects the dynamic UI. Seeing Running pylint on XYZ files in parallel in one swimlane and having N other swimlanes idle is likely very confusing. I can't say I have a good idea for how to fix this (other than maybe prodash in Render dynamic UI with prodash #13481 ), but it warrants thought.

Additionally for this issue (and not #9964) it might also be prudent to warn if this is enabled for a tool that is supported via #9964, since that will almost certainly result in faster runs

thejcannon · 2022-01-08T17:10:20Z

(Also duping with #9964)

As a single datapoint, on my 64-core machine with formatters yapf and isort and linters pylint and flake8. I see that with a warm cache (all prereqs already handled via warmup run, then touching N files with a timestamp comment and re-running):

Running fmt on 5 files:
- --per-file-caching: 1.3 seconds
- --no-per-file-caching: 1.5 seconds
- --no-per-file-caching and running tools in parallel via --tool-args: .974 seconds
Running fmt on 100 files:
- --per-file-caching: 6.8 seconds
- --no-per-file-caching: 13.6 seconds
- --no-per-file-caching and running tools in parallel via --tool-args: 3.1 seconds
Running lint (with pylint + flake8 only) on 30 files:
- --per-file-caching: 10.9 seconds
- --no-per-file-caching: 32.3 seconds
- --no-per-file-caching and running tools in parallel via --tool-args: 13.8 seconds
Running lint (with pylint + flake8 only) on 250 files:
- --per-file-caching: 59.9 seconds
- --no-per-file-caching: 120.2 seconds
- --no-per-file-caching and running tools in parallel via --tool-args: 30.7 seconds

So I expect this issue to land less performant than bullet 3 in each case (which would be #9964), but not by much.

thejcannon · 2022-01-08T17:24:01Z

With enough similar metrics you might be able to deprecate ``per-file-caching` 🤔

To prepare to add batching for formatters as part of #13462, this change removes the need to implement per-backend `@rules` that pipeline `FmtRequest`s that apply to a particular language. Instead, targets are grouped by which `FmtRequest`s apply to them, and then those requests are run sequentially. There will be further changes to the formatting API in support of #13462, so this API is not final. [ci skip-rust] [ci skip-build-wheels]

As described in #13462, there are correctness concerns around not breaking large batches of files into smaller batches in `lint` and `fmt`. But there are other reasons to batch, including improving the performance of linters which don't support internal parallelism (by breaking them into multiple processes which _can_ be parallelized). This change adds a function to sequentially partition a list of items into stable batches, and then uses it to create batches by default in `lint` and `fmt`. Sequential partitioning was chosen rather than bucketing by hash, because it was easier to reason about in the presence of minimum and maximum bucket sizes. Additionally, this implementation is at the level of the `lint` and `fmt` goals themselves (rather than within individual `lint`/`fmt` `@rule` sets, as originally suggested [on the ticket](#13462 (comment))) because that reduces the effort of implementing a linter or formatter, and would likely ease doing further "automatic"/declarative partitioning in those goals (by `Field` values, for example). `./pants --no-pantsd --no-local-cache --no-remote-cache-read fmt lint ::` runs about ~4% faster than on main. Fixes #13462. [ci skip-build-wheels]

…ility. (#14210) As a followup to #14186, this change improves the stability (and thus cache hit rates) of batching by removing the minimum bucket size. It also fixes an issue in the tests, and expands the range that they test. As mentioned in the expanded comments: capping bucket sizes (in either the `min` or the `max` direction) can cause streaks of bucket changes: when a bucket hits a `min`/`max` threshold and ignores a boundary, it increases the chance that the next bucket will trip a threshold as well. Although it would be most-stable to remove the `max` threshold entirely, it is necessary to resolve the correctness issue of #13462. But we _can_ remove the `min` threshold, and so this change does that. [ci skip-rust] [ci skip-build-wheels]

ryanking added the bug label Nov 2, 2021

stuhood changed the title ~~pants sends to many files to black and isort~~ pants sends too many files to black and isort Nov 2, 2021

stuhood mentioned this issue Dec 7, 2021

scala: add scalafmt support for fmt and lint goals #13814

Merged

ryanking mentioned this issue Dec 14, 2021

pants fmt sometimes ignoring pyproject.toml? #13886

Closed

stuhood mentioned this issue Jan 4, 2022

Add duration and cache source to fmt/lint/check output. #14066

Open

stuhood added this to 2022 Performance Push Jan 5, 2022

stuhood added the estimate: <1W label Jan 5, 2022

stuhood added estimate: ~2W and removed estimate: <1W labels Jan 5, 2022

stuhood changed the title ~~pants sends too many files to black and isort~~ Batch invokes of linters Jan 5, 2022

stuhood changed the title ~~Batch invokes of linters~~ Batch invokes of linters (and other dynamically sized process invokes) Jan 5, 2022

stuhood self-assigned this Jan 5, 2022

stuhood mentioned this issue Jan 7, 2022

Per-process parallelism should be chosen dynamically, and possibly per platform/host #9964

Closed

stuhood mentioned this issue Jan 13, 2022

Remove per-language indirection for formatter plugins #14166

Merged

This was referenced Jan 18, 2022

Batching of lint and fmt invokes #14186

Merged

Should Python linters/checkers stream their partitioned results? #13380

Open

stuhood added the 2022-idea label Jan 19, 2022

stuhood closed this as completed in #14186 Jan 19, 2022

stuhood moved this to Done in 2022 Performance Push Jan 19, 2022

stuhood mentioned this issue Jan 19, 2022

[internal] Remove the minimum bucket size of batching to improve stability. #14210

Merged

stuhood added this to 2022 Priorities Aug 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch invokes of linters (and other dynamically sized process invokes) #13462

Batch invokes of linters (and other dynamically sized process invokes) #13462

ryanking commented Nov 2, 2021

stuhood commented Nov 2, 2021 •

edited

Loading

stuhood commented Jan 5, 2022 •

edited

Loading

thejcannon commented Jan 7, 2022 •

edited

Loading

stuhood commented Jan 7, 2022

thejcannon commented Jan 7, 2022

stuhood commented Jan 7, 2022 •

edited

Loading

thejcannon commented Jan 8, 2022

thejcannon commented Jan 8, 2022 •

edited

Loading

thejcannon commented Jan 8, 2022

Batch invokes of linters (and other dynamically sized process invokes) #13462

Batch invokes of linters (and other dynamically sized process invokes) #13462

Comments

ryanking commented Nov 2, 2021

stuhood commented Nov 2, 2021 • edited Loading

stuhood commented Jan 5, 2022 • edited Loading

thejcannon commented Jan 7, 2022 • edited Loading

stuhood commented Jan 7, 2022

thejcannon commented Jan 7, 2022

stuhood commented Jan 7, 2022 • edited Loading

thejcannon commented Jan 8, 2022

thejcannon commented Jan 8, 2022 • edited Loading

thejcannon commented Jan 8, 2022

stuhood commented Nov 2, 2021 •

edited

Loading

stuhood commented Jan 5, 2022 •

edited

Loading

thejcannon commented Jan 7, 2022 •

edited

Loading

stuhood commented Jan 7, 2022 •

edited

Loading

thejcannon commented Jan 8, 2022 •

edited

Loading