-
-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batching of lint
and fmt
invokes
#14186
Changes from 4 commits
71a874d
f737f93
8b219af
e28c35b
d14468e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,6 +18,7 @@ | |
from pants.engine.rules import Get, MultiGet, collect_rules, goal_rule, rule | ||
from pants.engine.target import SourcesField, Targets | ||
from pants.engine.unions import UnionMembership, union | ||
from pants.util.collections import partition_sequentially | ||
from pants.util.logging import LogLevel | ||
from pants.util.strutil import strip_v2_chroot_path | ||
|
||
|
@@ -135,6 +136,11 @@ def register_options(cls, register) -> None: | |
advanced=True, | ||
type=bool, | ||
default=False, | ||
removal_version="2.11.0.dev0", | ||
removal_hint=( | ||
"Formatters are now broken into multiple batches by default using the " | ||
"`--batch-size` argument." | ||
stuhood marked this conversation as resolved.
Show resolved
Hide resolved
|
||
), | ||
help=( | ||
"Rather than formatting all files in a single batch, format each file as a " | ||
"separate process.\n\nWhy do this? You'll get many more cache hits. Why not do " | ||
|
@@ -145,11 +151,35 @@ def register_options(cls, register) -> None: | |
"faster than `--no-per-file-caching` for your use case." | ||
), | ||
) | ||
register( | ||
"--batch-size", | ||
advanced=True, | ||
type=int, | ||
default=128, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We technically shouldn't change this default after we release this. Thoughts if it's worth us trying to benchmark what the optimal number is? I imagine that's hard to arrive at, including depending on your machine's specs. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not just machine specs, but each tool will likely exhibit different characteristics affected by batch size 🤔 Could be fun to benchmark though 😉 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I did a bit of benchmarking to settle on
Yea, there are potentially a lot of dimensions here. But I think that from a complexity perspective, we're not going to want to expose per-tool batch size knobs without some quantitative justification (post landing). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK this one had me stumped until I tried it. My assumption was that the batches would fill all the threads and so in-tool parallelization would only result in over-allocating your resources. What I see though is that depending on the number of threads available, number of files in the target-list, and batch size, there are many points in time you're running fewer batches than available threads. Of course, (and I hope to show this via data) using the poor-man's in-tool parallelization is likely not ideal as it isn't dynamic and would result in over-allocation of resources in the "bursts" where there are more-than-thread-count of rules. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this need to be a power of two? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No. |
||
help=( | ||
"The target minimum number of files that will be included in each formatter batch.\n" | ||
"\n" | ||
"Formatter processes are batched for a few reasons:\n" | ||
"\n" | ||
"1. to avoid OS argument length limits (in processes which don't support argument " | ||
"files)\n" | ||
"2. to support more stable cache keys than would be possible if all files were " | ||
"operated on in a single batch.\n" | ||
"3. to allow for parallelism in formatter processes which don't have internal " | ||
"parallelism, or -- if they do support internal parallelism -- to improve scheduling " | ||
"behavior when multiple processes are competing for cores and so internal " | ||
"parallelism cannot be used perfectly.\n" | ||
), | ||
) | ||
|
||
@property | ||
def per_file_caching(self) -> bool: | ||
return cast(bool, self.options.per_file_caching) | ||
|
||
@property | ||
def batch_size(self) -> int: | ||
return cast(int, self.options.batch_size) | ||
|
||
|
||
class Fmt(Goal): | ||
subsystem_cls = FmtSubsystem | ||
|
@@ -187,9 +217,12 @@ async def fmt( | |
per_language_results = await MultiGet( | ||
Get( | ||
_LanguageFmtResults, | ||
_LanguageFmtRequest(fmt_requests, Targets(targets)), | ||
_LanguageFmtRequest(fmt_requests, Targets(target_batch)), | ||
) | ||
for fmt_requests, targets in targets_by_fmt_request_order.items() | ||
for target_batch in partition_sequentially( | ||
targets, key=lambda t: t.address.spec, size_min=fmt_subsystem.batch_size | ||
) | ||
) | ||
|
||
individual_results = list( | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Not specific to this PR)
I find it somewhat noteworthy that as Pants evolves it's way of doing things, little idiosyncrasies come out of the woodwork (like this one). In #14182 I noticed
GofmtRequest
didn't inherit fromLintRequest
.It seems to me potentially hazardous that code can "seemingly work" in one dimension or another, but then a change brings to light that it was incorrectly configured. I wonder when third-party plugins become more standard how we can be proactive about avoiding these possible idiosyncrasies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This particular change is because
mypy
can't check some usages ofClassVar
, unfortunately. This is an abstractClassVar
that it failed to check was implemented for this subclass ofFmtRequest
.That is because
@union
doesn't actually require that a type extend any particular interface, although @kaos has prototyped changing that: #12577. There is rough consensus that we should change unions, but not exactly how.