Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batching of lint and fmt invokes #14186

Merged
merged 5 commits into from
Jan 19, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 7 additions & 16 deletions src/python/pants/core/goals/fmt.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from dataclasses import dataclass
from typing import TypeVar, cast

from pants.core.goals.style_request import StyleRequest
from pants.core.goals.style_request import StyleRequest, style_batch_size_help
from pants.core.util_rules.source_files import SourceFiles, SourceFilesRequest
from pants.engine.console import Console
from pants.engine.engine_aware import EngineAwareReturnType
Expand Down Expand Up @@ -139,7 +139,11 @@ def register_options(cls, register) -> None:
removal_version="2.11.0.dev0",
removal_hint=(
"Formatters are now broken into multiple batches by default using the "
"`--batch-size` argument."
"`--batch-size` argument.\n"
"\n"
"To keep (roughly) this option's behavior, set [fmt].batch_size = 1. However, "
"you'll likely get better performance by using a larger batch size because of "
"reduced overhead launching processes."
),
help=(
"Rather than formatting all files in a single batch, format each file as a "
Expand All @@ -156,20 +160,7 @@ def register_options(cls, register) -> None:
advanced=True,
type=int,
default=128,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We technically shouldn't change this default after we release this. Thoughts if it's worth us trying to benchmark what the optimal number is? I imagine that's hard to arrive at, including depending on your machine's specs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not just machine specs, but each tool will likely exhibit different characteristics affected by batch size 🤔

Could be fun to benchmark though 😉

Copy link
Member Author

@stuhood stuhood Jan 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a bit of benchmarking to settle on 128 here on an integration branch with #9964 included: 128 was best by ~1%. Additional benchmarking and adjustment after both this and #9964 have landed will be good, since they interplay strongly with one another: I'll include more numbers over there.

Not just machine specs, but each tool will likely exhibit different characteristics affected by batch size 🤔

Yea, there are potentially a lot of dimensions here. But I think that from a complexity perspective, we're not going to want to expose per-tool batch size knobs without some quantitative justification (post landing).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK this one had me stumped until I tried it. My assumption was that the batches would fill all the threads and so in-tool parallelization would only result in over-allocating your resources. What I see though is that depending on the number of threads available, number of files in the target-list, and batch size, there are many points in time you're running fewer batches than available threads.

Of course, (and I hope to show this via data) using the poor-man's in-tool parallelization is likely not ideal as it isn't dynamic and would result in over-allocation of resources in the "bursts" where there are more-than-thread-count of rules.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be a power of two?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No.

help=(
"The target minimum number of files that will be included in each formatter batch.\n"
"\n"
"Formatter processes are batched for a few reasons:\n"
"\n"
"1. to avoid OS argument length limits (in processes which don't support argument "
"files)\n"
"2. to support more stable cache keys than would be possible if all files were "
"operated on in a single batch.\n"
"3. to allow for parallelism in formatter processes which don't have internal "
"parallelism, or -- if they do support internal parallelism -- to improve scheduling "
"behavior when multiple processes are competing for cores and so internal "
"parallelism cannot be used perfectly.\n"
),
help=style_batch_size_help(uppercase="Formatter", lowercase="formatter"),
)

@property
Expand Down
29 changes: 10 additions & 19 deletions src/python/pants/core/goals/lint.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from dataclasses import dataclass
from typing import Any, Iterable, cast

from pants.core.goals.style_request import StyleRequest, write_reports
from pants.core.goals.style_request import StyleRequest, style_batch_size_help, write_reports
from pants.core.util_rules.distdir import DistDir
from pants.engine.console import Console
from pants.engine.engine_aware import EngineAwareReturnType
Expand Down Expand Up @@ -157,7 +157,11 @@ def register_options(cls, register) -> None:
removal_version="2.11.0.dev0",
removal_hint=(
"Linters are now broken into multiple batches by default using the "
"`--batch-size` argument."
"`--batch-size` argument.\n"
"\n"
"To keep (roughly) this option's behavior, set [lint].batch_size = 1. However, "
"you'll likely get better performance by using a larger batch size because of "
"reduced overhead launching processes."
),
help=(
"Rather than linting all files in a single batch, lint each file as a "
Expand All @@ -174,20 +178,7 @@ def register_options(cls, register) -> None:
advanced=True,
type=int,
default=128,
help=(
"The target minimum number of files that will be included in each linter batch.\n"
"\n"
"Linter processes are batched for a few reasons:\n"
"\n"
"1. to avoid OS argument length limits (in processes which don't support argument "
"files)\n"
"2. to support more stable cache keys than would be possible if all files were "
"operated on in a single batch.\n"
"3. to allow for parallelism in linter processes which don't have internal "
"parallelism, or -- if they do support internal parallelism -- to improve scheduling "
"behavior when multiple processes are competing for cores and so internal "
"parallelism cannot be used perfectly.\n"
),
help=style_batch_size_help(uppercase="Linter", lowercase="linter"),
)

@property
Expand Down Expand Up @@ -235,10 +226,10 @@ def address_str(fs: FieldSet) -> str:
return fs.address.spec

all_batch_results = await MultiGet(
Get(LintResults, LintRequest, request.__class__(field_sets))
Get(LintResults, LintRequest, request.__class__(field_set_batch))
for request in requests
if request.field_sets
for field_sets in partition_sequentially(
for field_set_batch in partition_sequentially(
request.field_sets, key=address_str, size_min=lint_subsystem.batch_size
)
)
Expand All @@ -262,7 +253,7 @@ def key_fn(results: LintResults):
sorted_all_batch_results, key=key_fn
)
),
key=lambda results: results.linter_name,
key=key_fn,
)
)

Expand Down
17 changes: 17 additions & 0 deletions src/python/pants/core/goals/style_request.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,23 @@
_FS = TypeVar("_FS", bound=FieldSet)


def style_batch_size_help(uppercase: str, lowercase: str) -> str:
return (
f"The target minimum number of files that will be included in each {lowercase} batch.\n"
"\n"
f"{uppercase} processes are batched for a few reasons:\n"
"\n"
"1. to avoid OS argument length limits (in processes which don't support argument "
"files)\n"
"2. to support more stable cache keys than would be possible if all files were "
"operated on in a single batch.\n"
f"3. to allow for parallelism in {lowercase} processes which don't have internal "
"parallelism, or -- if they do support internal parallelism -- to improve scheduling "
"behavior when multiple processes are competing for cores and so internal "
"parallelism cannot be used perfectly.\n"
)


@frozen_after_init
@dataclass(unsafe_hash=True)
class StyleRequest(Generic[_FS], EngineAwareParameter, metaclass=ABCMeta):
Expand Down
14 changes: 11 additions & 3 deletions src/python/pants/util/collections.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
import collections
import collections.abc
import math
from typing import Any, Callable, Iterable, Iterator, MutableMapping, Sequence, TypeVar
from typing import Any, Callable, Iterable, Iterator, MutableMapping, TypeVar

from pants.engine.internals import native_engine

Expand Down Expand Up @@ -77,7 +77,7 @@ def ensure_str_list(val: str | Iterable[str], *, allow_single_str: bool = False)


def partition_sequentially(
items: Sequence[_T],
items: Iterable[_T],
*,
key: Callable[[_T], str],
size_min: int,
Expand All @@ -95,7 +95,15 @@ def partition_sequentially(
# To stably partition the arguments into ranges of at least `size_min`, we sort them, and
# create a new batch sequentially once we have the minimum number of entries, _and_ we encounter
# an item hash prefixed with a threshold of zeros.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm definitely missing what the leading zeros check is about. You explain trying to accommodate adding items disturbing batches minimally, but I don't understand the mechanism of how this helps. Is it tied to characteristics of the Rust hash function used? Maybe a unit test of this function that shows how adding an item to an N bucket chain only results in ~1 bucket changing contents?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to unit tests of this function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can sort of see what's going on here, but some comments explaining it would be really helpful. Especially justifying the selection of zero_prefix_threshold .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expanded the comment and added a test.

zero_prefix_threshold = math.log(size_min // 8, 2)
#
# The hashes act like a (deterministic) series of rolls of an evenly distributed die. The
# probability of a hash prefixed with Z zero bits is 1/2^Z, and so to break after N items on
# average, we look for `Z == log2(N)` zero bits.
#
# Breaking on these deterministic boundaries means that adding any single item will affect
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Ok, but this assumes ... the same CLI specs with files edits / adds in between? In other words this whole scheme does ~nothing for ./pants lint bob:: followed by ./pants lint :: - those will likely have completely different buckets - no hits at all. If that's right and all this fanciness is to support repeatedly running the same Pants command - that's great but seems useful to call out somewhere. I'm not sure where. Here is not the place since this would be about the larger picture and the call after call inputs to this function from a higher layer. If I've got this wrong and this always works - that's awesome. I'm still mystified though how that works. If I've got it right, it would be great to get some comment that does this spelling out that we're optimizing a particular use pattern.

Copy link
Member Author

@stuhood stuhood Jan 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Ok, but this assumes ... the same CLI specs with files edits / adds in between? In other words this whole scheme does ~nothing for ./pants lint bob:: followed by ./pants lint :: - those will likely have completely different buckets - no hits at all.

This is mostly about optimizing the re-running of a single command over time, yea. But because the inputs are sorted before hashing, bob:: should fall entirely into sequential buckets within the entire set of ::, and some of those might be able to hit in the larger set.

Unfortunately, it's hard to make promises about that in the presence of the min/max thresholds. If we were breaking purely along bucket boundaries (without additionally setting min/max sizes), we could make better guarantees (because the bucket boundaries within bob:: would be the same when the rest of :: was added).

Thanks for raising this point though... it does actually seem like a desirable enough use case to try and strengthen this further.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if strengthening this further, or even using the current trickery, belongs in this PR. IIUC batching is purely a win over the status quo today, even if the batches are simply formed from fixed size buckets over the sorted input and nothing more. If that's true, then this PR is only muddied by the fiddly bits here and maybe adding the fiddly bits en-masse could be done as a follow up that gives magic speed ups for free.

Copy link
Member Author

@stuhood stuhood Jan 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I think that doing it here might be useful is that it would adjust the meaning of the --batch-size flag (from "minimum size" to "target size")... it should be a quick change I think. It is actually desirable for smaller batches to improve cache hit rates (which is why the per-file-caching flag existed in the first place).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Eric-Arellano brought up a good point above about the fact that changing the batch size or implementation could have negative repercussions (due to the cache being invalidated). However as @jsirois points out any batching is a win.

IMO the solution with #9964 is the "baseline" and any improvement over that is gravy. If Pants finds better solutions/numbers over time I'm OK with a one-time cache bust which result in speedier runs going forward.

Because of the amount of variance here with tools/machine/repo I think the best approach is probably a data-driven one. You might try building a performance test suite to test various numbers/algorithms and asking the community (and/or Toolchain's customers) to run it overnight (or during a dead time) on a machine or two. I know I'd happily volunteer some CPU cycles to help find the optimal config.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I think that doing it here might be useful is that it would adjust the meaning of the --batch-size flag (from "minimum size" to "target size")

Imo it's fine to change the behavior up till 2.10.0rc0. This seems worth landing as-is

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...it was not, in fact, a "quick change", because removing the min threshold and attempting to use only the hash to choose boundaries requires increasing the threshold to the point where our sweet spot of bucket size (~128) is too likely to never encounter a good boundary.

Since the code as-is already snaps to boundaries, I feel fairly confident that we'll already get some matches in the bob:: vs :: case, but can defer looking at that to a followup.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't help myself. #14210.

# either one bucket (if the item does not create a boundary) or two (if it does create a
# boundary).
zero_prefix_threshold = math.log(max(4, size_min) // 4, 2)
size_max = size_min * 2 if size_max is None else size_max

batch: list[_T] = []
Expand Down
21 changes: 21 additions & 0 deletions src/python/pants/util/collections_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
assert_single_element,
ensure_list,
ensure_str_list,
partition_sequentially,
recursively_update,
)

Expand Down Expand Up @@ -85,3 +86,23 @@ def test_ensure_str_list() -> None:
ensure_str_list(0) # type: ignore[arg-type]
with pytest.raises(ValueError):
ensure_str_list([0, 1]) # type: ignore[list-item]


@pytest.mark.parametrize("size_min", [0, 1, 16, 32, 64, 128])
def test_partition_sequentially(size_min: int) -> None:
# Adding an item at any position in the input sequence should affect either 1 or 2 (if the added
# item becomes a boundary) buckets in the output.

def partitioned_buckets(items: list[str]) -> set[tuple[str, ...]]:
return set(tuple(p) for p in partition_sequentially(items, key=str, size_min=size_min))

# We start with base items containing every other element from a sorted sequence.
all_items = sorted((f"item{i}" for i in range(0, 64)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not burn a tree for this, but FYI you already had a generator expression and the extra () is superfluous:

Suggested change
all_items = sorted((f"item{i}" for i in range(0, 64)))
all_items = sorted(f"item{i}" for i in range(0, 64))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Eric-Arellano I think there is a flake8 plugin that detects those... maybe we should add it to this repo.
https://pypi.org/project/flake8-comprehensions/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'd be great!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. will work on this. PRs coming soon.

base_items = [item for i, item in enumerate(all_items) if i % 2 == 0]
base_partitions = partitioned_buckets(base_items)

# Then test that adding any of the remaining items elements (which will be interspersed in the
# base items) only affects 1 or 2 buckets in the output.
for to_add in [item for i, item in enumerate(all_items) if i % 2 == 1]:
updated_partitions = partitioned_buckets([to_add, *base_items])
assert 1 <= len(base_partitions ^ updated_partitions) <= 2