-
-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Per-process parallelism should be chosen dynamically, and possibly per platform/host #9964
Comments
See #9253 for the motivation. We didn't land the change then because we had to account for both v1 and v2 usages, and we didn't want to optimize for v2 at the harm of v1. Now we only have v2 to worry about. This partially closes #9964, although in a much more naive way. [ci skip-rust] [ci skip-build-wheels]
While this was naively fixed in #10584, I'm reopening because this wasn't fixed as described above. |
Addressed by #11006. |
Reopened because I still think that we can fully automate this. |
This also applies to pylint: adjusted the title. |
This has become more relevant recently since there are now a variety of PEX shapes, and not all invokes are likely to run concurrently. In particular, when using a lockfile/constraints.txt, the run to build the constraints PEX will generally run alone (and take a while to run), while lots of dependent PEX builds will run in parallel (and run quickly). This makes it very challenging to set the |
As mentioned on #13462, we should implement this ticket in order to take better advantage of internal parallelism of processes. We should likely add the following fields to class Process:
..
# If non-zero, the amount of parallelism that this process is capable of given its inputs. This value
# does not directly set the number of cores allocated to the process: that is computed based on
# availability, and provided as a template value in the arguments of the process.
maximum_parallelism: int = 0 This will require changes to the The initial strategy I'm thinking of is to overcommit by default: when a process enters the semaphore, it will record its own concurrency as Overcommitting has some downsides: so, optionally in a followup (unsure if worth the complexity / or whether it would actually give a speedup): if the computed parallelism for a given process ends up not matching its maximum, a task can check while the process runs (maybe only once, a short fixed period after it has started: e.g. 250ms or so) to see if it can cancel and re-start it with a better parallelism value. |
I like the idea! Some thoughts (which also apply to #13462 so it's repeated there):
|
(Also duping with #13462) As a single datapoint, on my 64-core machine with formatters
|
With enough similar metrics you might be able to deprecate ``per-file-caching` 🤔 |
…14184) When tools support internal concurrency and cannot be partitioned (either because they don't support it, such as in the case of a PEX resolve, or because of overhead to partitioning as fine-grained as desired), Pants' own concurrency currently makes it ~impossible for them to set their concurrency settings correctly. As sketched in #9964, this change adjusts Pants' local runner to dynamically choose concurrency values per process based on the current concurrency. 1. When acquiring a slot on the `bounded::CommandRunner`, a process takes as much concurrency as it a) is capable of, as configured by a new `Process.concurrency_available` field, b) deserves for the purposes of a fairness (i.e. half, for two processes). This results in some amount of over-commit. 2. Periodically, a balancing task runs and preempts/re-schedules processes which have been running for less than a very short threshold (`200ms` currently) and which are the largest contributors to over/under-commit. This fixes some over/under-commit, but not all of it, because if a process becomes over/under-committed after it has been running a while (because other processes started or finished), we will not preempt it. Combined with #14186, this change results in an additional 2% speedup for `lint` and `fmt`. But it should also have a positive impact on PEX processes, which were the original motivation for #9964. Fixes #9964. [ci skip-build-wheels]
As described on #9253:
This applies to more than just CI though: in any context, either on a laptop, in CI, or in the context of remote execution, this type of value is challenging to set correctly.
@jsirois is working on splitting out resolve from install in PEX, and that might help mitigate the particular case of PEX, but this will continue to apply generally to parallel tools.
In the context of remote execution we can probably assume that workers are opaque, and safely configure a value on a "per-host"/destination/toolchain/platform basis (in a Multi-Platform Speculation sense).
But locally, we should likely choose this kind of value dynamically based on concurrency. As explained on #9253, hardcoding this value too low will hurt you when you're waiting for only one resolve to complete. A sketch of native support for this would maybe look like adding support for a
_PANTS_CONCURRENCY
environment variable that our executors would set dynamically (post cache-key calculation in the case of a local environment) based on how many other processes were running.The text was updated successfully, but these errors were encountered: