Turn on parallelism by default, ensure Commands always run serially at the end #3265

lihaoyi · 2024-07-17T05:24:51Z

Remove the hardcoded setting of threadCount = Some(1)
Split Tasks and Commands into two separate batches, to only run Commands serially after all Tasks have run
Introduce a MILL_TEST_DEST_FOLDER environment variable in our standard TestModule. This is something we needed to do to avoid collisions between parallel test modules in Mill's own build (e.g. everyone stomping over target/worksources) and will be something that Mill users will need as well

Commands are typically the less well-behaved tasks, whereas Targets and others are meant to be pure functional and thread safe without side effects. So running commands serially at the end is a decent tradeoff the do the right thing by default, and if someone wants parallelism they can always use a task (e.g. .testCached instead of .test)

Commands called from other tasks, e.g. those tested by example/misc/5-module-run-task, are run in parallel together with the main group. We assume that these commands are well behaved, as their results will potentially be cached (as part of downstream tasks), and so I run them as part of the primary task graph.

(An alternative could have been to run these sequentially as well, but given the nature of commands-run-as-part-of-tasks I think that would be unnecessarily conservative)

I disable the thread-number log prefixes in the integration tests, since all they would do there is add clutter.

Pull request: #3265

lefou · 2024-07-17T08:02:24Z

Since it's not forbidden, it's possible to depend on commands from other targets, even if this is not recommended. When we now delay command execution to happen serially at the end, it could be problematic. We should have at least one test with a target depending on a command and assert correct behavior.

lihaoyi · 2024-07-17T08:10:33Z

We have one such test in example/misc/5-module-run-task, and it's failing as expected. Will need to adjust the code to make it passable

lihaoyi · 2024-07-18T03:00:53Z

All tests are green, should be ready for a review. This can go into a binary compatible 0.12.x

lefou · 2024-07-18T08:39:00Z

main/eval/src/mill/eval/EvaluatorCore.scala

+    // given but run the commands in linear order
+    evaluateTerminals(
+      tasks,
+      if (sys.env.contains("MILL_TEST_SUITE")) _ => ""


Can you add a comment with some explanation here?

I guess, time is ripe to have a table with all used environment variables in our readme.

Done. This is an internal environment variable, so users should not need to know about it

I had other developers (myself included) in mind. I thought, our readme is targeting devlopers.

Besides, nothing we need to do in this PR, just some loud thinking.

lefou · 2024-07-18T08:51:31Z

main/eval/src/mill/eval/EvaluatorCore.scala

+    val (_, tasksTransitive0) = Plan.plan(Agg.from(tasks0.map(_.task)))
+
+    val tasksTransitive = tasksTransitive0.toSet
+    val (tasks, commands) = terminals0.partition {


Can we either add some comment or rename commands to make it more obvious, it's only those commands that don't have downstream dependencies?

I renamed it leafCommands. Ideally I would want terminalCommands, but Terminal already has another meaning in this code

So, IIUC, we don't evaluate all command serially, only those, that don't have downstream dependencies. That's a bit odd, as I was hoping, we could the user some guarantee, that commands are always executed serially.

Now, that I think about it, we probably only want commands, that are interactive to run serially. But we currently can't determite, if a tasks wants to be interactive.

Not just interactive commands, there's probably lots of commands to do side effects, tests that are not well behaved (e.g. writing to shared paths on disk), and commands that print stuff (and probably don't want the [#threadid] prefix). Those are all solvable, but I think "only commands at the end of the evaluation graph run serially" is a reasonable compromise that gets us most of the way there

Understood.

I don't know how often targets depend on commands, but I could imagine, it would be fine to just run those downstream dependencies serially too. So we could make this guarantee for all commands without implementing multiple parallel/serial blocks.

lefou · 2024-07-18T09:21:36Z

As a consequence of this PR, users will no longer be able to run command-based tests in parallel. I guess, we should promote our testCached more and/or review the whole usability of our test target setup in the next breaking window.

lihaoyi · 2024-07-18T09:33:21Z

Yeah we can add testCached to the docs, so people know it's there

lihaoyi added 2 commits July 17, 2024 13:20

.

8cdff60

.

cc29c49

lihaoyi changed the title ~~Turn on parallelism by default, ensure Commands always run serially~~ Turn on parallelism by default, ensure Commands always run serially at the end Jul 17, 2024

lihaoyi added 4 commits July 17, 2024 13:26

.

06da986

.

00b9dea

.

d54e932

.

13907e4

lihaoyi added 5 commits July 17, 2024 16:17

TEST_DEST_FOLDER

27ab50a

TEST_DEST_FOLDER

4c3244b

.

21c85d7

.

b4506cc

.

728ceff

lihaoyi requested review from lefou and lolgab July 18, 2024 03:00

lihaoyi added this to the 0.12.0 milestone Jul 18, 2024

lefou reviewed Jul 18, 2024

View reviewed changes

.

02b2086

lihaoyi added 3 commits August 5, 2024 20:51

Merge branch 'main' into par-default

7dbc991

Merge branch 'main' into par-default

6eda4a7

.

f619ea5

lihaoyi merged commit 044b2a4 into com-lihaoyi:main Aug 8, 2024
29 of 30 checks passed

lefou mentioned this pull request Aug 9, 2024

mill.scalalib.Dependency/showUpdates seems much slower due to forced non-parallel execution #3359

Closed

lihaoyi mentioned this pull request Sep 23, 2024

Mill 0.12.0-RC2-17-07e173 is incorrectly counting/reporting the tasks #3598

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Turn on parallelism by default, ensure Commands always run serially at the end #3265

Turn on parallelism by default, ensure Commands always run serially at the end #3265

lihaoyi commented Jul 17, 2024 •

edited by lefou

Loading

lefou commented Jul 17, 2024

lihaoyi commented Jul 17, 2024

lihaoyi commented Jul 18, 2024

lefou Jul 18, 2024

lihaoyi Jul 18, 2024

lefou Jul 18, 2024 •

edited

Loading

lefou Jul 18, 2024

lihaoyi Jul 18, 2024

lefou Jul 18, 2024

lihaoyi Jul 18, 2024

lefou Jul 18, 2024 •

edited

Loading

lefou commented Jul 18, 2024

lihaoyi commented Jul 18, 2024

Turn on parallelism by default, ensure Commands always run serially at the end #3265

Turn on parallelism by default, ensure Commands always run serially at the end #3265

Conversation

lihaoyi commented Jul 17, 2024 • edited by lefou Loading

lefou commented Jul 17, 2024

lihaoyi commented Jul 17, 2024

lihaoyi commented Jul 18, 2024

lefou Jul 18, 2024

Choose a reason for hiding this comment

lihaoyi Jul 18, 2024

Choose a reason for hiding this comment

lefou Jul 18, 2024 • edited Loading

Choose a reason for hiding this comment

lefou Jul 18, 2024

Choose a reason for hiding this comment

lihaoyi Jul 18, 2024

Choose a reason for hiding this comment

lefou Jul 18, 2024

Choose a reason for hiding this comment

lihaoyi Jul 18, 2024

Choose a reason for hiding this comment

lefou Jul 18, 2024 • edited Loading

Choose a reason for hiding this comment

lefou commented Jul 18, 2024

lihaoyi commented Jul 18, 2024

lihaoyi commented Jul 17, 2024 •

edited by lefou

Loading

lefou Jul 18, 2024 •

edited

Loading

lefou Jul 18, 2024 •

edited

Loading