Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow solving with build-tool-depends on local packages with tests #7472

Closed
michaelpj opened this issue Jul 2, 2021 · 16 comments · Fixed by #7490 or #7532
Closed

Slow solving with build-tool-depends on local packages with tests #7472

michaelpj opened this issue Jul 2, 2021 · 16 comments · Fixed by #7490 or #7532

Comments

@michaelpj
Copy link
Collaborator

Describe the bug
Suppose I have a local package B which provides a build tool, and which either has tests or depends on a package A that has tests. Finally, I have another local package C which depends on B's build tool.

Finally, I have tests: true in my cabal.project, since I naturally want to test my local packages.

The result of this is that the solver becomes slower, because (I hypothesize) it tries to create a build plan for A without tests to satisfy the build-tool dependency, then finds that it can't do that (due to the project constraint) and has to redo it.

That looks something like this (from cabal configure -v3):

[_13] trying: C:B:exe.A~>A-0.1.0.0 (dependency of C:B:exe.B)
[_14] rejecting: C:B:exe.A:!test (dependencies not linked: stanza "test" incompatible; conflict set: A, B, C:B:exe.A, C:B:exe.B, A:test, C:B:exe.A:test)

This is not a big deal in a small project. However, if we have a long chain of packages up to the build tool dependency like so:

A->B-> ... ->Z->Tool

all of which with tests, then the solver will redo most of the work over and over until it finally realises it has to build all of A to Z with tests. This can cause a very large slowdown in solving time.

cabal freeze does not help, since it only pins the versions and the test/no-test incompatibility still bites.

To Reproduce

I made a repository that exhibits the behaviour I described in the small: https://github.com/michaelpj/cabal-build-tool-depends-solver-repro

Changing tests:true to tests:false in cabal.project stops the solver from failing and having to backtrack.

Expected behavior

I don't understand how cabal's solver works at all, but I would like this to not be slow! Maybe the test/no-tests constraint needs to be pushed down eagerly?

System information

  • NixOS
  • cabal 3.4.0.0, ghc 8.10.4
@michaelpj
Copy link
Collaborator Author

Quick check for "does this affect me":

cabal configure -v3 > out.txt
cat out.txt | grep 'stanza "test"' | wc -l

@michaelpj
Copy link
Collaborator Author

A few more notes for the interested:

@michaelpj
Copy link
Collaborator Author

For newer versions of cabal which don't run the solver when calling configure, the easiest way to see the issue is probably cabal build --dry -v3 all.

@gbaz
Copy link
Collaborator

gbaz commented Jul 22, 2021

Here's the result of my investigation thus far. The planPackages call constructs a DepResolverParams (this gets printed out at v3) that in turn includes "preferences" which include, in this example:

  A [BenchStanzas]
  B [BenchStanzas]
  C [BenchStanzas]

Those preferences say try with bench enabled before bench disabled. In this case we should set the preferences to also have TestStanzas.

Since this section also sets up constraints, and tests are definitely included in the constraints, all the data should be there to include them in preferences as well.

If this is done the search tree will be reordered to try packages first with test enabled (the success path!) and only then disabled. Not quite sure how to tweak the code to do this though, if anyone else wants to take a run at it first :-)

@gbaz
Copy link
Collaborator

gbaz commented Jul 23, 2021

Anyone affected by this issue is welcome to try out the linked PR an see if it improves things. (Assuming it gets a good review, I think it should probably be merged even if there's no noticeable speedup since it seems slightly more correct, and at worst will remove some "line noise" from solver output by eliminating some unnecessary local backjumps.

@grayjay
Copy link
Collaborator

grayjay commented Jul 27, 2021

I ran the reproduction, but I'm not sure it reproduces the performance issue, even though it contains the log message about incompatible stanzas. I didn't see any backtracking with cabal 3.4.0.0.

Full solver log:

[__0] trying: A-0.1.0.0 (user goal)
[__1] trying: base-4.14.1.0/installed-4.14.1.0 (dependency of A)
[__2] trying: rts-1.0/installedrts (dependency of base)
[__3] trying: integer-gmp-1.0.3.0/installed-1.0.3.0 (dependency of base)
[__4] trying: ghc-prim-0.6.1/installed-0.6.1 (dependency of base)
[__5] rejecting: A:!test (constraint from config file, command line flag, or user target requires opposite flag selection)
[__5] trying: A:*test
[__6] trying: B-0.1.0.0 (user goal)
[__7] trying: C-0.1.0.0 (user goal)
[__8] trying: C:B:exe.B~>B-0.1.0.0 (dependency of C)
[__9] trying: C:B:exe.base~>base-4.14.1.0/installed-4.14.1.0 (dependency of C:B:exe.B)
[_10] trying: C:B:exe.rts~>rts-1.0/installedrts (dependency of C:B:exe.base)
[_11] trying: C:B:exe.integer-gmp~>integer-gmp-1.0.3.0/installed-1.0.3.0 (dependency of C:B:exe.base)
[_12] trying: C:B:exe.ghc-prim~>ghc-prim-0.6.1/installed-0.6.1 (dependency of C:B:exe.base)
[_13] trying: C:B:exe.A~>A-0.1.0.0 (dependency of C:B:exe.B)
[_14] rejecting: C:B:exe.A:!test (dependencies not linked: stanza "test" incompatible; conflict set: A, B, C:B:exe.A, C:B:exe.B, A:test, C:B:exe.A:test)
[_14] trying: C:B:exe.A:*test
[_15] done

The log shows that cabal is trying to use the same instance of A for both the top-level build target and C's build tool dependency (level 13), which is an optimization. Using the same instance of A means that the two uses must make the same choice for enabling tests. The solver first tries disabling tests for the build tool dependency at level 14 but immediately encounters the conflict and enables tests instead. This is how I would expect the solver to behave, even with more packages in the chain of dependencies.

I think I would need to see an example of how increasing the chain of dependencies causes backtracking, especially if it causes the solver to make the same choice multiple times.

I also think that cabal should add the same preferences for all local packages as in #7472 (comment), for consistency.

@michaelpj
Copy link
Collaborator Author

I've attached the actual constraint-solver log from the project in question. It's quite long, but perhaps illuminating.

out.txt

@michaelpj
Copy link
Collaborator Author

In particular it has:

  • Lots of steps even in the happy path
  • Lots of backracking when it hits the test/non-test conflict

@michaelpj
Copy link
Collaborator Author

(Sorry that the reproduction wasn't obviously slow, I was trying to just show the test/non-test conflict issue, which seemed more self-contained!)

@grayjay
Copy link
Collaborator

grayjay commented Aug 10, 2021

Thanks for the logs! I graphed the level in the search tree vs line number and saw that there were four main instances of backtracking:

solver-levels

All four have similar conflicts involving build tools and enabling testing, so I focused on the first one, which has these relevant lines:

[1016] trying: plutus-pab:cardano-node:exe.plutus-tx-0.1.0.0 (dependency of plutus-pab:cardano-node:exe.plutus-ledger-api)
[1017] trying: plutus-pab:cardano-node:exe.plutus-tx:!test
...
[1510] trying: plutus-tx~>plutus-pab:cardano-node:exe.plutus-tx-0.1.0.0 (user goal)
[1511] rejecting: plutus-tx:!test (constraint from config file, command line flag, or user target requires opposite flag selection)
[1511] rejecting: plutus-tx:*test (dependencies not linked: stanza "test" incompatible; conflict set: plutus-ledger-api, plutus-tx, plutus-pab:cardano-cli:exe.plutus-ledger-api, plutus-pab:cardano-cli:exe.plutus-tx, plutus-pab:cardano-node:exe.plutus-ledger-api, plutus-pab:cardano-node:exe.plutus-tx, plutus-tx:test, plutus-pab:cardano-cli:exe.plutus-tx:test, plutus-pab:cardano-node:exe.plutus-tx:test)
[1511] fail (backjumping, conflict set: plutus-ledger-api, plutus-tx, plutus-pab:cardano-cli:exe.plutus-ledger-api, plutus-pab:cardano-cli:exe.plutus-tx, plutus-pab:cardano-node:exe.plutus-ledger-api, plutus-pab:cardano-node:exe.plutus-tx, plutus-tx:test, plutus-pab:cardano-cli:exe.plutus-tx:test, plutus-pab:cardano-node:exe.plutus-tx:test)
[1510] rejecting: plutus-tx-0.1.0.0 (multiple instances)
[1510] fail (backjumping, conflict set: plutus-ledger-api, plutus-tx, plutus-pab:cardano-cli:exe.plutus-ledger-api, plutus-pab:cardano-cli:exe.plutus-tx, plutus-pab:cardano-node:exe.plutus-ledger-api, plutus-pab:cardano-node:exe.plutus-tx, plutus-tx:test, plutus-pab:cardano-cli:exe.plutus-tx:test, plutus-pab:cardano-node:exe.plutus-tx:test)
...
[1017] trying: plutus-pab:cardano-node:exe.plutus-tx:*test
[1018] trying: plutus-tx~>plutus-pab:cardano-node:exe.plutus-tx-0.1.0.0 (user goal)
[1019] rejecting: plutus-tx:!test (constraint from config file, command line flag, or user target requires opposite flag selection)
[1019] trying: plutus-tx:*test

This conflict differs from the simple example and leads to backtracking because of a difference in goal order. The solver chose the version of the build tool dependency plutus-tx (plutus-pab:cardano-node:exe.plutus-tx) before the version of the top level build target plutus-tx. Only the top level build target is constrained to enable testing, so the solver initially chose to disable tests for plutus-pab:cardano-node:exe.plutus-tx. Then when it chose the version for the top level build target, it tried linking plutus-tx to plutus-pab:cardano-node:exe.plutus-tx, which requires both to make the same choice for enabling testing. The solver was unable to resolve the conflict by unlinking plutus-tx, because cabal doesn't currently support building the same version of a package twice. It was also unable to choose an alternative version for plutus-tx, because it is a local package. Those conflicts meant that the solver had to backtrack many levels to enable testing in plutus-pab:cardano-node:exe.plutus-tx.

I think that the solution is to remove the restriction on building the same version of a package twice. Then the solver could have unlinked the two goals and avoided backtracking. If I understand correctly, cabal could still build each component of the package at most once (avoiding increasing build time) while fulfilling the requirement to build both with and without tests. There is already an issue for removing the restriction to only build each version of a package once: #6459

Avoiding backtracking would only decrease the number of steps by about 60%, so there may be other performance issues related to solving for build tools on the happy path.

@gbaz
Copy link
Collaborator

gbaz commented Aug 10, 2021

@grayjay thanks for the detailed analysis!

note that my patch (#7490) would add a preference to build-tool dependency to enable testing as well, which tends to bypass this path that you pointed out -- and I think in fact that's what accounts for its improved performance. Looking at the cabal project now, I note some other places besides tests enabled that would induce the same backjumping. In particular, there are ghc options and flags also set at the top level alone:

package cardano-api
  ghc-options: -Werror

package cardano-cli
  ghc-options: -Werror

package cardano-config
  ghc-options: -Werror

package cardano-node
  ghc-options: -Werror

package cardano-node-chairman
  ghc-options: -Werror

package tx-generator
  ghc-options: -Werror

package cryptonite
  -- Using RDRAND instead of /dev/urandom as an entropy source for key
  -- generation is dubious. Set the flag so we use /dev/urandom by default.
  flags: -support_rdrand

Inspecting the logs, I noticed the cryptonite flag in particular led to some significant backjumping.

The current patch only pushes stanza prefs through. Do you think it would make sense to have a patch to push through options and flags explicitly as well?

@gbaz
Copy link
Collaborator

gbaz commented Aug 10, 2021

In particular I'm proposing that at

(PackageConstraint (scopeToplevel pkgname)

and the following definition

we change scopeToplevel to ScopeAnyQualifier.

@grayjay
Copy link
Collaborator

grayjay commented Aug 10, 2021

@gbaz I think that how we treat ghc options and flags should depend on the meaning of a package section. I'm not sure whether the section is supposed to apply to only the top level use of the package or all uses. I had thought that there was an issue about this distinction, but the closest thing I can find now is issues related to specifying local packages vs all packages, which are listed in #3720. I think that whichever behavior we choose should be made consistent across all options that can be specified in a package section. Do you know how ghc options are currently handled?

One issue with making constraints in a package section apply to all uses of the package is that it might seem inconsistent with the way that constraints are applied with the --constraint flag. --constraint="my-package == 1.0" only constrains the top-level use of my-package, and --constraint="any.my-package == 1.0" is required to constrain all uses.

@gbaz gbaz reopened this Aug 11, 2021
@gbaz
Copy link
Collaborator

gbaz commented Aug 11, 2021

Merged #7490 but still not done. Empirical testing shows that the change I suggest above -- which applies flag constraints at all scopes -- yields significant further speedup. However it is not a correct change, as there may be times when constraints need not be uniform (when the dependency is solely on the binary). Further, while we can add constraints such as any.foo to indicate constraints across all scopes for version bounds, we cannot add such constraints for flags, so there's no workaround in clever use of cabal.project syntax.

One thing we should do, eventually, is enable that scope qualified syntax for flag constraints as well as version constraints. However, that's pretty invasive.

I think a lightweight change would be to apply qualified version and flag constraints both as preferences at other scopes -- this is a uniform improvement to the solver with no changes to the semantics of the solution space. It just means we try harder, earlier, to keep things uniform. I'll work on a PR towards this goal.

@gbaz
Copy link
Collaborator

gbaz commented Aug 11, 2021

Ok, with that last breakthrough, now we're done!

@michaelpj
Copy link
Collaborator Author

Finally got the chance to try this out, and it looks like the patches perform as advertised! Thanks again! 🙇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants