evaluator: performance regression on 0.6.0 #2559

MarcWort · 2023-08-22T09:43:49Z

What version of CUE are you using (`cue version`)?

➜  cue git:(33b7393) ✗ ./cue-33b7393 version
cue version v0.0.0-20230808110549-33b73930b8fc

go version go1.21.0
      -buildmode exe
       -compiler gc
  DefaultGODEBUG panicnil=1
     CGO_ENABLED 1
          GOARCH amd64
            GOOS linux
         GOAMD64 v1
             vcs git
    vcs.revision 33b73930b8fc8a719a88978be0b7e019ba34e247
        vcs.time 2023-08-08T11:05:49Z
    vcs.modified false
➜  cue git:(97d7109d) ✗ ./cue-97d7109 version
cue version v0.0.0-20230809090532-97d7109df4d0

go version go1.21.0
      -buildmode exe
       -compiler gc
  DefaultGODEBUG panicnil=1
     CGO_ENABLED 1
          GOARCH amd64
            GOOS linux
         GOAMD64 v1
             vcs git
    vcs.revision 97d7109df4d03e7ecc77daa12dbc8b6a247e4d37
        vcs.time 2023-08-09T09:05:32Z
    vcs.modified true

Does this issue reproduce with the latest stable release?

Yes also running on 0.6.0.

What did you do?

I noticed that when running cue 0.6.0 that it was approximately 10x slower on my rules.
Bisecting led me to commit 97d7109 where the performance regression started but only when using a "_tool.cue" file.

➜  example-cue-gh git:(main) hyperfine "../cue/cue-33b7393 build-rules" && hyperfine "../cue/cue-97d7109 build-rules"
Benchmark 1: ../cue/cue-33b7393 build-rules
  Time (mean ± σ):     282.4 ms ±  15.0 ms    [User: 436.7 ms, System: 33.0 ms]
  Range (min … max):   266.5 ms … 308.8 ms    10 runs

Benchmark 1: ../cue/cue-97d7109 build-rules
  Time (mean ± σ):      5.881 s ±  0.450 s    [User: 8.591 s, System: 0.117 s]
  Range (min … max):    5.337 s …  6.769 s    10 runs

but not when directly using cue export

➜  example-cue-gh git:(main) hyperfine "../cue/cue-33b7393 export rules/rule.cue --out yaml" && hyperfine "../cue/cue-97d7109 export rules/rule.cue --out yaml"
Benchmark 1: ../cue/cue-33b7393 export rules/rule.cue --out yaml
  Time (mean ± σ):     157.8 ms ±   7.6 ms    [User: 241.0 ms, System: 21.8 ms]
  Range (min … max):   142.5 ms … 176.3 ms    18 runs

Benchmark 1: ../cue/cue-97d7109 export rules/rule.cue --out yaml
  Time (mean ± σ):     160.7 ms ±  11.3 ms    [User: 252.4 ms, System: 15.9 ms]
  Range (min … max):   147.7 ms … 189.6 ms    19 runs

You can reproduce my tests with my example repo: https://github.com/MarcWort/example-cue

What did you expect to see?

Not such a strong performance impact. My actual project now takes 4 minutes instead of 30secs. I tried some changes on the ruleset but havent found any particular section where the performance hit appears.

The text was updated successfully, but these errors were encountered:

mvdan · 2023-08-22T12:08:13Z

What about CUE versions f6e49bd (when Recurse was first added) and 0554d4e (the commit right before)?

97d7109 was only intended to get back the original behavior, as Marcel's change to add Recurse was just meant to be a refactor, but it introduced some regressions.

MarcWort · 2023-08-22T12:36:29Z

0554d4e is a bit slower but no as bad as 97d7109

➜  example-cue-gh git:(main) hyperfine "../cue/cue-33b7393 build-rules" && hyperfine "../cue/cue-97d7109 build-rules" && hyperfine "../cue/cue-f6e49bd build-rules" && hyperfine "../cue/cue-0554d4e build-rules"
Benchmark 1: ../cue/cue-33b7393 build-rules
  Time (mean ± σ):     238.8 ms ±   9.9 ms    [User: 382.5 ms, System: 19.5 ms]
  Range (min … max):   226.6 ms … 258.9 ms    12 runs
 
Benchmark 1: ../cue/cue-97d7109 build-rules
  Time (mean ± σ):      5.774 s ±  0.448 s    [User: 8.408 s, System: 0.123 s]
  Range (min … max):    5.279 s …  6.421 s    10 runs
 
Benchmark 1: ../cue/cue-f6e49bd build-rules
  Time (mean ± σ):     272.4 ms ±   8.1 ms    [User: 434.0 ms, System: 24.8 ms]
  Range (min … max):   261.6 ms … 284.6 ms    10 runs
 
Benchmark 1: ../cue/cue-0554d4e build-rules
  Time (mean ± σ):     824.0 ms ±  33.7 ms    [User: 1204.3 ms, System: 30.2 ms]
  Range (min … max):   788.9 ms … 904.5 ms    10 runs

when using export all of them are fast

➜  example-cue-gh git:(main) hyperfine "../cue/cue-33b7393 export rules/rule.cue --out yaml" && hyperfine "../cue/cue-97d7109 export rules/rule.cue --out yaml" && hyperfine "../cue/cue-f6e49bd export rules/rule.cue --out yaml" && hyperfine "../cue/cue-0554d4e export rules/rule.cue --out yaml"
Benchmark 1: ../cue/cue-33b7393 export rules/rule.cue --out yaml
  Time (mean ± σ):     133.9 ms ±   7.6 ms    [User: 207.6 ms, System: 15.2 ms]
  Range (min … max):   121.0 ms … 151.2 ms    21 runs
 
Benchmark 1: ../cue/cue-97d7109 export rules/rule.cue --out yaml
  Time (mean ± σ):     136.5 ms ±   6.5 ms    [User: 208.3 ms, System: 15.7 ms]
  Range (min … max):   123.1 ms … 148.9 ms    22 runs
 
Benchmark 1: ../cue/cue-f6e49bd export rules/rule.cue --out yaml
  Time (mean ± σ):     133.8 ms ±   6.0 ms    [User: 205.0 ms, System: 14.2 ms]
  Range (min … max):   120.8 ms … 150.8 ms    22 runs
 
Benchmark 1: ../cue/cue-0554d4e export rules/rule.cue --out yaml
  Time (mean ± σ):     128.0 ms ±   6.4 ms    [User: 201.6 ms, System: 12.5 ms]
  Range (min … max):   119.8 ms … 141.5 ms    23 runs

mvdan · 2023-08-23T08:40:57Z

We are currently focusing on v0.7, whose main goal is performance improvements. Marcel already has some big changes coming up in that direction, so I'm inclined to let them hit master first and then see where we stand with your benchmark.

MarcWort · 2023-08-23T11:47:28Z

Sure, let me know if there is any early dev branch to test. Im willing to help.

MarcWort · 2023-12-04T16:56:00Z

I know that 0.7.0 has no performance improvements, but just to keep you updated, here is a performance test on the current 0.7.0-rc.1 tag:

➜  example-cue-gh git:(main) hyperfine "../cue/cue-b36b477 build-rules" 
Benchmark 1: ../cue/cue-b36b477 build-rules
  Time (mean ± σ):      6.676 s ±  0.657 s    [User: 9.428 s, System: 0.120 s]
  Range (min … max):    5.711 s …  7.553 s    10 runs
 
➜  example-cue-gh git:(main) hyperfine "../cue/cue-97d7109 build-rules" 
Benchmark 1: ../cue/cue-97d7109 build-rules
  Time (mean ± σ):      5.715 s ±  0.417 s    [User: 8.266 s, System: 0.111 s]
  Range (min … max):    5.401 s …  6.477 s    10 runs

mvdan · 2024-07-29T15:50:50Z

I'm looking at this again after @mxey provided some more information on Slack and also started helping with the debugging. Thank you again @MarcWort for the information above, and apologies that I hadn't had the time to come back to this before.

Starting from your helpful example at https://github.com/MarcWort/example-cue, I've been able to build a worst case scenario reproducer:

# run one command
exec cue cmd test

-- build_tool.cue --
package p

import (
	"list"
	"tool/cli"
)

command: test: cli.Print & { text: "\(len(list.FlattenN(indir1, 10)))" }

indir1: [if len(indir2) > 0 { indir2 }]
indir2: [if len(indir3) > 0 { indir3 }]
indir3: [if len(indir4) > 0 { indir4 }]
indir4: indir5
indir5: indir6
indir6: indir7
indir7: [for i in list.Range(1,20,4) {
	#pair & {
		one: n: 0
		two: n: i
	}
}]

#pair: {
	one: #unit & {
		n: <2
	}
	two: #unit & {
		n: >=one.n
		if n == one.n {
			n: <= one.n
		}
	}
	if two.n > one.n {
		one: n: 0
	}
}

#unit: {
	n: int
	#padding: {in: <10, out: "0\(in)"} | {in: >=10, out: "\(in)"}
	str: (#padding & {in: n}).out
}

With CUE as of master (3f84ba6) this takes about 7s:

# run one command (7.534s)
> exec cue cmd test
[stdout]
5
PASS

@mxey looked at https://review.gerrithub.io/c/cue-lang/cue/+/556460 and https://review.gerrithub.io/c/cue-lang/cue/+/557477 and suggested this surgical fix that seems to not cause any test failures and seems to speed up your command quite a lot:

diff --git tools/flow/tasks.go tools/flow/tasks.go
index 019a218f..cf684da5 100644
--- tools/flow/tasks.go
+++ tools/flow/tasks.go
@@ -257,7 +257,7 @@ func (c *Controller) markTaskDependencies(t *Task, n *adt.Vertex) {
 				return nil
 			}
 			c.nodes[d.Node] = cycleMarker
-			d.Recurse()
+			c.markTaskDependencies(t, d.Node)
 			c.nodes[d.Node] = nil
 		}
 		return nil

And it indeed helps with my reproducer massively, by a factor of nearly 100x:

# run one command (0.081s)
> exec cue cmd test
[stdout]
5
PASS

I still can't really tell why there is such a big performance difference between the two very similar versions of the code. Recurse was really only meant to save on allocations, and it wasn't meant to be doing significantly more (or any different) work.

Funnily enough, my reproducer causes a stack overflow with CUE v0.5.0, so I can't really validate whether this is the same regression that you saw with v0.6.0.

mvdan · 2024-07-29T15:52:26Z

I should note that the reproducer above is akin to a worst case benchmark for the current code, so we can't really use it as a regression test, but we could still add it to the test suite and trust that, if we were to cause a similar slow-down in the future, we would notice by virtue of go test spinning the CPU for much longer than normal.

mvdan · 2024-07-29T16:32:20Z

Tentative fix with test at https://review.gerrithub.io/c/cue-lang/cue/+/1198566; I will need to talk through it with Marcel to verify it's sane.

The method calls markTaskDependencies, which in some cases like the addded test would mean more evaluator work, but those counters were not being added to the task's stats. The added test would fail before the fix: stats: task totals different from controller: [...] task totals - controller totals: Leaks: -1 Freed: -5 Reused: -3 Allocs: -3 Retain: -5 Unifications: -6 Conjuncts: -5 Disjuncts: -9 For #2559. Signed-off-by: Daniel Martí <[email protected]> Change-Id: I0e7e441b27a3cebad2669fe729cf5805eee019f7 Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1198616 Unity-Result: CUE porcuepine <[email protected]> TryBot-Result: CUEcueckoo <[email protected]> Reviewed-by: Marcel van Lohuizen <[email protected]>

The added CUE takes 7s to test with tools/flow or run with `cue cmd`, as it's adapted from https://github.com/MarcWort/example-cue to show the worst case scenario of tools/flow being slow by nesting many references to non-task nodes which do some underlying work. Thanks to Marc W for providing the original reproducer on GitHub. For #2559. Signed-off-by: Daniel Martí <[email protected]> Change-Id: I25543e7c22b1dda6a8a8528198d2d7b4a63bb411 Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1198565 Unity-Result: CUE porcuepine <[email protected]> TryBot-Result: CUEcueckoo <[email protected]> Reviewed-by: Roger Peppe <[email protected]>

mvdan · 2024-07-30T11:31:39Z

Should be resolved now in master, as well as the upcoming alpha release. Thanks again!

andrewjeffree · 2024-08-02T00:11:33Z

@mvdan is there any chance this fix might be cherry picked back into the 0.9 releases and result in a 0.9.3 release?

MarcWort added NeedsInvestigation Triage Requires triage/attention labels Aug 22, 2023

mvdan removed the Triage Requires triage/attention label Aug 23, 2023

mvdan added this to the v0.7.0: performance and disjunction fixes milestone Aug 23, 2023

mvdan added the roadmap/performance label Aug 23, 2023

myitcv changed the title ~~Performance regression on 0.6.0~~ evaluator: performance regression on 0.6.0 Aug 23, 2023

mvdan self-assigned this Jul 30, 2024

cueckoo closed this as completed in 34739e9 Jul 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evaluator: performance regression on 0.6.0 #2559

evaluator: performance regression on 0.6.0 #2559

MarcWort commented Aug 22, 2023

mvdan commented Aug 22, 2023 •

edited

Loading

MarcWort commented Aug 22, 2023

mvdan commented Aug 23, 2023

MarcWort commented Aug 23, 2023

MarcWort commented Dec 4, 2023

mvdan commented Jul 29, 2024

mvdan commented Jul 29, 2024

mvdan commented Jul 29, 2024

mvdan commented Jul 30, 2024

andrewjeffree commented Aug 2, 2024

evaluator: performance regression on 0.6.0 #2559

evaluator: performance regression on 0.6.0 #2559

Comments

MarcWort commented Aug 22, 2023

What version of CUE are you using (cue version)?

Does this issue reproduce with the latest stable release?

What did you do?

What did you expect to see?

mvdan commented Aug 22, 2023 • edited Loading

MarcWort commented Aug 22, 2023

mvdan commented Aug 23, 2023

MarcWort commented Aug 23, 2023

MarcWort commented Dec 4, 2023

mvdan commented Jul 29, 2024

mvdan commented Jul 29, 2024

mvdan commented Jul 29, 2024

mvdan commented Jul 30, 2024

andrewjeffree commented Aug 2, 2024

What version of CUE are you using (`cue version`)?

mvdan commented Aug 22, 2023 •

edited

Loading