Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

evaluator: performance regression on 0.6.0 #2559

Closed
MarcWort opened this issue Aug 22, 2023 · 10 comments
Closed

evaluator: performance regression on 0.6.0 #2559

MarcWort opened this issue Aug 22, 2023 · 10 comments

Comments

@MarcWort
Copy link

What version of CUE are you using (cue version)?

➜  cue git:(33b7393) ✗ ./cue-33b7393 version
cue version v0.0.0-20230808110549-33b73930b8fc

go version go1.21.0
      -buildmode exe
       -compiler gc
  DefaultGODEBUG panicnil=1
     CGO_ENABLED 1
          GOARCH amd64
            GOOS linux
         GOAMD64 v1
             vcs git
    vcs.revision 33b73930b8fc8a719a88978be0b7e019ba34e247
        vcs.time 2023-08-08T11:05:49Z
    vcs.modified false
➜  cue git:(97d7109d) ✗ ./cue-97d7109 version
cue version v0.0.0-20230809090532-97d7109df4d0

go version go1.21.0
      -buildmode exe
       -compiler gc
  DefaultGODEBUG panicnil=1
     CGO_ENABLED 1
          GOARCH amd64
            GOOS linux
         GOAMD64 v1
             vcs git
    vcs.revision 97d7109df4d03e7ecc77daa12dbc8b6a247e4d37
        vcs.time 2023-08-09T09:05:32Z
    vcs.modified true

Does this issue reproduce with the latest stable release?

Yes also running on 0.6.0.

What did you do?

I noticed that when running cue 0.6.0 that it was approximately 10x slower on my rules.
Bisecting led me to commit 97d7109 where the performance regression started but only when using a "_tool.cue" file.

➜  example-cue-gh git:(main) hyperfine "../cue/cue-33b7393 build-rules" && hyperfine "../cue/cue-97d7109 build-rules"
Benchmark 1: ../cue/cue-33b7393 build-rules
  Time (mean ± σ):     282.4 ms ±  15.0 ms    [User: 436.7 ms, System: 33.0 ms]
  Range (min … max):   266.5 ms … 308.8 ms    10 runs

Benchmark 1: ../cue/cue-97d7109 build-rules
  Time (mean ± σ):      5.881 s ±  0.450 s    [User: 8.591 s, System: 0.117 s]
  Range (min … max):    5.337 s …  6.769 s    10 runs

but not when directly using cue export

➜  example-cue-gh git:(main) hyperfine "../cue/cue-33b7393 export rules/rule.cue --out yaml" && hyperfine "../cue/cue-97d7109 export rules/rule.cue --out yaml"
Benchmark 1: ../cue/cue-33b7393 export rules/rule.cue --out yaml
  Time (mean ± σ):     157.8 ms ±   7.6 ms    [User: 241.0 ms, System: 21.8 ms]
  Range (min … max):   142.5 ms … 176.3 ms    18 runs

Benchmark 1: ../cue/cue-97d7109 export rules/rule.cue --out yaml
  Time (mean ± σ):     160.7 ms ±  11.3 ms    [User: 252.4 ms, System: 15.9 ms]
  Range (min … max):   147.7 ms … 189.6 ms    19 runs
 

You can reproduce my tests with my example repo: https://github.com/MarcWort/example-cue

What did you expect to see?

Not such a strong performance impact. My actual project now takes 4 minutes instead of 30secs. I tried some changes on the ruleset but havent found any particular section where the performance hit appears.

@MarcWort MarcWort added NeedsInvestigation Triage Requires triage/attention labels Aug 22, 2023
@mvdan
Copy link
Member

mvdan commented Aug 22, 2023

What about CUE versions f6e49bd (when Recurse was first added) and 0554d4e (the commit right before)?

97d7109 was only intended to get back the original behavior, as Marcel's change to add Recurse was just meant to be a refactor, but it introduced some regressions.

@MarcWort
Copy link
Author

0554d4e is a bit slower but no as bad as 97d7109

➜  example-cue-gh git:(main) hyperfine "../cue/cue-33b7393 build-rules" && hyperfine "../cue/cue-97d7109 build-rules" && hyperfine "../cue/cue-f6e49bd build-rules" && hyperfine "../cue/cue-0554d4e build-rules"
Benchmark 1: ../cue/cue-33b7393 build-rules
  Time (mean ± σ):     238.8 ms ±   9.9 ms    [User: 382.5 ms, System: 19.5 ms]
  Range (min … max):   226.6 ms … 258.9 ms    12 runs
 
Benchmark 1: ../cue/cue-97d7109 build-rules
  Time (mean ± σ):      5.774 s ±  0.448 s    [User: 8.408 s, System: 0.123 s]
  Range (min … max):    5.279 s …  6.421 s    10 runs
 
Benchmark 1: ../cue/cue-f6e49bd build-rules
  Time (mean ± σ):     272.4 ms ±   8.1 ms    [User: 434.0 ms, System: 24.8 ms]
  Range (min … max):   261.6 ms … 284.6 ms    10 runs
 
Benchmark 1: ../cue/cue-0554d4e build-rules
  Time (mean ± σ):     824.0 ms ±  33.7 ms    [User: 1204.3 ms, System: 30.2 ms]
  Range (min … max):   788.9 ms … 904.5 ms    10 runs

when using export all of them are fast

➜  example-cue-gh git:(main) hyperfine "../cue/cue-33b7393 export rules/rule.cue --out yaml" && hyperfine "../cue/cue-97d7109 export rules/rule.cue --out yaml" && hyperfine "../cue/cue-f6e49bd export rules/rule.cue --out yaml" && hyperfine "../cue/cue-0554d4e export rules/rule.cue --out yaml"
Benchmark 1: ../cue/cue-33b7393 export rules/rule.cue --out yaml
  Time (mean ± σ):     133.9 ms ±   7.6 ms    [User: 207.6 ms, System: 15.2 ms]
  Range (min … max):   121.0 ms … 151.2 ms    21 runs
 
Benchmark 1: ../cue/cue-97d7109 export rules/rule.cue --out yaml
  Time (mean ± σ):     136.5 ms ±   6.5 ms    [User: 208.3 ms, System: 15.7 ms]
  Range (min … max):   123.1 ms … 148.9 ms    22 runs
 
Benchmark 1: ../cue/cue-f6e49bd export rules/rule.cue --out yaml
  Time (mean ± σ):     133.8 ms ±   6.0 ms    [User: 205.0 ms, System: 14.2 ms]
  Range (min … max):   120.8 ms … 150.8 ms    22 runs
 
Benchmark 1: ../cue/cue-0554d4e export rules/rule.cue --out yaml
  Time (mean ± σ):     128.0 ms ±   6.4 ms    [User: 201.6 ms, System: 12.5 ms]
  Range (min … max):   119.8 ms … 141.5 ms    23 runs

@mvdan mvdan removed the Triage Requires triage/attention label Aug 23, 2023
@mvdan
Copy link
Member

mvdan commented Aug 23, 2023

We are currently focusing on v0.7, whose main goal is performance improvements. Marcel already has some big changes coming up in that direction, so I'm inclined to let them hit master first and then see where we stand with your benchmark.

@MarcWort
Copy link
Author

Sure, let me know if there is any early dev branch to test. Im willing to help.

@myitcv myitcv changed the title Performance regression on 0.6.0 evaluator: performance regression on 0.6.0 Aug 23, 2023
@MarcWort
Copy link
Author

MarcWort commented Dec 4, 2023

I know that 0.7.0 has no performance improvements, but just to keep you updated, here is a performance test on the current 0.7.0-rc.1 tag:

➜  example-cue-gh git:(main) hyperfine "../cue/cue-b36b477 build-rules" 
Benchmark 1: ../cue/cue-b36b477 build-rules
  Time (mean ± σ):      6.676 s ±  0.657 s    [User: 9.428 s, System: 0.120 s]
  Range (min … max):    5.711 s …  7.553 s    10 runs
 
➜  example-cue-gh git:(main) hyperfine "../cue/cue-97d7109 build-rules" 
Benchmark 1: ../cue/cue-97d7109 build-rules
  Time (mean ± σ):      5.715 s ±  0.417 s    [User: 8.266 s, System: 0.111 s]
  Range (min … max):    5.401 s …  6.477 s    10 runs

@mvdan
Copy link
Member

mvdan commented Jul 29, 2024

I'm looking at this again after @mxey provided some more information on Slack and also started helping with the debugging. Thank you again @MarcWort for the information above, and apologies that I hadn't had the time to come back to this before.

Starting from your helpful example at https://github.com/MarcWort/example-cue, I've been able to build a worst case scenario reproducer:

# run one command
exec cue cmd test

-- build_tool.cue --
package p

import (
	"list"
	"tool/cli"
)

command: test: cli.Print & { text: "\(len(list.FlattenN(indir1, 10)))" }

indir1: [if len(indir2) > 0 { indir2 }]
indir2: [if len(indir3) > 0 { indir3 }]
indir3: [if len(indir4) > 0 { indir4 }]
indir4: indir5
indir5: indir6
indir6: indir7
indir7: [for i in list.Range(1,20,4) {
	#pair & {
		one: n: 0
		two: n: i
	}
}]

#pair: {
	one: #unit & {
		n: <2
	}
	two: #unit & {
		n: >=one.n
		if n == one.n {
			n: <= one.n
		}
	}
	if two.n > one.n {
		one: n: 0
	}
}

#unit: {
	n: int
	#padding: {in: <10, out: "0\(in)"} | {in: >=10, out: "\(in)"}
	str: (#padding & {in: n}).out
}

With CUE as of master (3f84ba6) this takes about 7s:

# run one command (7.534s)
> exec cue cmd test
[stdout]
5
PASS

@mxey looked at https://review.gerrithub.io/c/cue-lang/cue/+/556460 and https://review.gerrithub.io/c/cue-lang/cue/+/557477 and suggested this surgical fix that seems to not cause any test failures and seems to speed up your command quite a lot:

diff --git tools/flow/tasks.go tools/flow/tasks.go
index 019a218f..cf684da5 100644
--- tools/flow/tasks.go
+++ tools/flow/tasks.go
@@ -257,7 +257,7 @@ func (c *Controller) markTaskDependencies(t *Task, n *adt.Vertex) {
 				return nil
 			}
 			c.nodes[d.Node] = cycleMarker
-			d.Recurse()
+			c.markTaskDependencies(t, d.Node)
 			c.nodes[d.Node] = nil
 		}
 		return nil

And it indeed helps with my reproducer massively, by a factor of nearly 100x:

# run one command (0.081s)
> exec cue cmd test
[stdout]
5
PASS

I still can't really tell why there is such a big performance difference between the two very similar versions of the code. Recurse was really only meant to save on allocations, and it wasn't meant to be doing significantly more (or any different) work.

Funnily enough, my reproducer causes a stack overflow with CUE v0.5.0, so I can't really validate whether this is the same regression that you saw with v0.6.0.

@mvdan
Copy link
Member

mvdan commented Jul 29, 2024

I should note that the reproducer above is akin to a worst case benchmark for the current code, so we can't really use it as a regression test, but we could still add it to the test suite and trust that, if we were to cause a similar slow-down in the future, we would notice by virtue of go test spinning the CPU for much longer than normal.

@mvdan
Copy link
Member

mvdan commented Jul 29, 2024

Tentative fix with test at https://review.gerrithub.io/c/cue-lang/cue/+/1198566; I will need to talk through it with Marcel to verify it's sane.

@mvdan mvdan self-assigned this Jul 30, 2024
cueckoo pushed a commit that referenced this issue Jul 30, 2024
The method calls markTaskDependencies, which in some cases like
the addded test would mean more evaluator work,
but those counters were not being added to the task's stats.

The added test would fail before the fix:

    stats: task totals different from controller:
    [...]
    task totals - controller totals:
    Leaks:  -1
    Freed:  -5
    Reused: -3
    Allocs: -3
    Retain: -5

    Unifications: -6
    Conjuncts:    -5
    Disjuncts:    -9

For #2559.

Signed-off-by: Daniel Martí <[email protected]>
Change-Id: I0e7e441b27a3cebad2669fe729cf5805eee019f7
Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1198616
Unity-Result: CUE porcuepine <[email protected]>
TryBot-Result: CUEcueckoo <[email protected]>
Reviewed-by: Marcel van Lohuizen <[email protected]>
cueckoo pushed a commit that referenced this issue Jul 30, 2024
The added CUE takes 7s to test with tools/flow or run with `cue cmd`,
as it's adapted from https://github.com/MarcWort/example-cue
to show the worst case scenario of tools/flow being slow by nesting
many references to non-task nodes which do some underlying work.

Thanks to Marc W for providing the original reproducer on GitHub.

For #2559.

Signed-off-by: Daniel Martí <[email protected]>
Change-Id: I25543e7c22b1dda6a8a8528198d2d7b4a63bb411
Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1198565
Unity-Result: CUE porcuepine <[email protected]>
TryBot-Result: CUEcueckoo <[email protected]>
Reviewed-by: Roger Peppe <[email protected]>
@mvdan
Copy link
Member

mvdan commented Jul 30, 2024

Should be resolved now in master, as well as the upcoming alpha release. Thanks again!

@andrewjeffree
Copy link

@mvdan is there any chance this fix might be cherry picked back into the 0.9 releases and result in a 0.9.3 release?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants