Skip to content
This repository has been archived by the owner on Nov 18, 2021. It is now read-only.

internal/eval: performance issue with recursive definitions #803

Closed
verdverm opened this issue Mar 1, 2021 · 8 comments
Closed

internal/eval: performance issue with recursive definitions #803

verdverm opened this issue Mar 1, 2021 · 8 comments

Comments

@verdverm
Copy link
Contributor

verdverm commented Mar 1, 2021

What version of CUE are you using (cue version)?

beta.5

What did you do?

Trying to work with Cue from the Go API. The following is extracted from a much more complex application, but captures the essence of what is happening. Some cue is loaded, some values are extracted and merged. There are notes below the txtar on how to change just a few lines to vary the performance and time to run.

exec go run main.go

-- main.go --
package main

import (
        "fmt"

        "cuelang.org/go/cue"
        "cuelang.org/go/cue/load"
)

func main() {
        // We need a Cue.Runtime, the zero value is ready to use
        var RT cue.Runtime

        // The entrypoints are the same as the files you'd specify at the command line
        entrypoints := []string{"repro.cue"}

        // Load Cue files into Cue build.Instances slice
        // the second arg is a configuration object, we'll see this later
        bis := load.Instances(entrypoints, nil)

        var (
                I *cue.Instance
                V cue.Value
                err error
        )


        // Loop over the instances, checking for errors and printing
        for _, bi := range bis {
                // check for errors on the instance
                // these are typically parsing errors
                if bi.Err != nil {
                        fmt.Println("Error during load:", bi.Err)
                        continue
                }

                // Use cue.Runtime to build.Instance to cue.INstance
                I, err = RT.Build(bi)
                if err != nil {
                        fmt.Println("Error during build:", bi.Err)
                        continue
                }

                // get the root value and print it
                V = I.Value()
                // fmt.Println("root value:", V)

                // Validate the value
                err = V.Validate()
                if err != nil {
                        fmt.Println("Error during validate:", err)
                        continue
                }
        }

        empty, err := RT.Compile("", "")
        if err != nil {
                panic(err)
        }
        // fmt.Println("empty:", empty.Value())

        base := I.Lookup("base")
        // fmt.Println("base:", base)

        input := I.Lookup("input")
        // fmt.Println("input:", input)

        output := I.Lookup("output")
        // fmt.Println("output:", output)

        // fmt.Println("===============")

        merged := empty

        fmt.Println("merge.base")
        merged, err = merged.Fill(base)
        if err != nil {
                panic(err)
        }

        fmt.Println("merge.input")
        merged, err = merged.Fill(input)
        if err != nil {
                panic(err)
        }

        fmt.Println("merge.output")
        merged, err = merged.Fill(output)
        if err != nil {
                panic(err)
        }

        fmt.Println("merge.value")
        final := merged.Value()

        fmt.Println("merge.final")
        fmt.Println(final)

}

-- repro.cue

package repro

base: {
        repository: #Dir

        build: #Build & {
                source:   repository
                packages: "./cmd"
                output:   "/usr/local/bin/cmd"
        }
        help: {
                steps: [#Load & {
                        from: build
                }, #Exec & {
                        args: ["cmd", "-h"]
                }]
        }
}

input: {
        repository: {
                steps: [{
                        // do:  "local"
                        dir: "."
                        include: []
                }]
        }
}

output: {
        help: {
                steps: [#Load & {
                        from: build
                }, #Exec & {
                        args: ["cmd", "-h"]
                }]
        }

        build: {
          version: *"1.16" | string
                source: {
                        steps: [{
                                // do:  "local"
                                dir: "."
                                include: []
                        }]
                }

                // Packages to build
                packages: "./cmd"

                // Specify the targeted binary name
                output: "/usr/local/bin/cmd"
                env: [string]: string
                steps: [#Copy & {
                        from: #Go & {
                                version: version
                                "source":  source
                                "env":     env
                        }
                        src:  output
                        dest: output
                }]
        }
        repository: {
                steps: [{
                        // do:  "local"
                        dir: "."
                        include: []
                }]
        }
}

#Dir: steps: [...#Op]

// One operation in a script
#Op: #Fetch | #Exec | #Local | #Copy | #Load

#Local: {
        // do:      "local"
        dir:     string
        include: [...string] | *[]
}

#Load: {
        // do:   "load"
        from: _
}

#Exec: {
        // do: "exec"
        args: [...string]
        env?: [string]: string
        always?: true | *false
        dir:     string | *"/"
        mount: [string]: "tmp" | "cache" | {from: _, path: string | *"/"}
}

#Fetch: {
        // do:  "fetch-container"
        ref: string
}

#Copy: {
        // do:   "copy"
        from: _
        src:  string | *"/"
        dest: string | *"/"
}

#Go: {
        // Go version to use
        version: *"1.16" | string

        // Arguments to the Go binary
        args: [...string]

        // Source Directory to build
        source: #Dir

        // Environment variables
        env: [string]: string

        steps: [
                #Fetch & {
                        ref: "docker.io/golang:\(version)-alpine"
                },
                #Exec & {
                        "args": ["go"] + args

                        "env": env
                        env: CGO_ENABLED: "0"

                        dir: "/src"
                        mount: "/src": from: source

                        mount: "/root/.cache": "cache"
                },
        ]
}

#Build: {
        // Go version to use
        version: *#Go.version | string

        // Source Directory to build
        source: #Dir

        // Packages to build
        packages: *"." | string

        // Specify the targeted binary name
        output: string

        env: [string]: string

        steps: [
                #Copy & {
                        from: #Go & {
                                "version": version
                                "source":  source
                                "env":     env
                                args: ["build"]
                        }
                        src:  output
                        dest: output
                },
        ]
}

#Test: {
        // Go version to use
        version: *#Go.version | string

        // Source Directory to build
        source: #Dir

        // Packages to test
        packages: *"." | string

        #Go & {
                "version": version
                "source":  source
                args: ["test", "-v", packages]
        }
}

notes:

  • cue eval repro.cue has no issue finishing in 0.1s
  • uncommenting the // do: <...> reduces runtime to a few seconds, but still much slower
  • making steps: [] (an empty list) in #Build removes the runtime issue all together.

would love some help reducing this repro further

What did you expect to see?

consistent performance regardless of the line changes

What did you see instead?

CPU pinned to 100% and growing memory (via htop), program "hangs" at the merged.Value() line.

delve has show some very strange slice sizes for internal slices (in the original), slices with lengths that would make their mem footprint in the Petabytes! Lots of time is seemingly spent in Go's garbage collector.

@myitcv
Copy link
Contributor

myitcv commented Mar 4, 2021

Thanks for the repro, @verdverm.

Unfortunately like #808 the reproducer is incomplete (indeed the txtar archive is invalid). If I could gently reiterate the suggestions in that issue to test out the repros as you're creating the issue - it means we can jump straight to looking at the problem, and don't have to second guess what was intended.

(In this case it's probably not worth causing the program to fail in the error case - the fact that it takes a noticeable length of time to complete is sufficient).

Here is my version of what I believe your repro was trying to show, which I think demonstrates the slowdown you're experiencing:

exec go mod tidy
exec go run main.go

-- go.mod --
module blah.com

go 1.16

require cuelang.org/go v0.3.0-beta.5
-- main.go --
package main

import (
        "fmt"

        "cuelang.org/go/cue"
        "cuelang.org/go/cue/load"
)

func main() {
        // We need a Cue.Runtime, the zero value is ready to use
        var RT cue.Runtime

        // The entrypoints are the same as the files you'd specify at the command line
        entrypoints := []string{"repro.cue"}

        // Load Cue files into Cue build.Instances slice
        // the second arg is a configuration object, we'll see this later
        bis := load.Instances(entrypoints, nil)

        var (
                I *cue.Instance
                V cue.Value
                err error
        )


        // Loop over the instances, checking for errors and printing
        for _, bi := range bis {
                // check for errors on the instance
                // these are typically parsing errors
                if bi.Err != nil {
                        fmt.Println("Error during load:", bi.Err)
                        continue
                }

                // Use cue.Runtime to build.Instance to cue.INstance
                I, err = RT.Build(bi)
                if err != nil {
                        fmt.Println("Error during build:", bi.Err)
                        continue
                }

                // get the root value and print it
                V = I.Value()
                // fmt.Println("root value:", V)

                // Validate the value
                err = V.Validate()
                if err != nil {
                        fmt.Println("Error during validate:", err)
                        continue
                }
        }

        empty, err := RT.Compile("", "")
        if err != nil {
                panic(err)
        }
        // fmt.Println("empty:", empty.Value())

        base := I.Lookup("base")
        // fmt.Println("base:", base)

        input := I.Lookup("input")
        // fmt.Println("input:", input)

        output := I.Lookup("output")
        // fmt.Println("output:", output)

        // fmt.Println("===============")

        merged := empty

        fmt.Println("merge.base")
        merged, err = merged.Fill(base)
        if err != nil {
                panic(err)
        }

        fmt.Println("merge.input")
        merged, err = merged.Fill(input)
        if err != nil {
                panic(err)
        }

        fmt.Println("merge.output")
        merged, err = merged.Fill(output)
        if err != nil {
                panic(err)
        }

        fmt.Println("merge.value")
        final := merged.Value()

        fmt.Println("merge.final")
        fmt.Println(final)

}

-- repro.cue --

package repro

base: {
        repository: #Dir

        build: #Build & {
                source:   repository
                packages: "./cmd"
                output:   "/usr/local/bin/cmd"
        }
        help: {
                steps: [#Load & {
                        from: build
                }, #Exec & {
                        args: ["cmd", "-h"]
                }]
        }
}

input: {
        repository: {
                steps: [{
                        // do:  "local"
                        dir: "."
                        include: []
                }]
        }
}

output: {
        help: {
                steps: [#Load & {
                        from: build
                }, #Exec & {
                        args: ["cmd", "-h"]
                }]
        }

        build: {
          version: *"1.16" | string
                source: {
                        steps: [{
                                // do:  "local"
                                dir: "."
                                include: []
                        }]
                }

                // Packages to build
                packages: "./cmd"

                // Specify the targeted binary name
                output: "/usr/local/bin/cmd"
                env: [string]: string
                steps: [#Copy & {
                        from: #Go & {
                                version: version
                                "source":  source
                                "env":     env
                        }
                        src:  output
                        dest: output
                }]
        }
        repository: {
                steps: [{
                        // do:  "local"
                        dir: "."
                        include: []
                }]
        }
}

#Dir: steps: [...#Op]

// One operation in a script
#Op: #Fetch | #Exec | #Local | #Copy | #Load

#Local: {
        // do:      "local"
        dir:     string
        include: [...string] | *[]
}

#Load: {
        // do:   "load"
        from: _
}

#Exec: {
        // do: "exec"
        args: [...string]
        env?: [string]: string
        always?: true | *false
        dir:     string | *"/"
        mount: [string]: "tmp" | "cache" | {from: _, path: string | *"/"}
}

#Fetch: {
        // do:  "fetch-container"
        ref: string
}

#Copy: {
        // do:   "copy"
        from: _
        src:  string | *"/"
        dest: string | *"/"
}

#Go: {
        // Go version to use
        version: *"1.16" | string

        // Arguments to the Go binary
        args: [...string]

        // Source Directory to build
        source: #Dir

        // Environment variables
        env: [string]: string

        steps: [
                #Fetch & {
                        ref: "docker.io/golang:\(version)-alpine"
                },
                #Exec & {
                        "args": ["go"] + args

                        "env": env
                        env: CGO_ENABLED: "0"

                        dir: "/src"
                        mount: "/src": from: source

                        mount: "/root/.cache": "cache"
                },
        ]
}

#Build: {
        // Go version to use
        version: *#Go.version | string

        // Source Directory to build
        source: #Dir

        // Packages to build
        packages: *"." | string

        // Specify the targeted binary name
        output: string

        env: [string]: string

        steps: [
                #Copy & {
                        from: #Go & {
                                "version": version
                                "source":  source
                                "env":     env
                                args: ["build"]
                        }
                        src:  output
                        dest: output
                },
        ]
}

#Test: {
        // Go version to use
        version: *#Go.version | string

        // Source Directory to build
        source: #Dir

        // Packages to test
        packages: *"." | string

        #Go & {
                "version": version
                "source":  source
                args: ["test", "-v", packages]
        }
}

Will work with @mpvl on investigating/reducing this today.

@myitcv myitcv changed the title API: performance issue, only shows when running from Go program cue: performance issue finalising Fill-ed value Mar 4, 2021
@myitcv
Copy link
Contributor

myitcv commented Mar 4, 2021

Analysis: this doesn't appear to be specific to the cue API.

cue eval repro.cue has no issue finishing in 0.1s

With repro.cue file stands you're not comparing apples with apples. In the Go code you're effectively unifying {} & base & input & output. If you create a top-level field with that value then the issue can be reproduced with cue eval:

exec cue eval

-- repro.cue --
package repro

x: {} & base & input & output

base: {
	repository: #Dir

	build: #Build & {
		source:   repository
		packages: "./cmd"
		output:   "/usr/local/bin/cmd"
	}
	help: {
		steps: [#Load & {
			from: build
		}, #Exec & {
			args: ["cmd", "-h"]
		}]
	}
}

input: {
	repository: {
		steps: [{
			// do:  "local"
			dir: "."
			include: []
		}]
	}
}

output: {
	help: {
		steps: [#Load & {
			from: build
		}, #Exec & {
			args: ["cmd", "-h"]
		}]
	}

	build: {
		version: *"1.16" | string
		source: {
			steps: [{
				// do:  "local"
				dir: "."
				include: []
			}]
		}

		// Packages to build
		packages: "./cmd"

		// Specify the targeted binary name
		output: "/usr/local/bin/cmd"
		env: [string]: string
		steps: [#Copy & {
			from: #Go & {
				version:  version
				"source": source
				"env":    env
			}
			src:  output
			dest: output
		}]
	}
	repository: {
		steps: [{
			// do:  "local"
			dir: "."
			include: []
		}]
	}
}

#Dir: steps: [...#Op]

// One operation in a script
#Op: #Fetch | #Exec | #Local | #Copy | #Load

#Local: {
	do:      "local"
	dir:     string
	include: [...string] | *[]
}

#Load: {
	do:   "load"
	from: _
}

#Exec: {
	do: "exec"
	args: [...string]
	env?: [string]: string
	always?: true | *false
	dir:     string | *"/"
	mount: [string]: "tmp" | "cache" | {from: _, path: string | *"/"}
}

#Fetch: {
	do:  "fetch-container"
	ref: string
}

#Copy: {
	do:   "copy"
	from: _
	src:  string | *"/"
	dest: string | *"/"
}

#Go: {
	// Go version to use
	version: *"1.16" | string

	// Arguments to the Go binary
	args: [...string]

	// Source Directory to build
	source: #Dir

	// Environment variables
	env: [string]: string

	steps: [
		#Fetch & {
			ref: "docker.io/golang:\(version)-alpine"
		},
		#Exec & {
			"args": ["go"] + args

			"env": env
			env: CGO_ENABLED: "0"

			dir: "/src"
			mount: "/src": from: source

			mount: "/root/.cache": "cache"
		},
	]
}

#Build: {
	// Go version to use
	version: *#Go.version | string

	// Source Directory to build
	source: #Dir

	// Packages to build
	packages: *"." | string

	// Specify the targeted binary name
	output: string

	env: [string]: string

	steps: [
		#Copy & {
			from: #Go & {
				"version": version
				"source":  source
				"env":     env
				args: ["build"]
			}
			src:  output
			dest: output
		},
	]
}

#Test: {
	// Go version to use
	version: *#Go.version | string

	// Source Directory to build
	source: #Dir

	// Packages to test
	packages: *"." | string

	#Go & {
		"version": version
		"source":  source
		args: ["test", "-v", packages]
	}
}

Now working to reproduce this example.

@myitcv myitcv changed the title cue: performance issue finalising Fill-ed value internal/eval: performance issue with recursive definitions Mar 4, 2021
@myitcv
Copy link
Contributor

myitcv commented Mar 4, 2021

Further analysis.

In the following repro, I have created a top-level x field, that unifies base and output:

# Base version
exec cue eval

-- repro.cue --
package repro

x: base & output

base: {
	repository: #Dir

	build: #Build & {
		source:   repository
		packages: "./cmd"
		output:   "/usr/local/bin/cmd"
	}
	help: {
		steps: [#Load & {
			from: build
		}]
	}
}

output: {
	help: {
		steps: [#Load & {
			from: build
		}]
	}

	build: {
		version: *"1.16" | string
		source: {
			steps: [{
				do:  "local"
				dir: "."
				include: []
			}]
		}

		// Packages to build
		packages: "./cmd"

		// Specify the targeted binary name
		output: "/usr/local/bin/cmd"
		env: [string]: string
		steps: [#Copy & {
			from: #Go & {
				version:  version
				"source": source
				"env":    env
			}
			src:  output
			dest: output
		}]
	}
	repository: {
		steps: [{
			do:  "local"
			dir: "."
			include: []
		}]
	}
}

#Dir: steps: [...#Op]

// One operation in a script
#Op: #Fetch | #Exec | #Local | #Copy | #Load

#Local: {
	do:      "local"
	dir:     string
	include: [...string] | *[]
}

#Load: {
	do:   "load"
	from: _
}

#Exec: {
	do: "exec"
	args: [...string]
	env?: [string]: string
	always?: true | *false
	dir:     string | *"/"
	mount: [string]: "tmp" | "cache" | {from: _, path: string | *"/"}
}

#Fetch: {
	do:  "fetch-container"
	ref: string
}

#Copy: {
	do:   "copy"
	from: _
	src:  string | *"/"
	dest: string | *"/"
}

#Go: {
	// Go version to use
	version: *"1.16" | string

	// Arguments to the Go binary
	args: [...string]

	// Source Directory to build
	source: #Dir

	// Environment variables
	env: [string]: string

	steps: [
		#Fetch & {
			ref: "docker.io/golang:\(version)-alpine"
		},
		#Exec & {
			"args": ["go"] + args

			"env": env
			env: CGO_ENABLED: "0"

			dir: "/src"
			mount: "/src": from: source

			mount: "/root/.cache": "cache"
		},
	]
}

#Build: {
	// Go version to use
	version: *#Go.version | string

	// Source Directory to build
	source: #Dir

	// Packages to build
	packages: *"." | string

	// Specify the targeted binary name
	output: string

	env: [string]: string

	steps: [
		#Copy & {
			from: #Go & {
				"version": version
				"source":  source
				"env":     env
				args: ["build"]
			}
			src:  output
			dest: output
		},
	]
}

#Test: {
	// Go version to use
	version: *#Go.version | string

	// Source Directory to build
	source: #Dir

	// Packages to test
	packages: *"." | string

	#Go & {
		"version": version
		"source":  source
		args: ["test", "-v", packages]
	}
}

This is sufficient to reproduce the slow evaluation. What's interesting, is that moving all of the fields from output into base causes the problem to go away, despite the fact that logically the resulting x will be the same:

package repro

x: base & output

base: {
	repository: #Dir

	build: #Build & {
		source:   repository
		packages: "./cmd"
		output:   "/usr/local/bin/cmd"
	}
	help: {
		steps: [#Load & {
			from: build
		}]
	}
	help: {
		steps: [#Load & {
			from: build
		}]
	}

	build: {
		version: *"1.16" | string
		source: {
			steps: [{
				do:  "local"
				dir: "."
				include: []
			}]
		}

		// Packages to build
		packages: "./cmd"

		// Specify the targeted binary name
		output: "/usr/local/bin/cmd"
		env: [string]: string
		steps: [#Copy & {
			from: #Go & {
				version:  version
				"source": source
				"env":    env
			}
			src:  output
			dest: output
		}]
	}
	repository: {
		steps: [{
			do:  "local"
			dir: "."
			include: []
		}]
	}
}

output: {
}

#Dir: steps: [...#Op]

// One operation in a script
#Op: #Fetch | #Exec | #Local | #Copy | #Load

#Local: {
	do:      "local"
	dir:     string
	include: [...string] | *[]
}

#Load: {
	do:   "load"
	from: _
}

#Exec: {
	do: "exec"
	args: [...string]
	env?: [string]: string
	always?: true | *false
	dir:     string | *"/"
	mount: [string]: "tmp" | "cache" | {from: _, path: string | *"/"}
}

#Fetch: {
	do:  "fetch-container"
	ref: string
}

#Copy: {
	do:   "copy"
	from: _
	src:  string | *"/"
	dest: string | *"/"
}

#Go: {
	// Go version to use
	version: *"1.16" | string

	// Arguments to the Go binary
	args: [...string]

	// Source Directory to build
	source: #Dir

	// Environment variables
	env: [string]: string

	steps: [
		#Fetch & {
			ref: "docker.io/golang:\(version)-alpine"
		},
		#Exec & {
			"args": ["go"] + args

			"env": env
			env: CGO_ENABLED: "0"

			dir: "/src"
			mount: "/src": from: source

			mount: "/root/.cache": "cache"
		},
	]
}

#Build: {
	// Go version to use
	version: *#Go.version | string

	// Source Directory to build
	source: #Dir

	// Packages to build
	packages: *"." | string

	// Specify the targeted binary name
	output: string

	env: [string]: string

	steps: [
		#Copy & {
			from: #Go & {
				"version": version
				"source":  source
				"env":     env
				args: ["build"]
			}
			src:  output
			dest: output
		},
	]
}

#Test: {
	// Go version to use
	version: *#Go.version | string

	// Source Directory to build
	source: #Dir

	// Packages to test
	packages: *"." | string

	#Go & {
		"version": version
		"source":  source
		args: ["test", "-v", packages]
	}
}

@mpvl perhaps this gives a sufficient steer as to where the problem might lie?

myitcv added a commit to myitcvscratch/20210305_tony that referenced this issue Mar 5, 2021
@myitcv
Copy link
Contributor

myitcv commented Mar 6, 2021

Further analysis.

TL;DR - there is a CUE problem here; we have a couple of options for fixing it.

  1. I've established what I think is a base case for performance and correctness here, by moving all the fields/constraints involved into a single struct. Evaluation time: 0.072s. Link
  2. Using that as a frame of reference I then performed a number of experiments which are captured in the https://github.com/myitcvscratch/20210305_tony repo as commits on the main branch. The following bullet points discuss those various experiments, with evaluation times, diffs of the x field with respect to the reference value, and links to the relevant diff/code.
  3. For completeness, I removed the discriminator field do. I don't think this is sensible comparison because nobody is advocating we go down this path; if anything CUE is going to be encouraging people more strongly to add discriminator fields to help with performance. Note the diff in this case is the "wrong way around" - the change to remove the do fields is on the LHS, as are the timings and any diffs with respect to the reference. Evaluation time: 1m52secs. Link
  4. If now we change x from being a single struct value to the unification of two other struct values, base and output, with base containing all the fields/constraints that were previously in x, and output empty, the evaluation time jumps ~40%. This should not be happening, because logically the value of x remains unchanged in this scenario. Evaluation time: 0.111s. Link
  5. If we go one step further and duplicate the declaration of the help field within output, then the evaluation time leaps to ~20s. This appears to be the main trigger of the performance blow up. Evaluation time: 20s. Link
  6. If we instead duplicate the help declaration within the same struct then the performance is far better. Evaluation time: 0.1s. Link
  7. If we inline the declaration of repository where it is referenced by build the performance problem also largely disappears. Evaluation time: 0.25s. Link
  8. If we remove the redundant specification of steps in build then the problem entirely disappears, regardless of the split unification. Evaluation time: 0.06s. Link

The observation in point 5 is, I think, the main trigger in this instance. Despite the fact that we are adding an identical declaration of help to the output struct, the evaluation time jumps. There is some intuition as to why this might be an issue: the context of help and build are different within the two conjuncts, base and output. Marcel has a hunch as to what might be going wrong here too (will let him comment separately).

What's interesting to observe is that specific conditions are required to trigger the problem, as demonstrated by points 6, 7 and 8. i.e. a slight tweak and the problem can "disappear".

Therefore, we've established case 5 as the current best repro of the problem, i.e. the error case (diff with respect to the base case).

For completeness I did a quick analysis over time to see whether there were any commits that stood out:

  • v0.3.0-alpha6 took 0.2s, but the changes to handling of disjunction and closedness since then have been significant
  • a086f74 (immediately prior to v0.3.0-beta.2) introduced a severe memory problem, a problem that was fixed in 0e401d6 (v0.3.0-beta.3), so during this time we can't infer anything
  • immediately prior to the memory problem evaluation time was ~3s
  • immediately after the memory problem the evaluation time was ~20s

Marcel will be able to say more precisely, but I'm not convinced there is much benefit to trying to find a single commit as the root cause here. We have some pretty good ideas on the root cause.

To summarise:

  • there is an underlying CUE issue here
  • we have a good steer on what that might be, with a couple of options on how to fix it (again, will let Marcel say more on that)

I also looked to see whether there is a more immediate term fix we employ here whilst the proper fix (in whatever form that takes) is being worked on. That workaround involved optimising for the fact that the "collected" value is ultimately, for now, only needed in concrete form. However, it looks like that workaround will not, unfortunately, work in this case. The reason being is that the zero value of a list type is the empty list. Hence a round trip to/from a concrete representation is lossy results in @verdverm's original repro failing because we end up trying to unify two fields values, one with a zero-length list (which should in fact be the type) with a non-zero-length list.

@mpvl
Copy link
Contributor

mpvl commented Mar 7, 2021

v0.3.0-alpha6 took 0.2s, but the changes to handling of disjunction and closedness since then have been significant
Also note that the handling of disjunctions was wrong in alpha6. :) So it is not a fair comparison.

Either way, it it does give an indication of what is possible. It tells me that reintroducing some of the techniques of v0.2.2 should be able to get it does to this 0.1s range

Basically, the current approach filters disjunctions looking at struct properties. The theory there was that there is a limited number of syntactic element so that the number of duplicates converges to a small number quickly, stoping exponential growth, and that the check itself is quick.

This turned out to be a flawed assumption: conjuncts need to be uniqued not only by syntactical element, but also by evaluator, of which there is an unbounded number. So nice try, but it doesn't work.

Rather than refining the duplicate elimination model, the plan is to just brute force the comparison. This adds a constant overhead, but that beats the hell out of exponential slowdown. Also, some of the tricks now can be used to short-circuit the comparison when possible, gaining back some of the speed.

Brute forcing the comparison is a bit involved: v0.3 does not compute arcs for optional fields. This would then be necessary. This chance will also help with the required field and several other features, though.

@mpvl
Copy link
Contributor

mpvl commented Mar 7, 2021

Possible workaround:

Note that

include: [...string] | *[]

is equivalent to

include: [...string]

as the default value of [...string] is []

If I make that substitution, it runs well below a second.

Of course there is no excuse for CUE to become so slow if the more verbose form is used, but hopefully this will help for now.

@mpvl
Copy link
Contributor

mpvl commented Mar 7, 2021

To confirm the above hypothesis, this is a small reproducer:

x: #A & #B & #C
#A: #C // same as #C, the references forces a new closedInfo and Environment
#B: #C // same here, repeat pattern to make disjunction in `x` grow exponentially
#C: [...string] | *[]

Because each conjunct of #C represents a new copy (because the Environment is different), and because the ...string in list counts as an optional field, the Equality check is skipped.

Including optional fields in the equality check (as was done in v0.2, well that used subsumption, but same difference), would address this issue. Note that equality cannot be implemented accurately for bulk optional fields. But it can at least be implemented if the bulk optional field originated from the same struct. So by doing so, we achieve the dedup goal of the original v0.3 implementation, because it effectively takes the Environment factor out of the equation.

Note that in this case it would be sufficient: if we rewrite x as x: #C & #C & #C, performance is great. This is effectively the performance one would get with the proposed technique (minus the constant overhead).

A completely different approach would be to detect equivalence of Environments. This is complicated though, and is a less general mechanism.

@cueckoo
Copy link

cueckoo commented Jul 3, 2021

This issue has been migrated to cue-lang/cue#803.

For more details about CUE's migration to a new home, please see cue-lang/cue#1078.

@cueckoo cueckoo closed this as completed Jul 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants