Skip to content

Commit

Permalink
cue/load: implement shared syntax cache
Browse files Browse the repository at this point in the history
We define a new optional `io.FS` interface which allows the `modpkgload`
code (or, more specifically, the `modimports` code), to access the
cached syntax maintained by `cue/load`.

We then implement that interface in the `cue/load`-provided `io.FS`
implementation, thus sharing the cache between `cue/load` and the
modules dependency calculation code.

One slightly adverse implication of this is that, when invoked from
`cue/load`, the dependency code will now read entire CUE files rather
than just the imports part, but that should hopefully not have a huge
impact in practice, as CUE files will usually be included in the
`build.Instance`, and therefore need the whole syntax to be parsed
anyway. If it turns out to be a problem, we could fix it by using a
somewhat more sophisticated cache in the future.

To verify that the caching is actually working as expected, I ran the
following testscript (with `testscript -v`), which creates a large
module with many instances that span multiple directories.

    exec go run writemodule.go
    exec sh -c 'time cue fmt --check ./...'

    -- go.mod --
    module test
    -- writemodule.go --
    package main

    import (
        "fmt"
        "log"
        "os"
        "path/filepath"
        "strings"
    )

    var modDir = filepath.Join(os.Getenv("WORK"))

    func main() {
        if err := os.MkdirAll(modDir, 0o777); err != nil {
            log.Fatal(err)
        }
        for i := range 5 {
            writePackage(fmt.Sprintf("x%d.cue", i))
            for j := range 10 {
                writePackage(fmt.Sprintf("d%d/x%d.cue", i, j))
                for k := range 10 {
                    writePackage(fmt.Sprintf("d%d/d%d/x%d.cue", i, j, k))
                    for l := range 10 {
                        writePackage(fmt.Sprintf("d%d/d%d/d%d/x%d.cue", i, j, k, l))
                    }
                }
            }
        }
        writeFile("cue.mod/module.cue", `
    module: "test.example"
    language: version: "v0.9.2"
    `)
    }

    const fillerLines = 100

    var filler = func() string {
        var buf strings.Builder
        for i := range fillerLines {
            fmt.Fprintf(&buf, "\n_d%d: true\n", i)
        }
        return buf.String()
    }()

    func writePackage(name string) {
        contents := fmt.Sprintf(`
    package x

    value: %q: true
    `[1:], name)
        writeFile(name, contents+filler)
    }

    func writeFile(name string, contents string) {
        name = filepath.Join(modDir, name)
        if err := os.MkdirAll(filepath.Dir(name), 0o777); err != nil {
            log.Fatal(err)
        }
        if err := os.WriteFile(name, []byte(contents), 0o666); err != nil {
            log.Fatal(err)
        }
    }

At the start of this CL chain (commit 27adbac) the result was:

    9.85user 0.51system 0:05.40elapsed 191%CPU (0avgtext+0avgdata 436104maxresident)k
    0inputs+0outputs (0major+122110minor)pagefaults 0swaps

As of this CL, I get this:

    6.22user 0.20system 0:03.24elapsed 198%CPU (0avgtext+0avgdata 476548maxresident)k
    0inputs+0outputs (0major+142320minor)pagefaults 0swaps

This is a reduction of 40%, indicating that the caching seems to be
working as expected. There is still more to be done: we are not yet
caching the results of directory reads, which should provide more
speedup in a subsequent CL.

Fixes #3177.

Signed-off-by: Roger Peppe <[email protected]>
Change-Id: I1f8328df09b5b6e457b202de2337d2dac2be1d19
Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1197531
Reviewed-by: Paul Jolly <[email protected]>
TryBot-Result: CUEcueckoo <[email protected]>
Unity-Result: CUE porcuepine <[email protected]>
  • Loading branch information
rogpeppe committed Jul 11, 2024
1 parent afa222f commit fc16ef8
Show file tree
Hide file tree
Showing 3 changed files with 101 additions and 23 deletions.
53 changes: 48 additions & 5 deletions cue/load/fs.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ import (
"path/filepath"
"slices"
"strings"
"sync"
"time"

"cuelang.org/go/cue"
Expand Down Expand Up @@ -86,10 +87,6 @@ func (fs *fileSystem) getDir(dir string, create bool) map[string]*overlayFile {
// paths required by most of the `cue/load` package
// implementation.
func (fs *fileSystem) ioFS(root string) iofs.FS {
dir := fs.getDir(root, false)
if dir == nil {
return module.OSDirFS(root)
}
return &ioFS{
fs: fs,
root: root,
Expand Down Expand Up @@ -342,6 +339,49 @@ func (fs *ioFS) ReadFile(name string) ([]byte, error) {
return os.ReadFile(fpath)
}

var _ module.ReadCUEFS = (*ioFS)(nil)

// ReadCUEFile implements [module.ReadCUEFS] by
// reading and updating the syntax file cache, which
// is shared with the cache used by the [fileSystem.getCUESyntax]
// method.
func (fs *ioFS) ReadCUEFile(path string) (*ast.File, error) {
fpath, err := fs.absPathFromFSPath(path)
if err != nil {
return nil, err
}
cache := fs.fs.fileCache
cache.mu.Lock()
entry, ok := cache.entries[fpath]
cache.mu.Unlock()
if ok {
return entry.file, entry.err
}
var data []byte
if fi := fs.fs.getOverlay(fpath); fi != nil {
if fi.file != nil {
// No need for a cache if we've got the contents in *ast.File
// form already.
return fi.file, nil
}
data = fi.contents
} else {
data, err = os.ReadFile(fpath)
if err != nil {
cache.mu.Lock()
defer cache.mu.Unlock()
cache.entries[fpath] = fileCacheEntry{nil, err}
return nil, err
}
}
return fs.fs.getCUESyntax(&build.File{
Filename: fpath,
Encoding: build.CUE,
// Form: build.Schema,
Source: data,
})
}

// ioFSFile implements [io/fs.File] for the overlay filesystem.
type ioFSFile struct {
fs *fileSystem
Expand Down Expand Up @@ -394,12 +434,14 @@ func (f *ioFSFile) ReadDir(n int) ([]iofs.DirEntry, error) {
}

func (fs *fileSystem) getCUESyntax(bf *build.File) (*ast.File, error) {
fs.fileCache.mu.Lock()
defer fs.fileCache.mu.Unlock()
if bf.Encoding != build.CUE {
panic("getCUESyntax called with non-CUE file encoding")
}
// When it's a regular CUE file with no funny stuff going on, we
// check and update the syntax cache.
useCache := bf.Form == build.Schema && bf.Interpretation == ""
useCache := bf.Form == "" && bf.Interpretation == ""
if useCache {
if syntax, ok := fs.fileCache.entries[bf.Filename]; ok {
return syntax.file, syntax.err
Expand Down Expand Up @@ -431,6 +473,7 @@ func newFileCache(c *Config) *fileCache {
type fileCache struct {
config encoding.Config
ctx *cue.Context
mu sync.Mutex
entries map[string]fileCacheEntry
}

Expand Down
55 changes: 37 additions & 18 deletions internal/mod/modimports/modimports.go
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
package modimports

import (
"errors"
"fmt"
"io/fs"
"path"
Expand Down Expand Up @@ -214,26 +215,44 @@ func yieldPackageFile(fsys fs.FS, fpath string, selectPackage func(pkgName strin
pf := ModuleFile{
FilePath: fpath,
}
f, err := fsys.Open(fpath)
if err != nil {
return "", yield(pf, err)
var syntax *ast.File
var err error
if cueFS, ok := fsys.(module.ReadCUEFS); ok {
// The FS implementation supports reading CUE syntax directly.
// A notable FS implementation that does this is the one
// provided by cue/load, allowing that package to cache
// the parsed CUE.
syntax, err = cueFS.ReadCUEFile(fpath)
if err != nil && !errors.Is(err, errors.ErrUnsupported) {
return "", yield(pf, err)
}
}
defer f.Close()
if syntax == nil {
// Either the FS doesn't implement [module.ReadCUEFS]
// or the ReadCUEFile method returned ErrUnsupported,
// so we need to acquire the syntax ourselves.

// Note that we use cueimports.Read before parser.ParseFile as cue/parser
// will always consume the whole input reader, which is often wasteful.
//
// TODO(mvdan): the need for cueimports.Read can go once cue/parser can work
// on a reader in a streaming manner.
data, err := cueimports.Read(f)
if err != nil {
return "", yield(pf, err)
}
// Add a leading "./" so that a parse error filename is consistent
// with the other error filenames created elsewhere in the codebase.
syntax, err := parser.ParseFile("./"+fpath, data, parser.ImportsOnly)
if err != nil {
return "", yield(pf, err)
f, err := fsys.Open(fpath)
if err != nil {
return "", yield(pf, err)
}
defer f.Close()

// Note that we use cueimports.Read before parser.ParseFile as cue/parser
// will always consume the whole input reader, which is often wasteful.
//
// TODO(mvdan): the need for cueimports.Read can go once cue/parser can work
// on a reader in a streaming manner.
data, err := cueimports.Read(f)
if err != nil {
return "", yield(pf, err)
}
// Add a leading "./" so that a parse error filename is consistent
// with the other error filenames created elsewhere in the codebase.
syntax, err = parser.ParseFile("./"+fpath, data, parser.ImportsOnly)
if err != nil {
return "", yield(pf, err)
}
}

if !selectPackage(syntax.PackageName()) {
Expand Down
16 changes: 16 additions & 0 deletions mod/module/dirfs.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ package module
import (
"io/fs"
"os"

"cuelang.org/go/cue/ast"
)

// SourceLoc represents the location of some CUE source code.
Expand All @@ -13,6 +15,20 @@ type SourceLoc struct {
Dir string
}

// ReadCUE can be implemented by an [fs.FS]
// to provide an optimized (cached) way of
// reading and parsing CUE syntax.
type ReadCUEFS interface {
fs.FS

// ReadCUEFile reads CUE syntax from the given path.
//
// If this method is implemented, but the implementation
// does not support reading CUE files,
// it should return [errors.ErrUnsupported].
ReadCUEFile(path string) (*ast.File, error)
}

// OSRootFS can be implemented by an [fs.FS]
// implementation to return its root directory as
// an OS file path.
Expand Down

0 comments on commit fc16ef8

Please sign in to comment.