Skip to content
This repository has been archived by the owner on Nov 18, 2021. It is now read-only.

Ensure we read from overlay when calling readDir #786

Closed

Conversation

tonyhb
Copy link

@tonyhb tonyhb commented Feb 21, 2021

fileSystem.readDir always reads from the actual filesystem, even if
there is an overaly directory present. Read from the overlay directory
in these circumstances; we shouldn't be hitting the disk.

Also, if the root path is "/" we can skip loading and return true. This
allows fully in-memory packages as per
#607.

This helps with fix certain aspects in #607, but does not refactor fileSystem
to use any of the newer interfaces that we could.

`fileSystem.readDir` always reads from the actual filesystem, even if
there is an overaly directory present.  Read from the overlay directory
in these circumstances;  we shouldn't be hitting the disk.

Also, if the root path is "/" we can skip loading and return true.  This
allows fully in-memory packages as per
cuelang#607.
@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

1 similar comment
@google-cla
Copy link

google-cla bot commented Feb 21, 2021

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@google-cla google-cla bot added the cla: no label Feb 21, 2021
@tonyhb tonyhb changed the title Ensure we read from overlay when hitting readDir Ensure we read from overlay when calling readDir Feb 21, 2021
@tonyhb
Copy link
Author

tonyhb commented Feb 21, 2021

@googlebot I signed it!

@googlebot
Copy link

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

1 similar comment
@google-cla
Copy link

google-cla bot commented Feb 21, 2021

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@tonyhb tonyhb force-pushed the fix/read-overlay-directories-fix-wasm branch from e52901b to 60d8365 Compare February 21, 2021 14:20
@verdverm
Copy link
Contributor

verdverm commented Mar 1, 2021

How would this impact someone using Go created WASM from a NodeJS script, where the filesystem is available?

I'm not super familiar, but this thought occurred to me

@verdverm
Copy link
Contributor

verdverm commented Mar 1, 2021

https://github.com/cuelang/cue/tree/master/pkg/path

This may be the preferred way to handle GOOS specific differences

@tonyhb
Copy link
Author

tonyhb commented Mar 1, 2021

How would this impact someone using Go created WASM from a NodeJS script, where the filesystem is available?

I'm not super familiar, but this thought occurred to me

Oh wow, yeah, that's a great question. I'm not super familiar either, but as far as I know this logic applies:

  • WASM is entirely sandboxed, and so any filesystem inside is memfs / tmpfs.
  • You can't access the real filesystem via wasm just yet
  • If you could I believe you would have to grant the FS abilities when creating the wasm VM; this is all theoretical as it doesn't work but I would assume you would be able to choose which files & roots were available.

In this case, maybe we should have documentation that specifies how this would work? I'm not entirely certain that you would be looking to mutate files when running in WASM; it's usually used to ensure you're running in a sandbox specifically so that mutations don't occur.

So, short answer: I think this might be a risk but a very, very slim one, esp. as disk functionality is not supported by WASM yet. What are your thoughts?

RE. GOOS specific differences, thanks for the pointer - will change!

@verdverm
Copy link
Contributor

verdverm commented Mar 1, 2021

regarding WASM generally, my hunch is that it becomes a new JVM like platform with a LLVM flavor. That is to say that you get the intermediate assembly format that has a runtime for different platforms, and you get N language frontends that all compile to it. So I see it becoming more ubiquitous, not just for inside the browser. There are already projects pushing on this front, see https://github.com/wasmerio/wasmer

@tonyhb
Copy link
Author

tonyhb commented Mar 1, 2021

Yep, I use these in my project (wasmtime, not wasmer - wasmer seems to be a UI over wasmtime) and these are still isolated.
There is no spec for filesystems yet - there are still questions around interface types for integrating (eg. things better than pointers).

Edit: After some research there is WASI - webassembly system interface. https://github.com/bytecodealliance/wasmtime/blob/main/docs/WASI-overview.md

I don't think this is supported in Go just yet - which is why os.Stat etc fail in the Go runtime. We will either be blocked on Go developing this, or have to revisit overlays when WASI compatibility is reached in Go.

And, as anticipated in the third bullet point above, you explicitly tell the VM which directories and files are available before running:

The --dir= option instructs wasmtime to preopen a directory, and make it available to the program as a capability which can be used to open files inside that directory
When a program calls open, they look up the file name in the map, and automatically supply the appropriate directory capability

Essentially, it is the same as your Overlay implementation :)

That said, once I make the path changes it would be nice to merge this in so that Cue actually works with overlay packages in wasm?

@verdverm
Copy link
Contributor

verdverm commented Mar 1, 2021

We're still waiting for @mpvl and @myitcv to chime in. They are the core maintainers and have permissions on the repo. I just hangout a lot and contribute where / when I can. I'm not sure how they will feel about conditionals on GOOS, it seems like an anti-pattern to me personally. They also tend to prefer holistic solutions than piecemeal fixes for edge cases. This might be part of a larger WASM of FileSystem story. It would be worthwhile to open an broader discussion particulars and see where that takes us. There have been previous discussions around how to make Cue work well in the context of an API server that are probably related to this. Generally, it's better to start with an issue / discussion that ends up with a proposal and then work on code. Marcel and Paul come from the Go team, and so a lot of good philosophy and practices carry over with them. It may feel slow at times, but it results in much better designed and implemented software in the long term.

Separately, you can take a look at the contributing docs and get setup with Gerrit. Most development happens over there. This repo is more of a mirror than the source of truth. There are some sync services setup that I'm not familiar with (for a bit of context).

Copy link
Contributor

@mpvl mpvl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes.

This still needs a test to expose the problem.

@@ -142,6 +142,10 @@ func (l *loader) importPkg(pos token.Pos, p *build.Instance) []*build.Instance {

found := false
for _, d := range dirs {
if d[1] == "/" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this also work for Windows volumes and other root indicators?

@@ -414,6 +415,9 @@ func IsEllipsis(x ast.Decl) bool {

// GenPath reports the directory in which to store generated files.
func GenPath(root string) string {
if runtime.GOOS == "js" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would "js" have to be special-cased here?

Is it because avoiding picking up the "pkg" directory? This really should be disabled for all OSes, but probably only after a 3.0 release. Is there any reason why JS is special here? If so, please add a comment/ TODO.

@@ -167,6 +167,20 @@ func hasSubdir(root, dir string) (rel string, ok bool) {

func (fs *fileSystem) readDir(path string) ([]os.FileInfo, errors.Error) {
path = fs.makeAbs(path)

if fi := fs.getDir(path, false); fi != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also a fs.getDir call below that is ostensibly there to achieve the same. So if the below code is buggy, it should be fixed there.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the difficulties when lacking the fs.DirEntry interface. We call ReadDir to iterate through, but this means we must have a concrete directory to read from. This breaks webassembly, as there is no underlying directory to read. This is why we must special case unless we comform to fs.DirEntry and support only go > 1.16.

I do think that @myitcv was correct regarding this patch - it fixes this code such that we use overlays if possible - but the underlying root is the lack of io/fs support.

To be honest, I'm fairly blocked on supporting cue with embedded packages without these fixes. I really understand that we should strive towards an io/fs driver with interfaces - that's the correct solution here.

As you can probably tell, I'm not familiar with cue's internals enough to take this on - I could give it a shot, but there will be lots of feedback.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then my suggestion is that we close this PR and move discussion to #607 or another issue if that one is not appropriate. In those issues, let's focus on specific use cases to help shape the solution.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR aims to fix the bugs in #607 - we might want to start an entirely new discussion on the architecture of overlays, files, and cue loading using io/fs as the interface to account for windows, linux, and webassembly (virtual fs) usage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR aims to fix the bugs in #607

Per my comments in #786 (comment), I'm not clear that the behaviour described in #607 is actually a bug (the issue itself is ostensibly a feature request). I'll comment on #607 to see if we can find a way forward.

@myitcv
Copy link
Contributor

myitcv commented Mar 12, 2021

Sorry for the delay in getting around to this.

Read from the overlay directory in these circumstances; we shouldn't be hitting the disk.

It's not clear to me that this is correct, and therefore I'm not clear that this change is solving the right problem. An overlay, in its current form, exists in order to complement what is on disk, rather than wholesale replace its contents.

-- blah.txt --
Hello
-- go.mod --
module blah.com

go 1.16

require cuelang.org/go v0.3.0-beta.6
-- main.go --
package main

import (
	"fmt"
	"log"
	"os"
	"path/filepath"

	"cuelang.org/go/cue"
	"cuelang.org/go/cue/load"
)

func main() {
	cwd, err := os.Getwd()
	if err != nil {
		log.Fatal(err)
	}
	abs := func(p string) string {
		return filepath.Join(cwd, p)
	}
	loadCfg := &load.Config{
		ModuleRoot: "/",
		Overlay: map[string]load.Source{
			abs("cue.mod/module.cue"): load.FromString(`module: "example.com"`),
			abs("x.cue"):              load.FromString("package x\nx: y"),
		},
	}
	bps := load.Instances([]string{"."}, loadCfg)
	is := cue.Build(bps)
	x, err := is[0].Lookup("x").Int64()
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("x = %v\n", x)
}
-- y.cue --
package x

y: 5

In that respect it is a different beast to io/fs.FS.

What I think you're looking for here is an io/fs.FS-based solution. That being the case, I suggest we open an issue discussing that.

FWIW the CUE playground (which is WASM-based) does not run into this because it consumes its input over stdin.

@myitcv myitcv closed this Mar 12, 2021
@myitcv
Copy link
Contributor

myitcv commented Mar 12, 2021

Closing per #786 (comment)

@tonyhb
Copy link
Author

tonyhb commented Jun 3, 2021

Sorry for the delay in getting around to this.

Read from the overlay directory in these circumstances; we shouldn't be hitting the disk.

It's not clear to me that this is correct, and therefore I'm not clear that this change is solving the right problem. An overlay, in its current form, exists in order to complement what is on disk, rather than wholesale replace its contents.

-- blah.txt --
Hello
-- go.mod --
module blah.com

go 1.16

require cuelang.org/go v0.3.0-beta.6
-- main.go --
package main

import (
	"fmt"
	"log"
	"os"
	"path/filepath"

	"cuelang.org/go/cue"
	"cuelang.org/go/cue/load"
)

func main() {
	cwd, err := os.Getwd()
	if err != nil {
		log.Fatal(err)
	}
	abs := func(p string) string {
		return filepath.Join(cwd, p)
	}
	loadCfg := &load.Config{
		ModuleRoot: "/",
		Overlay: map[string]load.Source{
			abs("cue.mod/module.cue"): load.FromString(`module: "example.com"`),
			abs("x.cue"):              load.FromString("package x\nx: y"),
		},
	}
	bps := load.Instances([]string{"."}, loadCfg)
	is := cue.Build(bps)
	x, err := is[0].Lookup("x").Int64()
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("x = %v\n", x)
}
-- y.cue --
package x

y: 5

In that respect it is a different beast to io/fs.FS.

What I think you're looking for here is an io/fs.FS-based solution. That being the case, I suggest we open an issue discussing that.

FWIW the CUE playground (which is WASM-based) does not run into this because it consumes its input over stdin.

Quickly wanted to reply to this and say that the playground does not resolve this because it consumes from stdin - it resolves this because it has no cue.mod packages. Any time you include a cue.mod package in wasm, GenPath fails. Seems as though any cue.mod runs through the typical os functions within load.go :)

@tonyhb tonyhb deleted the fix/read-overlay-directories-fix-wasm branch June 3, 2021 18:36
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants