Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fsapi: migrates PollRead to Poll with Pflag #1599

Merged
merged 5 commits into from
Jul 30, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
129 changes: 129 additions & 0 deletions RATIONALE.md
Original file line number Diff line number Diff line change
Expand Up @@ -616,6 +616,135 @@ act differently and document `ModuleConfig` is more about emulating, not necessa

## File systems

### Motivation on `sys.FS`

The `sys.FS` abstraction in wazero was created because of limitations in
`fs.FS`, and `fs.File` in Go. Compilers targeting `wasip1` may access
functionality that writes new files. The ability to overcome this was requested
even before wazero was named this, via issue #21 in March 2021.

A month later, golang/go#45757 was raised by someone else on the same topic. As
of July 2023, this has not resolved to a writeable file system abstraction.

Over the next year more use cases accumulated, consolidated in March 2022 into
#390. This closed in January 2023 with a milestone of providing more
functionality, limited to users giving a real directory. This didn't yet expose
a file abstraction for general purpose use. Internally, this used `os.File`.
However, a wasm module instance is a virtual machine. Only supporting `os.File`
breaks sand-boxing use cases. Moreover, `os.File` is not an interface. Even
though this abstracts functionality, it does allow interception use cases.

Hence, a few days later in January 2023, we had more issues asking to expose an
abstraction, #1013 and later #1532, on use cases like masking access to files.
In other words, the use case requests never stopped, and aren't solved by
exposing only real files.

In summary, the primary motivation for exposing a replacement for `fs.FS` and
`fs.File` was around repetitive use case requests for years, around
interception and the ability to create new files, both virtual and real files.
While some use cases are solved with real files, not all are. Regardless, an
interface approach is necessary to ensure users can intercept I/O operations.

### Why doesn't `sys.File` have a `Fd()` method?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: I'm using sys.File as even though we may have forgotten, we did promise to expose our filesystem abstraction in the next release. Even if that ends up experimentalsys.File, the end name would be this.


There are many features we could expose. We could make File expose underlying
file descriptors in case they are supported, for integration of system calls
that accept multiple ones, namely `poll` for multiplexing. This special case is
described in a subsequent section.

As noted above, users have been asking for a file abstraction for over two
years, and a common answer was to wait. Making users wait is a problem,
especially so long. Good reasons to make people wait are stabilization. Edge
case features are not a great reason to hold abstractions from users.

Another reason is implementation difficulty. Go did not attempt to abstract
file descriptors. For example, unlike `fs.ReadFile` there is no `fs.FdFile`
interface. Most likely, this is because file descriptors are an implementation
detail of common features. Programming languages, including Go, do not require
end users to know about file descriptors. Types such as `fs.File` can be used
without any knowledge of them. Implementations may or may not have file
descriptors. For example, in Go, `os.DirFS` has underlying file descriptors
while `embed.FS` does not.

Despite this, some may want to expose a non-standard interface because
`os.File` has `Fd() uintptr` to return a file descriptor. Mainly, this is
handy to integrate with `syscall` package functions (on `GOOS` values that
declare them). Notice, though that `uintptr` is unsafe and not an abstraction.
Close inspection will find some `os.File` types internally use `poll.FD`
instead, yet this is not possible to use abstractly because that type is not
exposed. For example, `plan9` uses a different type than `poll.FD`. In other
words, even in real files, `Fd()` is not wholly portable, despite it being
useful on many operating systems with the `syscall` package.

The reasons above, why Go doesn't abstract `FdFile` interface are a subset of
reasons why `sys.File` does not. If we exposed `File.Fd()` we not only would
have to declare all the edge cases that Go describes including impact of
finalizers, we would have to describe these in terms of virtualized files.
Then, we would have to reason with this value vs our existing virtualized
`sys.FileTable`, mapping whatever type we return to keys in that table, also
in consideration of garbage collection impact. The combination of issues like
this could lead down a path of not implementing a file system abstraction at
all, and instead a weak key mapped abstraction of the `syscall` package. Once
we finished with all the edge cases, we would have lost context of the original
reason why we started.. simply to allow file write access!

When wazero attempts to do more than what the Go programming language team, it
has to be carefully evaluated, to:
* Be possible to implement at least for `os.File` backed files
* Not be confusing or cognitively hard for virtual file systems and normal use.
* Affordable: custom code is solely the responsible by the core team, a much
smaller group of individuals than who maintain the Go programming language.

Due to problems well known in Go, consideration of the end users who constantly
ask for basic file system functionality, and the difficulty virtualizing file
descriptors at multiple levels, we don't expose `Fd()` and likely won't ever
expose `Fd()` on `sys.File`.

### Why does `sys.File` have a `Poll()` method, while `sys.FS` does not?

wazero exposes `File.Poll` which allows one-at-a-time poll use cases,
requested by multiple users. This not only includes abstract tests such as
Go 1.21 `GOOS=wasip1`, but real use cases including python and container2wasm
repls, as well listen sockets. The main use cases is non-blocking poll on a
single file. Being a single file, this has no risk of problems such as
head-of-line blocking, even when emulated.

The main use case of multi-poll are bidirectional network services, something
not used in `GOOS=wasip1` standard libraries, but could be in the future.
Moving forward without a multi-poller allows wazero to expose its file system
abstraction instead of continuing to hold back it back for edge cases. We'll
continue discussion below regardless, as rationale was requested.

You can loop through multiple `sys.File`, using `File.Poll` to see if an event
is ready, but there is a head-of-line blocking problem. If a long timeout is
used, bad luck could have a file that has nothing to read or write before one
that does. This could cause more blocking than necessary, even if you could
poll the others just after with a zero timeout. What's worse than this is if
unlimited blocking was used (`timeout=-1`). The host implementations could use
goroutines to avoid this, but interrupting a "forever" poll is problematic. All
of these are reasons to consider a multi-poll API, but do not require exporting
`File.Fd()`.

Should multi-poll becomes critical, `sys.FS` could expose a `Poll` function
like below, despite it being the non-portable, complicated if possible to
implement on all platforms and virtual file systems.
```go
ready, errno := fs.Poll([]sys.PollFile{{f1, sys.POLLIN}, {f2, sys.POLLOUT}}, timeoutMillis)
```

A real filesystem could handle this by using an approach like the internal
`unix.Poll` function in Go, passing file descriptors on unix platforms, or
returning `sys.ENOSYS` for unsupported operating systems. Implementation for
virtual files could have a strategy around timeout to avoid the worst case of
head-of-line blocking (unlimited timeout).

Let's remember that when designing abstractions, it is not best to add an
interface for everything. Certainly, Go doesn't, as evidenced by them not
exposing `poll.FD` in `os.File`! Such a multi-poll could be limited to
built-in filesystems in the wazero repository, avoiding complexity of trying to
support and test this abstractly. This would still permit multiplexing for CLI
users, and also permit single file polling as exists now.

### Why doesn't wazero implement the working directory?

An early design of wazero's API included a `WithWorkDirFS` which allowed
Expand Down
3 changes: 2 additions & 1 deletion imports/wasi_snapshot_preview1/poll.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ import (

"github.com/tetratelabs/wazero/api"
"github.com/tetratelabs/wazero/experimental/sys"
"github.com/tetratelabs/wazero/internal/fsapi"
internalsys "github.com/tetratelabs/wazero/internal/sys"
"github.com/tetratelabs/wazero/internal/wasip1"
"github.com/tetratelabs/wazero/internal/wasm"
Expand Down Expand Up @@ -174,7 +175,7 @@ func pollOneoffFn(_ context.Context, mod api.Module, params []uint64) sys.Errno
return sys.EBADF
}
// Wait for the timeout to expire, or for some data to become available on Stdin.
stdinReady, errno := stdin.File.PollRead(int32(timeout.Milliseconds()))
stdinReady, errno := stdin.File.Poll(fsapi.POLLIN, int32(timeout.Milliseconds()))
if errno != 0 {
return errno
}
Expand Down
14 changes: 10 additions & 4 deletions imports/wasi_snapshot_preview1/poll_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -554,8 +554,11 @@ type neverReadyTtyStdinFile struct {
ttyStat
}

// PollRead implements the same method as documented on fsapi.File
func (neverReadyTtyStdinFile) PollRead(timeoutMillis int32) (ready bool, errno experimentalsys.Errno) {
// Poll implements the same method as documented on fsapi.File
func (neverReadyTtyStdinFile) Poll(flag fsapi.Pflag, timeoutMillis int32) (ready bool, errno experimentalsys.Errno) {
if flag != fsapi.POLLIN {
return false, experimentalsys.ENOTSUP
}
switch {
case timeoutMillis <= 0:
return
Expand All @@ -570,7 +573,10 @@ type pollStdinFile struct {
ready bool
}

// PollRead implements the same method as documented on fsapi.File
func (p *pollStdinFile) PollRead(int32) (ready bool, errno experimentalsys.Errno) {
// Poll implements the same method as documented on fsapi.File
func (p *pollStdinFile) Poll(flag fsapi.Pflag, timeoutMillis int32) (ready bool, errno experimentalsys.Errno) {
if flag != fsapi.POLLIN {
return false, experimentalsys.ENOTSUP
}
return p.ready, 0
}
4 changes: 2 additions & 2 deletions internal/fsapi/dir.go
Original file line number Diff line number Diff line change
Expand Up @@ -87,8 +87,8 @@ func (DirFile) Pread([]byte, int64) (int, experimentalsys.Errno) {
return 0, experimentalsys.EISDIR
}

// PollRead implements File.PollRead
func (DirFile) PollRead(int32) (ready bool, errno experimentalsys.Errno) {
// Poll implements File.Poll
func (DirFile) Poll(Pflag, int32) (ready bool, errno experimentalsys.Errno) {
return false, experimentalsys.ENOSYS
}

Expand Down
18 changes: 11 additions & 7 deletions internal/fsapi/file.go
Original file line number Diff line number Diff line change
Expand Up @@ -205,32 +205,36 @@ type File interface {
// of io.Seeker. See https://pubs.opengroup.org/onlinepubs/9699919799/functions/fseek.html
Seek(offset int64, whence int) (newOffset int64, errno experimentalsys.Errno)

// PollRead returns if the file has data ready to be read or an error.
// Poll returns if the file has data ready to be read or written.
//
// # Parameters
//
// The `timeoutMillis` parameter is how long to block for data to become
// readable, or interrupted, in milliseconds. There are two special values:
// The `flag` parameter determines which event to await, such as POLLIN,
// POLLOUT, or a combination like `POLLIN|POLLOUT`.
//
// The `timeoutMillis` parameter is how long to block for an event, or
// interrupted, in milliseconds. There are two special values:
// - zero returns immediately
// - any negative value blocks any amount of time
//
// # Results
//
// `ready` means there was data ready to read or false if not or when
// `errno` is not zero.
// `ready` means there was data ready to read or written. False can mean no
// event was ready or `errno` is not zero.
//
// A zero `errno` is success. The below are expected otherwise:
// - sys.ENOSYS: the implementation does not support this function.
// - sys.ENOTSUP: the implementation does not the flag combination.
// - sys.EINTR: the call was interrupted prior to an event.
//
// # Notes
//
// - This is like `poll` in POSIX, for a single file.
// See https://pubs.opengroup.org/onlinepubs/9699919799/functions/poll.html
// - No-op files, such as those which read from /dev/null, should return
// immediately true, as data will never become readable.
// immediately true, as data will never become available.
// - See /RATIONALE.md for detailed notes including impact of blocking.
PollRead(timeoutMillis int32) (ready bool, errno experimentalsys.Errno)
Poll(flag Pflag, timeoutMillis int32) (ready bool, errno experimentalsys.Errno)

// Readdir reads the contents of the directory associated with file and
// returns a slice of up to n Dirent values in an arbitrary order. This is
Expand Down
20 changes: 20 additions & 0 deletions internal/fsapi/poll.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
package fsapi

// Pflag are bit flags used for File.Poll. Values, including zero, should not
// be interpreted numerically. Instead, use by constants prefixed with 'POLL'.
//
// # Notes
//
// - This is like `pollfd.events` flags for `poll` in POSIX. See
// https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/poll.h.html
type Pflag uint32

// Only define bitflags we support and are needed by `poll_oneoff` in wasip1
// See https://github.com/WebAssembly/WASI/blob/snapshot-01/phases/snapshot/docs.md#eventrwflags
const (
// POLLIN is a read event.
POLLIN Pflag = 1 << iota

// POLLOUT is a write event.
POLLOUT
)
4 changes: 2 additions & 2 deletions internal/fsapi/unimplemented.go
Original file line number Diff line number Diff line change
Expand Up @@ -153,8 +153,8 @@ func (UnimplementedFile) Readdir(int) (dirents []Dirent, errno experimentalsys.E
return nil, experimentalsys.ENOSYS
}

// PollRead implements File.PollRead
func (UnimplementedFile) PollRead(int32) (ready bool, errno experimentalsys.Errno) {
// Poll implements File.Poll
func (UnimplementedFile) Poll(Pflag, int32) (ready bool, errno experimentalsys.Errno) {
return false, experimentalsys.ENOSYS
}

Expand Down
7 changes: 5 additions & 2 deletions internal/sys/stdio.go
Original file line number Diff line number Diff line change
Expand Up @@ -48,8 +48,11 @@ func (noopStdinFile) Read([]byte) (int, experimentalsys.Errno) {
return 0, 0 // Always EOF
}

// PollRead implements the same method as documented on fsapi.File
func (noopStdinFile) PollRead(int32) (ready bool, errno experimentalsys.Errno) {
// Poll implements the same method as documented on fsapi.File
func (noopStdinFile) Poll(flag fsapi.Pflag, timeoutMillis int32) (ready bool, errno experimentalsys.Errno) {
if flag != fsapi.POLLIN {
return false, experimentalsys.ENOTSUP
}
return true, 0 // always ready to read nothing
}

Expand Down
27 changes: 24 additions & 3 deletions internal/sysfs/file_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -319,7 +319,9 @@ func TestFileReadAndPread(t *testing.T) {
}
}

func TestFilePollRead(t *testing.T) {
func TestFilePoll_POLLIN(t *testing.T) {
pflag := fsapi.POLLIN

// Test using os.Pipe as it is known to support poll.
r, w, err := os.Pipe()
require.NoError(t, err)
Expand All @@ -332,7 +334,7 @@ func TestFilePollRead(t *testing.T) {
timeout := int32(0) // return immediately

// When there's nothing in the pipe, it isn't ready.
ready, errno := rF.PollRead(timeout)
ready, errno := rF.Poll(pflag, timeout)
require.EqualErrno(t, 0, errno)
require.False(t, ready)

Expand All @@ -342,7 +344,7 @@ func TestFilePollRead(t *testing.T) {
require.NoError(t, err)

// We should now be able to poll ready
ready, errno = rF.PollRead(timeout)
ready, errno = rF.Poll(pflag, timeout)
require.EqualErrno(t, 0, errno)
require.True(t, ready)

Expand All @@ -353,6 +355,25 @@ func TestFilePollRead(t *testing.T) {
require.Equal(t, expected, buf[:len(expected)])
}

func TestFilePoll_POLLOUT(t *testing.T) {
pflag := fsapi.POLLOUT

// Test using os.Pipe as it is known to support poll.
r, w, err := os.Pipe()
require.NoError(t, err)
defer r.Close()
defer w.Close()

wF, err := NewStdioFile(false, w)
require.NoError(t, err)
timeout := int32(0) // return immediately

// We don't yet implement write blocking.
ready, errno := wF.Poll(pflag, timeout)
require.EqualErrno(t, experimentalsys.ENOTSUP, errno)
require.False(t, ready)
}

func requireRead(t *testing.T, f fsapi.File, buf []byte) {
n, errno := f.Read(buf)
require.EqualErrno(t, 0, errno)
Expand Down
6 changes: 3 additions & 3 deletions internal/sysfs/osfile.go
Original file line number Diff line number Diff line change
Expand Up @@ -180,9 +180,9 @@ func (f *osFile) Seek(offset int64, whence int) (newOffset int64, errno experime
return
}

// PollRead implements the same method as documented on fsapi.File
func (f *osFile) PollRead(timeoutMillis int32) (ready bool, errno experimentalsys.Errno) {
return pollRead(f.fd, timeoutMillis)
// Poll implements the same method as documented on fsapi.File
func (f *osFile) Poll(flag fsapi.Pflag, timeoutMillis int32) (ready bool, errno experimentalsys.Errno) {
return poll(f.fd, flag, timeoutMillis)
}

// Readdir implements File.Readdir. Notably, this uses "Readdir", not
Expand Down
15 changes: 10 additions & 5 deletions internal/sysfs/poll.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,17 @@

package sysfs

import "github.com/tetratelabs/wazero/experimental/sys"
import (
"github.com/tetratelabs/wazero/experimental/sys"
"github.com/tetratelabs/wazero/internal/fsapi"
)

// pollRead implements `PollRead` as documented on fsapi.File via a file
// descriptor.
func pollRead(fd uintptr, timeoutMillis int32) (ready bool, errno sys.Errno) {
// poll implements `Poll` as documented on fsapi.File via a file descriptor.
func poll(fd uintptr, flag fsapi.Pflag, timeoutMillis int32) (ready bool, errno sys.Errno) {
if flag != fsapi.POLLIN {
return false, sys.ENOTSUP
}
fds := []pollFd{newPollFd(fd, _POLLIN, 0)}
count, errno := poll(fds, timeoutMillis)
count, errno := _poll(fds, timeoutMillis)
return count > 0, errno
}
4 changes: 2 additions & 2 deletions internal/sysfs/poll_darwin.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,8 @@ func newPollFd(fd uintptr, events, revents int16) pollFd {
// _POLLIN subscribes a notification when any readable data is available.
const _POLLIN = 0x0001

// poll implements poll on Darwin via the corresponding libc function.
func poll(fds []pollFd, timeoutMillis int32) (n int, errno sys.Errno) {
// _poll implements poll on Darwin via the corresponding libc function.
func _poll(fds []pollFd, timeoutMillis int32) (n int, errno sys.Errno) {
var fdptr *pollFd
nfds := len(fds)
if nfds > 0 {
Expand Down
4 changes: 2 additions & 2 deletions internal/sysfs/poll_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,8 @@ func newPollFd(fd uintptr, events, revents int16) pollFd {
// _POLLIN subscribes a notification when any readable data is available.
const _POLLIN = 0x0001

// poll implements poll on Linux via ppoll.
func poll(fds []pollFd, timeoutMillis int32) (n int, errno sys.Errno) {
// _poll implements poll on Linux via ppoll.
func _poll(fds []pollFd, timeoutMillis int32) (n int, errno sys.Errno) {
var ts syscall.Timespec
if timeoutMillis >= 0 {
ts = syscall.NsecToTimespec(int64(time.Duration(timeoutMillis) * time.Millisecond))
Expand Down
Loading
Loading