Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filesystem4 #4196

Closed
wants to merge 38 commits into from
Closed

Filesystem4 #4196

wants to merge 38 commits into from

Conversation

dubiousjim
Copy link
Contributor

op_fxxx operations on open rid/fds for #4017 (PR 4 of 4).

This is based upon PRs #4192, #4193, and #4195.

Sometimes for clarity, other times for consistency of formatting and/or
wording. (Examples: after periods there were sometimes two spaces,
sometimes one. Sometimes asynch functions "resolved to" a value, other
times "resolved with"...)

Aimed also to copy JSDoc changes introduced by denoland#4170 over to the
corresponding TypeScript modules.

Many of these comments will be changed again by later commits in this
sequence of PRs that refine/extend the interface or function being
described.

Not sure how the `// @url js/...` directives work, but these seemed to
be inconsistently present, so I added them where I noticed. Didn't try
to clean up everything, just the filesystem modules I was looking at.

Note: The comment on the `FileInfo.created` field replaces `Unix` with
`Mac/BSD`. The Rust libs do handle this correctly for Linux, but the
story about how they do so there is more complicated (see
https://doc.rust-lang.org/std/fs/struct.Metadata.html#method.created).
More comment cleanup, in same spirit as previous commit. Separated
because these touch parts of lib.deno.ns.d.ts that I wasn't working on,
but got introduced just because of search/replacing in attempt to make
things consistent.
Various supposed-to-be innocuous changes.  Shouldn't have semantic
effects (that anyone cares about).

For example: `#[cfg(any(unix))]` attributes -- the `any` does nothing
here.
Changed one use of std::fs to just fs, and uses of tokio::fs to
tokio_fs.
The point of cli/rs.rs seemed to be to collect all the implementations
for filesystem operations that depend on third-party modules. So I moved
remove_dir_all, set_file_times there. Also unix::fs::symlink.
Comments in the code proposed this might be coming.

readdir is the spelling used by the underlying libc call, and by Node.
It was hard to remember which Options interfaces were spelled in the
singular and which in the plural (and anyway this one contained two
options).

Also added MkdirOptions to cli/js/deno.ts. All the analogous interfaces
were exported there.
This was the last remaining Option interface spelled in the singular.
Easier to remember if they're all Option**s**; plus we may want to add extra
options here later.
There's a lot of variation in doc comments and internal code about
whether the first parameter to file system calls is `path` or `name` or
`filename`. For consistency, have made it always be `path`.

Changes to: files.ts, readLink, utime, stat, truncate, writeFile, readFile
The previous implementation would clobber anything Rust calls
`permissions` that aren't visible to Unix chmod (for example,
`metadata.permissions().readonly()`).

Replaces with the method recommended at
https://doc.rust-lang.org/std/os/unix/fs/trait.PermissionsExt.html#tymethod.set_mode.
There's a lot of variation in doc comments and internal code about
whether chmod/0o777-style permissions are called `mode` or `perm`. (For
example, mkdir and writeFile choose differently.) `mode` is also used in
other ways, for example to label whether a file is being opened as `r+`,
`w`, etc. I've made the first things consistently be called `perm`.

I left the field in the (internal) interface StatResponse named `mode`,
as it contains more than just file permissions. (It has all the bits
from st_mode.)

Changes to: mkdir, chmod, FileInfo, and tests/other modules that depend
on them.
These changes are speculative: since the core API `FileInfo.mode` was
renamed to `perm` in a previous commit, perhaps we want to change the
`fileMode` field in tar headers to `filePerm`? And rename the
`EntryInfo.mode` field in std/http/file_server to `perm` too? (Also make
corresponding changes to labels and errors in that server's output.)
This still needs tests (they'd have to be integration tests to be very
effective).

In later commits we change permission tests for some other operations to
check that the process's umask is being applied. There we assume the
test machine is running with the usual default umask 0o022 (as Github's
CI machines do); we could also query it using this function.

On Windows, `umask(value)` currently throws, but `umask()` just returns
0. Should we make the latter throw too?
Verify that the `perm` supplied to mkdir is subject to the process's
umask (here just assumed to be the default 0o022 on Unix, or 0 on
Windows).
Add `perm` option for makeTempDir and makeTempFile. As with mkdir, this
argument is only used when a new dir or file is created. Defaults to
0o700 for directories and 0o600 for files.

We make the tests verify that the `perm` used is subject to the
process's umask (here just assumed to be the default 0o022 on Unix, or 0
on Windows).
Add `perm` option for truncate. As with mkdir and the makeTemp*
functions, this argument is only used when a new file is created.
Defaults to 0o666.

We make the tests verify that the `perm` used is subject to the
process's umask (here just assumed to be the default 0o022 on Unix, or 0
on Windows).

The comments say "error to specify perm when create is set to `false`";
but the create option is only introduced in a later commit.
Add `perm` option for open and create. As with mkdir, makeTemp*, and
truncate, this argument is only used when a new file is created.
Defaults to 0o666 (the process's umask will then be applied).

Currently we throw if `perm` is specified but a read-only opening mode
was requested.

Need to decide how to handle `perm` on Windows: NOOP? or throw? The
codebase doesn't seem to be consistent on this.
This is the first in a series of commits adding a `clobber` option to
filesystem operations. The term `clobber` comes from the shell's
`noclobber` option.

Mostly, `clobber: false` is just another way to write `createNew: true`.
There are some cases though where this functionality could be useful but
it'd be unnatural to use `createNew` to express it. For example,
`rename(oldpath, newpath, { clobber: false })` (not yet implemented)
could fail if there's already a file at `newpath`, instead of the
current (default) behavior of overwriting newpath.
Implement createNew for writeFile (also the variant spelling `clobber`,
with inverse boolean value).

This introduces a change of functionality. To make the operation more
atomic, and conform better to other parts of the API, writeFile will now
only apply `perm` if it creates a new file, not if it overwrites or
appends to an existing file.

Previously, writeFile did a chmod on the target file no matter what. If
that's the behavior a user wants, they can do it easily enough with
chmod. (And it never was atomic.)
We verify that the `perm` applied by writeFile are only used when
creating a new file; also that they are subject to the process's umask
(here just assumed to be the default 0o022 on Unix, or 0 on Windows).

Perhaps this test should be moved from stat_test.ts to
write_file_test.ts?
Add create and createNew (and clobber) options for copyFile. These work
as for writeFile. The fastest implementation is used when { create:
true, createNew: false } (this is the default, and matches the older
implementation).

To implement createNew atomically, we have to fall back to what I think
is a slower implementation, where the data gets copied from kernel space
to user space, and back.
Add create and createNew (and clobber) options for truncate. These work
as for writeFile and copyFile.
Make seek and seekSync return new position in file, as discussed
in denoland#3826.
This is the first of a few commits that tweak the numeric types of some
parameters and fields.
The whence argument in op_seek was received as i32 and cast to u32.
Can't see any good reason for the recast?

The offset parameter to op_seek was i32, which we were casting to u64
(so using it as a u32) for SEEK_START, and extending to i64 for the
other seek modes.

Instead made offset i64, requiring it to be non-negative (so effectively
u63) for SEEK_START, permitting to be negative for other seek modes.
That's what libc also seems to be doing: it uses off_t, which is usually
i64.
The len parameter to truncate was u64. Odd to use a bigger type for
truncate than for seek (where we were effectively letting offset be
u32); and indeed the underlying libc expects an off_t, which is i64.

Instead made this variable i64, requiring it to be non-negative (so
effectively u63). This matches the treatment of offset for op_seek when
whence is SEEK_START.
The time fields in StatResponse were being returned as u64. libc's
time_t is i64, and the nix package's FileStat structure also uses i64. I
cast our time fields to i64.

Made the same change to the atime, mtime parameters to utime.
Also cast the blksize and blocks fields to i64, as nix and many libc's
have them. (Of course they'll only use the non-negative bits of these
types, so they're really all u63s.)
This reports the `ctime` field from Unix (and that's what the field is
named in the internal StatResponse structure).

Better names to use on the TypeScript side are welcome. `ctime` is
problematic because it often gets confused with creation-time.
`metadataModified` is a reasonable candidate, because this field (unlike
`modified`) gets refreshed whenever a file is chmod-ed, etc. But this
field is *also* refreshed whenever data is written. (So it will always
be at least as new as `modified`.)
Comments in the code proposed that the `FileInfo.len` field might be
renamed in this way. `len` is what the Rust API uses, but most of the
corresponding Javascript APIs use `length`.

Node uses `size`, and the underlying field in the libc structure is
st_size.

Renamed StatResponse.len to size. StatResponse is an internal interface;
in the spirit of keeping its field names close to those of the
underlying libc structure, renamed `len` which corresponds to st_size to
`size`.
Note we report any suid etc bits read via stat (masking to 0o7777) but
our chmod and other perm-setting operations (mkdir, writeFile) mask to
0o0777.
We add both (1) variants of the functions like Deno.chmod that take an
rid in place of a path, and (2) methods like chmod on the File class.
@dubiousjim
Copy link
Contributor Author

I tried to split the PRs into four segments of related commits (Filesystem1-4), but I wasn't sure how to generate PRs in Github that were premised on other PRs pending review. What happened is that now Filesystem4 includes all the previous 3 PRs, and so on. I guess that'll make reviewing harder; sorry! (Or you could just look at Filesystem4 and ignore the other 3, but it includes a lot of stuff: 30 commits.)

@hayd
Copy link
Contributor

hayd commented Feb 29, 2020

but it includes a lot of stuff

This should probably be 8+ PRs that don't depend on each other and are much more focused. IMO this PR is too large and there's too many independent things going on.

E.g. Option->Options, readDir -> readdir, fs -> tokio_fs, improve debug calls, add perms, should each be their own PRs (and not depend on any unrelated commits).

(You want to create a new branch off of master for each PR and cherry-pick those commits you want to include.)

@dubiousjim
Copy link
Contributor Author

Yeah, I appreciate that. I'm sorry it's grown so big. But after a couple of days rebasing this stuff, and limits to the amount of time I can give to it, I thought it'd be best to put it out there for other eyes to have a look at it now, rather than wait even longer until I could figure out how to parcel it better. Most of this stuff was ready some weeks ago, but it got hard to keep up with advances with denoland/master.

The earlier commits in the sequence (Option->Options, & so on) would indeed be easy to separate off. From the middle of the sequence on, though, it becomes a lot harder to make them stand alone. I was hoping someone could review the sequence as is, then based on their feedback I'd pick off smaller pieces to push one at a time. (And with the later pushes relying on earlier pushes already being pulled into master.)

@dubiousjim dubiousjim closed this Mar 1, 2020
@ecyrbe
Copy link
Contributor

ecyrbe commented Mar 1, 2020

hi @dubiousjim , this is a lot of work, so first things first, thanks.

Now i do agree that you should try to make things as separate small PR.

For exemple, each new feature i do that only add one function to ops is a PR because it add already a lot of stuff to a lot of files.
for exemple as i'm doing std/node/os this mean i'm preparing around 16 PR.
i do no refactoring in these PR, and they are already individually long to review.

It only takes discipline to create a branch for everything you do that are not related. once you are used to do it, you'll never go back.

@dubiousjim
Copy link
Contributor Author

Thanks, I did close these big PRs and am breaking things into smaller pieces. I had filtered all of the non-semantic changes to the top of the queue though (rewording comments, except where it didn't make sense until the new feature was added, renaming internal variables, etc.). And when I tried to make smaller standalone PRs, I kept finding them too difficult to rebase off of (and then remerge again onto) those changes. So I'm first proposing #4205. That touches a lot of files, but none of the changes it contains should be substantial. Once that (or some descendent of it) gets merged into master, I'll push the rest of the work out as smaller PRs, as focused and standalone as I can make them. (There are some other blockers in the sequence, too, though, that later PRs will have to wait on, like renaming mode->perm, and filename->path.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants