-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filesystem4 #4196
Filesystem4 #4196
Conversation
Sometimes for clarity, other times for consistency of formatting and/or wording. (Examples: after periods there were sometimes two spaces, sometimes one. Sometimes asynch functions "resolved to" a value, other times "resolved with"...) Aimed also to copy JSDoc changes introduced by denoland#4170 over to the corresponding TypeScript modules. Many of these comments will be changed again by later commits in this sequence of PRs that refine/extend the interface or function being described. Not sure how the `// @url js/...` directives work, but these seemed to be inconsistently present, so I added them where I noticed. Didn't try to clean up everything, just the filesystem modules I was looking at. Note: The comment on the `FileInfo.created` field replaces `Unix` with `Mac/BSD`. The Rust libs do handle this correctly for Linux, but the story about how they do so there is more complicated (see https://doc.rust-lang.org/std/fs/struct.Metadata.html#method.created).
More comment cleanup, in same spirit as previous commit. Separated because these touch parts of lib.deno.ns.d.ts that I wasn't working on, but got introduced just because of search/replacing in attempt to make things consistent.
Various supposed-to-be innocuous changes. Shouldn't have semantic effects (that anyone cares about). For example: `#[cfg(any(unix))]` attributes -- the `any` does nothing here.
Changed one use of std::fs to just fs, and uses of tokio::fs to tokio_fs.
The point of cli/rs.rs seemed to be to collect all the implementations for filesystem operations that depend on third-party modules. So I moved remove_dir_all, set_file_times there. Also unix::fs::symlink.
Comments in the code proposed this might be coming. readdir is the spelling used by the underlying libc call, and by Node.
It was hard to remember which Options interfaces were spelled in the singular and which in the plural (and anyway this one contained two options). Also added MkdirOptions to cli/js/deno.ts. All the analogous interfaces were exported there.
This was the last remaining Option interface spelled in the singular. Easier to remember if they're all Option**s**; plus we may want to add extra options here later.
There's a lot of variation in doc comments and internal code about whether the first parameter to file system calls is `path` or `name` or `filename`. For consistency, have made it always be `path`. Changes to: files.ts, readLink, utime, stat, truncate, writeFile, readFile
The previous implementation would clobber anything Rust calls `permissions` that aren't visible to Unix chmod (for example, `metadata.permissions().readonly()`). Replaces with the method recommended at https://doc.rust-lang.org/std/os/unix/fs/trait.PermissionsExt.html#tymethod.set_mode.
There's a lot of variation in doc comments and internal code about whether chmod/0o777-style permissions are called `mode` or `perm`. (For example, mkdir and writeFile choose differently.) `mode` is also used in other ways, for example to label whether a file is being opened as `r+`, `w`, etc. I've made the first things consistently be called `perm`. I left the field in the (internal) interface StatResponse named `mode`, as it contains more than just file permissions. (It has all the bits from st_mode.) Changes to: mkdir, chmod, FileInfo, and tests/other modules that depend on them.
These changes are speculative: since the core API `FileInfo.mode` was renamed to `perm` in a previous commit, perhaps we want to change the `fileMode` field in tar headers to `filePerm`? And rename the `EntryInfo.mode` field in std/http/file_server to `perm` too? (Also make corresponding changes to labels and errors in that server's output.)
This still needs tests (they'd have to be integration tests to be very effective). In later commits we change permission tests for some other operations to check that the process's umask is being applied. There we assume the test machine is running with the usual default umask 0o022 (as Github's CI machines do); we could also query it using this function. On Windows, `umask(value)` currently throws, but `umask()` just returns 0. Should we make the latter throw too?
Verify that the `perm` supplied to mkdir is subject to the process's umask (here just assumed to be the default 0o022 on Unix, or 0 on Windows).
Add `perm` option for makeTempDir and makeTempFile. As with mkdir, this argument is only used when a new dir or file is created. Defaults to 0o700 for directories and 0o600 for files. We make the tests verify that the `perm` used is subject to the process's umask (here just assumed to be the default 0o022 on Unix, or 0 on Windows).
Add `perm` option for truncate. As with mkdir and the makeTemp* functions, this argument is only used when a new file is created. Defaults to 0o666. We make the tests verify that the `perm` used is subject to the process's umask (here just assumed to be the default 0o022 on Unix, or 0 on Windows). The comments say "error to specify perm when create is set to `false`"; but the create option is only introduced in a later commit.
Add `perm` option for open and create. As with mkdir, makeTemp*, and truncate, this argument is only used when a new file is created. Defaults to 0o666 (the process's umask will then be applied). Currently we throw if `perm` is specified but a read-only opening mode was requested. Need to decide how to handle `perm` on Windows: NOOP? or throw? The codebase doesn't seem to be consistent on this.
This is the first in a series of commits adding a `clobber` option to filesystem operations. The term `clobber` comes from the shell's `noclobber` option. Mostly, `clobber: false` is just another way to write `createNew: true`. There are some cases though where this functionality could be useful but it'd be unnatural to use `createNew` to express it. For example, `rename(oldpath, newpath, { clobber: false })` (not yet implemented) could fail if there's already a file at `newpath`, instead of the current (default) behavior of overwriting newpath.
Implement createNew for writeFile (also the variant spelling `clobber`, with inverse boolean value). This introduces a change of functionality. To make the operation more atomic, and conform better to other parts of the API, writeFile will now only apply `perm` if it creates a new file, not if it overwrites or appends to an existing file. Previously, writeFile did a chmod on the target file no matter what. If that's the behavior a user wants, they can do it easily enough with chmod. (And it never was atomic.)
We verify that the `perm` applied by writeFile are only used when creating a new file; also that they are subject to the process's umask (here just assumed to be the default 0o022 on Unix, or 0 on Windows). Perhaps this test should be moved from stat_test.ts to write_file_test.ts?
Add create and createNew (and clobber) options for copyFile. These work as for writeFile. The fastest implementation is used when { create: true, createNew: false } (this is the default, and matches the older implementation). To implement createNew atomically, we have to fall back to what I think is a slower implementation, where the data gets copied from kernel space to user space, and back.
Add create and createNew (and clobber) options for truncate. These work as for writeFile and copyFile.
Make seek and seekSync return new position in file, as discussed in denoland#3826.
This is the first of a few commits that tweak the numeric types of some parameters and fields.
The whence argument in op_seek was received as i32 and cast to u32. Can't see any good reason for the recast? The offset parameter to op_seek was i32, which we were casting to u64 (so using it as a u32) for SEEK_START, and extending to i64 for the other seek modes. Instead made offset i64, requiring it to be non-negative (so effectively u63) for SEEK_START, permitting to be negative for other seek modes. That's what libc also seems to be doing: it uses off_t, which is usually i64.
The len parameter to truncate was u64. Odd to use a bigger type for truncate than for seek (where we were effectively letting offset be u32); and indeed the underlying libc expects an off_t, which is i64. Instead made this variable i64, requiring it to be non-negative (so effectively u63). This matches the treatment of offset for op_seek when whence is SEEK_START.
The time fields in StatResponse were being returned as u64. libc's time_t is i64, and the nix package's FileStat structure also uses i64. I cast our time fields to i64. Made the same change to the atime, mtime parameters to utime.
Also cast the blksize and blocks fields to i64, as nix and many libc's have them. (Of course they'll only use the non-negative bits of these types, so they're really all u63s.)
This reports the `ctime` field from Unix (and that's what the field is named in the internal StatResponse structure). Better names to use on the TypeScript side are welcome. `ctime` is problematic because it often gets confused with creation-time. `metadataModified` is a reasonable candidate, because this field (unlike `modified`) gets refreshed whenever a file is chmod-ed, etc. But this field is *also* refreshed whenever data is written. (So it will always be at least as new as `modified`.)
Comments in the code proposed that the `FileInfo.len` field might be renamed in this way. `len` is what the Rust API uses, but most of the corresponding Javascript APIs use `length`. Node uses `size`, and the underlying field in the libc structure is st_size. Renamed StatResponse.len to size. StatResponse is an internal interface; in the spirit of keeping its field names close to those of the underlying libc structure, renamed `len` which corresponds to st_size to `size`.
Note we report any suid etc bits read via stat (masking to 0o7777) but our chmod and other perm-setting operations (mkdir, writeFile) mask to 0o0777.
We add both (1) variants of the functions like Deno.chmod that take an rid in place of a path, and (2) methods like chmod on the File class.
I tried to split the PRs into four segments of related commits (Filesystem1-4), but I wasn't sure how to generate PRs in Github that were premised on other PRs pending review. What happened is that now Filesystem4 includes all the previous 3 PRs, and so on. I guess that'll make reviewing harder; sorry! (Or you could just look at Filesystem4 and ignore the other 3, but it includes a lot of stuff: 30 commits.) |
This should probably be 8+ PRs that don't depend on each other and are much more focused. IMO this PR is too large and there's too many independent things going on. E.g. Option->Options, readDir -> readdir, fs -> tokio_fs, improve debug calls, add perms, should each be their own PRs (and not depend on any unrelated commits). (You want to create a new branch off of master for each PR and cherry-pick those commits you want to include.) |
Yeah, I appreciate that. I'm sorry it's grown so big. But after a couple of days rebasing this stuff, and limits to the amount of time I can give to it, I thought it'd be best to put it out there for other eyes to have a look at it now, rather than wait even longer until I could figure out how to parcel it better. Most of this stuff was ready some weeks ago, but it got hard to keep up with advances with denoland/master. The earlier commits in the sequence (Option->Options, & so on) would indeed be easy to separate off. From the middle of the sequence on, though, it becomes a lot harder to make them stand alone. I was hoping someone could review the sequence as is, then based on their feedback I'd pick off smaller pieces to push one at a time. (And with the later pushes relying on earlier pushes already being pulled into master.) |
hi @dubiousjim , this is a lot of work, so first things first, thanks. Now i do agree that you should try to make things as separate small PR. For exemple, each new feature i do that only add one function to ops is a PR because it add already a lot of stuff to a lot of files. It only takes discipline to create a branch for everything you do that are not related. once you are used to do it, you'll never go back. |
Thanks, I did close these big PRs and am breaking things into smaller pieces. I had filtered all of the non-semantic changes to the top of the queue though (rewording comments, except where it didn't make sense until the new feature was added, renaming internal variables, etc.). And when I tried to make smaller standalone PRs, I kept finding them too difficult to rebase off of (and then remerge again onto) those changes. So I'm first proposing #4205. That touches a lot of files, but none of the changes it contains should be substantial. Once that (or some descendent of it) gets merged into master, I'll push the rest of the work out as smaller PRs, as focused and standalone as I can make them. (There are some other blockers in the sequence, too, though, that later PRs will have to wait on, like renaming mode->perm, and filename->path.) |
op_fxxx operations on open rid/fds for #4017 (PR 4 of 4).
This is based upon PRs #4192, #4193, and #4195.