-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
touch some files when we use them #6477
Conversation
r? @ehuss (rust_highfive has picked a reviewer for you, use r? to override) |
This comment has been minimized.
This comment has been minimized.
c700f8d
to
4ae5abc
Compare
I think this would be useful. I'm not sure if it really needs a dedicated file. I did some tests on an old Windows machine, and I didn't see a noticeable performance difference. However, creating lots of small files generally isn't desirable. Would it be possible to just adjust the mtime of one of the other files? Touching dep-KIND-PKG-HASH may make #2426 worse, so that may not be a good idea, but I think the other two files would be possible options. It should probably get a test, too. |
Good points. It would be nice if there was a reliable way to know if the fingerprint was made with a cargo that is maintaining the mtime or if falling back to the atime is required.
The tooling can decide to be to conservative, not support older cargos, or do version number based feature detection. (assuming we don't add a way to opt out.) A separate file makes it very clear if Cargo is making this data available to the tool. (file mtime iif file exists else max atime.) If there is another signale you would prefer, let me know. I will add a test when we decided what we want the behavior to be and next time I look at the code. |
4ae5abc
to
920f4db
Compare
Thinking about it more, I don't think there is a reasonable way for a cleaner script to handle when cargo is sometimes opting out of this. Version number detection sounds approximately good enough. What is the best way to cross platform touch a file? |
Could we perhaps touch the mtime of the artifacts themselves? (like all the rlibs, binaries, etc). I think that Cargo doesn't compare the mtime on those artifacts (even historical Cargos) and that may make it easiest for other tools too? IIRC we already check for existence of all artifacts on nop builds, so opening up a file handle to set the mtime on it may not take too much longer. If this becomes a performance concern we could always add configuration/env vars to turn it off, but it seems reasonable to me to have it on by default. |
In what way? It would make a simple tool, like the published version of cargo-sweep, that does not know about the structure of the target dir closer to being correct but not really useable. I think a simple tool would start leaving the outputs, and delling the fingerprints leading to a complete rebuild. A tool that knows about the structure, like my branch of cargo-sweep, doesn't really care which file we pick as the authoritative one. |
Oh I was just thinking that if the output artifacts had mtimes on them then a tool could just delete any artifact older than N days, and the .fingerprint metadata would need cleaning eventually but it's not really large enough to warrant lots of scrutiny |
Ok, I am convinced that that could be a good next step!
Feels kinda dependent on the |
Hm that's what I would naively say we should do, but I'll admit I have no idea how the filesystem clock and |
41b80df
to
b144ad3
Compare
@alexcrichton where do we check for existence of all artifacts on nop builds? |
I think here |
b144ad3
to
41bf6f2
Compare
41bf6f2
to
3eaa70e
Compare
This has been updated. I rebased. I also switch to touching existing files instead of adding a new one. Tuch was imped with The files being touched are:
when CI is green, I will update the title and op. |
Added tests. Do people have thoughts, or is this good to go? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems good. Alex?
@bors: r=ehuss 👍 |
📌 Commit 97363ca has been approved by |
touch some files when we use them This is a small change to improve the ability for a third party subcommand to clean up a target folder. I consider this part of the push to experiment with out of tree GC, as discussed in #6229. how it works? -------- This updates the modification time of a file in each fingerprint folder and the modification time of the intermediate outputs every time cargo checks that they are up to date. This allows a third party subcommand to look at the modification time of the timestamp file to determine the last time a cargo invocation required that file. This is far more reliable then the current practices of looking at the `accessed` time. `accessed` time is not available or disabled on many operating systems, and is routinely set by arbitrary other programs. is this enough to be useful? -------- The current implementation of cargo sweep on master will automatically use this data with no change to the code. With this PR, it will work even on systems that do not update `accessed` time. This also allows a crude script to clean some of the largest subfolders based on each files modification time. is this worth adding, or should we just build `clean --outdated` into cargo? ------ I would love to see a `clean --outdated` in cargo! However, I think there is a lot of design work before we can make something good enough to deserve the cargo teams stamp of approval. Especially as an in tree version will have to work with many use cases some of witch are yet to be designed (like distributed builds). Even just including `cargo-sweep`s existing functionality opens a full bike shop about what arguments to take, and in what form (`cargo-sweep` takes a days argument, but maybe we should have a minutes or a ISO standard time or ...). This PR, or equivalent, allows out of tree experimentation with all different interfaces, and is basically required for any `LRU` based system. (For example [Crater](rust-lang/crater#346) wants a GC that cleans files in an `LRU` manner to maintain a target folder below a target size. This is not a use case that is widely enough needed to be worth adding to cargo but one supported by this PR.) what are the downsides? ---- 1. There are legitimate performance concerns about writing so many small files during a NOP build. 2. There are legitimate concerns about unnecessary wrights on read-only filesystems. 3. If we add this, and it starts seeing widespread use, we may be de facto stabilizing the folder structure we use. (This is probably true of any system that allows out of tree experimentation.) 4. This may not be an efficient way to store the data. (It does have the advantage of not needing different cargos to manipulate the same file. But if you have a better idea please make a suggestion.)
☀️ Test successful - checks-travis, status-appveyor |
Update cargo Unblocks #56884 cc @euclio 6 commits in 2b4a5f1f0bb6e13759e88ea9512527b0beba154f..ffe65875fd05018599ad07e7389e99050c7915be 2019-01-12 04:13:12 +0000 to 2019-01-17 23:57:50 +0000 - Better error message for bad manifest with `cargo install`. (rust-lang/cargo#6560) - relax rustdoc output assertion (rust-lang/cargo#6559) - touch some files when we use them (rust-lang/cargo#6477) - Add documentation for new package/publish feature flags. (rust-lang/cargo#6553) - Update chat link to Discord. (rust-lang/cargo#6554) - Fix typo (rust-lang/cargo#6552) r? @alexcrichton
Put mtime-on-use behind a feature flag. This places #6477 behind the `-Z mtime-on-use` feature flag. The change to update the mtime each time a crate is used has caused a performance regression on the rust playground (rust-lang/rust#57774). It is using about 241 pre-built crates in a Docker container. Due to the copy-on-write nature of Docker, it can take a significant amount of time to update the timestamps (over 10 seconds on slower systems). cc @Mark-Simulacrum
This is a small change to improve the ability for a third party subcommand to clean up a target folder. I consider this part of the push to experiment with out of tree GC, as discussed in #6229.
how it works?
This updates the modification time of a file in each fingerprint folder and the modification time of the intermediate outputs every time cargo checks that they are up to date. This allows a third party subcommand to look at the modification time of the timestamp file to determine the last time a cargo invocation required that file. This is far more reliable then the current practices of looking at the
accessed
time.accessed
time is not available or disabled on many operating systems, and is routinely set by arbitrary other programs.is this enough to be useful?
The current implementation of cargo sweep on master will automatically use this data with no change to the code. With this PR, it will work even on systems that do not update
accessed
time.This also allows a crude script to clean some of the largest subfolders based on each files modification time.
is this worth adding, or should we just build
clean --outdated
into cargo?I would love to see a
clean --outdated
in cargo! However, I think there is a lot of design work before we can make something good enough to deserve the cargo teams stamp of approval. Especially as an in tree version will have to work with many use cases some of witch are yet to be designed (like distributed builds). Even just includingcargo-sweep
s existing functionality opens a full bike shop about what arguments to take, and in what form (cargo-sweep
takes a days argument, but maybe we should have a minutes or a ISO standard time or ...). This PR, or equivalent, allows out of tree experimentation with all different interfaces, and is basically required for anyLRU
based system. (For example Crater wants a GC that cleans files in anLRU
manner to maintain a target folder below a target size. This is not a use case that is widely enough needed to be worth adding to cargo but one supported by this PR.)what are the downsides?