touch some files when we use them #6477

Eh2406 · 2018-12-22T18:19:42Z

This is a small change to improve the ability for a third party subcommand to clean up a target folder. I consider this part of the push to experiment with out of tree GC, as discussed in #6229.

how it works?

This updates the modification time of a file in each fingerprint folder and the modification time of the intermediate outputs every time cargo checks that they are up to date. This allows a third party subcommand to look at the modification time of the timestamp file to determine the last time a cargo invocation required that file. This is far more reliable then the current practices of looking at the accessed time. accessed time is not available or disabled on many operating systems, and is routinely set by arbitrary other programs.

is this enough to be useful?

The current implementation of cargo sweep on master will automatically use this data with no change to the code. With this PR, it will work even on systems that do not update accessed time.

This also allows a crude script to clean some of the largest subfolders based on each files modification time.

is this worth adding, or should we just build `clean --outdated` into cargo?

I would love to see a clean --outdated in cargo! However, I think there is a lot of design work before we can make something good enough to deserve the cargo teams stamp of approval. Especially as an in tree version will have to work with many use cases some of witch are yet to be designed (like distributed builds). Even just including cargo-sweeps existing functionality opens a full bike shop about what arguments to take, and in what form (cargo-sweep takes a days argument, but maybe we should have a minutes or a ISO standard time or ...). This PR, or equivalent, allows out of tree experimentation with all different interfaces, and is basically required for any LRU based system. (For example Crater wants a GC that cleans files in an LRU manner to maintain a target folder below a target size. This is not a use case that is widely enough needed to be worth adding to cargo but one supported by this PR.)

what are the downsides?

There are legitimate performance concerns about writing so many small files during a NOP build.
There are legitimate concerns about unnecessary wrights on read-only filesystems.
If we add this, and it starts seeing widespread use, we may be de facto stabilizing the folder structure we use. (This is probably true of any system that allows out of tree experimentation.)
This may not be an efficient way to store the data. (It does have the advantage of not needing different cargos to manipulate the same file. But if you have a better idea please make a suggestion.)

rust-highfive · 2018-12-22T18:19:44Z

r? @ehuss

(rust_highfive has picked a reviewer for you, use r? to override)

ehuss · 2018-12-23T19:08:49Z

I think this would be useful. I'm not sure if it really needs a dedicated file. I did some tests on an old Windows machine, and I didn't see a noticeable performance difference. However, creating lots of small files generally isn't desirable. Would it be possible to just adjust the mtime of one of the other files? Touching dep-KIND-PKG-HASH may make #2426 worse, so that may not be a good idea, but I think the other two files would be possible options. It should probably get a test, too.

Eh2406 · 2018-12-23T23:03:10Z

Good points.

It would be nice if there was a reliable way to know if the fingerprint was made with a cargo that is maintaining the mtime or if falling back to the atime is required.
Cargo can decide not to have a clear signal.

If the tools always uses max(atime,mtime) then it will be safely conservative, but I fear some system that always returns 'now' for atime breaking the tooling.
If the tool always uses mtime then it will be too aggressive for older Cargos.

The tooling can decide to be to conservative, not support older cargos, or do version number based feature detection. (assuming we don't add a way to opt out.) A separate file makes it very clear if Cargo is making this data available to the tool. (file mtime iif file exists else max atime.) If there is another signale you would prefer, let me know.

I will add a test when we decided what we want the behavior to be and next time I look at the code.

Eh2406 · 2018-12-27T14:51:48Z

Thinking about it more, I don't think there is a reasonable way for a cleaner script to handle when cargo is sometimes opting out of this. Version number detection sounds approximately good enough. What is the best way to cross platform touch a file?

alexcrichton · 2019-01-02T15:42:56Z

Could we perhaps touch the mtime of the artifacts themselves? (like all the rlibs, binaries, etc). I think that Cargo doesn't compare the mtime on those artifacts (even historical Cargos) and that may make it easiest for other tools too? IIRC we already check for existence of all artifacts on nop builds, so opening up a file handle to set the mtime on it may not take too much longer. If this becomes a performance concern we could always add configuration/env vars to turn it off, but it seems reasonable to me to have it on by default.

Eh2406 · 2019-01-02T16:39:51Z

that may make it easiest for other tools too?

In what way? It would make a simple tool, like the published version of cargo-sweep, that does not know about the structure of the target dir closer to being correct but not really useable. I think a simple tool would start leaving the outputs, and delling the fingerprints leading to a complete rebuild. A tool that knows about the structure, like my branch of cargo-sweep, doesn't really care which file we pick as the authoritative one.

alexcrichton · 2019-01-02T19:21:16Z

Oh I was just thinking that if the output artifacts had mtimes on them then a tool could just delete any artifact older than N days, and the .fingerprint metadata would need cleaning eventually but it's not really large enough to warrant lots of scrutiny

Eh2406 · 2019-01-02T19:42:38Z

Ok, I am convinced that that could be a good next step!
What is the best way to cross platform touch a file?

let t = FileTime::from_system_time(SystemTime::now());
filetime::set_file_times(path, t, t):

Feels kinda dependent on the SystemTime being the same clock as the File System.

alexcrichton · 2019-01-02T19:47:14Z

Hm that's what I would naively say we should do, but I'll admit I have no idea how the filesystem clock and SystemTime clocks are related...

Eh2406 · 2019-01-09T22:15:42Z

@alexcrichton where do we check for existence of all artifacts on nop builds?

alexcrichton · 2019-01-10T17:23:45Z

I think here

Eh2406 · 2019-01-10T19:51:11Z

This has been updated. I rebased. I also switch to touching existing files instead of adding a new one. Tuch was imped with SystemTime. After our discussion in #2426, I am not convinced that there is a better alternative, nor that the problems are all that severe. In addition none of this change is for cargos correctness.

The files being touched are:

the hash file in the fingerprint. This allows cargo-sweep to clean all of target on systems that do not support atime, with no change to its code.
the output artifacts. This allows a simple shell script to clean all files in deps and examples older than a target.

when CI is green, I will update the title and op.

Eh2406 · 2019-01-13T04:16:17Z

Added tests. Do people have thoughts, or is this good to go?

ehuss

Seems good. Alex?

alexcrichton · 2019-01-16T23:27:46Z

@bors: r=ehuss

👍

bors · 2019-01-16T23:27:47Z

📌 Commit 97363ca has been approved by ehuss

bors · 2019-01-16T23:28:08Z

⌛ Testing commit 97363ca with merge 513d230...

touch some files when we use them This is a small change to improve the ability for a third party subcommand to clean up a target folder. I consider this part of the push to experiment with out of tree GC, as discussed in #6229. how it works? -------- This updates the modification time of a file in each fingerprint folder and the modification time of the intermediate outputs every time cargo checks that they are up to date. This allows a third party subcommand to look at the modification time of the timestamp file to determine the last time a cargo invocation required that file. This is far more reliable then the current practices of looking at the `accessed` time. `accessed` time is not available or disabled on many operating systems, and is routinely set by arbitrary other programs. is this enough to be useful? -------- The current implementation of cargo sweep on master will automatically use this data with no change to the code. With this PR, it will work even on systems that do not update `accessed` time. This also allows a crude script to clean some of the largest subfolders based on each files modification time. is this worth adding, or should we just build `clean --outdated` into cargo? ------ I would love to see a `clean --outdated` in cargo! However, I think there is a lot of design work before we can make something good enough to deserve the cargo teams stamp of approval. Especially as an in tree version will have to work with many use cases some of witch are yet to be designed (like distributed builds). Even just including `cargo-sweep`s existing functionality opens a full bike shop about what arguments to take, and in what form (`cargo-sweep` takes a days argument, but maybe we should have a minutes or a ISO standard time or ...). This PR, or equivalent, allows out of tree experimentation with all different interfaces, and is basically required for any `LRU` based system. (For example [Crater](rust-lang/crater#346) wants a GC that cleans files in an `LRU` manner to maintain a target folder below a target size. This is not a use case that is widely enough needed to be worth adding to cargo but one supported by this PR.) what are the downsides? ---- 1. There are legitimate performance concerns about writing so many small files during a NOP build. 2. There are legitimate concerns about unnecessary wrights on read-only filesystems. 3. If we add this, and it starts seeing widespread use, we may be de facto stabilizing the folder structure we use. (This is probably true of any system that allows out of tree experimentation.) 4. This may not be an efficient way to store the data. (It does have the advantage of not needing different cargos to manipulate the same file. But if you have a better idea please make a suggestion.)

bors · 2019-01-16T23:56:26Z

☀️ Test successful - checks-travis, status-appveyor
Approved by: ehuss
Pushing 513d230 to master...

@euclio

Update cargo Unblocks #56884 cc @euclio 6 commits in 2b4a5f1f0bb6e13759e88ea9512527b0beba154f..ffe65875fd05018599ad07e7389e99050c7915be 2019-01-12 04:13:12 +0000 to 2019-01-17 23:57:50 +0000 - Better error message for bad manifest with `cargo install`. (rust-lang/cargo#6560) - relax rustdoc output assertion (rust-lang/cargo#6559) - touch some files when we use them (rust-lang/cargo#6477) - Add documentation for new package/publish feature flags. (rust-lang/cargo#6553) - Update chat link to Discord. (rust-lang/cargo#6554) - Fix typo (rust-lang/cargo#6552) r? @alexcrichton

@Mark-Simulacrum

Put mtime-on-use behind a feature flag. This places #6477 behind the `-Z mtime-on-use` feature flag. The change to update the mtime each time a crate is used has caused a performance regression on the rust playground (rust-lang/rust#57774). It is using about 241 pre-built crates in a Docker container. Due to the copy-on-write nature of Docker, it can take a significant amount of time to update the timestamps (over 10 seconds on slower systems). cc @Mark-Simulacrum

rust-highfive assigned ehuss Dec 22, 2018

This comment has been minimized.

Sign in to view

Eh2406 force-pushed the add-a-timestamp-file branch from c700f8d to 4ae5abc Compare December 22, 2018 18:31

This was referenced Dec 22, 2018

Suggestion: Add a flag to clean all currently unused build artifacts holmgr/cargo-sweep#2

Open

Improve performance and accuracy by baking in targets structure knowledge? holmgr/cargo-sweep#6

Closed

Eh2406 force-pushed the add-a-timestamp-file branch from 4ae5abc to 920f4db Compare December 27, 2018 01:43

Eh2406 mentioned this pull request Dec 27, 2018

Rebuild on mid build file modification #6484

Merged

Eh2406 mentioned this pull request Jan 4, 2019

fix cargo not doing anything when the input and output mtimes are equal #5919

Merged

Eh2406 force-pushed the add-a-timestamp-file branch 2 times, most recently from 41b80df to b144ad3 Compare January 7, 2019 21:15

Eh2406 mentioned this pull request Jan 9, 2019

Automatically purge target directories after reaching max size rust-lang/crater#346

Open

adds a timestamp file in the fingerprint folder

d25305b

Eh2406 force-pushed the add-a-timestamp-file branch from b144ad3 to 41bf6f2 Compare January 10, 2019 19:24

just touch some of the files we use.

3eaa70e

Eh2406 force-pushed the add-a-timestamp-file branch from 41bf6f2 to 3eaa70e Compare January 10, 2019 19:45

Eh2406 changed the title ~~adds a timestamp file in the fingerprint folder~~ touch some files when we use them Jan 10, 2019

Eh2406 added 2 commits January 12, 2019 22:50

add a test for touching deps

80f7b90

add a test for touching fingerprint

e31003a

fix tests on HFS

97363ca

ehuss approved these changes Jan 16, 2019

View reviewed changes

bors merged commit 97363ca into rust-lang:master Jan 16, 2019

ehuss mentioned this pull request Jan 18, 2019

Update cargo rust-lang/rust#57721

Merged

ehuss mentioned this pull request Jan 20, 2019

Put mtime-on-use behind a feature flag. #6573

Merged

Eh2406 mentioned this pull request Jan 20, 2019

Nightly cargo is sometimes slow to start rust-lang/rust#57774

Closed

Eh2406 deleted the add-a-timestamp-file branch January 23, 2019 02:08

Eh2406 mentioned this pull request May 23, 2019

mtime-on-use as a config option #6978

Closed

Eh2406 mentioned this pull request Jun 14, 2019

-Zmtime-on-use causes spurious rebuilds in workspace #6972

Closed

ehuss mentioned this pull request Jul 19, 2019

Cache usage meta tracking issue #7150

Open

ehuss added this to the 1.34.0 milestone Feb 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

touch some files when we use them #6477

touch some files when we use them #6477

Eh2406 commented Dec 22, 2018 •

edited

Loading

rust-highfive commented Dec 22, 2018

This comment has been minimized.

ehuss commented Dec 23, 2018

Eh2406 commented Dec 23, 2018

Eh2406 commented Dec 27, 2018

alexcrichton commented Jan 2, 2019

Eh2406 commented Jan 2, 2019

alexcrichton commented Jan 2, 2019

Eh2406 commented Jan 2, 2019

alexcrichton commented Jan 2, 2019

Eh2406 commented Jan 9, 2019

alexcrichton commented Jan 10, 2019

Eh2406 commented Jan 10, 2019

Eh2406 commented Jan 13, 2019

ehuss left a comment

alexcrichton commented Jan 16, 2019

bors commented Jan 16, 2019

bors commented Jan 16, 2019

bors commented Jan 16, 2019

touch some files when we use them #6477

touch some files when we use them #6477

Conversation

Eh2406 commented Dec 22, 2018 • edited Loading

how it works?

is this enough to be useful?

is this worth adding, or should we just build clean --outdated into cargo?

what are the downsides?

rust-highfive commented Dec 22, 2018

This comment has been minimized.

ehuss commented Dec 23, 2018

Eh2406 commented Dec 23, 2018

Eh2406 commented Dec 27, 2018

alexcrichton commented Jan 2, 2019

Eh2406 commented Jan 2, 2019

alexcrichton commented Jan 2, 2019

Eh2406 commented Jan 2, 2019

alexcrichton commented Jan 2, 2019

Eh2406 commented Jan 9, 2019

alexcrichton commented Jan 10, 2019

Eh2406 commented Jan 10, 2019

Eh2406 commented Jan 13, 2019

ehuss left a comment

Choose a reason for hiding this comment

alexcrichton commented Jan 16, 2019

bors commented Jan 16, 2019

bors commented Jan 16, 2019

bors commented Jan 16, 2019

Eh2406 commented Dec 22, 2018 •

edited

Loading

is this worth adding, or should we just build `clean --outdated` into cargo?