Skip to content

Commit

Permalink
Auto merge of rust-lang#12634 - ehuss:last-use, r=epage
Browse files Browse the repository at this point in the history
Add cache garbage collection

### What does this PR try to resolve?

This introduces a new garbage collection system which can track the last time files were used in cargo's global cache, and delete old, unused files either automatically or manually.

### How should we test and review this PR?

This is broken up into a large number of commits, and each commit should have a short overview of what it does. I am breaking some of these out into separate PRs as well (unfortunately GitHub doesn't really support stacked pull requests). I expect to reduce the size of this PR if those other PRs are accepted.

I would first review `unstable.md` to give you an idea of what the user side of this looks like. I would then skim over each commit message to give an overview of all the changes. The core change is the introduction of the `GlobalCacheTracker` which is an interface to a sqlite database which is used for tracking the timestamps.

### Additional information

I think the interface for this will almost certainly change over time. This is just a stab to create a starting point where we can start testing and discussing what actual user flags should be exposed. This is also intended to start the process of getting experience using sqlite, and getting some testing in real-world environments to see how things might fail.

I'd like to ask for the review to not focus too much on bikeshedding flag names and options. I expect them to change, so this is by no means a concrete proposal for where it will end up. For example, the options are very granular, and I would like to have fewer options. However, it isn't clear how that might best work. The size-tracking options almost certainly need to change, but I do not know exactly what the use cases for size-tracking are, so that will need some discussion with people who are interested in that.

I decided to place the gc commands in cargo's `cargo clean` command because I would like to have a single place for users to go for deleting cache artifacts. It may be possible that they get moved to another command, however introducing new subcommands is quite difficult (due to shadowing existing third-party commands). Other options might be `cargo gc`, `cargo maintenance`, `cargo cache`, etc. But there are existing extensions that would interfere with.

There are also more directions to go in the future. For example, we could add a `cargo clean info` subcommand which could be used for querying cache information (like the sizes and such). There is also the rest of the steps in the original proposal at https://hackmd.io/U_k79wk7SkCQ8_dJgIXwJg for rolling out sqlite support.

See rust-lang#12633 for the tracking issue
  • Loading branch information
bors committed Nov 11, 2023
2 parents 6ef771d + 0cd970b commit 9a1b092
Show file tree
Hide file tree
Showing 37 changed files with 5,704 additions and 38 deletions.
91 changes: 85 additions & 6 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,8 @@ pretty_assertions = "1.4.0"
proptest = "1.3.1"
pulldown-cmark = { version = "0.9.3", default-features = false }
rand = "0.8.5"
regex = "1.9.3"
rusqlite = { version = "0.29.0", features = ["bundled"] }
rustfix = "0.6.1"
same-file = "1.0.6"
security-framework = "2.9.2"
Expand Down Expand Up @@ -162,6 +164,8 @@ pasetors.workspace = true
pathdiff.workspace = true
pulldown-cmark.workspace = true
rand.workspace = true
regex.workspace = true
rusqlite.workspace = true
rustfix.workspace = true
semver.workspace = true
serde = { workspace = true, features = ["derive"] }
Expand Down
39 changes: 36 additions & 3 deletions benches/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,23 @@ cd benches/benchsuite
cargo bench
```

The tests involve downloading the index and benchmarking against some
However, running all benchmarks would take many minutes, so in most cases it
is recommended to just run the benchmarks relevant to whatever section of code
you are working on.

## Benchmarks

There are several different kinds of benchmarks in the `benchsuite/benches` directory:

* `global_cache_tracker` — Benchmarks saving data to the global cache tracker
database using samples of real-world data.
* `resolve` — Benchmarks the resolver against simulations of real-world workspaces.
* `workspace_initialization` — Benchmarks initialization of a workspace
against simulations of real-world workspaces.

### Resolve benchmarks

The resolve benchmarks involve downloading the index and benchmarking against some
real-world and artificial workspaces located in the [`workspaces`](workspaces)
directory.

Expand All @@ -21,15 +37,32 @@ faster. You can (and probably should) specify individual benchmarks to run to
narrow it down to a more reasonable set, for example:

```sh
cargo bench -- resolve_ws/rust
cargo bench -p benchsuite --bench resolve -- resolve_ws/rust
```

This will only download what's necessary for the rust-lang/rust workspace
(which is about 330MB) and run the benchmarks against it (which should take
about a minute). To get a list of all the benchmarks, run:

```sh
cargo bench -- --list
cargo bench -p benchsuite --bench resolve -- --list
```

### Global cache tracker

The `global_cache_tracker` benchmark tests saving data to the global cache
tracker database using samples of real-world data. This benchmark should run
relatively quickly.

The real-world data is based on a capture of my personal development
environment which has accumulated a large cache. So it is somewhat arbitrary,
but hopefully representative of a challenging environment. Capturing of the
data is done with the `capture-last-use` binary, which you can run if you need
to rebuild the database. Just try to run on a system with a relatively full
cache in your cargo home directory.

```sh
cargo bench -p benchsuite --bench global_cache_tracker
```

## Viewing reports
Expand Down
6 changes: 6 additions & 0 deletions benches/benchsuite/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,10 @@ publish = false

[dependencies]
cargo.workspace = true
cargo-util.workspace = true
criterion.workspace = true
flate2.workspace = true
rand.workspace = true
tar.workspace = true
url.workspace = true

Expand All @@ -26,3 +28,7 @@ harness = false
[[bench]]
name = "workspace_initialization"
harness = false

[[bench]]
name = "global_cache_tracker"
harness = false
Loading

0 comments on commit 9a1b092

Please sign in to comment.