Caching tracking issue #21

svartalf · 2019-11-27T07:21:52Z

There are two things that can be improved with caching added:

Cache target/ and Cargo folders (see an example)
Cache the compiled binary from the cargo install {pkgid} command

Right now this affects the following workflows:

actions-rs/cargo with use-cross: true input enabled (cross is compiled on each call)
Download and run cargo-tarpaulin from GitHub releases tarpaulin#1
cargo tomlfmt #20
semverver #13
cargo udeps #11
Caching of Build Artifacts #5

This issue can be solved by a proper implementation of the actions-rs/core/Cargo.installCached function, see actions-rs/core#31, but it is blocked by actions/cache#55

Right now this issue acts only as a tracking issue for a possibility of proper caching implementation.

The text was updated successfully, but these errors were encountered:

CAD97 · 2020-03-03T04:50:56Z

In rust-analyzer, we remove artifacts for the workspace-local files before the cache stores. While this isn't required to reduce cache churn like it does on auto-keyed caching (e.g. Travis) (as a matching key will not store the cache), it does still reduce cache sizes.

If actions-rs ever provides a "best practice" caching wrapper, it should probably allow scrubbing these files, so only dependencies' artifacts are cached. (Specifically, scrubbing files directly in ./target/{debug,release}/ as well as folders/files directly in ./target/{debug,release/{.fingerprint,deps,incremental}/ that contain the names of workspace-local libs/bins.

svartalf · 2020-03-03T07:33:49Z

@CAD97 noted, thank you! I do want to create some kind of actions-rs/cache, which will cache all these things automatically, but as this issue points to, there is no programmatic access to the cache routines right now.

anp · 2020-04-12T18:49:08Z

Lucky timing, but looks like a few hours before I came across this issue someone posted a possible workaround to the upstream issue: actions/cache#55 (comment).

Also worth noting that in addition to the target/ directory examples, currently I have these cache keys set up in my jobs:

    - name: cache cargo index
      uses: actions/cache@v1
      with:
        path: ~/.cargo/git
        key: cargo-index-${{ hashFiles('**/Cargo.lock') }}
        restore-keys: |
          cargo-index-
    - name: cache cargo registry
      uses: actions/cache@v1
      with:
        path: ~/.cargo/registry
        key: cargo-registry-${{ hashFiles('**/Cargo.lock') }}
        restore-keys: |
          cargo-registry-

thomaseizinger · 2020-05-16T01:47:15Z

There is an initial version of @actions/cache available now: https://github.com/actions/toolkit/tree/master/packages/cache

mzabaluev · 2020-05-16T22:14:48Z

One problem with using a hash of Cargo.lock in the cache key is that for library crates, checking Cargo.lock into version control is discouraged. So there may not be enough data in the repository checkout so that the cache action could score the hit with the dependencies exactly right for the build, before cargo fetch or cargo update generates Cargo.lock locally. But these commands also fetch the dependencies, so they would benefit from some hopefully close snapshot of ~/.cargo/registry to be restored from the cache already.

I have developed a workflow solution that maintains a Cargo.lock checked in and automatically updated in a special git branch. In every workflow run that uses cargo to build, Cargo.lock is grafted on top of the commit that triggered the build, and then the hash of Cargo.lock is used as the last part of the cache key for restoring ~/.cargo/registry. The hope is that the lockfile generated off the latest master tree often suits the build of a somewhat divergent revision with no new dependency crates to fetch. This is probably too much complexity for a single GitHub action to hide, though.

CAD97 · 2020-05-17T04:24:40Z

for library crates, checking Cargo.lock into version control is discouraged.

This is true, but I both do and don't understand why it's the case. If you're caching the build anyway, you're (probably) going to be caching a Cargo.lock anyway. Committing a Cargo.lock for a library is good for libraries imho for the exact same reasons it would be good for binaries.

So there may not be enough data in the repository checkout so that the cache action could score the hit with the dependencies exactly right for the build, before cargo fetch or cargo update generates Cargo.lock locally. But these commands also fetch the dependencies, so they would benefit from some hopefully close snapshot of ~/.cargo/registry to be restored from the cache already.

You probably want cargo generate-lockfile then. Also, general knowledge that I've absorbed is that .cargo/registry isn't worth caching because 1) it costs a "similar" amount to download that file from the cache or to download it via cargo, and 2) cargo will update an outdated registry anyway for most commands unless you use --offline.

So I think there are two useful "simple enough" behaviors here:

Use a checked in Cargo.lock. Use its hash as part of the cache key.
Don't check in Cargo.lock, and always run CI against the latest versions of dependencies. Run cargo generate-lockfile first, then use the generated lockfile's hash as part of the cache key.

And the idealized verison of the second case, which I don't think is (or will be in the near future) possible:

Don't check in Cargo.lock, and always run CI against the latest versions of depedencies. Run cargo generate-lockfile first, then use the generated lockfile's hash as part of the cache key, restoring from the head's cache, but only persisting the new or reused partial compilation artifacts, so there's no unused artifacts bloating the cache.

Personally, I think checking in lockfiles is the better plan, even for libraries, because it makes CI builds more deterministic, and less likely to break because of the outside world. I never want to be in a situation where CI passed on the head, but a PR fails due to no fault of its own, just because a library updated in a minorly breaking way. Rather, I think the better model is to put the Cargo.lock into the source control and have dependabot (or a similar bot) PR and bors r+ any lockfile updates as libraries release new versions.

mzabaluev · 2020-05-17T06:13:12Z

You probably want cargo generate-lockfile then.

That command needs at least ~/.cargo/registry/index to be up to date, so it too would benefit from a cache restore. However, that could well be a separate smaller cache, because the primary key of the index state should be the head commit in the rust-lang/crates.io-index GitHub repo, not the Cargo.lock of any particular project.

Also, general knowledge that I've absorbed is that .cargo/registry isn't worth caching because 1) it costs a "similar" amount to download that file from the cache or to download it via cargo,

A full restoration including the repository index is long and getting longer as the index is growing.
Another motivation for caching is to avoid unnecessarily taxing the crates.io infrastructure.

cargo will update an outdated registry anyway for most commands unless you use --offline.

With a close enough state of the registry restored from the cache, there is not much to synchronize. A job in the workflow can perform cargo fetch or cargo update and store the perfect cache state for other jobs to pick up, which can then run cargo with --frozen --offline.

Don't check in Cargo.lock, and always run CI against the latest versions of depedencies. Run cargo generate-lockfile first, then use the generated lockfile's hash as part of the cache key, restoring from the head's cache, but only persisting the new or reused partial compilation artifacts, so there's no unused artifacts bloating the cache.

One current problem with this is the lack of an easy way to prune the outdated dependencies. There is a cargo-cache command where this could be implemented: matthiaskrgr/cargo-cache#76.

Personally, I think checking in lockfiles is the better plan, even for libraries, because it makes CI builds more deterministic, and less likely to break because of the outside world.

It may be good for reproducible CI builds, but for regular developer working copies of library code a version-controlled Cargo.lock just gets in the way. There is no single golden state of the build for a library; everyone may want to build/test it locally against their choice of dependency versions, and keep this preference locked in the Cargo.lock while they are working on it.

Checking the baseline lockfile into a CI-managed branch away from the branch(es) that get checked out by developers could be a way to get the best of both worlds.

mzabaluev · 2020-05-17T15:22:26Z

For reference, here's a workflow using two stages of caching. First, a job runs cargo generate-lockfile and updates the crate cache keyed by the hash of Cargo.lock. Then a matrix of test jobs uses the cached dependency sources on all three supported OSes. The cargo registry filesystem, git checkouts and crate sources archived on Linux seem to work fine when restored on Mac or Windows.

Some follow-up on my earlier comments:

A full restoration including the repository index is long and getting longer as the index is growing.

cargo generate-lockfile making a full clone of the crates.io index repository takes about 5 s in this workflow, which is comparable to the time it takes to store it in the cache. Restoration time is about 2 s.

Another motivation for caching is to avoid unnecessarily taxing the crates.io infrastructure.

The crates.io index is hosted on GitHub, so access to it from GitHub CI should be fast and cheap (free?) and may use any backplane optimization magic invented by GitHub. The CDN hosting the crates is currently on AWS, though.

mzabaluev · 2020-05-18T09:59:15Z

More observations on windows-latest:

The ~/.cargo directory with a weeks old clone of the repository index and some cached crate sources is present in the vanilla build environment (Old cargo cache left over from the build in Windows and Linux actions/runner-images#895).
cargo fetch on that state takes more than a minute to pull the index repository.
Unpacking the crates from ~/.cargo/registry/cache into ~/.cargo/registry/src can take quite some time. This can be measured by removing the src directory and running cargo fetch --locked --offline.
When building without prefetched crates, cargo may unpack the crate sources as needed in parallel with building the artefacts. The above overhead had no significant impact on the build time in my testing.

anp · 2020-05-31T16:40:35Z

Looks like the npm package for making use of the cache actions from other actions has released: https://github.blog/changelog/2020-05-27-github-actions-v2-cache-actions/.

Swatinem · 2020-11-30T13:19:13Z

👋 Reaching out here, since I have created https://github.com/Swatinem/rust-cache and we have been successfully test-driving it for a few sentry projects.
Swatinem/rust-cache#3 was asking about maybe moving it into actions-rs.

So far its very opinionated with regards to how I think caching should ideally work, but I’m open to ideas.

CAD97 · 2021-06-13T04:45:02Z

For tools specifically, https://github.com/actions/toolkit/tree/main/packages/tool-cache exists now and seems useful.

This will (eventually) give us a huge build time boost when actions-rs/meta#21 is implemented.

svartalf mentioned this issue Nov 27, 2019

cargo tomlfmt #20

Open

svartalf added the enhancement New feature or request label Nov 27, 2019

svartalf mentioned this issue Nov 27, 2019

setup sccache action #19

Open

svartalf mentioned this issue Dec 25, 2019

Cache cargo-audit actions-rs/audit-check#56

Open

svartalf mentioned this issue Jan 2, 2020

Add example for installing a cargo subcommand. actions-rs/example#11

Closed

ordian mentioned this issue Jan 23, 2020

CI: caching doesn't work properly ordian/open-ethereum-ci-experiment#7

Open

svartalf mentioned this issue Feb 19, 2020

Action to test for no_std actions-rs/cargo#45

Open

This was referenced Apr 25, 2020

Add Github Actions CI smol-rs/smol#13

Merged

Caching efficiently in CI smol-rs/smol#51

Closed

svartalf mentioned this issue May 1, 2020

Cache compiled binaries in the GitHub cache storage actions-rs/install#4

Open

justinmoon mentioned this issue Jul 6, 2020

WIP: Attempt to speed up CI by caching cargo dependencies linebender/druid#1086

Closed

mehcode mentioned this issue Jul 27, 2020

CI binary builds and caching launchbadge/sqlx#547

Merged

This was referenced Jan 4, 2021

Update Github Actions CI for the master branch rust-random/getrandom#192

Merged

Add remaining CI improvements rust-random/getrandom#194

Closed

jonas-schievink mentioned this issue Aug 2, 2021

GHA support knurling-rs/flip-link#47

Closed

TDHolmes mentioned this issue Dec 25, 2021

HTTP 403 from the tool cache actions-rs/install#12

Open

3 tasks

alterae added a commit to alterae/alterae.github.io that referenced this issue Oct 15, 2022

Use actions-rs/install to install mksite

ec4d14f

This will (eventually) give us a huge build time boost when actions-rs/meta#21 is implemented.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caching tracking issue #21

Caching tracking issue #21

svartalf commented Nov 27, 2019

CAD97 commented Mar 3, 2020

svartalf commented Mar 3, 2020

anp commented Apr 12, 2020

thomaseizinger commented May 16, 2020 •

edited

Loading

mzabaluev commented May 16, 2020

CAD97 commented May 17, 2020

mzabaluev commented May 17, 2020

mzabaluev commented May 17, 2020 •

edited

Loading

mzabaluev commented May 18, 2020 •

edited

Loading

anp commented May 31, 2020

Swatinem commented Nov 30, 2020

CAD97 commented Jun 13, 2021

Caching tracking issue #21

Caching tracking issue #21

Comments

svartalf commented Nov 27, 2019

CAD97 commented Mar 3, 2020

svartalf commented Mar 3, 2020

anp commented Apr 12, 2020

thomaseizinger commented May 16, 2020 • edited Loading

mzabaluev commented May 16, 2020

CAD97 commented May 17, 2020

mzabaluev commented May 17, 2020

mzabaluev commented May 17, 2020 • edited Loading

mzabaluev commented May 18, 2020 • edited Loading

anp commented May 31, 2020

Swatinem commented Nov 30, 2020

CAD97 commented Jun 13, 2021

thomaseizinger commented May 16, 2020 •

edited

Loading

mzabaluev commented May 17, 2020 •

edited

Loading

mzabaluev commented May 18, 2020 •

edited

Loading