Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(faq): Recommend always committing Cargo.lock #12275

Closed

Conversation

demurgos
Copy link
Contributor

@demurgos demurgos commented Jun 16, 2023

This commit updates the FAQ to recommend to always commit Cargo.lock in all projects, both binaries and libraries. The FAQ previously discouraged Cargo.lock in libraries.

The question of changing the recommendation to always commit the lock file was raised in #8728, as well as in a few Rust Internals threads in the previous months (Feedback of cargo-upgrade, Cargo yank is a misfeature) and was a relatively popular proposition.

Cargo.lock enables reproducible builds, I argue that this is a desirable property for all projects. Reproducible builds are a safe and powerful default both for binaries and libraries. Wanting a non-reproducible build is less common, and still easy to achieve even if a lock file is committed.

Applications have binaries compiled through cargo build and executed by cargo run. Libraries also have binaries, they're just executed through cargo test instead. In both cases, it is desirable to be able to work with the version control system when debugging some regression. A Cargo.lock enables workflows such as git bisect or reproducing a failed test across all environments (e.g. CI and dev machines).

  1. A lockfile enables reproducible builds across commits. This means that you can browse the git history and reproduce your tests as they were at this time. This unlocks git bisect workflow and is extremely useful when working on transitive dependency issues.

  2. A lockfile ensures that you can reproduce an older build even if it contains yanked dependencies or weakly constrained versions (semver requirement *, git repository without a revision, ...).

  3. Even if a lock file is present, it is easy to refresh it / force dependency resolution again. On the other hand, if the lockfile is missing then it is very hard to retrieve the resolution at this time (or even impossible). Having a lock file is a safer default (no information loss).

  4. When multiple developers contribute to the same project (even a library), a lock file ensures that all contributors test with the same dependencies. Without a lockfile, developers may see different behavior when testing. This also enables reproducing CI errors locally for example.

  5. The usual objection to committing a lockfile is that it may take longer to detect regressions in transitive dependencies. In practice, this does not change much: refreshing the dependencies is always an explicit operation in Cargo anyway. The only time when it's done implicitly is when cloning a project for the first if it does not have a lock file yet. Checking transitive dependencies is better handled explicitly in CI or through tools such as Dependabot. Catching problems with new dependencies should not happen when someone is trying to test an unrelated change. There is also a magnitude more builds in consumer projects than CI builds for a lib, so you should be ready to get messaged by your users anyway if you break something :)

This change to always commit lock files also aligns with prior art from other ecosystems.

  • Yarn for Node.js

    Which files should be gitignored?

    [...]

  • Poetry for Python

    As a library developer

    A simple way to avoid such a scenario [testing with latest transitive dependencies] is to omit the poetry.lock file. However, by doing so, you sacrifice reproducibility and performance to a certain extent. Without a lockfile, it can be difficult to find the reason for failing tests, because in addition to obvious code changes an unnoticed library update might be the culprit. [...]

    If you do not want to give up the reproducibility and performance benefits, consider a regular refresh of poetry.lock to stay up-to-date and reduce the risk of sudden breakage for users.

  • Bundler for Ruby

    Q: Should I commit my Gemfile.lock when writing a gem?

    A: Yes, you should commit it. The presence of a Gemfile.lock in a gem’s repository ensures that a fresh checkout of the repository uses the exact same set of dependencies every time. We believe this makes repositories more friendly towards new and existing contributors. Ideally, anyone should be able to clone the repo, run bundle install, and have passing tests. If you don’t check in your Gemfile.lock, new contributors can get different versions of your dependencies, and run into failing tests that they don’t know how to fix.

This commit also updates cargo new to no longer ignore Cargo.lock.

Fixes #8728

@rustbot
Copy link
Collaborator

rustbot commented Jun 16, 2023

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @epage (or someone else) soon.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

  • @rustbot author: the review is finished, PR author should check the comments and take action accordingly
  • @rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue

@rustbot rustbot added A-documenting-cargo-itself Area: Cargo's documentation S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 16, 2023
@demurgos demurgos force-pushed the docs/always-commit-cargo-lock branch 3 times, most recently from a7e07b6 to 5f7669a Compare June 16, 2023 07:14
@zackw
Copy link

zackw commented Jun 16, 2023

I object to this change. We should instead recommend that Cargo.lock never be committed.

Committing lock files optimizes for a tiny handful of use cases (that are primarily encountered by core developers, hence core developers tend to think this is a good idea) at the expense of all other use cases. In particular, committed lockfiles are a hindrance for anyone who wants to take some existing software and reuse it in a context other than exactly how its core developers were using it.

(I'm honestly tempted to write and submit a PR that completely removes the lockfile feature, that's how strongly I feel about this.)

@tiziano88
Copy link

IMO this is an unequivocally good change to commit. There is no valid reason for dealing with non-deterministic build / tests nowadays. Especially with dependabot which keeps track of dependencies and bumps them by preserving reproducibility. If you go back to an old commit at any point, everything should keep working in the same exact way; all the information about the dependencies must be contained within the repo at that commit.

@zackw
Copy link

zackw commented Jun 16, 2023

The mere existence of dependabot should clue you in to the fact that dependency locking is a mistake. It makes so much extra work for everyone that we had to create a bot to automate part of the process! And it's still extra work for everyone, vs. the alternative of not locking dependencies in the first place.

Reproducibility is exactly what I was talking about when I said "a tiny handful of use cases primarily encountered by core developers". Yes, it's valuable to be able to know what versions were used to test an old revision of the code --- for core developers. For everyone who isn't developing a piece of software, who just wants to use the damn thing, that information is only of interest if something goes wrong when they try to use the program with set of dependencies they want. Apart from that, dependency locks are nothing but an obstacle to their ability to control that set.

@zackw
Copy link

zackw commented Jun 16, 2023

Maybe this will help you understand where I'm coming from: Suppose cargo was changed so that it would still create Cargo.lock (possibly with a different name) but it would not read that file at all by default, only when manually instructed to do so. In that case I would have no problem with checking the file in. It would be a durable report of what version of each dependency was used to develop each commit, which is valuable information to have, but its existence would not normally constrain what dependencies are to be used with that commit in the future.

@tiziano88
Copy link

@zackw

I'm not sure why it would be better, as a "user" (not core developer) of a crate (e.g. a binary), to try your luck and get the latest version of all the deps, which has a nonzero chance of not working, vs just use the version of the deps that was checked in (and hopefully tested at least once) by the developer.

AFAICT you are assuming not only that things will keep working, but that presumably they will automatically improve the experience for users by virtue of picking a later version of all the dependencies?

@Nemo157
Copy link
Member

Nemo157 commented Jun 16, 2023

Cargo.lock is not active when you simply use some software, for libraries used as a dependency it's completely ignored, for applications installed via cargo install it's ignored by default but can be enabled with cargo install --locked. Only when you are building from within the source tree is it active (e.g. via cloning the repo, or manually downloading and extracting the archive).

@tiziano88
Copy link

for applications installed via cargo install it's ignored by default but can be enabled with cargo install --locked

If anything, personally I would also like to see this reversed at some point: use the checked-in Cargo.lock by default, unless you invoke cargo install --latest or something like that. But I'm not holding my breath for that 😅

@zackw
Copy link

zackw commented Jun 16, 2023

AFAICT you are assuming not only that things will keep working, but that presumably they will automatically
improve the experience for users by virtue of picking a later version of all the dependencies?

I'm not assuming that, my experience over the past 25 years has invariably been that the user experience is improved by picking the latest version of all the dependencies, consistent with the actual constraints as stated in Cargo.toml or equivalent.

@tiziano88
Copy link

AFAICT you are assuming not only that things will keep working, but that presumably they will automatically
improve the experience for users by virtue of picking a later version of all the dependencies?

I'm not assuming that, my experience over the past 25 years has invariably been that the user experience is improved by picking the latest version of all the dependencies, consistent with the actual constraints as stated in Cargo.toml or equivalent.

I have yet to see a single instance in which blindly picking a later version makes anything better -- OTOH I have certainly observed many cases in which nothing at all was working, or things broke in subtle ways.

But I guess at least now we know what we agree to disagree about 👍

@zackw
Copy link

zackw commented Jun 16, 2023

Cargo.lock is not active when you simply use some software ... only when you are building from within the source tree is it active (e.g. via cloning the repo, or manually downloading and extracting the archive).

Those semantics are still an obstacle for non-core developers, particularly of library crates. Consider the following scenario:

  1. You are developing a shiny new program. You want to use a library that hasn't been updated in a while.
  2. Because Cargo ignores the library's lock file when the library is used as a dependency, the library is built against newer versions of its own dependencies. The library authors were insufficiently cautious in stating their dependency constraints in Cargo.toml and so something breaks.
  3. You figure you'll have a try at fixing the problem, so you attempt to build the library yourself.
  4. Now the checked-in Cargo.lock is honored and you can't reproduce the problem until you delete it.

It would be better to honor Cargo.lock only when specifically requested, because then the behavior in step 3 would match the behavior in step 2.

@zackw
Copy link

zackw commented Jun 16, 2023

my experience over the past 25 years has invariably been that the user experience is improved by picking the latest version of all the dependencies, consistent with the actual constraints as stated in Cargo.toml or equivalent.

I have yet to see a single instance in which blindly picking a later version makes anything better

I think you've had bad experiences with software where the authors were sloppy about stating their actual version constraints. Ironically, dependency locking encourages that kind of sloppiness...

@joshtriplett
Copy link
Member

Without commenting on whether we should make this change or not (I'll wade into that one at some point but not right this minute): if we make this change, we should also change cargo new to not include Cargo.lock in .gitignore. We shouldn't have our documentation and our code steering people in inconsistent directions.

@demurgos demurgos force-pushed the docs/always-commit-cargo-lock branch 2 times, most recently from ccc6ac6 to a002f09 Compare June 16, 2023 20:20
@demurgos demurgos force-pushed the docs/always-commit-cargo-lock branch from a002f09 to 67fcc3d Compare June 16, 2023 20:21
@demurgos
Copy link
Contributor Author

@joshtriplett Thank you for your comment, you're right that cargo new should be updated accordingly. I amended my PR to include the cargo new change.

@demurgos demurgos force-pushed the docs/always-commit-cargo-lock branch from 67fcc3d to 7272d86 Compare June 16, 2023 20:26
@demurgos
Copy link
Contributor Author

demurgos commented Jun 16, 2023

@zackw
You described a meaningful scenario, but my conclusion is different.

Those semantics are still an obstacle for non-core developers, particularly of library crates. Consider the following scenario:

  1. You are developing a shiny new program. You want to use a library that hasn't been updated in a while.
  2. Because Cargo ignores the library's lock file when the library is used as a dependency, the library is built against newer versions of its own dependencies. The library authors were insufficiently cautious in stating their dependency constraints in Cargo.toml and so something breaks.
  3. You figure you'll have a try at fixing the problem, so you attempt to build the library yourself.
  4. Now the checked-in Cargo.lock is honored and you can't reproduce the problem until you delete it.

It would be better to honor Cargo.lock only when specifically requested, because then the behavior in step 3 would match the behavior in step 2.

From my point of view, the fact that the library has a lock file committed is already a very good starting point. Since we're talking about a lib with loose requirements (e.g. clap = "*"), it's good to have a trace recorded of what was used. "Oh it was tested with clap 3.x last time, but it got 4.x in my project; let's add a patch to address it". This enables you to go from the lib maintainer left off and continue from there.

I feel that we can at least agree that it's useful information to have available when debugging such problems? Your concern seems mostly directed at the fact that cargo uses lock files by default.

My opinion is that it's a safer default that makes software more approachable actually. I like how the Ruby community captured it (see quote in original post). Having the lockfile enabled by default means that anyone can clone a project and confidently run cargo test / start adding its changes without getting involved in some accidental transitive dependency issues.

@demurgos demurgos force-pushed the docs/always-commit-cargo-lock branch from 7272d86 to 1a34081 Compare June 16, 2023 20:48
@weihanglo
Copy link
Member

Echo what joshtriplett said, there are places more than cargo new may need a change. For example, include_lockfile() should always return true for cargo package.

/// Returns if package should include `Cargo.lock`.
pub fn include_lockfile(&self) -> bool {
self.targets().iter().any(|t| t.is_example() || t.is_bin())
}

(not commenting on whether should make this change either)

@demurgos demurgos force-pushed the docs/always-commit-cargo-lock branch from 1a34081 to 878c2d4 Compare June 16, 2023 23:38
@epage
Copy link
Contributor

epage commented Jun 17, 2023

Whatever my own thoughts are on this, can we move this conversation to #8728? That is where the discussion should have originally happened before opening a PR and this is unlikely to be decided within the confines of this PR, so I think it would help if we did not bifurcate the discussion between here and the issue.

This commit updates the FAQ to recommend to always commit `Cargo.lock` in all projects, both binaries and libraries. The FAQ previously discouraged `Cargo.lock` in libraries.

The question of changing the recommendation to always commit the lock file was raised in rust-lang#8728, as well as in a few Rust Internals threads in the previous months ([Feedback of cargo-upgrade](https://internals.rust-lang.org/t/feedback-on-cargo-upgrade-to-prepare-it-for-merging/17101/124?u=demurgos), [Cargo yank is a misfeature](https://internals.rust-lang.org/t/suggestion-cargo-yank-is-a-misfeature-and-should-be-deprecated-and-eventually-removed/18486/15)) and was a relatively popular proposition.

`Cargo.lock` enables reproducible builds, I argue that this is a desirable property for _all_ projects. Reproducible builds are a safe and powerful default both for binaries and libraries. Wanting a non-reproducible build is less common, and still easy to achieve even if a lock file is committed.

Applications have binaries compiled through `cargo build` and executed by `cargo run`. Libraries also have binaries, they're just executed through `cargo test` instead. In both cases, it is desirable to be able to work with the version control system when debugging some regression. A `Cargo.lock` enables workflows such as `git bisect` or reproducing a failed test across all environments (e.g. CI and dev machines).

1. A lockfile enables reproducible builds across commits. This means that you can browse the git history and reproduce your tests as they were at this time. This unlocks `git bisect` workflow and is extremely useful when working on transitive dependency issues.

2. A lockfile ensures that you can reproduce an older build even if it contains  yanked dependencies or weakly constrained versions (semver requirement `*`, git repository without a revision, ...).

3. Even if a lock file is present, it is easy to refresh it / force dependency resolution again. On the other hand, if the lockfile is missing then it is very hard to retrieve the resolution at this time (or even impossible). Having a lock file is a safer default (no information loss).

4. When multiple developers contribute to the same project (even a library), a lock file ensures that all contributors test with the same dependencies. Without a lockfile, developers may see different behavior when testing. This also enables reproducing CI errors locally for example.

5. The usual objection to committing a lockfile is that it may take longer to detect regressions in transitive dependencies. In practice, this does not change much: refreshing the dependencies is always an explicit operation in Cargo anyway. The only time when it's done implicitly is when cloning a project for the first if it does not have a lock file yet. Checking transitive dependencies is better handled explicitly in CI or through tools such as Dependabot. Catching problems with new dependencies should not happen when someone is trying to test an unrelated change. There is also a magnitude more builds in consumer projects than CI builds for a lib, so you should be ready to get messaged by your users anyway if you break something :)

This change to always commit lock files also aligns with prior art from other ecosystems.
- [Yarn for Node.js](https://yarnpkg.com/getting-started/qa#which-files-should-be-gitignored)
  > Which files should be gitignored?
  >
  > [...]
  >
  > - yarn.lock should always be stored within your repository ([even if you develop a library](https://yarnpkg.com/getting-started/qa#should-lockfiles-be-committed-to-the-repository)).
- [Poetry for Python](https://python-poetry.org/docs/basic-usage/#committing-your-poetrylock-file-to-version-control)
  > As a library developer
  >
  > A simple way to avoid such a scenario [testing with latest transitive dependencies] is to omit the `poetry.lock` file. However, by doing so, you sacrifice reproducibility and performance to a certain extent. Without a lockfile, it can be difficult to find the reason for failing tests, because in addition to obvious code changes an unnoticed library update might be the culprit. [...]
  >
  > If you do not want to give up the reproducibility and performance benefits, consider a regular refresh of `poetry.lock` to stay up-to-date and reduce the risk of sudden breakage for users.
- [Bundler for Ruby](https://bundler.io/guides/faq.html#using-gemfiles-inside-gems)
  > Q: Should I commit my `Gemfile.lock` when writing a gem?
  >
  > A: Yes, you should commit it. The presence of a `Gemfile.lock` in a gem’s repository ensures that a fresh checkout of the repository uses the exact same set of dependencies every time. We believe this makes repositories more friendly towards new and existing contributors. Ideally, anyone should be able to clone the repo, run `bundle install`, and have passing tests. If you don’t check in your `Gemfile.lock`, new contributors can get different versions of your dependencies, and run into failing tests that they don’t know how to fix.

This commit also updates `cargo new` to no longer ignore `Cargo.lock`.

Fixes rust-lang#8728
@demurgos demurgos force-pushed the docs/always-commit-cargo-lock branch from 878c2d4 to dda69ae Compare June 17, 2023 14:46
@demurgos
Copy link
Contributor Author

This was initially a documentation only PR so discussion felt adequate here, but as I see that it needs more and more code changes I agree that the PR should remain for technical issue and the general discussion should move to the issue.

@epage
Copy link
Contributor

epage commented Jun 17, 2023

This was initially a documentation only PR so discussion felt adequate here, but as I see that it needs more and more code changes I agree that the PR should remain for technical issue and the general discussion should move #8728.

This isn't just a documentation change but a change in policy which needs to be hashed out in the issue and a FCP with the cargo team before we can consider merging any docs/code changes.

@bors
Copy link
Collaborator

bors commented Jul 24, 2023

☔ The latest upstream changes (presumably #12397) made this pull request unmergeable. Please resolve the merge conflicts.

@weihanglo
Copy link
Member

This has been superseded and resolved by #12382. See #8728 as well for the full thread of the discussion.

Thank you all for participating in this :)

@weihanglo weihanglo closed this Aug 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-documenting-cargo-itself Area: Cargo's documentation Command-new S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Check Cargo.lock in version control for libraries
9 participants