-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fetch specific commits even if the github fast path fails #13946
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @weihanglo (or someone else) some time within the next two weeks. Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (
|
src/cargo/sources/git/utils.rs
Outdated
@@ -977,6 +977,13 @@ pub fn fetch( | |||
// The reason we write to `refs/remotes/origin/HEAD` is that it's of special significance | |||
// when during `GitReference::resolve()`, but otherwise it shouldn't matter. | |||
refspecs.push(format!("+{0}:refs/remotes/origin/HEAD", rev)); | |||
} else if Url::parse(remote_url).map_or(false, |url| is_github(&url)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the nice finding and write-up!
I understand supporting other Git hosting would be awesome, though this is still a GitHub specific fast path. I wonder if we could move this patch to under github_fast_path
. Maybe something like this?
if let Some(local_object) = local_object {
if is_short_hash_of(rev, local_object) {
debug!("github fast path already has {local_object}");
return Ok(FastPathRev::UpToDate);
}
}
+ if let Ok(oid) = rev.parse::<Oid>() {
+ debug!("github fast path is already a full commit hash {rev}");
+ return Ok(FastPathRev::NeedsFetch(oid))
+ }
rev
} else {
debug!("can't use github fast path with `rev = \"{}\"`", rev);
return Ok(FastPathRev::Indeterminate);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh that's clever! Let me give that a shot.
Co-authored-by: Weihang Lo <[email protected]>
745fc3b
to
5d7a06b
Compare
I added a It looks like that approach also has the benefit of making the behavior around what the ref is fetched as more consistent; previously if the GitHub API request failed the ref would instead be fetched to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thank you for the contribution.
Let's move on and leave off the issue of handling 64-hex-digit commit hash, believing this part won't be the only place need to change at that time 😬.
@bors r+ |
☀️ Test successful - checks-actions |
Update cargo 7 commits in 84dc5dc11a9007a08f27170454da6097265e510e..a8d72c675ee52dd57f0d8f2bae6655913c15b2fb 2024-05-20 18:57:08 +0000 to 2024-05-24 03:34:17 +0000 - Improve error description when deserializing partial field struct (rust-lang/cargo#13956) - fix: remove symlink dir on Windows (rust-lang/cargo#13910) - Fix wrong type of rustc-flags in documentation (rust-lang/cargo#13957) - Add more high level traces (rust-lang/cargo#13951) - upgrade gix from 0.62 to 0.63 (rust-lang/cargo#13948) - Use `i32` rather than `usize` as "default integer" in library template (rust-lang/cargo#13939) - fetch specific commits even if the github fast path fails (rust-lang/cargo#13946) r? ghost
Update cargo 7 commits in 84dc5dc11a9007a08f27170454da6097265e510e..a8d72c675ee52dd57f0d8f2bae6655913c15b2fb 2024-05-20 18:57:08 +0000 to 2024-05-24 03:34:17 +0000 - Improve error description when deserializing partial field struct (rust-lang/cargo#13956) - fix: remove symlink dir on Windows (rust-lang/cargo#13910) - Fix wrong type of rustc-flags in documentation (rust-lang/cargo#13957) - Add more high level traces (rust-lang/cargo#13951) - upgrade gix from 0.62 to 0.63 (rust-lang/cargo#13948) - Use `i32` rather than `usize` as "default integer" in library template (rust-lang/cargo#13939) - fetch specific commits even if the github fast path fails (rust-lang/cargo#13946) r? ghost
Update cargo 7 commits in 84dc5dc11a9007a08f27170454da6097265e510e..a8d72c675ee52dd57f0d8f2bae6655913c15b2fb 2024-05-20 18:57:08 +0000 to 2024-05-24 03:34:17 +0000 - Improve error description when deserializing partial field struct (rust-lang/cargo#13956) - fix: remove symlink dir on Windows (rust-lang/cargo#13910) - Fix wrong type of rustc-flags in documentation (rust-lang/cargo#13957) - Add more high level traces (rust-lang/cargo#13951) - upgrade gix from 0.62 to 0.63 (rust-lang/cargo#13948) - Use `i32` rather than `usize` as "default integer" in library template (rust-lang/cargo#13939) - fetch specific commits even if the github fast path fails (rust-lang/cargo#13946) r? ghost
Update cargo 7 commits in 84dc5dc11a9007a08f27170454da6097265e510e..a8d72c675ee52dd57f0d8f2bae6655913c15b2fb 2024-05-20 18:57:08 +0000 to 2024-05-24 03:34:17 +0000 - Improve error description when deserializing partial field struct (rust-lang/cargo#13956) - fix: remove symlink dir on Windows (rust-lang/cargo#13910) - Fix wrong type of rustc-flags in documentation (rust-lang/cargo#13957) - Add more high level traces (rust-lang/cargo#13951) - upgrade gix from 0.62 to 0.63 (rust-lang/cargo#13948) - Use `i32` rather than `usize` as "default integer" in library template (rust-lang/cargo#13939) - fetch specific commits even if the github fast path fails (rust-lang/cargo#13946) r? ghost
Update cargo 7 commits in 84dc5dc11a9007a08f27170454da6097265e510e..a8d72c675ee52dd57f0d8f2bae6655913c15b2fb 2024-05-20 18:57:08 +0000 to 2024-05-24 03:34:17 +0000 - Improve error description when deserializing partial field struct (rust-lang/cargo#13956) - fix: remove symlink dir on Windows (rust-lang/cargo#13910) - Fix wrong type of rustc-flags in documentation (rust-lang/cargo#13957) - Add more high level traces (rust-lang/cargo#13951) - upgrade gix from 0.62 to 0.63 (rust-lang/cargo#13948) - Use `i32` rather than `usize` as "default integer" in library template (rust-lang/cargo#13939) - fetch specific commits even if the github fast path fails (rust-lang/cargo#13946) r? ghost
Update cargo 7 commits in 84dc5dc11a9007a08f27170454da6097265e510e..a8d72c675ee52dd57f0d8f2bae6655913c15b2fb 2024-05-20 18:57:08 +0000 to 2024-05-24 03:34:17 +0000 - Improve error description when deserializing partial field struct (rust-lang/cargo#13956) - fix: remove symlink dir on Windows (rust-lang/cargo#13910) - Fix wrong type of rustc-flags in documentation (rust-lang/cargo#13957) - Add more high level traces (rust-lang/cargo#13951) - upgrade gix from 0.62 to 0.63 (rust-lang/cargo#13948) - Use `i32` rather than `usize` as "default integer" in library template (rust-lang/cargo#13939) - fetch specific commits even if the github fast path fails (rust-lang/cargo#13946) r? ghost
What does this PR try to resolve?
This PR fixes #13555, which describes a regression from 1.64.0 to 1.65.0 where the inability to fetch commit information from api.github.com (the "GitHub fast path") silently changes Cargo's behavior.
Cargo can fetch a specific Git commit from a remote without having to fetch all refs. Prior to #10807, this functionality required a repository hosted on github.com and providing the full commit hash (usually available from the Cargo.lock); after that change, any revision (including abbreviated revisions) that could be resolved by GitHub's API could be fetched directly. However, this logic requires the "GitHub fast path", which was not intended to be robust, to successfully return the resolved commit hash; if a client is currently rate-limited by api.github.com (very common in CI and shared cloud / corporate environments) this fails and Cargo falls back to fetching all refs.
Usually this is not noticeable. However, GitHub allows fetching commits that are related to the repository but not actually part of any of its refs, including commits pushed to a fork. This results in the same command working fine in some environments where api.github.com is accessible, and not working in other environments that are rate-limited, which is very confusing and difficult to debug.
This change adds another branch to cover the regression case: if we are going through the GitHub fast path with a full commit hash, return early indicating that we need to fetch it. (Previously:
when the GitHub fast path was unsuccessful, the user is not using the unstable shallow clone options, and we have a full commit hash and expect to be able to fetch it directly because we know it's a github.com repository.)How should we test and review this PR?
I have been testing this PR by temporarily adding a
0.0.0.0 api.github.com
entry to my/etc/hosts
, which causes the GitHub fast path to always fail, then running:This refers to a particular commit on a fork of the repo which won't be found by the fallback path or current Cargo.
Note that you will need to delete
~/.cargo/git/checkouts/unstable-test-repo-*
and~/.cargo/git/db/unstable-test-repo-*
after a successful run with this change in order to reproduce the broken behavior of the current release.I am having trouble getting the test suite to run at all on my system so I haven't experimented with writing a specific test for this case, but I probably should.
Additional information
This uses the same logic as the unstable shallow clone support to detect if the revision is a full commit hash. This is not compatible with SHA-256 commit hashes;
git2::Oid
specifically expects a 40-character hexadecimal string. Given that the change introducing this bug was meant to future-proof SHA-256 support (despite only doing so for GitHub repositories), it might be good to make the logic more explicit within Cargo and allow either 40- or 64-character hex strings.I wanted to keep this change focused on the regression fix, but in testing, pretty much every Git repository I could think of (including non-forges, like git.kernel.org and some repositories I host on my own infrastructure with cgit) supports fetching directly from a commit, so it would be ideal to eventually relax the GitHub requirement for this functionality. However, it would need some sort of fallback logic because I suspect the HTTP dumb protocol doesn't support commit references, and I haven't researched when this functionality was added to the smart protocol.