reset lockfile information between resolutions #8274

Eh2406 · 2020-05-24T21:38:44Z

#8249 pointed out that some kind of lockfile data was leaking between calls to the resolver. @ehuss made a reproducing test case. This PR resets the LockedMap data structure when calling register_previous_locks.

lets see if CI likes it.
fix #8249

rust-highfive · 2020-05-24T21:38:47Z

r? @alexcrichton

(rust_highfive has picked a reviewer for you, use r? to override)

alexcrichton

Man would I love to remove this second call to the resolver...

This looks like it works in CI at least! Sorry I haven't been following the issue too too closely, but mind explaining a bit why this call is necessary? I'm also interested in how we haven't ever noticed this until now?

alexcrichton · 2020-05-27T15:33:08Z

src/cargo/core/registry.rs

-        Vec<(PackageId, Vec<PackageId>)>,
-    >,
+    // The next level is keyed by the name of the package...
+    (SourceId, InternedString),


This is just a small refactoring, right, not required for this change to be correct?

(ok either way, just wanted to confirm)

Yes, I did some refactorings to make debugging easier. This one felt worth keeping.

ehuss · 2020-05-27T19:55:54Z

I think this makes sense. Can you tell me if my summary is correct?

In the test, it builds a Cargo.lock (generate-lockfile). The test then adds a new dependency, and Cargo then:

Loads Cargo.lock
Resolves workspace:
2a. Locks things in PackageRegistry
2b. Runs resolver (update due to new dependency)
2c. Writes new Cargo.lock
Build feature-specific resolver:
3a. Locks things in PackageRegistry (again!)
3b. Runs resolver with features selected on the command-line

The issue is that the locked entries in PackageRegistry from step 2a are no longer correct because step 2b updated the dependency graph. In step 3a, it is starting with stale entries. The solution is in step 3a to clear PackageRegistry first.

Does that sound correct? If so, I'm surprised this hasn't been a problem in the past. PackageRegistry is one of the things that I struggle to understand.

Eh2406 · 2020-05-27T19:58:00Z

mind explaining a bit why this call is necessary?

I can try, it is not clear which layers of abstraction the explanation is best in. So it may take some back and forth.

register_previous_locks takes a previous resolution or lockfile and builds a LockedMap by calling register_lock.

Note 1, that the package is added to the end of the list in register_lock.
Note 2, before this PR the LockedMap was append only.

The LockedMap is read in 2 places here and here.

Note 3. in both .iter().find( is used get the first thing that matches.

Ok now we can step thru the bug, at this level of abstraction.

The original lock file is read. The only thing in it is Foo has no dependencies.
The first run of the resolver looks at this LockedMap and figures out that it is outdated. (How? Different level of abstraction.) And builds the correct resolution, the same one as gened without a lockfile or with the full lockfile.
The second call to the resolver calls register_previous_locks with the result of the first run.
This adds a full lock to the end of the list for Foo specifying its dependencies. And it adds a full lock for all the transitive deps.
The second call to the resolver sees an inconsistent Foo has no dependencies and locked versions of the dependencies. This looks like a bug at this level of abstraction. It should just use the results from the first run not a mix. (But why did this mix make the resolver fail? Different level of abstraction)

I'm also interested in how we haven't ever noticed this until now?

IDK. One thing is that it only makes a problem for the first call to cargo after making a change to the dependencies. So it takes some clever debugging to notese it. I would have just bin like "it works now" and moved on with my life. Also the change to the dependencies need to make an inconsistent LockedMap that has a visibly different result. I wonder if we have had deps rebuilt on the second run and just not had it reported.

Eh2406 · 2020-05-28T19:48:12Z

One of the questions asked yesterday was to follow up on "But why did this mix make the resolver fail?". So I instrumented the registry_cache/summary_cache, witch caches the results of questions to the registry. I converted that into a test for the resolver and pushed a test. The test shows that the "mix lockfile" as provided to the resolver should not have a resolution.

alexcrichton · 2020-06-01T16:01:05Z

@bors: r+

Ok I've thought about this enough that I'm ok with this. I unfortunately don't have the time right now to really dig deep into this. I can't really answer why this hasn't constantly come up in the past or how we haven't noticed this until now.

bors · 2020-06-01T16:01:07Z

📌 Commit 8ce0d02 has been approved by alexcrichton

bors · 2020-06-01T16:01:15Z

⌛ Testing commit 8ce0d02 with merge 5847787...

bors · 2020-06-01T16:33:38Z

☀️ Test successful - checks-azure
Approved by: alexcrichton
Pushing 5847787 to master...

Update cargo 9 commits in 9fcb8c1d20c17f51054f7aa4e08ff28d381fe096..40ebd52206e25c7a576ee42c137cc06a745a167a 2020-05-25 16:25:36 +0000 to 2020-06-01 22:35:00 +0000 - Warn if using hash in git URL, Fixes rust-lang/cargo#8241 (rust-lang/cargo#8297) - reset lockfile information between resolutions (rust-lang/cargo#8274) - Disable strip_works test on macos. (rust-lang/cargo#8301) - Fix typo in impl Display for Strip (rust-lang/cargo#8299) - Add support for rustdoc root URL mappings. (rust-lang/cargo#8287) - Fix tests with enoent error message on non-english systems. (rust-lang/cargo#8295) - Fix fingerprinting for lld on Windows with dylib. (rust-lang/cargo#8290) - Fix a typo (rust-lang/cargo#8289) - Fix several issues with close_output test. (rust-lang/cargo#8286)

ehuss and others added 5 commits May 24, 2020 17:31

Repro

771b2bc

minimize test

46b50bb

don't need to clone this string

053d827

we dont need both maps

8418a3a

remove all residual state from previous lock files

61e5927

rust-highfive assigned alexcrichton May 24, 2020

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 24, 2020

alexcrichton reviewed May 27, 2020

View reviewed changes

confirm that it is not a bug at resolver level

8ce0d02

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 1, 2020

bors merged commit 5847787 into rust-lang:master Jun 1, 2020

ehuss mentioned this pull request Jun 2, 2020

Update cargo rust-lang/rust#72901

Merged

Eh2406 deleted the 8249-repro branch June 5, 2020 17:08

ehuss added this to the 1.46.0 milestone Feb 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reset lockfile information between resolutions #8274

reset lockfile information between resolutions #8274

Eh2406 commented May 24, 2020

rust-highfive commented May 24, 2020

alexcrichton left a comment

alexcrichton May 27, 2020

Eh2406 May 27, 2020

ehuss commented May 27, 2020

Eh2406 commented May 27, 2020

Eh2406 commented May 28, 2020

alexcrichton commented Jun 1, 2020

bors commented Jun 1, 2020

bors commented Jun 1, 2020

bors commented Jun 1, 2020

reset lockfile information between resolutions #8274

reset lockfile information between resolutions #8274

Conversation

Eh2406 commented May 24, 2020

rust-highfive commented May 24, 2020

alexcrichton left a comment

Choose a reason for hiding this comment

alexcrichton May 27, 2020

Choose a reason for hiding this comment

Eh2406 May 27, 2020

Choose a reason for hiding this comment

ehuss commented May 27, 2020

Eh2406 commented May 27, 2020

Eh2406 commented May 28, 2020

alexcrichton commented Jun 1, 2020

bors commented Jun 1, 2020

bors commented Jun 1, 2020

bors commented Jun 1, 2020