RFC of WIP: Concurrent Cargo #1764

Byron · 2015-06-27T22:03:52Z

In the current state, using cargo concurrently on the same system results in undesired behaviour.

One way to alleviate this would be to use the local file-system's locking facilities to lock resources before trying to use or adjust them. Both Windows (LockFileEx(...)) and Unix (fcntl(...)) provide such facilities, which should allow a cross-platform implementation with equivalent behaviour.

There are two ways to deal with locks:

Abort gracefully when a lock could not be acquired without blocking (similar to try_lock()). This is equivalent to the current behaviour, except that the cargo process who managed to acquire the lock will not be disturbed by the competing process.
Wait for a lock, and try to avoid duplication of work when the lock was acquired (similar to lock()). This allows multiple concurrent cargo processes to work collaboratively.

The choice above should be configurable.

In the past days I have been adjusting the file-lock crate, which will eventually provide the required cross-platform locking functionality. Currently it is only implemented for Non-Windows though.

Caveats

file-system based locking may not be fully supported on all file-systems. Network filesystem can be problematic, even though both SMB and NFS provide locking support. If one cannot trust the file-system, cargo will unwillingly revert to the previous, somewhat undefined behaviour.
It's unsafe to delete a lock-file after creation, which means there will be additional (possibly hidden) files in the CARGO_HOME and target-dirs.

RFC

I am submitting the PR early to get a chance for early feedback. This shall help to get the code on track soon, instead of having a whole pile of unusable nonsense later. As these are my first steps within the cargo codebase (which is teaching me every day), code-quality will certainly need to improve in many ways. Therefore I will be happy to make adjustments as soon as a review is coming in.

This PR's text will be updated throughout its lifetime, based on comments and reviews.

Progress

Issues to fix

Issues mentioned here are the ones I am quite aware off, even though I have no clear idea on how to alleviate them. Input on these topics are particularly welcome.

cargo-rustc now allows invalid/half-finished files (see code)
CargoLock::lock(...) method is hacky (look here and here

Testing

Currently I test manually, which is accomplished as follows:

$ git clone https://github.com/Byron/google-apis-rs
$ cd google-apis-rs
# Build 4 targets with plenty of dependencies in parallel. Depending on the `build.jobs` variable 
# in `.cargo.config`, there is more or less intra-process contention.
$ CARGO_HOME=$PWD/cargo_home make -j4 groupsmigration1-cli-cargo discovery1-cli-cargo translate2-cli-cargo audit1-cli-cargo ARGS=build

The .cargo/config contained in the aforementioned repository is used to configure cargo. Currently it looks like this:

[build]
target-dir = "target"
lock-kind = "wait"

At the current time, intra-process contention seems to be the most troublesome, as I get deadlock avoided errors when trying to wait on a lock ... sometimes.

The lock type wraps the file_lock::FileLock into an interface suitable for cargo's error handling. It uses the cargo configuration to determine the locking semantics. * build.lock-kind = "nowait" is non-blocking, which is the default to work similarly to what people have grown to except. After all, cargo fails ungracefully if there is any kind of concurrent access. In future, it would fail in a controlled fasion. Once we are sure about this, the default might change to 'wait' * build.lock-kind = "wait" is blocking

Namely these are: * `open(...)` * `do_update(...)` * `download_package(...)` The general intention is to add an additional check for the originally desired result and to verify it in case it already exists. That way, duplication of work can be prevented (for example in case of `download_package(...)`).

rust-highfive · 2015-06-27T22:04:05Z

r? @alexcrichton

(rust_highfive has picked a reviewer for you, use r? to override)

Previously, a non-existant branch was set by accident.

It might be easier to read as well, and doesn't incour unnecessary overhead for converting the desired value to String, and parse it back. Also added latest version of Cargo.lock which contains the actual branch we need in the `file_lock` crate.

It's primariily cool to see that there is concurrency going on, but the user shouldn't have to care. In most cases, there is no attempt concurrently download anything as the crate will already be present. Here is [an example](https://goo.gl/JqBrlI) on how it looked previously.

This will make the type easier to use, especially in concjunction with utilities that don't necessarily have a `Config` instance available at all times. The adjustment will help implementing locking for the `Layout` type.

It's somewhat unlikely the directory creation fails, and I'd clearly prefer to lock the entire build process on a higher level.

This is interesting when locks cannot be acquired. It also shows that we probably want to provide the lock file path as well in case a lock fails (for instance due to the `nowait` mode). Added `debug!` information to indicate when we try to get a lock.

This is an intermediate version as it shows how locking can be done on JobQueue level. However, as we usually run multiple targets for the same package and stage pair at the same time, the FileLock implementation may fail with `Deadlock avoided` errors. Apparently it doesn't like multiple threads of the same process try to get an exclusive lock. Admittedly, I don't understand why this is. Nonetheless, I am confident that each thread should simply stay on its own resource lock, which would require us to provide additional information with each job. The `TargetKind` *should* do it.

The `lock` implementation will now indefinitely attempt to aquire a lock unless the result has nothing to do with `deadlock`. When there is high intra-process contention thanks to threads, this issue occours even though I would hope that we don't try to to have multiple threads obtain a lock on the same lock file. However, apparently this does happen. Even if it does I don't understand why this is a deadlock for him. The good news is that building now works in my manual tests. On the bright side, this made it work in its entirety, and serves as proof-of-concept.

Just because this is not really an issue to handle. [skip ci]

alexcrichton · 2015-06-29T18:22:16Z

Nice work! Of the two strategies you've outlined (abort or handle locking) I personally favor aborting currently as it seems like there's not a great way to force Cargo to think about file locking just yet. I haven't given it an inordinate amount of thought of what needs to be locked and where, but whenever I've done so I've reached the conclusion that it's easiest to just have a global lock on all operations.

I think that it'd be a good idea to handle more fine-grained locking, but I'm worried about the ability to statically ensure that we handle locking at all the appropriate locations. For example this PR currently sprinkles around a few locks here and there, but I'd be much more comfortable if there was one central location to acquire a lock from and otherwise paths just weren't accessible at all.

For now I feel like it might be best to just try to acquire an exclusive lock on a global file for each Cargo invocation and just abort if there's another Cargo instance running elsewhere as it's what I feel is the most robust solution that can be added now. I also want to make sure that if Cargo dies unexpectedly (e.g. receives a signal) that the locks are cleaned up (e.g. released).

Byron · 2015-06-30T07:28:48Z

... but whenever I've done so I've reached the conclusion that it's easiest to just have a global lock on all operations.

I agree, and believe this would be a great starting point for cargo. It should also be something that could be working relatively soon.

I also want to make sure that if Cargo dies unexpectedly (e.g. receives a signal) that the locks are cleaned up (e.g. released).

Please note that the lock implementation is not alike the one used by Git. The latter opens a file with O_EXCL and thus can tell if a lock is already present. It will abort as only option. The implementation you see here relies on flock and friends, which works with existing files only but will also allow to wait on a lock. However, this locking style cannot safely remove lock file it created, as others might be waiting on it, and as it doesn't create them with O_EXCL.
This shouldn't be an issue though, it would just mean the said global lock would have to remain on disk.

For example this PR currently sprinkles around a few locks here and there [...]

That is true, and it's what I came up with while searching for the 'highest possible' lock locations. Maybe there are better ones especially for locks on the folder structure initialization. The lock in the JobQueue though seems to be particularly well working, and very fine-grained too.

For now I feel like it might be best to just try to acquire an exclusive lock on a global file for each Cargo invocation and just abort if there's another Cargo instance running elsewhere as it's what I feel is the most robust solution that can be added now.

I agree, and hope you are fine with me using the current lock implementation, i.e. wait is still supported, and could be provided as an option in cargo/config.
As this is quite a different route compared to the fine-grained locking I attempt here, I would certainly create a new PR for it.

However, I am not sure if I should abandon this one as it actually works for me except for some hick-ups that can surely be fixed. This PR is like a proof of concept, that would just need more work to work correctly. Having two PRs (fine-grained and global locking) with different times to completion seems acceptable to me. What do you think ?

alexcrichton · 2015-06-30T16:24:44Z

Having two PRs (fine-grained and global locking) with different times to completion seems acceptable to me. What do you think ?

My worry about the approach taken in this PR is that there's very little static guarantee that the appropriate locks are held for various operations, and as Cargo grows to do more things in the future it seems inevitable that locks will be forgotten and/or mismanaged. I would prefer to see an approach that provides more of a static guarantee that a lock is held rather than sprinkling a few locks here and there throughout the codebase.

The case I'm trying to avoid is that we get a constant trickle of bug reports about concurrent invocations causing weird bugs (if they're expected to work) and we continue to patch up small areas where locks are needed and/or need to be cleaned up. If we instead just take a system-package-manager or git-like approach where we just abort if something else is running I feel like it's not necessarily the end of the world. We can guarantee that corruption and weird results won't happen, give nicer error messages, etc.

I suspect that running cargo concurrently is not used that widely, so I'm not sure if it's worth adding all the support to do so.

Byron · 2015-07-01T07:20:58Z

I agree, and am closing this PR in favour of a new one which will focus on a single, global cargo lock. As there are legitimate uses for non-blocking, wait-like functionality (see comment), I will keep the respective part from this PR alive, knowing that reviews will bring it into shape.

Thanks for your input, I found it very valuable.

This allows to build everything concurrently without failure provided the latest cargo is used. rust-lang/cargo#1764 It's still very early in development, but works for me nevertheless. [skip ci]

colin-kiegel · 2015-07-17T08:19:09Z

Speaking of 'wait-like' functionality - I have written a tiny wrapper script for cargo on linux. Of course it's only a temporary workaround - not a final solution.

Byron added 2 commits June 27, 2015 18:16

rust-highfive assigned alexcrichton Jun 27, 2015

Use the correct branch in file-lock crate

b45bbfe

Previously, a non-existant branch was set by accident.

Byron mentioned this pull request Jun 28, 2015

Concurrent Builds Byron/google-apis-rs#122

Closed

Byron added 9 commits June 28, 2015 07:16

Refactor CargoLock to not require a shared Config

d34af1b

This will make the type easier to use, especially in concjunction with utilities that don't necessarily have a `Config` instance available at all times. The adjustment will help implementing locking for the `Layout` type.

Lock Layout preparation if Config is available

aa4b12c

It's somewhat unlikely the directory creation fails, and I'd clearly prefer to lock the entire build process on a higher level.

Made TODO into a note

c16146a

Just because this is not really an issue to handle. [skip ci]

Update Cargo.lock to use latest of file_lock crate

deef1bf

Byron closed this Jul 1, 2015

Byron mentioned this pull request Jul 4, 2015

RFC: Globally Locked Cargo #1781

Closed

2 tasks

alexcrichton mentioned this pull request Mar 15, 2016

Fix running Cargo concurrently #2486

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC of WIP: Concurrent Cargo #1764

RFC of WIP: Concurrent Cargo #1764

Byron commented Jun 27, 2015

rust-highfive commented Jun 27, 2015

alexcrichton commented Jun 29, 2015

Byron commented Jun 30, 2015

alexcrichton commented Jun 30, 2015

Byron commented Jul 1, 2015

colin-kiegel commented Jul 17, 2015

RFC of WIP: Concurrent Cargo #1764

RFC of WIP: Concurrent Cargo #1764

Conversation

Byron commented Jun 27, 2015

Caveats

RFC

Progress

Issues to fix

Testing

rust-highfive commented Jun 27, 2015

alexcrichton commented Jun 29, 2015

Byron commented Jun 30, 2015

alexcrichton commented Jun 30, 2015

Byron commented Jul 1, 2015

colin-kiegel commented Jul 17, 2015