Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On Windows, dvc repro spends an awful lot of time in rwlock #3653

Closed
charlesbaynham opened this issue Apr 19, 2020 · 20 comments · Fixed by #4052
Closed

On Windows, dvc repro spends an awful lot of time in rwlock #3653

charlesbaynham opened this issue Apr 19, 2020 · 20 comments · Fixed by #4052
Assignees
Labels
awaiting response we are waiting for your reply, please respond! :) p1-important Important, aka current backlog of things to do performance improvement over resource / time consuming tasks research

Comments

@charlesbaynham
Copy link
Contributor

Windows 7 / 10, dvc 0.93.0+f5d89f (after merging #3645)

Running cProfile on dvc repro -P in a repository with around 200 steps all of which are up to date. On my system this takes 65s, of which 42s is spent in _edit_rwlock.

I wonder, is it necessary to lock each individual file here? Couldn't the whole repository be locked instead?

Maybe a broader question: what are the respective purposes of .dvc/tmp/rwlock, .dvc/lockand .dvc/updater.lock?

@triage-new-issues triage-new-issues bot added the triage Needs to be triaged label Apr 19, 2020
@skshetry skshetry added performance improvement over resource / time consuming tasks research labels Apr 20, 2020
@triage-new-issues triage-new-issues bot removed the triage Needs to be triaged label Apr 20, 2020
@efiop
Copy link
Contributor

efiop commented Apr 20, 2020

Running cProfile on dvc repro -P in a repository with around 200 steps all of which are up to date. On my system this takes 65s, of which 42s is spent in _edit_rwlock.

Thanks for the investigation! That is indeed pretty strange, we need to take a closer look. Could you show your dvc version output btw?

I wonder, is it necessary to lock each individual file here? Couldn't the whole repository be locked instead?

Actually, that's what we used to do, but rwlock was introduced to support granular locking, so that you could run dvc in parallel.

Maybe a broader question: what are the respective purposes of .dvc/tmp/rwlock, .dvc/lockand .dvc/updater.lock?

Useful link: https://dvc.org/doc/user-guide/dvc-files-and-directories 🙂 rwlock is for granular locking of specific deps and outs, lock is for locking the whole repo (when you need to push/pull/etc or need to create/modify/delete dvc-file). updater.lock is for our updater worker, that checks for updates once a week.

@efiop
Copy link
Contributor

efiop commented Apr 20, 2020

@charlesbaynham Btw, have you tried running that on Linux? Wonder if this is windows-specific or if we have troubles everywhere.

@efiop efiop added the awaiting response we are waiting for your reply, please respond! :) label Apr 20, 2020
@charlesbaynham
Copy link
Contributor Author

Thanks for the run down @efiop, that's helpful. The version that ran the above was 0.93.0+f5d89f, which is v0.93.0 + the changes in PR #3645. That's necessary because otherwise the effect is swamped by all the time spent reproducing duplicate steps.

I'll give it a try on linux: annoyingly the repository I tested on has some case sensitivity problems which I've glossed over so far on Windows but which prevent it from working on linux, but I should fix that anyway.

The cProfile results for my repo on Windows:
repro_nowork.zip

@charlesbaynham
Copy link
Contributor Author

Here's the cProfile results for dvc repro -P in the same repo, run on Linux and Windows.

repro_profile.zip

It looks like the same fraction of time is spend writing to rwlock in both (just via posix.fsync instead of nt.fsync)

@charlesbaynham
Copy link
Contributor Author

Here's a screenshot of the profile for Windows:

image

Zooming in, you can see that the actual calls to Stage.changed() don't take much time at all, though that's where I'd have expected most of the load to be.

image

@efiop efiop added the p1-important Important, aka current backlog of things to do label Apr 28, 2020
@efiop
Copy link
Contributor

efiop commented Apr 28, 2020

@charlesbaynham Sorry for the delay. Admitedly didn't have much time to look deeply into it, but could you give my patch a try, please:

pip uninstall -y dvc
# make sure there is no old dvc version installed
pip install git+https://github.com/efiop/dvc@3653

?

@charlesbaynham
Copy link
Contributor Author

Yes that makes a lot of difference. Here are two profiles in the same repo on Window 7, one at v0.93.0+f5d89f and the other at your patch:

data_processing.zip

Your patch cuts the time spent in _reproduce_stages from 62s:
image

to 23s

image

@charlesbaynham
Copy link
Contributor Author

But doesn't that just effectively disable the lock?

@charlesbaynham
Copy link
Contributor Author

I also notice that we're making 1.4 million calls to normpath which seems like a lot for a repo with ~200 files in it!

@pmrowla
Copy link
Contributor

pmrowla commented Apr 29, 2020

Took a quick look at the profiler results to see if the excessive normpath calls were pathinfo related, and they are not, but we do make 220k calls to relpath in dvc.utils (with string paths)

@efiop
Copy link
Contributor

efiop commented May 10, 2020

@charlesbaynham Sorry for the delay. It doesn't really disable it, just doesn't try to be paranoid about sync-ing to disk. We'll need to take a closer look. We don't have the capacity to do it right now, but we'll try to do that in the nearest future.

@charlesbaynham
Copy link
Contributor Author

No need for any apology, the work you guys do is amazing.

Maybe a compromise could be to only flush the lock if dvc determines that actual work needs to be done to reproduce a stage. In that scenario it'd be a small fraction of the time taken.

@efiop
Copy link
Contributor

efiop commented May 11, 2020

Maybe a compromise could be to only flush the lock if dvc determines that actual work needs to be done to reproduce a stage. In that scenario it'd be a small fraction of the time taken.

The thing is that it modifies rwlock before checking that something has changed, so that no other stage is in the middle of modifying something that we will be checking. 🙁 It might be as simple as my patch that removes paranoid fsync-ing, but need to double check that. If not, I'm sure we'll find another way to optimize this.

You and @pmrowla have also noticed something that we know is problematic: relpath calls. The solution there is not stop trying to abspath every path we use and just use whatever paths we get. That would also fix the issues with our abspaths getting longer than PATH_MAX limit. dvc doesn't really chdir at any point in its operation, so that should be very safe and efficient. The reason why we didn't do that from day 0 is that we weren't really sure which direction dvc will go, so just wanted to be safe with using deterministic abspaths, but we can surely do the switch now.

@skshetry
Copy link
Member

@efiop, I think we can get rid of the fsync(), as the rwlock is a temporary and it's totally fine for it to either get corrupted and prefer performance over durability. Also, at one time, there can only be a single instance of dvc that can write to rwlock.

In my synthetic benchmark, I see 12X improvement (146s vs 12s) just by removing fsync().
Regarding relpath calls, it's mostly coming from Stage::__eq__ or Stage::__hash__ used by networkx.

We can assume Stage to be immutable and cache relpath and path_in_repo calls (used by __eq__ and __hash__ to further reduce the time to 7.5s (~20X) improvement. Again, take this with a grain of salt as it is a very silly/synthetic benchmark.

Also, it's interesting to see that the fsync calls on Windows is twice as fast as on Linux.

@efiop
Copy link
Contributor

efiop commented Jun 16, 2020

@skshetry Sounds good. Let's get rid of it for now.

@Suor
Copy link
Contributor

Suor commented Jun 16, 2020

BTW, we already discussed this fsync() here :)

@charlesbaynham
Copy link
Contributor Author

You guys are great, thank you!

@skshetry
Copy link
Member

@charlesbaynham, can you give dvc==1.0.0b5 a try? Would love to see the difference. 🙂

@efiop
Copy link
Contributor

efiop commented Jun 18, 2020

Btw, for the record: we'll add a benchmark soon for this (or related) iterative/dvc-bench#73 .

@charlesbaynham
Copy link
Contributor Author

Sorry for the absolute age it has taken me to respond here. You might be interested to see the performance on the same repository using a new version of the code: v1.6.0:

image

So that's 13s instead of the initial 65s, a 5x improvement. No obvious bottlenecks remaining either:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting response we are waiting for your reply, please respond! :) p1-important Important, aka current backlog of things to do performance improvement over resource / time consuming tasks research
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants