Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get/import: check target path before downloading #3673

Closed
jorgeorpinel opened this issue Apr 24, 2020 · 3 comments · Fixed by #4738
Closed

get/import: check target path before downloading #3673

jorgeorpinel opened this issue Apr 24, 2020 · 3 comments · Fixed by #4738
Labels
bug Did we break something? p1-important Important, aka current backlog of things to do research

Comments

@jorgeorpinel
Copy link
Contributor

λ dvc version
DVC version: 0.93.0
Python version: 3.7.5
Platform: Windows-10-10.0.18362-SP0
Binary: True
Package: exe
Supported remotes: azure, gdrive, gs, hdfs, http, https, s3, ssh, oss
Cache: reflink - not supported, hardlink - supported, symlink - not supported
Filesystem type (cache directory): ('NTFS', 'C:\\')
Repo: dvc, git
Filesystem type (workspace): ('NTFS', 'C:\\')

λ echo data > data.xml
λ la
...
-rw-r--r-- 1 usd grp        5 Apr 23 21:31 data.xml
λ dvc get https://github.com/iterative/dataset-registry get-started/data.xml -o data.xml
# HERE I SEE THE DOWNLOAD BAR FIRST, and the whole transfer is done up to 100%
ERROR: unexpected error - [WinError 5] Access is denied: 'C:\\Users\\poj12\\DVC-repos\\dvc-get-started\\data.xml.Cf4LfW3t5gZ7aK9F2qXquP'
λ la
...
-r--r--r-- 1 usd grp 37916850 Apr 23 21:26 data.xml
-rw-r--r-- 1 usd grp 37916850 Apr 23 21:26 data.xml.Cf4LfW3t5gZ7aK9F2qXquP

The result is I end up with the download having overwritten the small data.xml I created, but with an error message, and with the tmp file not deleted either, as it can be seen above.

@triage-new-issues triage-new-issues bot added the triage Needs to be triaged label Apr 24, 2020
@jorgeorpinel jorgeorpinel added the bug Did we break something? label Apr 25, 2020
@triage-new-issues triage-new-issues bot removed the triage Needs to be triaged label Apr 25, 2020
@efiop efiop added p1-important Important, aka current backlog of things to do research labels Apr 28, 2020
@efiop
Copy link
Contributor

efiop commented Oct 18, 2020

Not able to reproduce. Probably was related to some workspace issues. Feel free to reopen if you are still able to reproduce.

@efiop efiop closed this as completed Oct 18, 2020
@efiop
Copy link
Contributor

efiop commented Oct 18, 2020

Ok, there was a misunderstanding on my side, the error itself is irrelevant, as it is related to some setup issues, but the fact that the thing is downloaded before checking if file is already there is questionable. The temporary file is from the link failure, probably @jorgeorpinel had the symlink configured in global/system config without the rights to use them. Would've been great if we had a verbose log for the error 🙁

Keeping open for the further research, as we might want to revisit ExternalRepo.get_external to not download beforehand, but actually do that dynamically when checking out, so existing target could be checked for before.

Regarding the link issue, the logic change proposal from above is not really related to it, because we couldn't really know that the link will fail before trying to link the file in an atomic way. So need to give it a thought.

@efiop efiop reopened this Oct 18, 2020
@efiop efiop reopened this Nov 11, 2020
@jorgeorpinel
Copy link
Contributor Author

This is fixed now (tested on DVC 2.3.0).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Did we break something? p1-important Important, aka current backlog of things to do research
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants