Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cache file name automatically changed #3680

Closed
lmxs1237 opened this issue Apr 25, 2020 · 9 comments
Closed

cache file name automatically changed #3680

lmxs1237 opened this issue Apr 25, 2020 · 9 comments
Labels
awaiting response we are waiting for your reply, please respond! :)

Comments

@lmxs1237
Copy link

lmxs1237 commented Apr 25, 2020

dvc version 0.93.0

What I did is using dvc run to run python code, and dvc add for dependencies.
Then changed one of the dependencies,dvc run again, and dvc add for the changed file

The cache for files automatically changed.
For example, for csv file
cache
- e2
- 83742f43b84506f6417e43f6ba666b

became
cache
- e2
- 83742f43b84506f6417e43f6ba666b.csv

which makes me unable to do dvc checkout.
Does anyone have idea about this? Thanks

image

@triage-new-issues triage-new-issues bot added the triage Needs to be triaged label Apr 25, 2020
@skshetry
Copy link
Member

skshetry commented Apr 25, 2020

Hi @lmxs1237, I notice that you are using a very old version (recent release is v0.93.0). Can you please retry your operations with updated version of dvc? For now, you can just rename the cache and retry. If you have external remote, dvc pull might work with a updated version (assuming the bug is fixed).

@skshetry skshetry added the awaiting response we are waiting for your reply, please respond! :) label Apr 25, 2020
@triage-new-issues triage-new-issues bot removed the triage Needs to be triaged label Apr 25, 2020
@lmxs1237
Copy link
Author

Hi @skshetry, I updated the version of dvc to v0.93.0, but it happened again. And I tried to change the cache file name with
mv 80cc95bf0407188578fb59eda1c2a7.csv 80cc95bf0407188578fb59eda1c2a7
Then somehow, other file's name changed, it's like totally randomly selecting cache and changing its name.

@skshetry
Copy link
Member

Can you check via grep where the hashes are being used?

$ grep -r <hash_with_csv` <dvc directory>

Can you also check that specific cache content if it's a json? To me, it looks like someone changed md5 for directory stage file from <hash>.dir to <hash>.csv.

@lmxs1237
Copy link
Author

@skshetry
This time I run
dvc run -f train.dvc -d data.csv -d code.py -o output python code.py
Then dvc add data.csv code.py , then git add all dvc.
Then it picks the python code and changes the cache name..
image
So I checked the hash of this python code, it returned
Binary file <dvc directory>/.dvc/state matches
<dvc directory>/train.dvc: - md5: 02a545fad57b4355030ff35c220daef4
<dvc directory>/src/code.py.dvc: - md5: 02a545fad57b4355030ff35c220daef4
And the cache content for python code is just code, for csv is csv file with comma separated.

Funny thing.. When I finished typing above words, 3 more cache file's name changed..
image

@efiop
Copy link
Contributor

efiop commented Apr 25, 2020

Whoa, that is extremely strange 👀 Having those .csv/.py suffixes shouldn't happen ever, something very serious is going on.

@efiop
Copy link
Contributor

efiop commented Apr 25, 2020

@lmxs1237 It would help us a lot if you could come up with a minimal reproducible script.

@efiop
Copy link
Contributor

efiop commented Apr 25, 2020

@lmxs1237 Are you sure you are not doing something weird with your cache? Or maybe this project is also used by someone else and they might be doing something odd to it? This looks really odd and I have hard time thinking that it has been a bug since at least 0.66.0 (but maybe it is so obscure and specific that it only happens in specific circumstances, we'll see). As a sanity check, could you run $ dvc version(it contains more than the version itself) and show us the output, please?

@karajan1001
Copy link
Contributor

The cache is exactly the same file with the original one. Sound like there is a program automatically adding a suffix to the cache file according to its file format whenever it reads.

@efiop
Copy link
Contributor

efiop commented Jun 3, 2020

Closing as stale.

@efiop efiop closed this as completed Jun 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting response we are waiting for your reply, please respond! :)
Projects
None yet
Development

No branches or pull requests

4 participants