Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various issues in example-dvc-experiments #98

Closed
iesahin opened this issue Dec 7, 2021 · 11 comments
Closed

Various issues in example-dvc-experiments #98

iesahin opened this issue Dec 7, 2021 · 11 comments
Labels
bug Something isn't working priority-p0 Blocker, critical user facing issue, critical to hit deadlines

Comments

@iesahin
Copy link
Contributor

iesahin commented Dec 7, 2021

These are reported by @tapadipti (thanks). I'm moving here to discuss and follow:

I was running experiments by following the docs (https://dvc.org/doc/start/experiments) and encountered the following issues. Sharing here for any required action.

  1. dvc is not installed by pip install -r requirements.txt. So, if someone is trying to use a new virtual env, they need to install dvc separately. Would be good to include dvc in requirements.txt.

  2. dvc pull gave this error:

    ERROR: failed to pull data from the cloud - Checkout failed for following targets:
    models/model.h5
    metrics
    Is your cache up to date?
    <https://error.dvc.org/missing-files>
    
  3. dvc exp run lists all the image when running the extract stage. Would be good to remove -v from tar -xvzf data/images.tar.gz --directory data

  4. If you used dvc repro before section in the doc is a little unclear. Does dvc exp run replace dvc repro? If yes, can we state this clearly? Also would be great to change this statement We use dvc repro to run the pipeline... to dvc repro runs the pipeline...

@jorgeorpinel
Copy link
Contributor

This seems high priority.

@jorgeorpinel jorgeorpinel added bug Something isn't working priority-p0 Blocker, critical user facing issue, critical to hit deadlines labels Dec 7, 2021
@jorgeorpinel
Copy link
Contributor

We can remove bug and change to p1 after 2. is addressed at least, I think.

@iesahin
Copy link
Contributor Author

iesahin commented Dec 8, 2021

  1. dvc is not installed by pip install -r requirements.txt. So, if someone is trying to use a new virtual env, they need to install dvc separately. Would be good to include dvc in requirements.txt.

This was a bit intentional to let the users install DVC themselves, and a bit to prevent version conflicts. There are some conditions (like installing DVC to system and venv both with different dependencies) that cause weird behavior.

We can go on to this route though, it's a single line of change. Is it better to add dvc to the requirements.txt @shcheklein?

@tapadipti
Copy link
Contributor

If this was intentional and we don't want to include dvc in requirements.txt, then we should add an instruction that the user should install dvc. Currently, such an instruction is missing. It is unlikely that many people will reach the experiments page of the tutorial without first having installed dvc. But in case they try to work a new venv, it can be a `lil confusing.

@iesahin
Copy link
Contributor Author

iesahin commented Dec 8, 2021

I remembered why I left -v in tar, it was taking some time after extract to start running and the experiment looks like it's frozen. I've now updated the project not to use -v in tar, and also updated model.h5 in the remote. (We had a bug in DVC that was preventing to upload experiments.) Could you now check whether the project works as intended? @tapadipti

I'll create separate PRs in the docs for content updates. Thank you.

@tapadipti
Copy link
Contributor

Thanks @iesahin

dvc pull gave this error:

ERROR: failed to pull data from the cloud - Checkout failed for following targets:
/Users/tapadiptisitaula/Documents/test/example-dvc-experiments/models/model.h5
Is your cache up to date?
<https://error.dvc.org/missing-files>

So looks like metrics worked but not model.h5. And this time, the full file path is displayed.

Removing -v worked. The files are not listed anymore.

@iesahin
Copy link
Contributor Author

iesahin commented Dec 9, 2021

> ERROR: failed to pull data from the cloud - Checkout failed for following targets:
/Users/tapadiptisitaula/Documents/test/example-dvc-experiments/models/model.h5

Interesting. I double checked yesterday that the script pushing the artifacts has completed successfully. Now, I've checked again and it says:

dvc push
Everything is up to date.

Could you check the MD5 line in dvc.lock, corresponding to this line: https://github.com/iterative/example-dvc-experiments/blob/main/dvc.lock#L36

What's the MD5 hash value there, in your installation?

@iesahin
Copy link
Contributor Author

iesahin commented Dec 9, 2021

Also, I've checked after cloning the repository:

image

@tapadipti

@iesahin
Copy link
Contributor Author

iesahin commented Jan 18, 2022

The current staging version in https://github.com/iterative/example-dvc-staging resolves all of these issues. I think we can push it to example-dvc-experiments.

@shcheklein
Copy link
Member

@iesahin sounds good.

@iesahin
Copy link
Contributor Author

iesahin commented Jan 19, 2022

The most recent https://github.com/iterative/example-dvc-experiments resolves all these issues. The codification changes are in #97. Closing this.

@iesahin iesahin closed this as completed Jan 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority-p0 Blocker, critical user facing issue, critical to hit deadlines
Projects
None yet
Development

No branches or pull requests

4 participants