Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get: handle non-DVC repositories #3089

Closed
shcheklein opened this issue Jan 8, 2020 · 5 comments · Fixed by #3097
Closed

get: handle non-DVC repositories #3089

shcheklein opened this issue Jan 8, 2020 · 5 comments · Fixed by #3097
Labels
feature request Requesting a new feature p1-important Important, aka current backlog of things to do

Comments

@shcheklein
Copy link
Member

Even though #2977 solved this for dvc import, it looks like it is still not supported for dvc get. It might be a simple bug or code path is different. We need to fix and add tests for this.

Version:

VC version: 0.80.0+82cdce
Python version: 3.7.5
Platform: Darwin-18.2.0-x86_64-i386-64bit
Binary: False
Package: None
Cache: reflink - True, hardlink - True, symlink - True
Filesystem type (cache directory): ('apfs', '/dev/disk1s1')
Filesystem type (workspace): ('apfs', '/dev/disk1s1') 

Reproduce:

mkdir git-repo
cd git-repo
git init
touch test
git commit -a -m "add test"
cd ..
dvc get ./git-repo test

outputs:

ERROR: failed to get 'test' from './git-repo' - URL './git-repo' is not a dvc repository.

@triage-new-issues triage-new-issues bot added the triage Needs to be triaged label Jan 8, 2020
@efiop efiop added the feature request Requesting a new feature label Jan 8, 2020
@triage-new-issues triage-new-issues bot removed the triage Needs to be triaged label Jan 8, 2020
@efiop efiop added p1-important Important, aka current backlog of things to do triage Needs to be triaged labels Jan 8, 2020
@triage-new-issues triage-new-issues bot removed the triage Needs to be triaged label Jan 8, 2020
@fabiosantoscode
Copy link
Contributor

Hello world! I am on this right now.

@fabiosantoscode
Copy link
Contributor

fabiosantoscode commented Jan 9, 2020

I need some help choosing an implementation.

The only way to download something seems to be in the dvc.remote.* namespace. The problem with the remotes is that they always require a Repo instance to be passed in to be initialised.

I figure there's a few solutions here, and would greatly appreciate some input on which direction to take:

  • My preferred option: Remove the Repo dependency from the remotes (they don't seem to use it much, the base class uses repo.cache which could be a separate argument, and the local remote needs a tree argument). This enables you to create a remote and download from a remote source without having a repository, which is what we need for get here.
  • Refactor-heavier option which might be good for code organisation: Create a new namespace, svc.download.* by factoring out the download function of each remote. There would be a function in its __init__ that chooses which downloader to use in each situation. The remotes would access the download function directly, since they already know what kind of remote they are and therefore which download function they need.
  • Hacky option: Create a valid DVC repository in a temporary directory, initiate an import, and then copy the imported file over to the get target path.

@fabiosantoscode
Copy link
Contributor

I just realised that I'm probably trying to fix too much. A closer look at #2977 reveals that it only fixes local repositories, which is probably an easier problem to fix for get here.

@shcheklein
Copy link
Member Author

Hi @fabiosantoscode !

This enables you to create a remote and download from a remote source without having a repository, which is what we need for get here.

could you elaborate on this, please? As far as I understand in this ticket actual file comes not from the Repo, and not even from a remote. It comes from the cloned git repo.

Regarding the other options - the same, I'm a bit confused, probably because we are not on the same page on the terminology and/or logic behind get/import.

@fabiosantoscode
Copy link
Contributor

Right. I probably jumped into looking at remotes because the get documentation states that it can download from any DVC-enabled git repository. My train of thought was to make it download from any git repository. Then I went into a tangent because I associated dvc.remote.* as the same concept as git remotes, which are git repositories. On a closer look they seem to be single files.

Got it, will use the existing git facilities to retrieve the thing. They probably work for local and remote repositories alike.

Thanks for clearing this up!

fabiosantoscode added a commit to fabiosantoscode/dvc that referenced this issue Jan 9, 2020
Allows us to `dvc get` from non-DVC source repositories.

Fixes iterative#3089
fabiosantoscode added a commit to fabiosantoscode/dvc that referenced this issue Jan 10, 2020
Allows us to `dvc get` from non-DVC source repositories.

Fixes iterative#3089
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Requesting a new feature p1-important Important, aka current backlog of things to do
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants