Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download only subdirectory for GitHub packages when appropriate #2012

Closed
jashapiro opened this issue Oct 10, 2024 · 3 comments
Closed

Download only subdirectory for GitHub packages when appropriate #2012

jashapiro opened this issue Oct 10, 2024 · 3 comments

Comments

@jashapiro
Copy link

When installing a package from GitHub where the package is in a subdirectory of the main repository, renv::install() downloads the entire repository. For some repositories this may be a non-issue, but if the repository is large (and the R package is only a small part of the repository), this may result in a long, slow download.

Might it be possible to implement a sparse-checkout or equivalent to reduce downloads and improve install times in this situation?

@kevinushey
Copy link
Collaborator

It should be possible, but I'm not sure if there's anything made available by the GitHub API. We could implement support for this when installing packages via git remotes though, e.g.

renv::install("git::[email protected]:user/repo.git:subdir")

For reference, here's the git command we generate to check out sources from a particular repository.

renv/R/retrieve.R

Lines 499 to 504 in 1c8c64c

template <- heredoc('
git init ${QUIET}
git remote add origin "${ORIGIN}"
git fetch ${QUIET} --depth=1 origin "${REF}"
git reset ${QUIET} --hard FETCH_HEAD
')

All that said -- it's also possible that R packages within a project sub-directory might still depend on files outside of that sub-directory, so I'm not sure if this is something we could do by default.

@jashapiro
Copy link
Author

It should be possible, but I'm not sure if there's anything made available by the GitHub API. We could implement support for this when installing packages via git remotes though, e.g.

renv::install("git::[email protected]:user/repo.git:subdir")

For reference, here's the git command we generate to check out sources from a particular repository.

renv/R/retrieve.R

Lines 499 to 504 in 1c8c64c

template <- heredoc('
git init ${QUIET}
git remote add origin "${ORIGIN}"
git fetch ${QUIET} --depth=1 origin "${REF}"
git reset ${QUIET} --hard FETCH_HEAD
')

All that said -- it's also possible that R packages within a project sub-directory might still depend on files outside of that sub-directory, so I'm not sure if this is something we could do by default.

Yes, after some investigation on other fronts, I am not sure it is a great idea, or even a particularly useful one. When I was trying to figure out how this might actually work, I found that the .git directory for the repository I was working with which prompted this request was quite large on its own. So the best solution ended up being to break out the R package to its own repository anyway.

@kevinushey
Copy link
Collaborator

Thanks! In that case, I think this issue can be closed? Let me know if there's yet anything else I can improve on the renv side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants