Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flawed logic in add_dockerfile() #533

Closed
antoine-sachet opened this issue Oct 13, 2020 · 3 comments
Closed

Flawed logic in add_dockerfile() #533

antoine-sachet opened this issue Oct 13, 2020 · 3 comments

Comments

@antoine-sachet
Copy link
Contributor

antoine-sachet commented Oct 13, 2020

I am working on a PR touching the add_dockerfile_XXX functions. They all depend on the internal dock_from_desc function which needed a good clean up. There is a big inconsistency between CRAN and non-CRAN packages.

We have some "direct dependencies", i.e. the packages listed in DESCRIPTION. Some may be from CRAN and some from other services. We also have "extended dependencies" which are the direct dependencies of the direct dependencies.

Currently, add_dockerfile will install all direct dependencies and no indirect dependencies.

Unless... we happen to have a non-CRAN direct dependency, in which case all non-CRAN indirect dependencies will be added to the Dockerfile.

To reproduce the bug:

  • create a new golem project
  • remove all dependencies in the description
  • pick a package you've installed from CRAN and add it in Imports -- I used ggplot2
  • install a dependency of ggplot2 from github -- for example remotes::install_github("r-lib/prettyunits")
  • call golem::add_dockerfile

The dockerfile only contains an install of ggplot2, with no mention of prettyunits or any other dependencies.

RUN R -e 'install.packages("remotes")'
RUN Rscript -e 'remotes::install_version("ggplot2",upgrade="never", version = "3.3.2")'

Now, add another package installed from github as an import. Let's say we install golem with remotes::install_github("ThinkR-open/golem")

Your imports look like:

Imports: 
  ggplot2,
  golem

Generate a new dockerfile with golem::add_dockerfile() and this times it contains:

RUN R -e 'install.packages("remotes")'
RUN Rscript -e 'remotes::install_version("ggplot2",upgrade="never", version = "3.3.2")'
RUN Rscript -e 'remotes::install_github("ThinkR-open/golem@aaae5c8788802a7b4aef4df23691902a286dd964")'
RUN Rscript -e 'remotes::install_github("r-lib/rlang@4e2bbe18695d9fd17951e9d7ae92535eb095ff6d")'
RUN Rscript -e 'remotes::install_github("r-lib/prettyunits@b1cdad88b900cabb1622a1dfd2c7e6361c2a66f2")'
RUN Rscript -e 'remotes::install_github("r-lib/lifecycle@355dcba8530bcce57c15847165ca568f9f81b43e")'
RUN Rscript -e 'remotes::install_github("antoine-sachet/usethis@02e1e76519ec0415dfa8b1ac34ca881104be8c22")'
# The number of lines you get will depend on how many packages you have installed from github

That's obviously a bug, but it got me thinking: is it a good idea to set in stone the indirect dependencies?

Instead of fixing the bug and removing all indirect dependencies, maybe we should actually add all indrect dependencies, CRAN or not?

Pros:

  • More robust. If an indirect dependency is updated between 2 deployments and something breaks, the app won't be impacted as it will use a known-good version.

Cons:

  • I don't actually want to use all those dependencies from github, I'd rather use the CRAN version when I deploy!
  • the dockerfile may grow out of control, or at least be much bigger than it used to be
  • it is more intuitive for the end-user to have a one-to-one mapping between DESCRIPTION and Dockerfile

Note both options are a potentially breaking change.

@ColinFay @VincentGuyader Any thoughts? Apologies for the long message!

@ColinFay
Copy link
Member

Thanks a lot @antoine-sachet for this.

I'm a little bit torn on this issue: on one hand, we would like the Dockerfile to exactly match the user library. On the other hand, we would like to make it more robust, so indeed depending on a fork of a package is kind of weird. One the other hand (yes, the third one), we would love to be able to use a dependency that comes from GitHub. For example I might want to depend on [email protected] for some reasons.

I think the safest way to go would be to fit exactly to the versions from the computer, because chances are that in most cases that's what we want.

But as a safeguard, we might want to print a message to the console when the dependencies do not come from an known repository, i.e. when it comes to GitHub, we might want to print to the console something like:

Package xx has been locally installed from XX/XX, Docker will install from that very same source

Also, I think it might be safe to warn about a package that has no remote ? Something like:

We didn't find a repository for package {XX}, don't forget to specify it in your Dockerfile.

What do you think?

@antoine-sachet
Copy link
Contributor Author

antoine-sachet commented Oct 20, 2020

Hi Colin,

I agree, we should always use the versions installed in the user library.

My question was more about whether we should also specify the version of the indirect dependencies.

To take an example:

DESCRIPTION

Imports: shiny

will generate

Dockerfile

# If shiny is installed from CRAN
RUN Rscript -e 'remotes::install_version("shiny",upgrade="never", version = "1.5.0")'

# If shiny is installed from Github
RUN Rscript -e 'remotes::install_github("rstudio/shiny@4e2bbe18695d9fd17951e9d7ae92535eb095ff6d")'

But shiny itself depends on a bunch of packages:

#> as.data.frame(remotes::package_deps("shiny"))
                package                                installed                                available diff is_cran remote
BH                   BH                                 1.72.0-3                                 1.72.0-3    0    TRUE   CRAN
magrittr       magrittr                                      1.5                                      1.5    0    TRUE   CRAN
rlang             rlang 4e2bbe18695d9fd17951e9d7ae92535eb095ff6d 5210e6b91eb12b7cc811c081090a2d7d0948a61b   -1   FALSE GitHub
later             later                                    1.0.0                                  1.1.0.1   -1    TRUE   CRAN
Rcpp               Rcpp                                  1.0.4.6                                    1.0.5   -1    TRUE   CRAN
R6                   R6                                    2.4.1                                    2.4.1    0    TRUE   CRAN
base64enc     base64enc                                    0.1-3                                    0.1-3    0    TRUE   CRAN
digest           digest                                   0.6.25                                   0.6.26   -1    TRUE   CRAN
promises       promises                                    1.1.0                                    1.1.1   -1    TRUE   CRAN
glue               glue                                    1.4.1                                    1.4.2   -1    TRUE   CRAN
commonmark   commonmark                                      1.7                                      1.7    0    TRUE   CRAN
withr             withr                                    2.2.0                                    2.3.0   -1    TRUE   CRAN
fastmap         fastmap                                    1.0.1                                    1.0.1    0    TRUE   CRAN
crayon           crayon                                    1.3.4                                    1.3.4    0    TRUE   CRAN
sourcetools sourcetools                                    0.1.7                                    0.1.7    0    TRUE   CRAN
htmltools     htmltools                                    0.5.0                                    0.5.0    0    TRUE   CRAN
xtable           xtable                                    1.8-4                                    1.8-4    0    TRUE   CRAN
jsonlite       jsonlite                                    1.7.0                                    1.7.1   -1    TRUE   CRAN
mime               mime                                      0.9                                      0.9    0    TRUE   CRAN
httpuv           httpuv                                    1.5.2                                    1.5.4   -1    TRUE   CRAN
shiny             shiny                                    1.5.0                                    1.5.0    0    TRUE   CRAN

I don't think we should also lock those. If we did it would look like:

Dockerfile

# If shiny is installed from CRAN
RUN Rscript -e 'remotes::install_version("shiny",upgrade="never", version = "1.5.0")'
RUN Rscript -e 'remotes::install_version("BH",upgrade="never", version = "1.72.0-3")'
RUN Rscript -e 'remotes::install_version("magrittr",upgrade="never", version = "1.5")'
RUN Rscript -e 'remotes::install_github("r-lib/rlang@4e2bbe18695d9fd17951e9d7ae92535eb095ff6d",upgrade="never")'
RUN Rscript -e 'remotes::install_version("later",upgrade="never", version = "1.0.0")'
# etc

Note I have a github version of rlang installed so the Dockerfile installs it from github as well.

Currently, we lock non-cran indirect dependencies (that's what my example above was trying to demonstrate) but I don't think it was intended.

To summarise, the choices are:

  1. Only install direct dependencies (here, only shiny). They can be from CRAN or not.
  2. Install direct dependencies and non-cran indirect dependencies (here, shiny and rlang) -- current behaviour
  3. Install both direct and indirect dependencies (here, shiny and all its 20 dependencies)
  4. Why stop there? Install the dependencies of the dependencies of the dependencies...

I think only option 1 is reasonable, so that's what I'll implement unless you disagree.
Option 2 makes no sense (why not lock CRAN package versions as well?) and 3 is a slippery slope to 4.
Option 4 is nuclear and would literally install hundreds of package for any medium size app.

The messages when using a non-cran package are also a good idea, I'll do that.

Cheers
Antoine

@ColinFay
Copy link
Member

Hey,
All docker-related functions have moved to {dockerfiler}.
We'll keep track of the issues here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants