Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MNT: Use conda-lock lock file for reproducible environment #32

Merged
merged 2 commits into from
Oct 6, 2023

Conversation

matthewfeickert
Copy link
Member

@matthewfeickert matthewfeickert commented Aug 11, 2023

This PR does 3 things:

  1. Define the uploader's high level dependencies in a conda environment.yml file.
  2. Provide a Bash script (could have also been a noxfile but I thought that people might want to keep things less fancy and just go with something very straightforward) that uses the same Docker image that the Action's Dockerfile uses as a base image to install conda-lock and then create a conda-lock.yml lock file for the environment.yml which fully specifies the environment down to the hash level.
  3. Using this lock file in the Action's Dockerfile have the Action install the specified locked environment efficiently and reproducibly.

Advantages of this method

  • Updating to a new release of anaconda-client can be tested through the CI tests with high confidence that no breaking changes will follow. anaconda-client has many dependencies, that are (correctly) not restricted, which means that specifying the anaconda-client version number alone is not enough as all these dependencies can float beneath it and in addition to potentially breaking any anaconda-client release in the future can be a security risk. The only way around this is through a hash level lock file which specifies the exact binary that should be installed.
    • With no blame towards the Anaconda open source team, I have had to submit patches to conda-forge to unbreak anaconda-client before as its SemVer nature is not checked explicitly, and so securing reproducible environments for it is rather important.
  • The update is easy:
    1. Update the environment.yml and then run bash lock.sh.
    2. Commit the updated environment.yml and conda-lock.yml to version control and make a new PR.
    3. If the upload tests pass then this new environment can be released to everyone.
  • Using a lock file means that no environment solve is done during the build. It was already done when the lock file was created, so to create the environment micromamba just needs to download each binary listed in the lock file. (Of course, as this environment is so simple this saves very little time, but it is true in general regardless of the size of the environment.)
  • This (hopefully!) means that the SemVer nature of this project's versioning can be leaned into and making new releases is viewed as a trivially verifiable and fast chore.

I'd like to make release 0.2.0 after this is merged.

@matthewfeickert matthewfeickert added the enhancement New feature or request label Aug 11, 2023
@matthewfeickert matthewfeickert added this to the 0.2.0 milestone Aug 11, 2023
@matthewfeickert matthewfeickert self-assigned this Aug 11, 2023
@matthewfeickert matthewfeickert temporarily deployed to remove-old-wheels August 11, 2023 04:32 — with GitHub Actions Inactive
Comment on lines +1 to +16
#!/bin/bash

# This should be the base image the Docker image built by the GitHub Action uses
BUILD_IMAGE="mambaorg/micromamba:1.4.9-bullseye-slim"
docker pull "${BUILD_IMAGE}"

# Don't need to ensure emulation possible on non-x86_64 platforms at the
# `docker run` level as the platform is specified in the conda-lock command.
docker run \
--rm \
--volume "${PWD}":/work \
--workdir /work \
"${BUILD_IMAGE}" \
/bin/bash -c "\
micromamba install --yes --channel conda-forge conda-lock && \
conda-lock lock --micromamba --platform linux-64 --file environment.yml"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For maintainability I would like to know if this Bash script is clear enough as to what is happening and why (hopefully the name lock.sh helps it a bit as well).

Comment on lines +11 to +12
# lock file created by running lock.sh in the top level of the repository
COPY --chown=mambauser conda-lock.yml /conda-lock.yml
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lock file is now under version control so we have it accessible to us at the Docker image build time to be able to COPY in.

"${BUILD_IMAGE}" \
/bin/bash -c "\
micromamba install --yes --channel conda-forge conda-lock && \
conda-lock lock --micromamba --platform linux-64 --file environment.yml"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default user for mambaorg/micromamba:1.4.9-bullseye-slim is a non-root user with id 1000 so the generated conda-lock.yml is under the ownership of the user who ran the lock.sh script. So you don't have to mess around with getting a file owned by root and then needing to do permission and ownership changes on it.

@matthewfeickert matthewfeickert temporarily deployed to remove-old-wheels August 15, 2023 01:31 — with GitHub Actions Inactive
@bsipocz
Copy link
Member

bsipocz commented Aug 15, 2023

Do you, or anyone else know what this means on a PR? Is the action configured improperly?

Screenshot 2023-08-14 at 18 37 08

Also, we haven't stated preferred dev workflow, but I always went with the assumption of fork's feature branch+PR is the preferred way rather than keeping feature branches on the main repo. Could we do that in the future?

@matthewfeickert
Copy link
Member Author

matthewfeickert commented Aug 15, 2023

Do you, or anyone else know what this means on a PR? Is the action configured improperly?

Things are working as intended. "deployed to remove-old-wheels" means that the remove-old-wheels environment (added in PR #12) was used in the workflow.

This bit that uses the environment

cleanup:
runs-on: ubuntu-latest
needs: [test]
# Set required workflow secrets in the environment for additional security
# https://github.com/scientific-python/upload-nightly-action/settings/environments
environment:
name: remove-old-wheels

was added in PR #27 because you asked for the test upload to be removed in the same workflow it was uploaded in (#27 (comment)).

I always went with the assumption of fork's feature branch+PR is the preferred way rather than keeping feature branches on the main repo. Could we do that in the future?

No, as the CI needs to run

jobs:
test:
name: "test upload via action"
runs-on: ubuntu-latest
if: github.repository == 'scientific-python/upload-nightly-action'

and the CI requires secrets that are repo/org specific.

anaconda_nightly_upload_token: ${{ secrets.UPLOAD_TOKEN }}

anaconda --token ${{ secrets.ANACONDA_TOKEN }} remove \

There is pull_request_target triggers but that also introduces security threats and so should be avoided unless there is a very good reason to use otherwise.

I see no problems in having (short lived dev) branches other than main exist.

@matthewfeickert
Copy link
Member Author

matthewfeickert commented Oct 4, 2023

This PR would have prevented Issue #43 from affecting users. @Carreau can you review?

(Relocked and bumped to reshow that this works.)

* Add lock.sh which uses the same Docker image used for deployment to
  install conda-lock and then build the conda-lock lock file from the
  high level environment.yml.
* Add high level environment.yml and use conda-lock v2.x to create a
  hash level conda-lock.yml lock file of the environment.

  ```
  conda-lock lock --micromamba --platform linux-64 --file environment.yml
  ```

* COPY conda-lock.yml lock file to Dockerfile during build.
* Use the conda-lock.yml lock file to create an upload-nightly-action
  micromamba environment and activate it to have access to
  anaconda-client.
@matthewfeickert matthewfeickert temporarily deployed to remove-old-wheels October 4, 2023 16:28 — with GitHub Actions Inactive
Copy link
Member

@MridulS MridulS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, thanks @matthewfeickert!

We have 2 reviews, not sure if we should wait for more?
If not I can merge this and maybe we can do a 0.2.0 release?

@tupui
Copy link
Member

tupui commented Oct 4, 2023

Just adding a note here. If we start to use lock files, we need to be careful when integrating changes. There have been cases of supply chain attack using lock files. It's very easy to miss a change in a lock file as these are usually very large and people rarely check the diffs.

@bsipocz
Copy link
Member

bsipocz commented Oct 4, 2023

Does it have to be a full on conda lock, cannot we simply stay within pip land and use a much smaller file there?

@matthewfeickert
Copy link
Member Author

matthewfeickert commented Oct 4, 2023

Just adding a note here. If we start to use lock files, we need to be careful when integrating changes. There have been cases of supply chain attack using lock files. It's very easy to miss a change in a lock file as these are usually very large and people rarely check the diffs.

@tupui Can you elaborate? If you mean by injection of malicious software by manually editing the lock file then this is checked in two ways:

  1. Only allowing incremental updates (with conda-lock --update <target package>) and not entire re-locks by outside contributors.
  2. You can also check the file by checking out the PR branch and rerunning lock.sh. The diff should immeditaltey show if there is anything suspicious.

Does it have to be a full on conda lock, cannot we simply stay within pip land and use a much smaller file there?

Anaconda does not support anaconda-client on PyPI, so no. pip-compile is not going to be singificanly smaller though, so that wouldn't help anyway.

@MridulS
Copy link
Member

MridulS commented Oct 4, 2023

Just adding a note here. If we start to use lock files, we need to be careful when integrating changes. There have been cases of supply chain attack using lock files. It's very easy to miss a change in a lock file as these are usually very large and people rarely check the diffs.

We can also just make github run the lock.sh script and ask it to resolve and generate the lock file, but it would probably be a bit overkill. I don't really think there would be too much activity/changes in lock file here.

@matthewfeickert
Copy link
Member Author

If not I can merge this and maybe we can do a 0.2.0 release?

@MridulS, yes once @tupui and @bsipocz respond (assuming they don't have any further concerns) then I would very much like to get this merged and to get a v0.2.0 out.

@matthewfeickert
Copy link
Member Author

matthewfeickert commented Oct 6, 2023

Gentle ping to @tupui and @bsipocz (before it gets to be Friday night in Europe). If there are no further comments by end of the day US time I'm going to merge this as we already have 2 approving reviews from @Carreau and @MridulS. If there are additional comments later we can always iterate in a new Issue.

@tupui
Copy link
Member

tupui commented Oct 6, 2023

Sorry all good 👍 I just wanted to bring that up.

@bsipocz
Copy link
Member

bsipocz commented Oct 6, 2023

It's an ok from me too. But would advise against a Friday night release.

@matthewfeickert
Copy link
Member Author

But would advise against a Friday night release.

This is a fix for a currently broken procees.

@matthewfeickert matthewfeickert merged commit 5fb764c into main Oct 6, 2023
2 checks passed
@matthewfeickert matthewfeickert deleted the mnt/use-conda-lock branch October 6, 2023 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants