Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass CONDA_OVERRIDE_CUDA to with_cuda of conda-lock #721

Merged
merged 2 commits into from
Jan 16, 2024

Conversation

nkaretnikov
Copy link
Contributor

@nkaretnikov nkaretnikov commented Jan 5, 2024

Fixes #719.

Description

This pull request makes it possible to set the CUDA version by passing the value of the CONDA_OVERRIDE_CUDA specification variable to the with_cuda parameter of conda-lock.

See https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-virtual.html#overriding-detected-packages

How I tested this:

  • Created this test env via the server admin environment page because conda-store UI strips variables even in raw YAML mode, so it doesn't work there:
channels:
- conda-forge
dependencies:
- pytorch
- ipykernel
- pip:
  - nothing
description: test2
name: test3
prefix: null
variables:
  CONDA_OVERRIDE_CUDA: '12.0'
  • Checked that the pytorch package in the generated lockfile has constraints matching this version of cuda. The url of the pytorch package should also reflect that, e.g., https://conda.anaconda.org/conda-forge/linux-64/pytorch-2.1.0-cuda120py312hfe5e8c6_301.conda.

  • Additionally, on a machine with an NVIDIA card that's configured to use CUDA:

% CONDA_STORE_USERNAME=test CONDA_STORE_PASSWORD=password  python -m conda_store --conda-store-url="http://localhost:8080/conda-store/" --auth=basic run test/test3:4 -- python
>>> import torch
>>> torch.__file__
'/tmp/conda-store/4/lib/python3.12/site-packages/torch/__init__.py'
>>> print(torch.version.cuda)
12.0

This version matches the one in the lockfile, which comes from the variable. Note: I had to set the url when calling conda-store CLI in order to match the url used by the server when running via docker.

  • Repeated the same with versions 10.0 and 11.0. Observed that these affect constraints in the pytorch package in the lockfile. Note that I've used different major versions as minor versions don't affect the constraints in the lockfile.

Pull request checklist

  • Did you test this change locally?
  • Did you update the documentation (if required)?
  • Did you add/update relevant tests for this change (if required)?

Additional information

Copy link

netlify bot commented Jan 5, 2024

Deploy Preview for kaleidoscopic-dango-0cf31d ready!

Name Link
🔨 Latest commit 8c9390c
🔍 Latest deploy log https://app.netlify.com/sites/kaleidoscopic-dango-0cf31d/deploys/65a6ec80fa29ce00088fc76f
😎 Deploy Preview https://deploy-preview-721--kaleidoscopic-dango-0cf31d.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@nkaretnikov
Copy link
Contributor Author

@amjames Do you have cycles to review this one?

@nkaretnikov nkaretnikov marked this pull request as ready for review January 5, 2024 09:56
@nkaretnikov nkaretnikov added the area: dependencies 📦 Issues related to conda-store dependencies label Jan 5, 2024
Copy link

@amjames amjames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question otherwise looks good for a temp fix.

# https://github.com/conda-incubator/conda-store/issues/719
# https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-virtual.html#overriding-detected-packages
if specification.variables is not None:
cuda_version = specification.variables.get("CONDA_OVERRIDE_CUDA")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CONDA_OVERRIDE_CUDA variable could come in as a string (expected by conda-lock) or a float.

The example from the description of this PR

...
variables:
  CONDA_OVERRIDE_CUDA: '12.0'

Will come through as a string and be forwarded to conda-lock, but the example in the original issue has:

...
variables:
  CONDA_OVERRIDE_CUDA: 11.8

Which would be parsed as float and be forwarded to conda-lock as a float.

I don't see a test update associated with this, so I am assuming we don't have any coverage here, could you try this out locally to see if we need to convert the value as we pull it off the specification?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a print to action_solve_lockfile to verify. It's always a string, even if you don't quote it. Tried with 12, 12.0, and 12.2.

(In fact, the admin UI renders previously unquoted versions in quotes once you submit an env.)

So no action is needed here.


You also asked offline whether using CONDA_OVERRIDE_CUDA on a machine without a GPU allows for using PyTorch with CUDA on a GPU-enabled machine later. I've tried the following:

  • On a machine with an Intel GPU (so no CUDA):
    • added the env from the top comment to conda-store, which has CONDA_OVERRIDE_CUDA: '12.0'
    • waited for it to generate the lockfile (but didn't wait for it to build)
    • downloaded the lockfile
  • On a machine with an NVIDIA GPU (and CUDA configured):
    • created an env from the downloaded lockfile via conda-lock install -n mytest1 --micromamba ~/mytest1.json
    • activated the env
    • confirmed that CUDA works:
>>> import torch
>>> torch.__file__
'/home/nkaretnikov/.conda/envs/mytest1/lib/python3.12/site-packages/torch/__init__.py'
>>> print(torch.version.cuda)
12.0
>>> x = torch.tensor(1, device='cuda')
>>> x.device
device(type='cuda', index=0)
>>> x
tensor(1, device='cuda:0')

Please let me know if you have any other questions!

@nkaretnikov
Copy link
Contributor Author

@trallard Could you approve so I could merge? Andrew has reviewed this.

Comment on lines +40 to +42
# conda-lock ignores variables defined in the specification, so this code
# gets the value of CONDA_OVERRIDE_CUDA and passes it to conda-lock via
# the with_cuda parameter, see:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, so non blocking but since we said this is an "interim" solution it would be best to add also a TODO/something indicating that we should look at a more robust approach

@trallard
Copy link
Collaborator

Added a non-blocking comment otherwise we can merge

@nkaretnikov nkaretnikov merged commit f63b4d7 into conda-incubator:main Jan 16, 2024
18 checks passed
@nkaretnikov nkaretnikov deleted the env-vars-719 branch January 16, 2024 21:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: configuration area: dependencies 📦 Issues related to conda-store dependencies area: user experience 👩🏻‍💻 Items impacting the end-user experience type: bug 🐛 Something isn't working
Projects
Status: Done 💪🏾
Development

Successfully merging this pull request may close these issues.

[BUG] - Conda-Store strips environment vars from env.yaml spec.
3 participants