Change `cpuonly` to a conda mutex #488

scopatz · 2020-08-11T19:26:22Z

This is in an effort to address pytorch/pytorch#40213

But I think I have ended up with more questions than solutions.

Where does the main, templated meta.yaml actually live?
The build_pytorch.sh seems like it would be better suited to being in Python or some kind of cross-platform language that actually has string formatting. I am curious as to why it is in Bash...
How does this actually work? I have tried following the instructions in the README several times now locally on master, and I have gotten them to fail in a variety of ways, but they seem out of date in some ways. E.g. there is no pytorch-vX.Y.Z/ directory.
Does the current way of templating the meta.yaml allow for multiple outputs? It seems like it does not / may not, which is sort of required for moving to a mutex.

CC @ezyang @rgommers & Thanks in advance!

rgommers · 2020-08-11T21:03:45Z

Yep I've gotten lost here a few times. I don't know the answers, so let's try to ping @seemethere and @malfet as the experts here. If one of you can help answer these questions, then @scopatz can return the favour by updating the README with the info he's currently missing.

scopatz · 2020-08-11T21:26:11Z

It looks like everything is based off of what is in nightly?

seemethere · 2020-08-11T22:08:46Z

Yes this repository is a mess unfortunately :(

The main meta.yaml for pytorch packages is here: https://github.com/pytorch/builder/blob/master/conda/pytorch-nightly/meta.yaml
I don't disagree, we actually have something on our roadmap to refactor these scripts in general to make them easier to understand / easier to run
I haven't actually built a PyTorch conda package locally in a long time so building one using our README instructions, the last command I have in my zsh history that did this looked something like this:

docker run --rm -it \
    -e PACKAGE_TYPE=conda \
    -e DESIRED_CUDA=cu101 \
    -e DESIRED_PYTHON=3.8 \
    -e PYTORCH_BUILD_VERSION=1.5.0 \
    -e PYTORCH_BUILD_NUMBER=1 \
    -e OVERRIDE_PACKAGE_VERSION=1.5.0 \
    -e TORCH_CONDA_BUILD_FOLDER='pytorch-nightly' \
    -v ${path_to_pytorch}:/pytorch \
    -v ${path_to_build}:/builder \
    -v "$(pwd):/final_pkgs" \
    pytorch/conda-cuda \
    /builder/conda/build_pytorch.sh |& tee build.log

Which is admittedly pretty awful.

I don't believe we support multiple outputs, so to answer your question, most likely no

scopatz · 2020-08-28T20:16:43Z

Hi @seemethere - thanks! This was extremely helpful. I was able to get it successfully building locally.

I have refactored the recipe to build with a gpu mutex package now, and also simplified some of the logic. I have tested this with cuda 10.1 and cpu-only at this point. Building locally, though, does produce some side effects on the system that end up having root permissions even when run as a regular user, which is not great.

There is clearly more work to do here in terms of cleanup and documentation, which I will be getting to next week. I wanted to mention this now, just as a status update and give the opportunity for comments and feedback.

scopatz · 2020-09-16T00:55:04Z

Hi folks, I believe that this working now. I have tried it locally in conjunction with the nonroot branch (#518) and the mutex packages are built and docker run completes successfully. There are now three packages that are created: pytorch itself, a cudaXY convenience package, and the gpu mutex

rgommers

That sounds good! This PR LGTM as far as I can judge.

What would be the steps to deploy this? E.g., since this changes the current cpuonly, would it require a coordinates release of cpuonly, pytorch, torchvision, etc.?

rgommers · 2020-09-16T07:46:34Z

conda/cpuonly/meta.yaml

+    - pytorch-proc * cpu
+
+outputs:
+  # A meta-package to select CPU or GPU build for faiss.


This comment is a little confusing - why specifically for faiss? I assume it's for pytorch and possibly other packages too?

Oopps, this change shouldn't have made it in

ezyang · 2020-09-16T13:43:31Z

The first step would be to get the nightly conda packages onto the new mutex scheme, so we can iron out bugs. This is going to need coordination with a release manager, e.g., @seemethere.

scopatz · 2020-10-21T19:26:44Z

BTW I have fixed the conflicts here and this is good to go, as far as I am concerned. Let me know if you want to sync on it @seemethere!

seemethere

PR looks mostly good, going to hold off on merging until after the 1.7 release

facebook-github-bot · 2020-10-30T17:21:06Z

Hi @scopatz!

Thank you for your pull request. We require contributors to sign our Contributor License Agreement, and yours needs attention.

You currently have a record in our system, but we do not have a signature on file.

In order for us to review and merge your code, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

kev-zheng · 2021-01-05T14:48:53Z

Hey! quick bump -- We'd really like to use this!

rgommers · 2021-01-05T16:08:43Z

@kev-zheng thanks for the ping. Just curious, where are you planning to use this?

rgommers · 2021-01-05T16:12:17Z

I see the CLA check turned red, @scopatz would you mind signing it (takes about 10 seconds)?

kev-zheng · 2021-01-05T16:17:54Z

@kev-zheng thanks for the ping. Just curious, where are you planning to use this?

We have a large conda environment that include pytorch and cpuonly depending on gpu. Specifically, we're building environment specs with conda-lock and mamba seems to massively speed up generating lockfiles. Not a huge issue, but decent QOL improvement

scopatz · 2021-01-18T17:48:44Z

Hi All, On the CLA issue, all code here was written on behalf of Quansight. Is this for some reason not covered by Quansight's corporate CLA (https://code.facebook.com/cla/corporate)?

rgommers · 2021-01-18T17:58:21Z

Hi All, On the CLA issue, all code here was written on behalf of Quansight. Is this for some reason not covered by Quansight's corporate CLA (https://code.facebook.com/cla/corporate)?

We filled out both personal and corporate CLAs for each team member. You're not on the corporate CLA, because it was signed in early Nov just after you left (it's not retroactive).

scopatz · 2021-01-27T23:23:56Z

Alright, in order to move this along, I have fully read through and signed the CLA. Sorry it took so long

rgommers · 2021-01-29T05:34:56Z

Thanks @scopatz!

seemethere · 2021-02-01T22:12:31Z

So I'm in the process of trying to merge this and I'm running into a weird issue when attempting to build:

$ docker run --rm \
    -it \
    -e DESIRED_PYTHON=3.8 \
    -e DESIRED_CUDA="110" \
    -e CUDA_VERSION=11.0 \
    -e PYTORCH_BUILD_VERSION=1.0.0 \
    -e PYTORCH_BUILD_NUMBER=1 \
    -v "$(pwd)":/builder \
    -v /raid/eliuriegas/pytorch:/pytorch \
    pytorch/conda-builder:cuda11.0 \
    /builder/conda/build_pytorch.sh
....
Traceback (most recent call last):
  File "/opt/conda/bin/conda-build", line 11, in <module>
    sys.exit(main())
  File "/opt/conda/lib/python3.8/site-packages/conda_build/cli/main_build.py", line 481, in main
    execute(sys.argv[1:])
  File "/opt/conda/lib/python3.8/site-packages/conda_build/cli/main_build.py", line 470, in execute
    outputs = api.build(args.recipe, post=args.post, test_run_post=args.test_run_post,
  File "/opt/conda/lib/python3.8/site-packages/conda_build/api.py", line 186, in build
    return build_tree(
  File "/opt/conda/lib/python3.8/site-packages/conda_build/build.py", line 3068, in build_tree
    packages_from_this = build(metadata, stats,
  File "/opt/conda/lib/python3.8/site-packages/conda_build/build.py", line 2031, in build
    output_metas = expand_outputs([(m, need_source_download, need_reparse_in_env)])
  File "/opt/conda/lib/python3.8/site-packages/conda_build/render.py", line 789, in expand_outputs
    for (output_dict, m) in deepcopy(_m).get_output_metadata_set(permit_unsatisfiable_variants=False):
  File "/opt/conda/lib/python3.8/site-packages/conda_build/metadata.py", line 2120, in get_output_metadata_set
    ensure_matching_hashes(conda_packages)
  File "/opt/conda/lib/python3.8/site-packages/conda_build/metadata.py", line 325, in ensure_matching_hashes
    dep.split(' ')[-1] != m.build_id() and _variants_equal(m, om)):
  File "/opt/conda/lib/python3.8/site-packages/conda_build/metadata.py", line 1364, in build_id
    check_bad_chrs(manual_build_string, 'build/string')
  File "/opt/conda/lib/python3.8/site-packages/conda_build/metadata.py", line 579, in check_bad_chrs
    if c in s:
TypeError: argument of type 'int' is not iterable

When I added a print statement to see what was the string it was getting caught up on I found it was interpreting DESIRED_CUDA=110 as an integer value instead of a string value, I tried appending | string() to the corresponding meta.yaml line and it appears as though that doesn't work either.

@rgommers or @scopatz do you guys have any idea what might be happening?

ezyang · 2021-02-02T04:44:50Z

maybe @mattip might also have some ideas

mattip · 2021-02-02T06:55:41Z

Digging around in the sources, it seems the metadata parser will always prefer to convert to int?

conda/pytorch-nightly/meta.yaml

wolfv · 2021-03-16T11:09:43Z

would be great to have this!

facebook-github-bot · 2021-03-16T12:06:02Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

jph00 · 2021-07-13T07:11:36Z

Thanks so much for your work on this @scopatz , @rgommers, @seemethere and @ezyang! :) FYI this issue has just started blocking PyTorch GPU installation with mamba working. @wolfv tells me that this PR is required in order to get it working again.

It looks like it was just about ready to merge AFAIK, but perhaps it got forgotten?

scopatz · 2021-07-26T21:58:49Z

Seems like it did. Also, I would like to remove my facebook CLA at this time. Can this please be merged / forked / etc ASAP?

rgommers · 2021-07-27T14:05:34Z

Seems like it did. Also, I would like to remove my facebook CLA at this time. Can this please be merged / forked / etc ASAP?

I'll follow up on this today.

rgommers · 2021-07-28T09:44:51Z

pytorch/pytorch#54900 showed that this PR no longer worked as expected 3 months ago, something had had changed in the meantime. It needs more work. Trying to hash out a plan together with @seemethere and @malfet. Probably we'll close and resubmit - should have it figured out by tomorrow.

rgommers · 2021-07-29T12:26:46Z

Okay closing, we'll resubmit together with a more extended and documented test plan - will comment once that is done, so everyone who is interested knows where to look for updates.

Also, I would like to remove my facebook CLA at this time.

Please go ahead @scopatz, should be fine now. Thanks for the work on this!

rgommers mentioned this pull request Sep 7, 2020

GPU Support conda-forge/pytorch-cpu-feedstock#7

Closed

rgommers reviewed Sep 16, 2020

View reviewed changes

ezyang requested a review from seemethere September 16, 2020 13:42

seemethere approved these changes Oct 26, 2020

View reviewed changes

facebook-github-bot added the cla signed label Jan 28, 2021

seemethere force-pushed the mutex branch from 76d04e5 to 2cc6324 Compare February 1, 2021 21:49

seemethere reviewed Feb 2, 2021

View reviewed changes

conda/pytorch-nightly/meta.yaml Outdated Show resolved Hide resolved

wolfv mentioned this pull request Mar 16, 2021

Mamba does not respect Pytorch's cpuonly mamba-org/mamba#336

Closed

Added initial mutex for multiple outputs

bb7ad39

scopatz and others added 6 commits March 29, 2021 10:57

fix name

11c48e1

some initial mutex changes

a8e4e34

jinja builds

3833540

build again

9edd81f

conda/cpuonly/meta.yaml master reset

23ab82c

Quote desired_cuda

d98e6df

seemethere force-pushed the mutex branch from cf575d4 to d98e6df Compare March 29, 2021 17:59

seemethere mentioned this pull request Mar 29, 2021

DO NOT MERGE, testing conda mutex pytorch/pytorch#54900

Closed

rgommers mentioned this pull request May 29, 2021

conda installation from nighlty causes package conflicts and fails to install PyTorch pytorch/pytorch#52751

Closed

wolfv mentioned this pull request Jun 12, 2021

Different package resolution with conda and mamba when installing pytorch mamba-org/mamba#1001

Closed

rgommers changed the title ~~Mutex~~ Change cpuonly to a conda mutex Jul 28, 2021

rgommers closed this Jul 29, 2021

IvanYashchuk mentioned this pull request Jul 30, 2021

Conda Package: cpuonly should be a mutex package pytorch/pytorch#40213

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change `cpuonly` to a conda mutex #488

Change `cpuonly` to a conda mutex #488

scopatz commented Aug 11, 2020

rgommers commented Aug 11, 2020

scopatz commented Aug 11, 2020

seemethere commented Aug 11, 2020

scopatz commented Aug 28, 2020

scopatz commented Sep 16, 2020 •

edited

Loading

rgommers left a comment

rgommers Sep 16, 2020

scopatz Sep 16, 2020

ezyang commented Sep 16, 2020

scopatz commented Oct 21, 2020

seemethere left a comment

facebook-github-bot commented Oct 30, 2020

kev-zheng commented Jan 5, 2021

rgommers commented Jan 5, 2021

rgommers commented Jan 5, 2021

kev-zheng commented Jan 5, 2021

scopatz commented Jan 18, 2021

rgommers commented Jan 18, 2021 •

edited

Loading

scopatz commented Jan 27, 2021

rgommers commented Jan 29, 2021

seemethere commented Feb 1, 2021 •

edited

Loading

ezyang commented Feb 2, 2021

mattip commented Feb 2, 2021

wolfv commented Mar 16, 2021

facebook-github-bot commented Mar 16, 2021

jph00 commented Jul 13, 2021

scopatz commented Jul 26, 2021

rgommers commented Jul 27, 2021

rgommers commented Jul 28, 2021

rgommers commented Jul 29, 2021

Change cpuonly to a conda mutex #488

Change cpuonly to a conda mutex #488

Conversation

scopatz commented Aug 11, 2020

rgommers commented Aug 11, 2020

scopatz commented Aug 11, 2020

seemethere commented Aug 11, 2020

scopatz commented Aug 28, 2020

scopatz commented Sep 16, 2020 • edited Loading

rgommers left a comment

Choose a reason for hiding this comment

rgommers Sep 16, 2020

Choose a reason for hiding this comment

scopatz Sep 16, 2020

Choose a reason for hiding this comment

ezyang commented Sep 16, 2020

scopatz commented Oct 21, 2020

seemethere left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Oct 30, 2020

kev-zheng commented Jan 5, 2021

rgommers commented Jan 5, 2021

rgommers commented Jan 5, 2021

kev-zheng commented Jan 5, 2021

scopatz commented Jan 18, 2021

rgommers commented Jan 18, 2021 • edited Loading

scopatz commented Jan 27, 2021

rgommers commented Jan 29, 2021

seemethere commented Feb 1, 2021 • edited Loading

ezyang commented Feb 2, 2021

mattip commented Feb 2, 2021

wolfv commented Mar 16, 2021

facebook-github-bot commented Mar 16, 2021

jph00 commented Jul 13, 2021

scopatz commented Jul 26, 2021

rgommers commented Jul 27, 2021

rgommers commented Jul 28, 2021

rgommers commented Jul 29, 2021

Change `cpuonly` to a conda mutex #488

Change `cpuonly` to a conda mutex #488

scopatz commented Sep 16, 2020 •

edited

Loading

rgommers commented Jan 18, 2021 •

edited

Loading

seemethere commented Feb 1, 2021 •

edited

Loading