Skip to content

Commit

Permalink
IFU-main-2023-07-31 (#35)
Browse files Browse the repository at this point in the history
* Add 12.1 workflow for docker image build (pytorch#1367)

* add 12.1 workflow for docker image build

* add github workflow

* update cuDNN to 8.8.1 and location for archive

* Do not use ftp (pytorch#1369)

* Do not use ftp

`s#ftp://#https://#`

* Remove no-longer relevant comment

* add magma build for CUDA12.1 (pytorch#1368)

* add magma build for CUDA12.1

* copy and fix CMake.patch; drop sm_37 for CUDA 12.1

* remove CUDA 11.6 builds (pytorch#1366)

* remove CUDA 11.6 builds

* remove more 11.6 builds

* build libtorch and manywheel for 12.1 (pytorch#1373)

* enable nightly CUDA 12.1 builds (pytorch#1374)

* enable nightly CUDA 12.1 builds

* fix version typo

* Fix typo (bracket) in DEPS_LIST setting (pytorch#1377)

* Remove special cases for Python 3.11 (pytorch#1381)

* Remove special case for Python 3.11

* Remove install torch script

* Windows CUDA 12.1 changes (pytorch#1376)

* Windows CUDA 12.1 changes

* add CUDA version checks for Windows MAGMA builds

* use magma branch without fermi arch

* fix  check for 12.1 (pytorch#1383)

* Update CUDA_UPGRADE_GUIDE.MD (pytorch#1384)

* add pytorch-cuda constraints for CUDA 12.1 (pytorch#1385)

* Fix `cuda-pytorch/meta.yaml`

And add `12.1` to the matrix

Test plan: `conda build . -c nvidia` and observe https://anaconda.org/malfet/pytorch-cuda/files

* [aarch64] acl build script updates for multi isa build with armv8-a base architecture (pytorch#1370)

* Compile with C++17 standard in check_binary.sh

To unblock pytorch/pytorch#98209

* Use robocopy fix 256char limit (pytorch#1386)

* add constraints for pytorch-cuda (pytorch#1391)

* [Conda] Update MacOS target to 11.0

As 10.9 was release a decade ago and for that reason yet not supported C++17 standard.

Similar to pytorch/pytorch#99857

* Update MACOSX_DEPLOYMENT_TARGET to 10.13

To fix builds, though we should really target 11.0 at the very least

* add nvjitlink for Windows builds for CUDA 12.1 (pytorch#1393)

* pytorch-cuda: Added `nvjitlink` as dependency

* Add `fsspec` to list of packages

Added by pytorch/pytorch#99768

* Add ffmpeg build to audio aarch64 (pytorch#1396)

* [S3] Add all `cu117.with.pypi` deps to nightly CUDA index

* Fix nvjitlink inclusion in 12.1 wheels (pytorch#1397)

* Fix nvjitlink inclusion in 12.1 wheels

* Fix typo

* update winserver driver (pytorch#1388)

* Add pyyaml as PyTorch runtime dep (pytorch#1394)

Companion PR to pytorch/pytorch#100166

* Fix typo

* Temp: Comment out VS2019 installation

As it should be part of the AMI

* Use VS2022 for libtorch windows tests

* Revert "Use VS2022 for libtorch windows tests"

This reverts commit 2922d7d.

* Revert "Temp: Comment out VS2019 installation"

This reverts commit e919e17.

* Do not cleanup MSVC and CUDA on Windows non-ephemeral runners (pytorch#1398)

* Update CUDA_UPGRADE_GUIDE.MD (pytorch#1399)

Update Related PR's

* Attempt to fix infinite copy into existing folder

Looks like robocopy is confused what to do about symlinks

* Handle symlink when using robocopy (pytorch#1400)

* Use /xjd instead of /sl when dealing with symlink on Windows

https://superuser.com/questions/1642106/how-does-robocopy-handle-file-system-links-symbolic-links-hard-links-and-junct

* Skip uninstalling other MSVC versions if they are found (pytorch#1402)

* Update driver for cuda 12.1 (pytorch#1403)

* Don't prepend system-wide PATH when installing Python for binary smoke test (pytorch#1404)

The PATH has already been set and restored manually in the script

* Pin MSVC version to 2019 (pytorch#1405)

* Pin numpy for windows builds (pytorch#1406)

* Pin numpy for windows builds

* bump mkl version, remove conda-forge

* Change python 3.9 mkl version

* Use pinned mkl

* Upgrade nightly wheels to rocm5.5 (pytorch#1407)

* Add MIOpen db files to wheel

* Update magma commits for various branches to include header path updates

* Add ROCm5.5 support with Navi31-tuned MIOpen branch

* Upgrade nightly wheels to rocm5.5

* Update build_docker.sh for gfx1100

* Update build_docker.sh for gfx1100

---------

Co-authored-by: Jithun Nair <[email protected]>
Co-authored-by: Jithun Nair <[email protected]>

* Update CUDA_UPGRADE_GUIDE.MD with CI update instructions

* Pin numpy to 1.21.3 for python 3.10 (pytorch#1409)

* Update cuda matrix CUDA_UPGRADE_GUIDE.MD

* Pin delocate (pytorch#1411)

* Fix pytorch#1410. (pytorch#1412)

The libdrm replacement for ROCm images will first search the typical
location for the amdgpu.ids file. If failed, it will search as it did
before in the python install location.

* One-step ROCm manywheel/libtorch docker build (pytorch#1418)

* Use rocm/dev-centos-7:<rocm_version>-complete image with full ROCm install

* Remove ROCm install step and reinstate magma and MIOpen build steps

* Install full package for MIOpen, including headers and db files

Retained some of the disk-cleanup-related code from pytorch@cb0912c

* Use rocm/dev-ubuntu-20.04:<rocm_version>-complete image with full ROCm install

* Remove ROCm install and reinstate magma build from source

* Use --offload-arch instead of --amdgpu-target to silence warnings

* Use beefier runner instance for ROCm docker builds

* Typo

* Simplify ROCm targets

* Update wheel build scripts for ROCm5.5 (pytorch#1413)

* Fix lib search logic and lib list

* Add libhipsolver.so dependency for upstream Pytorch (needed since PyTorch PR 97370)

* Add MIOpen db files to share path only for ROCm5.5 and above

* Add cu12 packages to allow list for download.pytorch.org (pytorch#1420)

* Fix ROCm5.4.2 builds breakage (pytorch#1421)

* Use MIOpen db logic for ROCm5.5 or later

* Remove moved lines

* Fix bash logic to add elements of array to existing array

* Add nvidia-nvjitlink-cu12 to s3 manage (pytorch#1422)

* Add nvidia-nvjitlink-cu12 to s3 manage (pytorch#1423)

* Pin openssl for python 3.9 conda package (pytorch#1424)

Fixes pytorch/pytorch#103436
Test PR: pytorch/pytorch#103437

* Update CUDA_UPGRADE_GUIDE.MD

* Pin openssl for py3.8 (pytorch#1425)

* Add pytorch-triton to small wheel (pytorch#1426)

* Add poetry and pypi tests (pytorch#1419)

* smoke test poetry

Add a little more tests

test

Test poetry

test

Test poetry on python 3.10

Add more poetry tests

Test en us

test

test

Try verboose

testing

testing

try quiet install

Code refactooring

test

move linux pipy validation to workflow

test

test

Fix path

try test pipy

More torch installations

test

testing

test

test

test new

fix install 2

try poetry nightly

test nightly

test

test

Test poetry validation

test

test_new

test

* Put back executing this on pull

* [Manywheel] Add Python-3.12.0b2 (pytorch#1427)

To enable initial experiments with PyTorch builds

* smoke test poetry (pytorch#1428)

Add a little more tests

test

Test poetry

test

Test poetry on python 3.10

Add more poetry tests

Test en us

test

test

Try verboose

testing

testing

try quiet install

Code refactooring

test

move linux pipy validation to workflow

test

test

Fix path

try test pipy

More torch installations

test

testing

test

test

test new

fix install 2

try poetry nightly

test nightly

test

test

Test poetry validation

test

test_new

test

Put back executing this on pull

Print matrix variable

test

Fix conditional for pypi poetry tests

add quptes

Add nightly as supplemental requirement

Make sure we clone module only for first time

Fix python

test validate binaries

Add repo existance checks

test

Disable runtime error before final validation

fix typo

fix cwd

* smoke test poetry (pytorch#1429)

* s/master/main/

* Update aarch64 Scripts for CI worflow (pytorch#1431)

* Run release validation testing once a day, run nightly a little later (pytorch#1434)

* Remove CUDA 11.7 builds (pytorch#1408)

* remove CUDA 11.7 builds

* remove CUDA 11.7 from MAGMA builds

* add pytorch-cuda back for 11.7

* add 11.7 back to pytorch-cuda

* Add safe directory to all dockerfiles (pytorch#1435)

* update cuDNN to 8.9.2.26 for CUDA 12.1 (pytorch#1436)

* update NCCL to 2.18.1 (pytorch#1437)

Co-authored-by: Nikita Shulga <[email protected]>

* Update CI for aarch64 (pytorch#1438)

* Fix wheel macos arm64 validations (pytorch#1441)

* Fix wheel validations

* Try using upgrade flag instead

* try uninstall

* test

* Try using python3

* use python3 vs python for validation

* Fix windows vs other os python execution

* Uninstall fix

* Remove previous installations on macos-arm64 before smoke testing (pytorch#1444)

More arm64 changes

test run under environment

sleep 15min allow investigate

add sleep

test

test

Test

test

test

Arm64 use python

fix

test

testing

test

tests

testing

test

test

* Update build_docker.sh (pytorch#1446)

Use [`nvidia/cuda:11.4.3-devel-centos7`](https://hub.docker.com/layers/nvidia/cuda/11.4.3-devel-centos7/images/sha256-e2201a4954dfd65958a6f5272cd80b968902789ff73f26151306907680356db8?context=explore) because `nvidia/cuda:10.2-devel-centos7` was deleted in accordance with [Nvidia's Container Support Policy](https://gitlab.com/nvidia/container-images/cuda/blob/master/doc/support-policy.md):
> After a period of Six Months time, the EOL tags WILL BE DELETED from Docker Hub and Nvidia GPU Cloud (NGC). This deletion ensures unsupported tags (and image layers) are not left lying around for customers to continue using after they have long been abandoned.

Also delete redundant DEVTOOLSET=7 clause

* Fix aarch64 nightly (pytorch#1449)

* Update Docker base images for conda and libtorch (pytorch#1448)

Followup after pytorch#1446

CUDA-10.2 and moreover CUDA-9.2 docker images are gone per [Nvidia's Container Support Policy](https://gitlab.com/nvidia/container-images/cuda/blob/master/doc/support-policy.md):
> After a period of Six Months time, the EOL tags WILL BE DELETED from Docker Hub and Nvidia GPU Cloud (NGC). This deletion ensures unsupported tags (and image layers) are not left lying around for customers to continue using after they have long been abandoned.

Also, as all our Docker script install CUDA toolkit anyway, what's the point of using `nvidia/cuda` images at all instead of `centos:7`/`ubuntu:18.04` that former are based on, according to https://gitlab.com/nvidia/container-images/cuda/-/blob/master/dist/11.4.3/centos7/base/Dockerfile

Explicitly install `g++` to `libtorch/Docker` base image, as it's needed by `patchelf`

Please note, that `libtorch/Docker` can not be completed without buildkit, as `rocm` step depends on `python3` which is not available in `cpu` image

* Fix magma installation inside docker container (pytorch#1447)

Not sure, what weird version of `wget` is getting installed, but  attempt to download https://anaconda.org/pytorch/magma-cuda121/2.6.1/download/linux-64/magma-cuda121-2.6.1-1.tar.bz2 fails with:
```
--2023-07-06 03:18:38--  https://anaconda.org/pytorch/magma-cuda121/2.6.1/download/linux-64/magma-cuda121-2.6.1-1.tar.bz2
Resolving anaconda.org (anaconda.org)... 104.17.93.24, 104.17.92.24, 2606:4700::6811:5d18, ...
Connecting to anaconda.org (anaconda.org)|104.17.93.24|:443... connected.
ERROR: cannot verify anaconda.org's certificate, issued by ‘/C=US/O=Let's Encrypt/CN=E1’:
  Issued certificate has expired.
To connect to anaconda.org insecurely, use `--no-check-certificate'.
```

Also, switch from NVIDIA container to a stock `centos:7` one, to make containers slimmer and fit on standard GitHub Actions runners.

* [Manywheel] Add `/usr/local/cuda` symlink

And add `nvcc` to path

Regression introduced by pytorch#1447 when NVIDIA image was dropped in favor of base `centos` image

* Do not build PyTorch with LLVM (pytorch#1445)

As NNC is dead, and llvm dependency has not been updated in last 4 years

First step towards fixing pytorch/pytorch#103756

* Remove `DESIRED_CUDA` logic from `check_binary.sh`

As [`pytorch/manylinux-builder`](https://hub.docker.com/r/pytorch/manylinux-builder) containers has only one version of CUDA, there is no need to select any

Nor setup `LD_LIBRARY_PATH` as it does not match the setup users might have on their system (but keep it for libtorch tests)

Should fix crash due to different minor version of cudnn installed in docker container and specified as dependency to a small wheel package, seen here https://github.com/pytorch/pytorch/actions/runs/5478547018/jobs/9980463690

* Revert "Remove `DESIRED_CUDA` logic from `check_binary.sh`"

This reverts commit ed9a2ae.

* Update CUDA_UPGRADE_GUIDE.MD to add small wheel update Cudnn step

* Set CUDA_VERSION in conda Docker environment

* Rebuild docker images on release builds (pytorch#1451)

* Rebuild docker images on release

* Include with-push

* Create `/usr/local/cuda` in libtorch builds

I.e. applying the same changes as in pytorch@4a7ed14 to libtorch docker builds

* Revert "Rebuild docker images on release builds (pytorch#1451)" (pytorch#1452)

This reverts commit 2ba03df as it essentially broke all the builds on trunk (fix is coming)

* Reland "Rebuild docker images on release builds"

This is a reland of pytorch#1451 with an
important to branches filter:  entries in multi-line array definition
should start with `-` otherwise it were attempting to match branch name `main\nrelease/*`
I.e. just copy-n-paste example from https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#using-filters

Test plan: actionlint .github/workflows/build-manywheel-images.yml

Original PR description:
Rebuild docker images on release builds. It should also tag images for release here: https://github.com/pytorch/builder/blob/3fc310ac21c9ede8d0ce13ec71096820a41eb9f4/conda/build_docker.sh#L58-L60
This is first step in pinning docker images for release.

* Let's try to force the path this way

* Remove `DESIRED_CUDA` logic from `check_binary.sh`

As [`pytorch/manylinux-builder`](https://hub.docker.com/r/pytorch/manylinux-builder) containers has only one version of CUDA, there is no need to select any

Nor setup `LD_LIBRARY_PATH` as it does not match the setup users might have on their system (but keep it for libtorch tests for now)

Should fix crash due to different minor version of cudnn installed in docker container and specified as dependency to a small wheel package, seen here https://github.com/pytorch/pytorch/actions/runs/5478547018/jobs/9980463690

* Advance libgfortran version (pytorch#1453)

* Update builder images to ROCm5.6  (pytorch#1443)

* Update manywheel and libtorch images to rocm5.6
* Add MIOpen branch for ROCm5.6

* Pin miniconda install to py310_23.5.2 for macos and windows (pytorch#1460)

* Cleanup unused builder files (pytorch#1459)

* Remove unused builder files (pytorch#1461)

* Add support for ROCm5.6 for nightly wheels (pytorch#1442)

* Add msccl-algorithms directory to PyTorch wheel

* Bundle msccl-algorithms into wheel

* Use correct src path for msccl-algorithms

(cherry picked from commit 95b5af3)

* Add hipblaslt dependency for ROCm5.6 onwards

* Update build_all_docker.sh to ROCm5.6

* [aarch64][build] Aarch64 lapack fix and ARMCL version update (pytorch#1462)

* Fix lapack missing and armcl update

* update ARMCL version

* Remove unused parameter to limit-win-builds from validation workflows (pytorch#1464)

* Run git update-index --chmod=+x  on aarch64_ci_build.sh (pytorch#1466)

* Fix erroneous logic that was skipping msccl files even for ROCm5.6; update msccl path for ROCm5.7

(cherry picked from commit 36c10cc)

---------

Co-authored-by: ptrblck <[email protected]>
Co-authored-by: Nikita Shulga <[email protected]>
Co-authored-by: Andrey Talman <[email protected]>
Co-authored-by: Nikita Shulga <[email protected]>
Co-authored-by: snadampal <[email protected]>
Co-authored-by: Richard Zou <[email protected]>
Co-authored-by: Huy Do <[email protected]>
Co-authored-by: Bo Li <[email protected]>
Co-authored-by: Jeff Daily <[email protected]>
Co-authored-by: Mike Schneider <[email protected]>
Co-authored-by: Eli Uriegas <[email protected]>
  • Loading branch information
12 people authored Aug 7, 2023
1 parent 3fe2e34 commit 4a72eb3
Show file tree
Hide file tree
Showing 92 changed files with 718 additions and 2,017 deletions.
70 changes: 30 additions & 40 deletions .github/scripts/validate_binaries.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,49 +2,39 @@ if [[ ${MATRIX_PACKAGE_TYPE} == "libtorch" ]]; then
curl ${MATRIX_INSTALLATION} -o libtorch.zip
unzip libtorch.zip
else
#special case for Python 3.11
if [[ ${MATRIX_PYTHON_VERSION} == '3.11' ]]; then
conda create -y -n ${ENV_NAME} python=${MATRIX_PYTHON_VERSION}
conda activate ${ENV_NAME}

INSTALLATION=${MATRIX_INSTALLATION/"-c pytorch"/"-c malfet -c pytorch"}
INSTALLATION=${INSTALLATION/"pytorch-cuda"/"pytorch-${MATRIX_CHANNEL}::pytorch-cuda"}
INSTALLATION=${INSTALLATION/"conda install"/"conda install -y"}

eval $INSTALLATION
python ./test/smoke_test/smoke_test.py
conda deactivate
conda env remove -n ${ENV_NAME}
else



# Special case Pypi installation package, only applicable to linux nightly CUDA 11.7 builds, wheel package
if [[ ${TARGET_OS} == 'linux' && ${MATRIX_GPU_ARCH_VERSION} == '11.7' && ${MATRIX_PACKAGE_TYPE} == 'manywheel' && ${MATRIX_CHANNEL} != 'nightly' ]]; then
conda create -yp ${ENV_NAME}_pypi python=${MATRIX_PYTHON_VERSION} numpy ffmpeg
INSTALLATION_PYPI=${MATRIX_INSTALLATION/"cu117"/"cu117_pypi_cudnn"}
INSTALLATION_PYPI=${INSTALLATION_PYPI/"torchvision torchaudio"/""}
INSTALLATION_PYPI=${INSTALLATION_PYPI/"index-url"/"extra-index-url"}
conda run -p ${ENV_NAME}_pypi ${INSTALLATION_PYPI}
conda run -p ${ENV_NAME}_pypi python ./test/smoke_test/smoke_test.py --package torchonly
conda deactivate
conda env remove -p ${ENV_NAME}_pypi
fi
# Please note ffmpeg is required for torchaudio, see https://github.com/pytorch/pytorch/issues/96159
conda create -y -n ${ENV_NAME} python=${MATRIX_PYTHON_VERSION} numpy ffmpeg
conda activate ${ENV_NAME}
INSTALLATION=${MATRIX_INSTALLATION/"conda install"/"conda install -y"}

export OLD_PATH=${PATH}
# Workaround macos-arm64 runners. Issue: https://github.com/pytorch/test-infra/issues/4342
if [[ ${TARGET_OS} == 'macos-arm64' ]]; then
export PATH="${CONDA_PREFIX}/bin:${PATH}"
fi

# Please note ffmpeg is required for torchaudio, see https://github.com/pytorch/pytorch/issues/96159
conda create -y -n ${ENV_NAME} python=${MATRIX_PYTHON_VERSION} numpy ffmpeg
conda activate ${ENV_NAME}
INSTALLATION=${MATRIX_INSTALLATION/"conda install"/"conda install -y"}
eval $INSTALLATION
# Make sure we remove previous installation if it exists, this issue seems to affect only
if [[ ${MATRIX_PACKAGE_TYPE} == 'wheel' ]]; then
pip3 uninstall -y torch torchaudio torchvision
fi
eval $INSTALLATION

if [[ ${TARGET_OS} == 'linux' ]]; then
export CONDA_LIBRARY_PATH="$(dirname $(which python))/../lib"
export LD_LIBRARY_PATH=$CONDA_LIBRARY_PATH:$LD_LIBRARY_PATH
${PWD}/check_binary.sh
fi
if [[ ${TARGET_OS} == 'linux' ]]; then
export CONDA_LIBRARY_PATH="$(dirname $(which python))/../lib"
export LD_LIBRARY_PATH=$CONDA_LIBRARY_PATH:$LD_LIBRARY_PATH
${PWD}/check_binary.sh
fi

if [[ ${TARGET_OS} == 'windows' ]]; then
python ./test/smoke_test/smoke_test.py
conda deactivate
conda env remove -n ${ENV_NAME}
else
python3 ./test/smoke_test/smoke_test.py
fi

if [[ ${TARGET_OS} == 'macos-arm64' ]]; then
export PATH=${OLD_PATH}
fi

conda deactivate
conda env remove -n ${ENV_NAME}
fi
12 changes: 12 additions & 0 deletions .github/scripts/validate_pipy.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
conda create -yp ${ENV_NAME}_pypi python=${MATRIX_PYTHON_VERSION} numpy ffmpeg

if [[ ${MATRIX_CHANNEL} != "release" ]]; then
conda run -p ${ENV_NAME}_pypi pip3 install --pre torch --index-url "https://download.pytorch.org/whl/${MATRIX_CHANNEL}/${MATRIX_DESIRED_CUDA}_pypi_cudnn"
conda run -p ${ENV_NAME}_pypi pip3 install --pre torchvision torchaudio --index-url "https://download.pytorch.org/whl/${MATRIX_CHANNEL}/${MATRIX_DESIRED_CUDA}"
else
conda run -p ${ENV_NAME}_pypi pip3 install torch torchvision torchaudio
fi

conda run -p ${ENV_NAME}_pypi python ./test/smoke_test/smoke_test.py
conda deactivate
conda env remove -p ${ENV_NAME}_pypi
27 changes: 27 additions & 0 deletions .github/scripts/validate_poetry.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@

conda create -y -n ${ENV_NAME}_poetry python=${MATRIX_PYTHON_VERSION} numpy ffmpeg
conda activate ${ENV_NAME}_poetry
curl -sSL https://install.python-poetry.org | python3 - --git https://github.com/python-poetry/poetry.git@master
export PATH="/root/.local/bin:$PATH"

poetry --version
poetry new test_poetry
cd test_poetry

if [[ ${MATRIX_CHANNEL} != "release" ]]; then
# Installing poetry from our custom repo. We need to configure it before use and disable authentication
export PYTHON_KEYRING_BACKEND=keyring.backends.null.Keyring
poetry source add --priority=explicit domains "https://download.pytorch.org/whl/${MATRIX_CHANNEL}/${MATRIX_DESIRED_CUDA}"
poetry source add --priority=supplemental pytorch-nightly "https://download.pytorch.org/whl/${MATRIX_CHANNEL}"
poetry source add --priority=supplemental pytorch "https://download.pytorch.org/whl/${MATRIX_CHANNEL}/${MATRIX_DESIRED_CUDA}_pypi_cudnn"
poetry --quiet add --source pytorch torch
poetry --quiet add --source domains torchvision torchaudio
else
export PYTHON_KEYRING_BACKEND=keyring.backends.null.Keyring
poetry --quiet add torch torchaudio torchvision
fi

python ../test/smoke_test/smoke_test.py
conda deactivate
conda env remove -p ${ENV_NAME}_poetry
cd ..
11 changes: 8 additions & 3 deletions .github/workflows/build-conda-images.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,12 @@ name: Build conda docker images
on:
push:
branches:
main
- main
- release/*
tags:
# NOTE: Binary build pipelines should only get triggered on release candidate or nightly builds
# Release candidate tags look like: v1.11.0-rc1
- v[0-9]+.[0-9]+.[0-9]+-rc[0-9]+
paths:
- conda/Dockerfile
- 'common/*'
Expand All @@ -19,14 +24,14 @@ env:
DOCKER_BUILDKIT: 1
DOCKER_ID: ${{ secrets.DOCKER_ID }}
DOCKER_TOKEN: ${{ secrets.DOCKER_TOKEN }}
WITH_PUSH: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
WITH_PUSH: ${{ github.event_name == 'push' && (github.ref == 'refs/heads/main' || startsWith(github.ref, 'refs/heads/release')) }}

jobs:
build-docker:
runs-on: ubuntu-22.04
strategy:
matrix:
cuda_version: ["11.6", "11.7", "11.8", "cpu"]
cuda_version: ["11.8", "12.1", "cpu"]
env:
CUDA_VERSION: ${{ matrix.cuda_version }}
steps:
Expand Down
15 changes: 10 additions & 5 deletions .github/workflows/build-libtorch-images.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,12 @@ name: Build libtorch docker images
on:
push:
branches:
main
- main
- release/*
tags:
# NOTE: Binary build pipelines should only get triggered on release candidate or nightly builds
# Release candidate tags look like: v1.11.0-rc1
- v[0-9]+.[0-9]+.[0-9]+-rc[0-9]+
paths:
- .github/workflows/build-libtorch-images.yml
- libtorch/Dockerfile
Expand All @@ -21,14 +26,14 @@ env:
DOCKER_BUILDKIT: 1
DOCKER_ID: ${{ secrets.DOCKER_ID }}
DOCKER_TOKEN: ${{ secrets.DOCKER_TOKEN }}
WITH_PUSH: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
WITH_PUSH: ${{ github.event_name == 'push' && (github.ref == 'refs/heads/main' || startsWith(github.ref, 'refs/heads/release')) }}

jobs:
build-docker-cuda:
runs-on: ubuntu-22.04
strategy:
matrix:
cuda_version: ["11.8", "11.7", "11.6"]
cuda_version: ["12.1", "11.8"]
env:
GPU_ARCH_TYPE: cuda
GPU_ARCH_VERSION: ${{ matrix.cuda_version }}
Expand All @@ -44,10 +49,10 @@ jobs:
run: |
libtorch/build_docker.sh
build-docker-rocm:
runs-on: ubuntu-22.04
runs-on: linux.12xlarge
strategy:
matrix:
rocm_version: ["5.3", "5.4.2"]
rocm_version: ["5.5", "5.6"]
env:
GPU_ARCH_TYPE: rocm
GPU_ARCH_VERSION: ${{ matrix.rocm_version }}
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/build-magma-linux.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ jobs:
runs-on: linux.2xlarge
strategy:
matrix:
cuda_version: ["118", "117", "116"]
cuda_version: ["121", "118"]
steps:
- name: Checkout PyTorch builder
uses: actions/checkout@v3
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/build-magma-windows.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ jobs:
runs-on: windows-2019
strategy:
matrix:
cuda_version: ["118", "117", "116"]
cuda_version: ["121", "118"]
config: ["Release", "Debug"]
env:
CUDA_VERSION: ${{ matrix.cuda_version }}
Expand Down
15 changes: 10 additions & 5 deletions .github/workflows/build-manywheel-images.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,12 @@ name: Build manywheel docker images
on:
push:
branches:
main
- main
- release/*
tags:
# NOTE: Binary build pipelines should only get triggered on release candidate or nightly builds
# Release candidate tags look like: v1.11.0-rc1
- v[0-9]+.[0-9]+.[0-9]+-rc[0-9]+
paths:
- .github/workflows/build-manywheel-images.yml
- manywheel/Dockerfile
Expand All @@ -23,14 +28,14 @@ env:
DOCKER_BUILDKIT: 1
DOCKER_ID: ${{ secrets.DOCKER_ID }}
DOCKER_TOKEN: ${{ secrets.DOCKER_TOKEN }}
WITH_PUSH: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
WITH_PUSH: ${{ github.event_name == 'push' && (github.ref == 'refs/heads/main' || startsWith(github.ref, 'refs/heads/release')) }}

jobs:
build-docker-cuda:
runs-on: ubuntu-22.04
strategy:
matrix:
cuda_version: ["11.8", "11.7", "11.6"]
cuda_version: ["12.1", "11.8"]
env:
GPU_ARCH_TYPE: cuda
GPU_ARCH_VERSION: ${{ matrix.cuda_version }}
Expand All @@ -46,10 +51,10 @@ jobs:
run: |
manywheel/build_docker.sh
build-docker-rocm:
runs-on: ubuntu-22.04
runs-on: linux.12xlarge
strategy:
matrix:
rocm_version: ["5.3", "5.4.2"]
rocm_version: ["5.5", "5.6"]
env:
GPU_ARCH_TYPE: rocm
GPU_ARCH_VERSION: ${{ matrix.rocm_version }}
Expand Down
10 changes: 0 additions & 10 deletions .github/workflows/validate-binaries.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,6 @@ on:
default: ""
required: false
type: string
limit-win-builds:
description: "Limit windows builds to single python/cuda config"
default: "disable"
type: string
workflow_dispatch:
inputs:
os:
Expand Down Expand Up @@ -53,11 +49,6 @@ on:
default: ""
required: false
type: string
limit-win-builds:
description: "Limit windows builds to single python/cuda config"
default: "disable"
required: false
type: string

jobs:
win:
Expand All @@ -66,7 +57,6 @@ jobs:
with:
channel: ${{ inputs.channel }}
ref: ${{ inputs.ref || github.ref }}
limit-win-builds: ${{ inputs.limit-win-builds }}

linux:
if: inputs.os == 'linux' || inputs.os == 'all'
Expand Down
10 changes: 10 additions & 0 deletions .github/workflows/validate-linux-binaries.yml
Original file line number Diff line number Diff line change
Expand Up @@ -55,4 +55,14 @@ jobs:
export ENV_NAME="conda-env-${{ github.run_id }}"
export TARGET_OS="linux"
eval "$(conda shell.bash hook)"
# Special case PyPi installation package. And Install of PyPi package via poetry
if [[ ${MATRIX_PACKAGE_TYPE} == "manywheel" ]] && \
([[ ${MATRIX_GPU_ARCH_VERSION} == "12.1" && ${MATRIX_CHANNEL} != "release" ]] || \
[[ ${MATRIX_GPU_ARCH_VERSION} == "11.7" && ${MATRIX_CHANNEL} == "release" ]]); then
source ./.github/scripts/validate_pipy.sh --runtime-error-check disabled
source ./.github/scripts/validate_poetry.sh --runtime-error-check disabled
fi
# Standart case: Validate binaries
source ./.github/scripts/validate_binaries.sh
15 changes: 3 additions & 12 deletions .github/workflows/validate-nightly-binaries.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@ name: cron

on:
schedule:
# At 2:30 pm UTC (7:30 am PDT)
- cron: "30 14 * * *"
# At 3:30 pm UTC (8:30 am PDT)
- cron: "30 15 * * *"
# Have the ability to trigger this job manually through the API
workflow_dispatch:
push:
Expand All @@ -17,19 +17,10 @@ on:
- .github/workflows/validate-macos-binaries.yml
- .github/workflows/validate-macos-arm64-binaries.yml
- test/smoke_test/*
pull_request:
paths:
- .github/workflows/validate-nightly-binaries.yml
- .github/workflows/validate-linux-binaries.yml
- .github/workflows/validate-windows-binaries.yml
- .github/workflows/validate-macos-binaries.yml
- .github/workflows/validate-macos-arm64-binaries.yml
- .github/scripts/validate_binaries.sh
- test/smoke_test/*

jobs:
nightly:
uses: ./.github/workflows/validate-binaries.yml
with:
channel: nightly
os: all
limit-win-builds: enable
14 changes: 11 additions & 3 deletions .github/workflows/validate-release-binaries.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@ name: cron

on:
schedule:
# At 3 am and 2 pm UTC (7 am and 8 pm PDT)
- cron: "0 3,14 * * *"
# At 3 am UTC (7 am PDT)
- cron: "0 3 * * *"
# Have the ability to trigger this job manually through the API
workflow_dispatch:
push:
Expand All @@ -17,11 +17,19 @@ on:
- .github/workflows/validate-macos-binaries.yml
- .github/workflows/validate-macos-arm64-binaries.yml
- test/smoke_test/*
pull_request:
paths:
- .github/workflows/validate-nightly-binaries.yml
- .github/workflows/validate-linux-binaries.yml
- .github/workflows/validate-windows-binaries.yml
- .github/workflows/validate-macos-binaries.yml
- .github/workflows/validate-macos-arm64-binaries.yml
- .github/scripts/validate_binaries.sh
- test/smoke_test/*

jobs:
release:
uses: ./.github/workflows/validate-binaries.yml
with:
channel: release
os: all
limit-win-builds: enable
Loading

0 comments on commit 4a72eb3

Please sign in to comment.