-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add AMD GPU support #1546
Merged
Merged
Add AMD GPU support #1546
Changes from 9 commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
ac530b5
add amd gpu tests
mht-sharma 84699a5
add docs
mht-sharma e80a8e0
add docs
mht-sharma 65df71b
add docs
mht-sharma b2641a5
Add ORT trainer docs and dockerfile
mht-sharma 26861b1
addressed comments
mht-sharma 1de36bd
addressed comments
mht-sharma e02f6ea
addressed comments
mht-sharma f798fd3
added pytorch installation step
mht-sharma 0fe7545
update test
mht-sharma 1551292
update
mht-sharma File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,124 @@ | ||
# Accelerated inference on AMD GPUs supported by ROCm | ||
|
||
By default, ONNX Runtime runs inference on CPU devices. However, it is possible to place supported operations on an AMD Instinct GPU, while leaving any unsupported ones on CPU. In most cases, this allows costly operations to be placed on GPU and significantly accelerate inference. | ||
|
||
Our testing primarily involved AMD Instinct GPUs, and for specific GPU compatibility, refer to the support matrix available at AMD ROCm GPU OS support matrix." | ||
fxmarty marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
This guide will show you how to run inference on the `ROCMExecutionProvider` execution provider that ONNX Runtime supports for AMD GPUs. | ||
|
||
## Installation | ||
The following setup installs the ONNX Runtime support with ROCM Execution Provider with ROCm 5.7. | ||
|
||
#### 1. ROCm Installation | ||
|
||
To install ROCM 5.7, please follow the [ROCm installation guide](https://rocm.docs.amd.com/en/latest/deploy/linux/index.html). | ||
|
||
#### 2. PyTorch Installation with ROCm Support | ||
Optimum ONNX Runtime integration relies on some functionalities of Transformers that require PyTorch. For now, we recommend to use Pytorch compiled against RoCm 5.7, that can be installed following [PyTorch installation guide](https://pytorch.org/get-started/locally/): | ||
|
||
mht-sharma marked this conversation as resolved.
Show resolved
Hide resolved
|
||
```bash | ||
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm5.7 | ||
``` | ||
|
||
<Tip> | ||
For docker installation, the following base image is recommended: `rocm/pytorch:rocm5.7_ubuntu22.04_py3.10_pytorch_2.0.1` | ||
</Tip> | ||
|
||
### 3. ONNX Runtime installation with ROCm Execution Provider | ||
|
||
```bash | ||
# pre-requisites | ||
pip install -U pip | ||
pip install cmake onnx | ||
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh | ||
|
||
# Install ONNXRuntime from source | ||
git clone --recursive https://github.com/ROCmSoftwarePlatform/onnxruntime.git | ||
git checkout rocm5.7_internal_testing_eigen-3.4.zip_hash | ||
cd onnxruntime | ||
|
||
./build.sh --config Release --build_wheel --update --build --parallel --cmake_extra_defines ONNXRUNTIME_VERSION=$(cat ./VERSION_NUMBER) --use_rocm --rocm_home=/opt/rocm | ||
pip install build/Linux/Release/dist/* | ||
``` | ||
|
||
<Tip> | ||
To avoid conflicts between `onnxruntime` and `onnxruntime-rocm`, make sure the package `onnxruntime` is not installed by running `pip uninstall onnxruntime` prior to installing `onnxruntime-rocm`. | ||
</Tip> | ||
|
||
### Checking the ROCm installation is successful | ||
|
||
Before going further, run the following sample code to check whether the install was successful: | ||
|
||
```python | ||
>>> from optimum.onnxruntime import ORTModelForSequenceClassification | ||
>>> from transformers import AutoTokenizer | ||
|
||
>>> ort_model = ORTModelForSequenceClassification.from_pretrained( | ||
... "philschmid/tiny-bert-sst2-distilled", | ||
... export=True, | ||
... provider="ROCMExecutionProvider", | ||
... ) | ||
|
||
>>> tokenizer = AutoTokenizer.from_pretrained("philschmid/tiny-bert-sst2-distilled") | ||
>>> inputs = tokenizer("expectations were low, actual enjoyment was high", return_tensors="pt", padding=True) | ||
|
||
>>> outputs = ort_model(**inputs) | ||
>>> assert ort_model.providers == ["ROCMExecutionProvider", "CPUExecutionProvider"] | ||
``` | ||
|
||
In case this code runs gracefully, congratulations, the installation is successfull! If you encounter the following error or similar, | ||
|
||
``` | ||
ValueError: Asked to use ROCMExecutionProvider as an ONNX Runtime execution provider, but the available execution providers are ['CPUExecutionProvider']. | ||
``` | ||
|
||
then something is wrong with the ROCM or ONNX Runtime installation. | ||
|
||
### Use ROCM Execution Provider with ORT models | ||
|
||
For ORT models, the use is straightforward. Simply specify the `provider` argument in the `ORTModel.from_pretrained()` method. Here's an example: | ||
|
||
```python | ||
>>> from optimum.onnxruntime import ORTModelForSequenceClassification | ||
|
||
>>> ort_model = ORTModelForSequenceClassification.from_pretrained( | ||
... "distilbert-base-uncased-finetuned-sst-2-english", | ||
... export=True, | ||
... provider="ROCMExecutionProvider", | ||
... ) | ||
``` | ||
|
||
The model can then be used with the common 🤗 Transformers API for inference and evaluation, such as [pipelines](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/pipelines). | ||
When using Transformers pipeline, note that the `device` argument should be set to perform pre- and post-processing on GPU, following the example below: | ||
|
||
```python | ||
>>> from optimum.pipelines import pipeline | ||
>>> from transformers import AutoTokenizer | ||
|
||
>>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english") | ||
|
||
>>> pipe = pipeline(task="text-classification", model=ort_model, tokenizer=tokenizer, device="cuda:0") | ||
>>> result = pipe("Both the music and visual were astounding, not to mention the actors performance.") | ||
>>> print(result) # doctest: +IGNORE_RESULT | ||
# printing: [{'label': 'POSITIVE', 'score': 0.9997727274894c714}] | ||
``` | ||
|
||
Additionally, you can pass the session option `log_severity_level = 0` (verbose), to check whether all nodes are indeed placed on the ROCM execution provider or not: | ||
|
||
```python | ||
>>> import onnxruntime | ||
|
||
>>> session_options = onnxruntime.SessionOptions() | ||
>>> session_options.log_severity_level = 0 | ||
|
||
>>> ort_model = ORTModelForSequenceClassification.from_pretrained( | ||
... "distilbert-base-uncased-finetuned-sst-2-english", | ||
... export=True, | ||
... provider="ROCMExecutionProvider", | ||
... session_options=session_options | ||
... ) | ||
``` | ||
|
||
### Observed time gains | ||
|
||
Coming soon! |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
43 changes: 43 additions & 0 deletions
43
examples/onnxruntime/training/docker/Dockerfile-ort-nightly-rocm57
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
# Use rocm image | ||
FROM rocm/pytorch:rocm5.7_ubuntu22.04_py3.10_pytorch_2.0.1 | ||
CMD rocm-smi | ||
|
||
# Ignore interactive questions during `docker build` | ||
ENV DEBIAN_FRONTEND noninteractive | ||
|
||
# Versions | ||
# available options 3.10 | ||
ARG PYTHON_VERSION=3.10 | ||
|
||
# Bash shell | ||
RUN chsh -s /bin/bash | ||
SHELL ["/bin/bash", "-c"] | ||
|
||
# Install and update tools to minimize security vulnerabilities | ||
RUN apt-get update | ||
RUN apt-get install -y software-properties-common wget apt-utils patchelf git libprotobuf-dev protobuf-compiler cmake \ | ||
bzip2 ca-certificates libglib2.0-0 libxext6 libsm6 libxrender1 mercurial subversion libopenmpi-dev ffmpeg && \ | ||
apt-get clean | ||
RUN apt-get autoremove -y | ||
|
||
ARG PYTHON_EXE=/opt/conda/envs/py_$PYTHON_VERSION/bin/python | ||
|
||
# (Optional) Intall test dependencies | ||
RUN $PYTHON_EXE -m pip install -U pip | ||
RUN $PYTHON_EXE -m pip install git+https://github.com/huggingface/transformers | ||
RUN $PYTHON_EXE -m pip install datasets accelerate evaluate coloredlogs absl-py rouge_score seqeval scipy sacrebleu nltk scikit-learn parameterized sentencepiece --no-cache-dir | ||
RUN $PYTHON_EXE -m pip install deepspeed --no-cache-dir | ||
RUN conda install -y mpi4py | ||
|
||
# PyTorch | ||
RUN $PYTHON_EXE -m pip install onnx ninja | ||
|
||
# ORT Module | ||
RUN $PYTHON_EXE -m pip install --pre onnxruntime-training -f https://download.onnxruntime.ai/onnxruntime_nightly_rocm57.html | ||
RUN $PYTHON_EXE -m pip install torch-ort | ||
RUN $PYTHON_EXE -m pip install --upgrade protobuf==3.20.2 | ||
RUN $PYTHON_EXE -m torch_ort.configure | ||
|
||
WORKDIR . | ||
|
||
CMD ["/bin/bash"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could clarify that we have tested on Instinct GPUs, but that support matrix is https://rocm.docs.amd.com/en/latest/release/gpu_os_support.html (unless ROCMExecutionProvider explicitely requires Instinct? In which case we can give a ref)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done