Add AMD GPU support #1546

mht-sharma · 2023-11-17T14:58:37Z

What does this PR do?

Adds AMD gpu support for optimum-onnxruntime using ROCMExecutionProvider

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

HuggingFaceDocBuilderDev · 2023-11-20T15:41:00Z

The documentation is not available anymore as the PR was closed or merged.

fxmarty

great work!

fxmarty · 2023-11-23T09:58:03Z

docs/source/onnxruntime/usage_guides/amdgpu.mdx

@@ -0,0 +1,117 @@
+# Accelerated inference on AMD GPUs
+
+By default, ONNX Runtime runs inference on CPU devices. However, it is possible to place supported operations on an AMD Instinct GPU, while leaving any unsupported ones on CPU. In most cases, this allows costly operations to be placed on GPU and significantly accelerate inference.


We could clarify that we have tested on Instinct GPUs, but that support matrix is https://rocm.docs.amd.com/en/latest/release/gpu_os_support.html (unless ROCMExecutionProvider explicitely requires Instinct? In which case we can give a ref)

docs/source/onnxruntime/usage_guides/amdgpu.mdx

fxmarty · 2023-11-23T10:00:37Z

docs/source/onnxruntime/usage_guides/amdgpu.mdx

+
+## Installation
+
+#### 1. ROCM Installation (V 5.7.X)


Suggested change

#### 1. ROCM Installation (V 5.7.X)

#### 1. ROCm Installation (V 5.7.X)

(V 5.7.X) means that ROCm 5.7 is a requirement? Can we put that rather in a sentence?

The branch AMD team shared for onnxruntime was on ROCm 5.7. So this has been tested on the ROCm5.7. I could mention that the following instructions are torun the ROCMEP on AMD gpu with ROCm 5.7 installed.

docs/source/onnxruntime/usage_guides/amdgpu.mdx

fxmarty · 2023-11-23T10:13:33Z

tests/onnxruntime/test_modeling.py

+    @require_torch_gpu
+    @pytest.mark.amdgpu_test
+    def test_load_model_rocm_provider(self):
+        model = ORTModel.from_pretrained(self.ONNX_MODEL_ID, provider="ROCMExecutionProvider")
+        self.assertListEqual(model.providers, ["ROCMExecutionProvider", "CPUExecutionProvider"])
+        self.assertListEqual(model.model.get_providers(), model.providers)
+        self.assertEqual(model.device, torch.device("cuda:0"))


Will this test fail if the install of ORT + ROCM EP is not done correctly? If so, can we add a @require_ort_rocm or something similar? If not, can we make these new tests @slow? The CI is already huge.

I don't think these tests should run by default as the marker of @pytest.mark.amdgpu_test is present?

No, I believe this marker is useful only to select a subset of tests to run (pytest -m "amdgpu_test").

If you try: pytest tests/onnxruntime -k "test_load_model_rocm_provider" -s -vvvvv this test is IMO likely to run while it should probably not unless ROCM EP is installed.

Yes, the above line would run. What I meant was in our CI, the cpu tests wont't run it as it would require GPU?
and for the gpu tests only the gpu_test marker is selected. https://github.com/huggingface/optimum/blob/main/tests/onnxruntime/docker/Dockerfile_onnxruntime_gpu#L26

So these tests are not running in the CI? Am I missing something or anyother place the test might run?

But if you want that if a user/ developer is running locally, something like @require_ort_rocm is nice?

Then the marker gpu_test is misleading and should be changed to cuda_ep_test or trt_ep_test.

But if you want that if a user/ developer is running locally, something like @require_ort_rocm is nice?

Yes, that's what I meant. If I run locally pytest tests/onnxruntime -s -vvvvv I would like it to be (somewhat) green

Ok will make the change with @require_ort_rocm . Could change the gpu_test, to cuda_and_trt_ep_tests as cuda and try ep are tested in same method .

On second thought, creating two markers for existing gpu tests cuda_ep_test or trt_ep_test. As only some of them have art

tests/onnxruntime/test_modeling.py

fxmarty

LGTM, great work!

docs/source/onnxruntime/usage_guides/amdgpu.mdx

fxmarty · 2023-11-23T16:55:40Z

One test is failing:

2023-11-23T13:43:36.0587352Z FAILED onnxruntime/test_utils.py::ProviderAndDeviceGettersTest::test_get_provider_for_device - AssertionError: 'ROCMExecutionProvider' != 'CUDAExecutionProvider'
2023-11-23T13:43:36.0587540Z - ROCMExecutionProvider
2023-11-23T13:43:36.0587650Z ? -- ^
2023-11-23T13:43:36.0587755Z + CUDAExecutionProvider
2023-11-23T13:43:36.0587831Z ?  ^^^
2023-11-23T13:43:36.0588086Z ==== 1 failed, 756 passed, 788 skipped, 2765 warnings in 1773.38s (0:29:33) ====

mht-sharma · 2023-11-24T12:01:22Z

One test is failing:

2023-11-23T13:43:36.0587352Z FAILED onnxruntime/test_utils.py::ProviderAndDeviceGettersTest::test_get_provider_for_device - AssertionError: 'ROCMExecutionProvider' != 'CUDAExecutionProvider'
2023-11-23T13:43:36.0587540Z - ROCMExecutionProvider
2023-11-23T13:43:36.0587650Z ? -- ^
2023-11-23T13:43:36.0587755Z + CUDAExecutionProvider
2023-11-23T13:43:36.0587831Z ?  ^^^
2023-11-23T13:43:36.0588086Z ==== 1 failed, 756 passed, 788 skipped, 2765 warnings in 1773.38s (0:29:33) ====

Added a check for this to determine the provider.

fxmarty · 2023-11-24T14:04:42Z

Hum it is still failing

mht-sharma added 2 commits November 17, 2023 14:56

add amd gpu tests

ac530b5

add docs

84699a5

mht-sharma marked this pull request as ready for review November 20, 2023 12:17

mht-sharma added the build-pr-doc label Nov 20, 2023

add docs

e80a8e0

fxmarty added build-pr-doc and removed build-pr-doc labels Nov 20, 2023

add docs

65df71b

mht-sharma added build-pr-doc and removed build-pr-doc labels Nov 20, 2023

Add ORT trainer docs and dockerfile

b2641a5

mht-sharma added build-pr-doc and removed build-pr-doc labels Nov 21, 2023

fxmarty mentioned this pull request Nov 21, 2023

Reflect RoCm support in the documentation huggingface/transformers#27636

Merged

fxmarty self-requested a review November 22, 2023 16:41

fxmarty reviewed Nov 23, 2023

View reviewed changes

mht-sharma added 4 commits November 23, 2023 12:52

addressed comments

26861b1

addressed comments

1de36bd

addressed comments

e02f6ea

added pytorch installation step

f798fd3

mht-sharma requested a review from fxmarty November 23, 2023 13:12

mht-sharma added build-pr-doc and removed build-pr-doc labels Nov 23, 2023

fxmarty approved these changes Nov 23, 2023

View reviewed changes

docs/source/onnxruntime/usage_guides/amdgpu.mdx Outdated Show resolved Hide resolved

update test

0fe7545

update

1551292

mht-sharma merged commit e2d3c8b into huggingface:main Nov 24, 2023
48 of 52 checks passed

fxmarty mentioned this pull request Nov 29, 2023

Make the 🤗 documentation more GPU vendor agnostic huggingface/optimum-amd#23

Closed

fxmarty mentioned this pull request Jan 9, 2024

Optimum supports AMDGPU　？ #1538

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AMD GPU support #1546

Add AMD GPU support #1546

mht-sharma commented Nov 17, 2023

HuggingFaceDocBuilderDev commented Nov 20, 2023 •

edited

Loading

fxmarty left a comment

fxmarty Nov 23, 2023

mht-sharma Nov 23, 2023

fxmarty Nov 23, 2023

mht-sharma Nov 23, 2023

fxmarty Nov 23, 2023

mht-sharma Nov 23, 2023

fxmarty Nov 23, 2023 •

edited

Loading

mht-sharma Nov 23, 2023

fxmarty Nov 23, 2023

mht-sharma Nov 23, 2023

mht-sharma Nov 23, 2023

fxmarty left a comment

fxmarty commented Nov 23, 2023

mht-sharma commented Nov 24, 2023

fxmarty commented Nov 24, 2023

		@@ -0,0 +1,117 @@
		# Accelerated inference on AMD GPUs

		By default, ONNX Runtime runs inference on CPU devices. However, it is possible to place supported operations on an AMD Instinct GPU, while leaving any unsupported ones on CPU. In most cases, this allows costly operations to be placed on GPU and significantly accelerate inference.

	#### 1. ROCM Installation (V 5.7.X)
	#### 1. ROCm Installation (V 5.7.X)

Add AMD GPU support #1546

Add AMD GPU support #1546

Conversation

mht-sharma commented Nov 17, 2023

What does this PR do?

Before submitting

HuggingFaceDocBuilderDev commented Nov 20, 2023 • edited Loading

fxmarty left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fxmarty Nov 23, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fxmarty left a comment

Choose a reason for hiding this comment

fxmarty commented Nov 23, 2023

mht-sharma commented Nov 24, 2023

fxmarty commented Nov 24, 2023

HuggingFaceDocBuilderDev commented Nov 20, 2023 •

edited

Loading

fxmarty Nov 23, 2023 •

edited

Loading