RFC-0036-hardware-accelerators-pytorch.org #63

bsochack · 2024-03-08T13:55:49Z

No description provided.

facebook-github-bot · 2024-03-08T13:55:56Z

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

facebook-github-bot · 2024-03-09T14:08:40Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

hipudding

Thanks to Intel colleagues for the excellent proposal. We fully support it. This proposal not only provides an opportunity for all hardware accelerators integrated with PyTorch to demonstrate their capabilities but also offers PyTorch users more comprehensive installation and usage guidance.

hipudding · 2024-03-19T01:46:07Z

RFC-0036-hardware-accelerators-pytorch.org.md

+### PyTorch integration types
+There are at least 2 ways how compute platforms are integrated with PyTorch:
+1. In-tree – CPU, CUDA, ROCm are developed, built and tested in PyTorch environment. PyTorch is ensuring quality criteria. This approach is limited to only a few compute platforms and it does not scale with number of compute platforms.
+2. Out-of-tree – Integration of other compute platforms like Intel Gaudi is done via additional python package (extension) that needs to be installed on top of PyTorch CPU package. Development, built and testing is done outside of PyTorch. In this case:


Does "out of tree" hardware accelerator refer to an accelerator that implements logic outside of PyTorch upstream, including devices accessed via privateuse1?

If so, could we specify "out-of-tree (including privateuse keys)" here?

Yes, "out of tree" refers to "integration logic" that is implemented outside of PyTorch upstream. It is applicable to privateuse1 and other accelerators with device specific dispatch key (i.e. Gaudi uses HPU key)

hipudding · 2024-03-19T01:46:19Z

RFC-0036-hardware-accelerators-pytorch.org.md

+PyTorch foundation shall introduce minimal requirements for new compute platforms.
+Report what kind of testing was done on a compute platform with PyTorch build
+* A common test report format - to be defined.
+  * A test report to contain: results of tests (PyTorch UT, TorchBench, model runs, additional test suites), tested OSes.


Due to variations in capabilities and implementation among different hardware accelerators, how should the minimum testing suite be defined?

This is a challenge. One option is to verify basic infrastructure and ops, i.e. tensor management, dispatcher, copy ops, basic math, hello world, etc.
The device capabilities reporting in PyTorch is not very extensive and extending it to take care of all hardware variants would be overengineering.

hipudding · 2024-03-19T01:47:44Z

If it's possible, we would like to apply to become co-authors, aiming to collaboratively promote the implementation of this proposal.

- added a new button "accelerators" with dropdown list - pytorch.org already provides a place for older releases - example of command to install device extension package

albanD

Hey!

Thank you very much for this proposal.
It all makes a lot of sense.
In particular:

Adding the Accelerators dropdown looks great
Adding a "Compute Platforms" landing page with sub-page for each system sounds good as well. The website is being re-organized right now, we most likely want to wait for this to be done to see what is the best place to put it.
I don't think we need to make too much distinction between in-tree and out-of-tree backends. Our goal is for the experience between the two to be the same and we need different install commands for all backends anyways.
The "Quality Criteria" section is the one that we need to clarify the most. Leaving comments inline.

albanD · 2024-05-06T15:29:32Z

RFC-0036-hardware-accelerators-pytorch.org.md

+1. Stable builds – the hardware provider shall provide the installation commands when a compute platform is tested against given PyTorch version and it meets the quality criteria set by PyTorch Foundation.    
+2. Preview (nightly) -  Similar to the stable builds, but the hardware provider must implement method to provide quickly fixes for PyTorch nightly.   It should be optional.
+
+### Quality criteria


I think we want to go a bit deeper into the details of what are going to be the criterions and expectations here.
I am also working on this and I think we need to properly define a couple of "tiers" for backends, defining expectations in details. I think that we can then refine the way we present them from this proposal where top-tier backends will be in the main get-started page, all others will be in the Compute Platforms page linked from get-started.

I'll take a stab at that proposal and come back to you here.

@albanD waiting for the proposal from your side.
Please let me know if I can help here.

My high level thoughts about quality criteria:

There are many types of devices and there should be a way for them to pass quality criteria

some of them are specialized i.e. inference only, machine learning or more general purpose like GPUs or CPUs

they differ in capabilities i.e. data types support, distributed ops support

there should be a way to mark relevant tests for a given device through test marking or more extensive capability reporting by a device

PyTorch has lots of unit tests, but they are very often specialized to CUDA

it will take a few PyTorch releases to generalize tests for other devices i.e. we have number of generic changes that will enable Intel Gaudi and other devices with PT UTs. We will follow up with RFCs and UT PRs

PyTorch UTs are probably the best ones to verify quality of PyTorch support by a device

The quality criteria should be set to reasonable level until PyTorch UT changes are upstreamed i.e. initially start with 50% pass rate for PT 2.4 and gradually increase it in the next releases

I agree that there should be a couple of "tiers"

Do you expect that 1st tier out of tree backends (the ones from the main get-started page) will work with PyTorch CPU from pytorch.org?

@albanD any update?

@albanD are there any details that can be shared with us?

Hey!
Sorry I have been held out in other projects (PT conference right now) and was not able to look at this yet.

thiagocrepaldi · 2024-05-07T19:00:35Z

FYI @wschin @xadupre

…sts (#126970) ### Motivation Intel Gaudi accelerator (device name hpu) is seen to have good pass rate with the pytorch framework UTs , however being an out-of-tree device, we face challenges in adapting the device to natively run the existing pytorch UTs under pytorch/test. The UTs however is a good indicator of the device stack health and as such we run them regularly with adaptations. Although we can add Gaudi/HPU device to generate the device specific tests using the TORCH_TEST_DEVICES environment variable, we miss out on lot of features such as executing for specific dtypes, skipping and overriding opInfo. With significant changes introduced every Pytorch release maintaining these adaptations become difficult and time consuming. Hence with this PR we introduce Gaudi device in common_device_type framework, so that the tests are instantiated for Gaudi when the library is loaded. The eventual goal is to introduce Gaudi out-of-tree support as equivalent to in-tree devices ### Changes Add HPUTestBase of type DeviceTypeTestBase specifying appropriate attributes for Gaudi/HPU. Include code to check if intel Gaudi Software library is loaded and if so, add the device to the list of devices considered for instantiation of device type tests ### Additional Context please refer the following RFC : pytorch/rfcs#63 Pull Request resolved: #126970 Approved by: https://github.com/albanD

…sts (pytorch#126970) ### Motivation Intel Gaudi accelerator (device name hpu) is seen to have good pass rate with the pytorch framework UTs , however being an out-of-tree device, we face challenges in adapting the device to natively run the existing pytorch UTs under pytorch/test. The UTs however is a good indicator of the device stack health and as such we run them regularly with adaptations. Although we can add Gaudi/HPU device to generate the device specific tests using the TORCH_TEST_DEVICES environment variable, we miss out on lot of features such as executing for specific dtypes, skipping and overriding opInfo. With significant changes introduced every Pytorch release maintaining these adaptations become difficult and time consuming. Hence with this PR we introduce Gaudi device in common_device_type framework, so that the tests are instantiated for Gaudi when the library is loaded. The eventual goal is to introduce Gaudi out-of-tree support as equivalent to in-tree devices ### Changes Add HPUTestBase of type DeviceTypeTestBase specifying appropriate attributes for Gaudi/HPU. Include code to check if intel Gaudi Software library is loaded and if so, add the device to the list of devices considered for instantiation of device type tests ### Additional Context please refer the following RFC : pytorch/rfcs#63 Pull Request resolved: pytorch#126970 Approved by: https://github.com/albanD

RFC-0036-hardware-accelerators-pytorch.org

48ca3c4

facebook-github-bot added the cla signed label Mar 9, 2024

hipudding reviewed Mar 19, 2024

View reviewed changes

bsochack added 2 commits March 22, 2024 08:43

Added Huawei as co-authors

232f454

Addressed comments and added Huawei Ascend to examples

4cf4fab

aradys mentioned this pull request Apr 24, 2024

Add accelerators to quick start table pytorch/pytorch.github.io#1596

Open

Updated a proposal per the latest feedback:

356d067

- added a new button "accelerators" with dropdown list - pytorch.org already provides a place for older releases - example of command to install device extension package

albanD reviewed May 6, 2024

View reviewed changes

ankurneog mentioned this pull request May 23, 2024

Add Intel Gaudi device/HPU to auto load in instantiate_device_type_tests pytorch/pytorch#126970

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC-0036-hardware-accelerators-pytorch.org #63

RFC-0036-hardware-accelerators-pytorch.org #63

bsochack commented Mar 8, 2024

facebook-github-bot commented Mar 8, 2024

facebook-github-bot commented Mar 9, 2024

hipudding left a comment •

edited

Loading

hipudding Mar 19, 2024

bsochack Mar 19, 2024

hipudding Mar 19, 2024

bsochack Mar 19, 2024

hipudding commented Mar 19, 2024

albanD left a comment

albanD May 6, 2024

bsochack May 7, 2024

bsochack May 22, 2024

bsochack May 29, 2024

bsochack Jun 25, 2024

albanD Jun 25, 2024

thiagocrepaldi commented May 7, 2024

RFC-0036-hardware-accelerators-pytorch.org #63

Are you sure you want to change the base?

RFC-0036-hardware-accelerators-pytorch.org #63

Conversation

bsochack commented Mar 8, 2024

facebook-github-bot commented Mar 8, 2024

Action Required

Process

facebook-github-bot commented Mar 9, 2024

hipudding left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hipudding commented Mar 19, 2024

albanD left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thiagocrepaldi commented May 7, 2024

hipudding left a comment •

edited

Loading