Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC-0036-hardware-accelerators-pytorch.org #63

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

bsochack
Copy link

@bsochack bsochack commented Mar 8, 2024

No description provided.

@facebook-github-bot
Copy link
Contributor

Hi @bsochack!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

@facebook-github-bot
Copy link
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

Copy link

@hipudding hipudding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks to Intel colleagues for the excellent proposal. We fully support it. This proposal not only provides an opportunity for all hardware accelerators integrated with PyTorch to demonstrate their capabilities but also offers PyTorch users more comprehensive installation and usage guidance.

### PyTorch integration types
There are at least 2 ways how compute platforms are integrated with PyTorch:
1. In-tree – CPU, CUDA, ROCm are developed, built and tested in PyTorch environment. PyTorch is ensuring quality criteria. This approach is limited to only a few compute platforms and it does not scale with number of compute platforms.
2. Out-of-tree – Integration of other compute platforms like Intel Gaudi is done via additional python package (extension) that needs to be installed on top of PyTorch CPU package. Development, built and testing is done outside of PyTorch. In this case:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does "out of tree" hardware accelerator refer to an accelerator that implements logic outside of PyTorch upstream, including devices accessed via privateuse1?

If so, could we specify "out-of-tree (including privateuse keys)" here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, "out of tree" refers to "integration logic" that is implemented outside of PyTorch upstream. It is applicable to privateuse1 and other accelerators with device specific dispatch key (i.e. Gaudi uses HPU key)

PyTorch foundation shall introduce minimal requirements for new compute platforms.
Report what kind of testing was done on a compute platform with PyTorch build
* A common test report format - to be defined.
* A test report to contain: results of tests (PyTorch UT, TorchBench, model runs, additional test suites), tested OSes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to variations in capabilities and implementation among different hardware accelerators, how should the minimum testing suite be defined?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a challenge. One option is to verify basic infrastructure and ops, i.e. tensor management, dispatcher, copy ops, basic math, hello world, etc.
The device capabilities reporting in PyTorch is not very extensive and extending it to take care of all hardware variants would be overengineering.

@hipudding
Copy link

If it's possible, we would like to apply to become co-authors, aiming to collaboratively promote the implementation of this proposal.

- added a new button "accelerators" with dropdown list
- pytorch.org already provides a place for older releases
- example of command to install device extension package
Copy link
Contributor

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey!

Thank you very much for this proposal.
It all makes a lot of sense.
In particular:

  • Adding the Accelerators dropdown looks great
  • Adding a "Compute Platforms" landing page with sub-page for each system sounds good as well. The website is being re-organized right now, we most likely want to wait for this to be done to see what is the best place to put it.
  • I don't think we need to make too much distinction between in-tree and out-of-tree backends. Our goal is for the experience between the two to be the same and we need different install commands for all backends anyways.
  • The "Quality Criteria" section is the one that we need to clarify the most. Leaving comments inline.

1. Stable builds – the hardware provider shall provide the installation commands when a compute platform is tested against given PyTorch version and it meets the quality criteria set by PyTorch Foundation.
2. Preview (nightly) - Similar to the stable builds, but the hardware provider must implement method to provide quickly fixes for PyTorch nightly. It should be optional.

### Quality criteria
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to go a bit deeper into the details of what are going to be the criterions and expectations here.
I am also working on this and I think we need to properly define a couple of "tiers" for backends, defining expectations in details. I think that we can then refine the way we present them from this proposal where top-tier backends will be in the main get-started page, all others will be in the Compute Platforms page linked from get-started.

I'll take a stab at that proposal and come back to you here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@albanD waiting for the proposal from your side.
Please let me know if I can help here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My high level thoughts about quality criteria:

  1. There are many types of devices and there should be a way for them to pass quality criteria
  • some of them are specialized i.e. inference only, machine learning or more general purpose like GPUs or CPUs
  • they differ in capabilities i.e. data types support, distributed ops support
  • there should be a way to mark relevant tests for a given device through test marking or more extensive capability reporting by a device
  1. PyTorch has lots of unit tests, but they are very often specialized to CUDA
  • it will take a few PyTorch releases to generalize tests for other devices i.e. we have number of generic changes that will enable Intel Gaudi and other devices with PT UTs. We will follow up with RFCs and UT PRs
  • PyTorch UTs are probably the best ones to verify quality of PyTorch support by a device
  1. The quality criteria should be set to reasonable level until PyTorch UT changes are upstreamed i.e. initially start with 50% pass rate for PT 2.4 and gradually increase it in the next releases
  2. I agree that there should be a couple of "tiers"
  • Do you expect that 1st tier out of tree backends (the ones from the main get-started page) will work with PyTorch CPU from pytorch.org?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@albanD any update?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@albanD are there any details that can be shared with us?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey!
Sorry I have been held out in other projects (PT conference right now) and was not able to look at this yet.

@thiagocrepaldi
Copy link

FYI @wschin @xadupre

pytorchmergebot pushed a commit to pytorch/pytorch that referenced this pull request Jun 11, 2024
…sts (#126970)

### Motivation
Intel Gaudi accelerator (device name hpu) is seen to have good pass rate with the pytorch framework UTs , however being an out-of-tree device, we face challenges in adapting the device to natively run the existing pytorch UTs under pytorch/test. The UTs however is a good indicator of the device stack health and as such we run them regularly with adaptations.
Although we can add Gaudi/HPU device to generate the device specific tests using the TORCH_TEST_DEVICES environment variable, we miss out on lot of features such as executing for specific dtypes, skipping and overriding opInfo. With significant changes introduced every Pytorch release maintaining these adaptations become difficult and time consuming.
Hence with this PR  we introduce Gaudi device in common_device_type framework, so that the tests are instantiated for Gaudi when the library is loaded.
The eventual goal is to introduce Gaudi out-of-tree support as equivalent to in-tree devices

### Changes
Add HPUTestBase of type DeviceTypeTestBase specifying appropriate attributes for Gaudi/HPU.
Include code to check if  intel Gaudi Software library is loaded and if so, add the device to the list of devices considered for instantiation of device type tests

### Additional Context
please refer the following RFC : pytorch/rfcs#63

Pull Request resolved: #126970
Approved by: https://github.com/albanD
TharinduRusira pushed a commit to TharinduRusira/pytorch that referenced this pull request Jun 14, 2024
…sts (pytorch#126970)

### Motivation
Intel Gaudi accelerator (device name hpu) is seen to have good pass rate with the pytorch framework UTs , however being an out-of-tree device, we face challenges in adapting the device to natively run the existing pytorch UTs under pytorch/test. The UTs however is a good indicator of the device stack health and as such we run them regularly with adaptations.
Although we can add Gaudi/HPU device to generate the device specific tests using the TORCH_TEST_DEVICES environment variable, we miss out on lot of features such as executing for specific dtypes, skipping and overriding opInfo. With significant changes introduced every Pytorch release maintaining these adaptations become difficult and time consuming.
Hence with this PR  we introduce Gaudi device in common_device_type framework, so that the tests are instantiated for Gaudi when the library is loaded.
The eventual goal is to introduce Gaudi out-of-tree support as equivalent to in-tree devices

### Changes
Add HPUTestBase of type DeviceTypeTestBase specifying appropriate attributes for Gaudi/HPU.
Include code to check if  intel Gaudi Software library is loaded and if so, add the device to the list of devices considered for instantiation of device type tests

### Additional Context
please refer the following RFC : pytorch/rfcs#63

Pull Request resolved: pytorch#126970
Approved by: https://github.com/albanD
ignaciobartol pushed a commit to ignaciobartol/pytorch that referenced this pull request Jun 14, 2024
…sts (pytorch#126970)

### Motivation
Intel Gaudi accelerator (device name hpu) is seen to have good pass rate with the pytorch framework UTs , however being an out-of-tree device, we face challenges in adapting the device to natively run the existing pytorch UTs under pytorch/test. The UTs however is a good indicator of the device stack health and as such we run them regularly with adaptations.
Although we can add Gaudi/HPU device to generate the device specific tests using the TORCH_TEST_DEVICES environment variable, we miss out on lot of features such as executing for specific dtypes, skipping and overriding opInfo. With significant changes introduced every Pytorch release maintaining these adaptations become difficult and time consuming.
Hence with this PR  we introduce Gaudi device in common_device_type framework, so that the tests are instantiated for Gaudi when the library is loaded.
The eventual goal is to introduce Gaudi out-of-tree support as equivalent to in-tree devices

### Changes
Add HPUTestBase of type DeviceTypeTestBase specifying appropriate attributes for Gaudi/HPU.
Include code to check if  intel Gaudi Software library is loaded and if so, add the device to the list of devices considered for instantiation of device type tests

### Additional Context
please refer the following RFC : pytorch/rfcs#63

Pull Request resolved: pytorch#126970
Approved by: https://github.com/albanD
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants