Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 'sycl' devices to the context #9691

Merged
merged 5 commits into from
Oct 26, 2023
Merged

Conversation

razdoburdin
Copy link
Contributor

@razdoburdin razdoburdin commented Oct 18, 2023

Hi,
I want to restart the discussion about the perspectives of adding SYCL support as a plugin for XGBoost.

Supporting of new devices will require some changes in the main part of the code, especially in the context of the Context class. The current PR adds SYCL devices to the Contex object.

Supported values of the device parameter are:

  • cpu: using OpenMP backend with CPU threads.
  • cuda: using cuda backend on cuda device.
  • gpu: the same as cuda for now.
  • sycl: using SYCL backend on default SYCL device.
  • sycl:cpu: using SYCL backend on CPU; threads and async evaluation are under control by SYCL runtime.
  • sycl:gpu: using SYCL backend on GPU.
  • sycl:N: using SYCL backend on SYCL device with index N, can be CPU or GPU depending on user's setup.
  • sycl:cpu:N: using SYCL backend on CPU with index N.
  • sycl:gpu:N: use SYCL backend on GPU with index N.

@trivialfis
Copy link
Member

Hi @razdoburdin , good to see you again, and thank you for continuing the work on sycl support!

Would be great if you could share some more info:

  • What's sync:cpu? How is it different from the OpenMP CPU, feature-wise and performance-wise?
  • How to add a plugin in the Intel repo? Is it a static plugin (intel-variant of xgboost in conda-forge, for instance) or a dynamic plugin (dlopen).
  • As for the plugin, I think it's a reasonable choice to have a separate repo, or in the xgboost/plugin/sycl directory. Either way, we need to have some CI infra in XGBoost so that we don't break the code or the logic to link the Intel repo immediately.

std::string s_device = std::regex_replace(input, std::regex{"gpu"}, DeviceSym::CUDA());
std::string s_device = input;
if (!std::regex_match(s_device, std::regex("sycl(:cpu|:gpu)?(:-1|:[0-9]+)?")))
s_device = std::regex_replace(s_device, std::regex{"gpu"}, DeviceSym::CUDA());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this mean
User specify any GPU on the system -> xgboost check available devices and pickup GPU device that i can execute on.
I.e. in case there is only CUDA device - it will be used
In case there is only SYCL device - it will be used
The question what to do in case both are available.. Might be warning about non deterministic selection or might be even error. Not sure how realistic this case is

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess in the future it should work like you have described. But let's postpone this fix for future.

@razdoburdin
Copy link
Contributor Author

Hi, @trivialfis

  • What's sync:cpu? How is it different from the OpenMP CPU, feature-wise and performance-wise?

Sycl allows the same code to be executed on cpu or on gpu. Currently it has less features and performance than openMP realization. But we plan to work on it later.

  • How to add a plugin in the Intel repo? Is it a static plugin (intel-variant of xgboost in conda-forge, for instance) or a dynamic plugin (dlopen).

I guess in the ideal case user should have ability just to install xgboost (like pip install xgboost) and use it on any device he has. But at the current stage it would be much easyer to have a static-plugin in intel anaconda channel. However, the source code of the plugin should be availiable in the main repo to allow users to build it from scratch.

  • As for the plugin, I think it's a reasonable choice to have a separate repo, or in the xgboost/plugin/sycl directory. Either way, we need to have some CI infra in XGBoost so that we don't break the code or the logic to link the Intel repo immediately.

xgboost/plugin/sycl looks nice, great.
As for CI infra, it is a tricky question. Launching a sycl tests on cpu can be a good solution for this issue.

Would you like to have a call next week to discuss the integration stages?

@napetrov
Copy link
Contributor

Would add couple cents.
On sycl:cpu - there are 2 parts here.
1st - is how well the SYCL would work on CPU. Overall this is not ideal and would require level_zero support and not just opencl runtime as done currently.
2nd - this would require sycl kernels tuning for CPUs

So we will not be working on 2nd until 1st would get improved. So we should keep this as an option to run sycl code on CPUs, but not recommend this to users at this point. But this might be usefull for validation/development purposes at this point

On CI - we in meantime would work on some form of GPU systems availability via https://cloud.intel.com/ so code also can be tested on Intel GPUs

@trivialfis
Copy link
Member

Would you like to have a call next week to discuss the integration stages?

That would be great!

Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the interface, please consider describing the semantics precisely. For instance:

  • cpu: using openmp with cpu threads.
  • sycl: default device selected by sycl runtime.
  • sycl:cpu using sycl rt to manage threads and async evaluation model.
  • sycl:cpu:0 ???
  • sycl:gpu: using sycl gpu impl.
  • sycl:gpu:0: first sycl gpu device.
  • sycl:0 ???
  • gpu: ???

@@ -74,6 +109,12 @@ struct DeviceOrd {
return DeviceSym::CPU();
case DeviceOrd::kCUDA:
return DeviceSym::CUDA() + (':' + std::to_string(ordinal));
case DeviceOrd::kSyclDefault:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is "sync:{ordinal}"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means using the SYCL device with index {ordinal}. It can be CPU or GPU, depending on user's system settings.

case DeviceOrd::kSyclDefault:
return DeviceSym::SYCL_default() + (':' + std::to_string(ordinal));
case DeviceOrd::kSyclCPU:
return DeviceSym::SYCL_CPU() + (':' + std::to_string(ordinal));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is "sycl:cpu:ordinal"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means using CPU with the specific index for multi CPU systems.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multi-socket or multi-core?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multi-socket

*
* @param ordinal SYCL device ordinal.
*/
[[nodiscard]] constexpr static auto SYCL_default(bst_d_ordinal_t ordinal = -1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow the function naming convention "SYCLDefault"/"SyclDefault"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 25 to 27
static auto constexpr SYCL_default() { return "sycl"; }
static auto constexpr SYCL_CPU() { return "sycl:cpu"; }
static auto constexpr SYCL_GPU() { return "sycl:gpu"; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, what's the default by specification?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SYCL try to run on GPU. If GPU isn't available for some reason, it launches on CPU.

@razdoburdin
Copy link
Contributor Author

Regarding the interface, please consider describing the semantics precisely. For instance:

I updated the PR description.

src/context.cc Outdated
Comment on lines 97 to 103
- cuda:<device ordinal> # e.g. cuda:0
- gpu
- gpu:<device ordinal> # e.g. gpu:0
- gpu:<device ordinal> # e.g. gpu:0
- sycl
- sycl:<device ordinal> # e.g. sycl:0
- sycl:<cpu, gpu>
- sycl:<cpu, gpu>:<device ordinal> # e.g. sycl:gpu:0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove the error message change for now and get it back when there's a public ready feature that can be enabled. At this point, the message is only going to confuse users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me, please keep the change internal for now.

@trivialfis
Copy link
Member

Please fix the tidy error

@razdoburdin
Copy link
Contributor Author

Please fix the tidy error

fixed

@trivialfis trivialfis merged commit f41a08f into dmlc:master Oct 26, 2023
24 of 25 checks passed
@trivialfis
Copy link
Member

trivialfis commented Oct 27, 2023

@razdoburdin Apologies for my mistake of not catching it, could you please help fix the CI by conditioning the regex match based on __MING32__ ? It hangs on regex match.

https://github.com/dmlc/xgboost/actions/runs/6662511294/job/18106961526
https://github.com/dmlc/xgboost/actions/runs/6655456686/job/18085766529

@razdoburdin
Copy link
Contributor Author

@razdoburdin Apologies for my mistake of not catching it, could you please help fix the CI by conditioning the regex match based on __MING32__ ? It hangs on regex match.

got it, will fix asap

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants