Add 'sycl' devices to the context #9691

razdoburdin · 2023-10-18T14:27:56Z

Hi,
I want to restart the discussion about the perspectives of adding SYCL support as a plugin for XGBoost.

Supporting of new devices will require some changes in the main part of the code, especially in the context of the Context class. The current PR adds SYCL devices to the Contex object.

Supported values of the device parameter are:

cpu: using OpenMP backend with CPU threads.
cuda: using cuda backend on cuda device.
gpu: the same as cuda for now.
sycl: using SYCL backend on default SYCL device.
sycl:cpu: using SYCL backend on CPU; threads and async evaluation are under control by SYCL runtime.
sycl:gpu: using SYCL backend on GPU.
sycl:N: using SYCL backend on SYCL device with index N, can be CPU or GPU depending on user's setup.
sycl:cpu:N: using SYCL backend on CPU with index N.
sycl:gpu:N: use SYCL backend on GPU with index N.

trivialfis · 2023-10-18T14:56:37Z

Hi @razdoburdin , good to see you again, and thank you for continuing the work on sycl support!

Would be great if you could share some more info:

What's sync:cpu? How is it different from the OpenMP CPU, feature-wise and performance-wise?
How to add a plugin in the Intel repo? Is it a static plugin (intel-variant of xgboost in conda-forge, for instance) or a dynamic plugin (dlopen).
As for the plugin, I think it's a reasonable choice to have a separate repo, or in the xgboost/plugin/sycl directory. Either way, we need to have some CI infra in XGBoost so that we don't break the code or the logic to link the Intel repo immediately.

napetrov · 2023-10-18T17:01:44Z

src/context.cc

-  std::string s_device = std::regex_replace(input, std::regex{"gpu"}, DeviceSym::CUDA());
+  std::string s_device = input;
+  if (!std::regex_match(s_device, std::regex("sycl(:cpu|:gpu)?(:-1|:[0-9]+)?")))
+    s_device = std::regex_replace(s_device, std::regex{"gpu"}, DeviceSym::CUDA());


Shouldn't this mean
User specify any GPU on the system -> xgboost check available devices and pickup GPU device that i can execute on.
I.e. in case there is only CUDA device - it will be used
In case there is only SYCL device - it will be used
The question what to do in case both are available.. Might be warning about non deterministic selection or might be even error. Not sure how realistic this case is

I guess in the future it should work like you have described. But let's postpone this fix for future.

razdoburdin · 2023-10-19T13:00:43Z

Hi, @trivialfis

What's sync:cpu? How is it different from the OpenMP CPU, feature-wise and performance-wise?

Sycl allows the same code to be executed on cpu or on gpu. Currently it has less features and performance than openMP realization. But we plan to work on it later.

How to add a plugin in the Intel repo? Is it a static plugin (intel-variant of xgboost in conda-forge, for instance) or a dynamic plugin (dlopen).

I guess in the ideal case user should have ability just to install xgboost (like pip install xgboost) and use it on any device he has. But at the current stage it would be much easyer to have a static-plugin in intel anaconda channel. However, the source code of the plugin should be availiable in the main repo to allow users to build it from scratch.

As for the plugin, I think it's a reasonable choice to have a separate repo, or in the xgboost/plugin/sycl directory. Either way, we need to have some CI infra in XGBoost so that we don't break the code or the logic to link the Intel repo immediately.

xgboost/plugin/sycl looks nice, great.
As for CI infra, it is a tricky question. Launching a sycl tests on cpu can be a good solution for this issue.

Would you like to have a call next week to discuss the integration stages?

napetrov · 2023-10-19T13:24:04Z

Would add couple cents.
On sycl:cpu - there are 2 parts here.
1st - is how well the SYCL would work on CPU. Overall this is not ideal and would require level_zero support and not just opencl runtime as done currently.
2nd - this would require sycl kernels tuning for CPUs

So we will not be working on 2nd until 1st would get improved. So we should keep this as an option to run sycl code on CPUs, but not recommend this to users at this point. But this might be usefull for validation/development purposes at this point

On CI - we in meantime would work on some form of GPU systems availability via https://cloud.intel.com/ so code also can be tested on Intel GPUs

trivialfis · 2023-10-20T05:06:53Z

Would you like to have a call next week to discuss the integration stages?

That would be great!

trivialfis

Regarding the interface, please consider describing the semantics precisely. For instance:

cpu: using openmp with cpu threads.
sycl: default device selected by sycl runtime.
sycl:cpu using sycl rt to manage threads and async evaluation model.
sycl:cpu:0 ???
sycl:gpu: using sycl gpu impl.
sycl:gpu:0: first sycl gpu device.
sycl:0 ???
gpu: ???

trivialfis · 2023-10-24T10:21:24Z

include/xgboost/context.h

@@ -74,6 +109,12 @@ struct DeviceOrd {
        return DeviceSym::CPU();
      case DeviceOrd::kCUDA:
        return DeviceSym::CUDA() + (':' + std::to_string(ordinal));
+      case DeviceOrd::kSyclDefault:


What is "sync:{ordinal}"?

It means using the SYCL device with index {ordinal}. It can be CPU or GPU, depending on user's system settings.

trivialfis · 2023-10-24T10:21:51Z

include/xgboost/context.h

+      case DeviceOrd::kSyclDefault:
+        return DeviceSym::SYCL_default() + (':' + std::to_string(ordinal));
+      case DeviceOrd::kSyclCPU:
+        return DeviceSym::SYCL_CPU() + (':' + std::to_string(ordinal));


What is "sycl:cpu:ordinal"

It means using CPU with the specific index for multi CPU systems.

multi-socket or multi-core?

multi-socket

trivialfis · 2023-10-24T10:23:10Z

include/xgboost/context.h

+   *
+   * @param ordinal SYCL device ordinal.
+   */
+  [[nodiscard]] constexpr static auto SYCL_default(bst_d_ordinal_t ordinal = -1) {


Please follow the function naming convention "SYCLDefault"/"SyclDefault"

trivialfis · 2023-10-24T10:24:56Z

include/xgboost/context.h

+  static auto constexpr SYCL_default() { return "sycl"; }
+  static auto constexpr SYCL_CPU() { return "sycl:cpu"; }
+  static auto constexpr SYCL_GPU() { return "sycl:gpu"; }


So, what's the default by specification?

SYCL try to run on GPU. If GPU isn't available for some reason, it launches on CPU.

razdoburdin · 2023-10-24T13:02:18Z

Regarding the interface, please consider describing the semantics precisely. For instance:

I updated the PR description.

trivialfis · 2023-10-25T08:54:12Z

src/context.cc

+- cuda:<device ordinal>            # e.g. cuda:0
 - gpu
- gpu:<device ordinal>   # e.g. gpu:0
+- gpu:<device ordinal>             # e.g. gpu:0
+- sycl
+- sycl:<device ordinal>            # e.g. sycl:0
+- sycl:<cpu, gpu>
+- sycl:<cpu, gpu>:<device ordinal> # e.g. sycl:gpu:0


Let's remove the error message change for now and get it back when there's a public ready feature that can be enabled. At this point, the message is only going to confuse users.

trivialfis

Overall looks good to me, please keep the change internal for now.

trivialfis · 2023-10-25T14:30:11Z

Please fix the tidy error

razdoburdin · 2023-10-26T07:23:56Z

Please fix the tidy error

fixed

trivialfis · 2023-10-27T05:31:47Z

@razdoburdin Apologies for my mistake of not catching it, could you please help fix the CI by conditioning the regex match based on __MING32__ ? It hangs on regex match.

https://github.com/dmlc/xgboost/actions/runs/6662511294/job/18106961526
https://github.com/dmlc/xgboost/actions/runs/6655456686/job/18085766529

razdoburdin · 2023-10-27T07:17:47Z

@razdoburdin Apologies for my mistake of not catching it, could you please help fix the CI by conditioning the regex match based on __MING32__ ? It hangs on regex match.

got it, will fix asap

add sycl devices to the context

48a240c

napetrov reviewed Oct 18, 2023

View reviewed changes

add tests for sycl context

2e42223

trivialfis reviewed Oct 24, 2023

View reviewed changes

fix naming

e837a05

trivialfis reviewed Oct 25, 2023

View reviewed changes

remove sycl from context error message

83c9371

fixing tidy

7fcd627

trivialfis merged commit f41a08f into dmlc:master Oct 26, 2023
24 of 25 checks passed

razdoburdin deleted the sycl_context branch October 27, 2023 08:02

razdoburdin mentioned this pull request Oct 27, 2023

Fix mingw hanging on regex in context #9729

Merged

razdoburdin mentioned this pull request Nov 21, 2023

Add support inference on SYCL devices #9800

Merged

razdoburdin mentioned this pull request Jan 26, 2024

SYCL. Add some basic functional for sycl implementation of partition builder. #10011

Merged

This was referenced Feb 14, 2024

SYCL. Add some basic functional for sycl implementation of GHistIndexMatrix. #10045

Merged

SYCL. Add functional for sycl implementation of RowSetCollection. #10057

Merged

razdoburdin mentioned this pull request Jun 3, 2024

2.1.0 release news. [skip ci] #10378

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add 'sycl' devices to the context #9691

Add 'sycl' devices to the context #9691

razdoburdin commented Oct 18, 2023 •

edited

Loading

trivialfis commented Oct 18, 2023

napetrov Oct 18, 2023

razdoburdin Oct 19, 2023

razdoburdin commented Oct 19, 2023

napetrov commented Oct 19, 2023

trivialfis commented Oct 20, 2023

trivialfis left a comment

trivialfis Oct 24, 2023

razdoburdin Oct 24, 2023

trivialfis Oct 24, 2023

razdoburdin Oct 24, 2023

trivialfis Oct 25, 2023

razdoburdin Oct 25, 2023

trivialfis Oct 24, 2023

razdoburdin Oct 24, 2023

trivialfis Oct 24, 2023

razdoburdin Oct 24, 2023

razdoburdin commented Oct 24, 2023

trivialfis Oct 25, 2023

razdoburdin Oct 25, 2023

trivialfis left a comment

trivialfis commented Oct 25, 2023

razdoburdin commented Oct 26, 2023

trivialfis commented Oct 27, 2023 •

edited

Loading

razdoburdin commented Oct 27, 2023

Add 'sycl' devices to the context #9691

Add 'sycl' devices to the context #9691

Conversation

razdoburdin commented Oct 18, 2023 • edited Loading

trivialfis commented Oct 18, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

razdoburdin commented Oct 19, 2023

napetrov commented Oct 19, 2023

trivialfis commented Oct 20, 2023

trivialfis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

razdoburdin commented Oct 24, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

trivialfis left a comment

Choose a reason for hiding this comment

trivialfis commented Oct 25, 2023

razdoburdin commented Oct 26, 2023

trivialfis commented Oct 27, 2023 • edited Loading

razdoburdin commented Oct 27, 2023

razdoburdin commented Oct 18, 2023 •

edited

Loading

trivialfis commented Oct 27, 2023 •

edited

Loading