Add paddle.device.cuda.get_device_properties #35661

Yanxing-Shi · 2021-09-10T12:14:53Z

PR types

New features

PR changes

APIs

Describe

Add paddle.device.cuda.get_device_properties

paddle-bot-old · 2021-09-10T12:14:56Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

… add_get_device_properties

CLAassistant · 2021-09-14T14:47:43Z

All committers have signed the CLA.

sneaxiy · 2021-09-15T02:44:47Z

paddle/fluid/platform/gpu_info.cc

@@ -297,6 +314,64 @@ std::vector<int> GetSelectedDevices() {
  return devices;
 }

+#ifdef PADDLE_WITH_CUDA
+cudaDeviceProp *GetDeviceProperties(int id) {
+  std::call_once(init_flag, [&] {


Too many duplicate codes. I think we can unify the codes of CUDA and HIP by some kinds of techniques shown in paddle/fluid/platform/type_defs.h.

sneaxiy · 2021-09-15T02:45:45Z

paddle/fluid/platform/gpu_info.cc

@@ -39,6 +44,18 @@ DECLARE_uint64(gpu_memory_limit_mb);

 constexpr static float fraction_reserve_gpu_memory = 0.05f;

+static std::once_flag init_flag;
+static std::deque<std::once_flag> device_flags;
+static int gpu_num = -1;


I do not think you should add this global variable gpu_num. As I mentioned offline yesterday, you do not need to record the gpu_num. You can use device_props.size() below.

sneaxiy · 2021-09-15T02:46:45Z

paddle/fluid/platform/gpu_info.cc

+static int gpu_num = -1;
+
+#ifdef PADDLE_WITH_CUDA
+static std::vector<cudaDeviceProp> device_props;


device_props -> g_device_props. Here, the prefix g_ means global variables.

sneaxiy · 2021-09-15T02:52:39Z

paddle/fluid/platform/gpu_info.cc

@@ -39,6 +44,18 @@ DECLARE_uint64(gpu_memory_limit_mb);

 constexpr static float fraction_reserve_gpu_memory = 0.05f;

+static std::once_flag init_flag;
+static std::deque<std::once_flag> device_flags;


Too fancy codes to use std::deque::resize for non-copyable and non-moveable types.

Although std::deque::resize can accept the non-copyable and non-moveable types like std::once_flag (note that std::once_flag is neither copyable nor moveable, see https://en.cppreference.com/w/cpp/thread/once_flag), many of us should think that std::deque should accept the moveable types at least. I do not think it is a good idea to use std::deque<std::once_flag>.

Alternatively, you use use std::vector<std::unique_ptr<std::once_flag>>.

sneaxiy · 2021-09-15T02:55:07Z

paddle/fluid/platform/gpu_info.cc

+
+  std::call_once(device_flags[id], [&] {
+    cudaDeviceProp device_prop;
+    PADDLE_ENFORCE_CUDA_SUCCESS(cudaGetDeviceProperties(&device_prop, id));


You need not to use the temporary variable device_prop. You can write the more direct codes:

PADDLE_ENFORCE_CUDA_SUCCESS(cudaGetDeviceProperties(&device_props[id], id));

sneaxiy · 2021-09-15T02:57:36Z

paddle/fluid/platform/gpu_info.h

@@ -67,6 +67,15 @@ dim3 GetGpuMaxGridDimSize(int);
 //! Get a list of device ids from environment variable or use all.
 std::vector<int> GetSelectedDevices();

+//! Get the properties of device.
+#ifdef PADDLE_WITH_CUDA
+cudaDeviceProp *GetDeviceProperties(int id);


I have emphasized the Google coding style for a long time. Use const reference for unchanged variables, and use non-const pointer for changeable variables.

You should not return non-const pointer here, because one should think that returning non-const pointer means the users may change the returned value. However, I do not think that cudaDeviceProp should be changed.

sneaxiy · 2021-09-15T02:57:56Z

paddle/fluid/pybind/pybind.cc

@@ -2267,6 +2267,71 @@ All parameter, weight, gradient are variables in Paddle.
 #endif
 #endif

+#ifdef PADDLE_WITH_CUDA
+  m.def("get_device_properties",
+        [](int id) -> cudaDeviceProp * {


Duplicate codes for CUDA and HIP.

sneaxiy · 2021-09-15T02:59:13Z

python/paddle/device/cuda/__init__.py

@@ -24,6 +24,7 @@
    'synchronize',
    'device_count',
    'empty_cache',
+    'get_device_properties'


Add comma in the end.

sneaxiy · 2021-09-15T03:02:11Z

python/paddle/device/cuda/__init__.py

+        else:
+            raise ValueError("device type must be int or paddle.CUDAPlace")
+
+    return core.get_device_properties(device_id)


Not friendly error message. If the users install the CPU version and he/she calls paddle.device.cuda.get_device_properties(), he/she would get the error message like "core" has no attribute "get_device_properties". It is not friendly to users.

sneaxiy · 2021-09-17T03:50:11Z

paddle/fluid/platform/gpu_info.cc

@@ -22,6 +22,11 @@ limitations under the License. */
 #else
 #include "paddle/fluid/platform/dynload/cudnn.h"
 #endif
+
+#include <deque>


Remove useless headers.

sneaxiy · 2021-09-17T03:52:32Z

paddle/fluid/platform/gpu_info.cc

@@ -39,6 +44,10 @@ DECLARE_uint64(gpu_memory_limit_mb);

 constexpr static float fraction_reserve_gpu_memory = 0.05f;

+static std::once_flag g_init_flag;
+static std::vector<std::unique_ptr<std::once_flag>> g_device_flags;


The variable names g_init_flag and g_device_flags are not readable. Nobody knows what they mean if he/she does not read the following codes.

Try:

g_init_flag -> g_device_props_size_init_flag

g_device_flags -> g_device_prop_init_flags

sneaxiy · 2021-09-17T03:53:46Z

paddle/fluid/platform/gpu_info.cc

@@ -297,6 +306,43 @@ std::vector<int> GetSelectedDevices() {
  return devices;
 }

+const gpuDeviceProp *GetDeviceProperties(int id) {


Have you read my comment #35661 (comment) carefully? I mean const reference.

sneaxiy · 2021-09-17T03:55:01Z

paddle/fluid/platform/gpu_info.cc

@@ -297,6 +306,43 @@ std::vector<int> GetSelectedDevices() {
  return devices;
 }

+const gpuDeviceProp *GetDeviceProperties(int id) {
+  int gpu_num = 0;


No need to put this variable outside the std::call_once. I mean this line can be put inside the following std::call_once.

sneaxiy · 2021-09-17T03:55:33Z

paddle/fluid/platform/gpu_info.cc

+    g_device_flags.resize(gpu_num);
+    g_device_props.resize(gpu_num);
+    for (int i = 0; i < gpu_num; ++i) {
+      g_device_flags[i] = std::unique_ptr<std::once_flag>(new std::once_flag());


g_device_flags[i] = std::make_unique<std::once_flag>() would be simpler.

sneaxiy · 2021-09-17T03:57:17Z

paddle/fluid/platform/gpu_info.cc

+        "The device id: %d exceeds out of range [0, the number of devices: %d "
+        "on this machine). Because the device id should be greater than or "
+        "equal to zero and smaller than the number of gpus. Please input "
+        "appropriate device again!",


Your error message would be like:

The device id: 8 exceeds out of range [0, the number of devices: 4 on this machine).

This is:

in the wrong format in math.

wrong grammar in English. The word exceed has the meaning of out of range!

I prefer that the error message would be like:

The device id 8 is out of range [0, 4), where 4 is the number of devices on this machine.

sneaxiy · 2021-09-17T04:09:36Z

python/paddle/device/cuda/__init__.py

 from paddle.fluid.wrapped_decorator import signature_safe_contextmanager
+from paddle.device import is_compiled_with_cuda


I guess this line would cause circular dependencies. Are you sure it is right? If you have checked, please tell me.

My thought on why there are circular dependencies:

When users import paddle.device.cuda, Python would import python/paddle/device/cuda/__init__.py and reach this line from paddle.device import is_compiled_with_cuda.

Then, Python would try to import paddle.device. Inside python/paddle/device/__init__.py, there is the code from . import cuda. So, Python would import python/paddle/device/cuda/__init__.py again. That is why I think the circular dependencies occur.

sneaxiy · 2021-09-17T04:11:41Z

python/paddle/device/cuda/__init__.py

+    '''
+
+    place = framework._current_expected_place()
+    if not isinstance(place, core.CUDAPlace) or not is_compiled_with_cuda():


I do not think that you should check isinstance(place, core.CUDAPlace).

The framework._current_expected_place() may return core.CPUPlace(). Even if users install the GPU version Paddle, they can use CPU instead of GPU.

sneaxiy · 2021-09-17T04:14:00Z

python/paddle/device/cuda/__init__.py

+        raise ValueError(
+            "The current device: {} is not expected. Because paddle.device.cuda."
+            "get_device_properties only support cuda device Please change device"
+            "and input device again!".format(place))


I am confused with this sentence. What are the differences between device and input device?

sneaxiy · 2021-09-22T08:17:32Z

python/paddle/device/cuda/__init__.py

+    Return the properties of given CUDA device.
+
+    Args:
+        device(paddle.CUDAPlace() or int or str): The device, the ID of the device 


paddle.CUDAPlace() -> paddle.CUDAPlace

sneaxiy · 2021-09-22T08:17:34Z

python/paddle/device/cuda/__init__.py

+            Default: None.
+
+    Returns:
+        _CudaDeviceProperties: the properties of the device which include ASCII string 


_CudaDeviceProperties -> _gpuDeviceProperties? I guess you can fix this in the next PR with test=document_fix.。。

sneaxiy · 2021-09-22T08:20:17Z

python/paddle/device/cuda/__init__.py

+
+    if not core.is_compiled_with_cuda():
+        raise ValueError(
+            "The current device {} is not expected. Because paddle.device.cuda."


It seems that the error message is not right. It should be something like The API paddle.device.cuda.get_device_properties() is not supported in CPU-only PaddlePaddle. Please reinstall PaddlePaddle with GPU support to call this API.

sneaxiy · 2021-09-22T08:20:55Z

python/paddle/device/cuda/__init__.py

+
+    device_id = -1
+
+    if device is not None:


What if device is None?

sneaxiy · 2021-09-22T08:21:16Z

python/paddle/device/cuda/__init__.py

+                raise ValueError(
+                    "The current string {} is not expected. Because paddle.device."
+                    "cuda.get_device_properties only support string which is like 'gpu:x'. "
+                    "Please input appropriat string again!".format(device))


Wrong word appropriat.

sneaxiy

LGTM.

zhhsplendid

LGTM

* Initial Commit * add unittest and add error information * modify doc * fix some error * fix some word * fix bug cudaDeviceProp* and modify error explanation * fix cudaDeviceProp* error and unnitest samples * fix hip error and PADDLE_WITH_HIP * update style * fix error is_compiled_with_cuda * fix paddle.device.cuda.get_device_properties * fix error for multi thread safe * update style * merge conflict * modify after mentor review * update style * delete word * fix unittest error for windows * support string input and modify some code * modify doc to support string input * fix error for express information * fix error for express information * fix unnitest for windows * fix device.startswith('gpu:') * format error and doc * fix after review * format code * fix error for doc compile * fix error for doc compile * fix error for doc compile * fix error for doc compile * fix error for doc compile * fix py2 error * fix wrong words and doc * fix _gpuDeviceProperties

Yanxing-Shi added 5 commits September 9, 2021 07:05

Initial Commit

f54af31

add unittest and add error information

7037648

modify doc

04ff014

fix some error

cbd6b67

fix some word

83ab60c

Yanxing-Shi changed the title ~~Add get device properties~~ Add paddle.get_device_properties Sep 10, 2021

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

ccbd8b5

… add_get_device_properties

Yanxing-Shi changed the title ~~Add paddle.get_device_properties~~ 【WIP】Add paddle.get_device_properties Sep 12, 2021

Yanxing-Shi changed the title ~~【WIP】Add paddle.get_device_properties~~ [WIP] Add paddle.get_device_properties Sep 12, 2021

Yanxing-Shi marked this pull request as draft September 12, 2021 10:01

fix bug cudaDeviceProp* and modify error explanation

c596407

Yanxing-Shi added 7 commits September 13, 2021 07:41

fix cudaDeviceProp* error and unnitest samples

06998e2

fix hip error and PADDLE_WITH_HIP

017daac

update style

529f236

fix error is_compiled_with_cuda

763f30d

fix paddle.device.cuda.get_device_properties

6427dea

fix error for multi thread safe

137a6e8

update style

eb84bb6

Yanxing-Shi added 2 commits September 14, 2021 15:04

merge conflict

3e6da93

merge conflict

b5d3d04

Yanxing-Shi marked this pull request as ready for review September 15, 2021 02:35

Yanxing-Shi changed the title ~~[WIP] Add paddle.get_device_properties~~ Add paddle.device.cuda.get_device_properties Sep 15, 2021

sneaxiy reviewed Sep 15, 2021

View reviewed changes

modify after mentor review

eda59b4

Yanxing-Shi added 4 commits September 16, 2021 14:18

fix error for express information

2d30961

fix unnitest for windows

7240bc8

fix device.startswith('gpu:')

afc33cb

format error and doc

5ca5627

sneaxiy reviewed Sep 17, 2021

View reviewed changes

Yanxing-Shi added 2 commits September 17, 2021 07:18

fix after review

a9f695c

format code

0d798cc

Yanxing-Shi mentioned this pull request Sep 17, 2021

Add paddle.device.cuda.get_device_properties doc PaddlePaddle/docs#3888

Merged

Yanxing-Shi added 6 commits September 17, 2021 11:27

fix error for doc compile

78be131

fix error for doc compile

7a95b5a

fix error for doc compile

9890322

fix error for doc compile

c610b92

fix error for doc compile

ff503ec

fix py2 error

b207275

sneaxiy reviewed Sep 22, 2021

View reviewed changes

fix wrong words and doc

ff1f79a

fix _gpuDeviceProperties

55f8d4c

sneaxiy approved these changes Sep 27, 2021

View reviewed changes

sneaxiy requested review from XiaoguangHu01, TCChenlong and zhhsplendid September 27, 2021 08:10

zhhsplendid approved these changes Sep 27, 2021

View reviewed changes

TCChenlong approved these changes Sep 27, 2021

View reviewed changes

sneaxiy requested a review from lanxianghit September 27, 2021 08:49

lanxianghit approved these changes Sep 27, 2021

View reviewed changes

sneaxiy merged commit 4cbed9e into PaddlePaddle:develop Sep 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add paddle.device.cuda.get_device_properties #35661

Add paddle.device.cuda.get_device_properties #35661

Yanxing-Shi commented Sep 10, 2021 •

edited

Loading

paddle-bot-old bot commented Sep 10, 2021

CLAassistant commented Sep 14, 2021 •

edited

Loading

sneaxiy Sep 15, 2021

sneaxiy Sep 15, 2021

sneaxiy Sep 15, 2021

sneaxiy Sep 15, 2021

sneaxiy Sep 15, 2021

sneaxiy Sep 15, 2021

sneaxiy Sep 15, 2021

sneaxiy Sep 15, 2021

sneaxiy Sep 15, 2021 •

edited

Loading

sneaxiy Sep 17, 2021

sneaxiy Sep 17, 2021

sneaxiy Sep 17, 2021

sneaxiy Sep 17, 2021

sneaxiy Sep 17, 2021

sneaxiy Sep 17, 2021

sneaxiy Sep 17, 2021

sneaxiy Sep 17, 2021 •

edited

Loading

sneaxiy Sep 17, 2021

sneaxiy Sep 22, 2021

sneaxiy Sep 22, 2021

sneaxiy Sep 22, 2021

sneaxiy Sep 22, 2021

sneaxiy Sep 22, 2021

sneaxiy left a comment

zhhsplendid left a comment

		from paddle.fluid.wrapped_decorator import signature_safe_contextmanager
		from paddle.device import is_compiled_with_cuda

Add paddle.device.cuda.get_device_properties #35661

Add paddle.device.cuda.get_device_properties #35661

Conversation

Yanxing-Shi commented Sep 10, 2021 • edited Loading

PR types

PR changes

Describe

paddle-bot-old bot commented Sep 10, 2021

CLAassistant commented Sep 14, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sneaxiy Sep 15, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sneaxiy Sep 17, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sneaxiy left a comment

Choose a reason for hiding this comment

zhhsplendid left a comment

Choose a reason for hiding this comment

Yanxing-Shi commented Sep 10, 2021 •

edited

Loading

CLAassistant commented Sep 14, 2021 •

edited

Loading

sneaxiy Sep 15, 2021 •

edited

Loading

sneaxiy Sep 17, 2021 •

edited

Loading