-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add paddle.device.cuda.get_device_properties #35661
Add paddle.device.cuda.get_device_properties #35661
Conversation
Thanks for your contribution! |
… add_get_device_properties
paddle/fluid/platform/gpu_info.cc
Outdated
@@ -297,6 +314,64 @@ std::vector<int> GetSelectedDevices() { | |||
return devices; | |||
} | |||
|
|||
#ifdef PADDLE_WITH_CUDA | |||
cudaDeviceProp *GetDeviceProperties(int id) { | |||
std::call_once(init_flag, [&] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too many duplicate codes. I think we can unify the codes of CUDA and HIP by some kinds of techniques shown in paddle/fluid/platform/type_defs.h
.
paddle/fluid/platform/gpu_info.cc
Outdated
@@ -39,6 +44,18 @@ DECLARE_uint64(gpu_memory_limit_mb); | |||
|
|||
constexpr static float fraction_reserve_gpu_memory = 0.05f; | |||
|
|||
static std::once_flag init_flag; | |||
static std::deque<std::once_flag> device_flags; | |||
static int gpu_num = -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think you should add this global variable gpu_num
. As I mentioned offline yesterday, you do not need to record the gpu_num
. You can use device_props.size()
below.
paddle/fluid/platform/gpu_info.cc
Outdated
static int gpu_num = -1; | ||
|
||
#ifdef PADDLE_WITH_CUDA | ||
static std::vector<cudaDeviceProp> device_props; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
device_props
-> g_device_props
. Here, the prefix g_
means global variables.
paddle/fluid/platform/gpu_info.cc
Outdated
@@ -39,6 +44,18 @@ DECLARE_uint64(gpu_memory_limit_mb); | |||
|
|||
constexpr static float fraction_reserve_gpu_memory = 0.05f; | |||
|
|||
static std::once_flag init_flag; | |||
static std::deque<std::once_flag> device_flags; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too fancy codes to use std::deque::resize
for non-copyable and non-moveable types.
Although std::deque::resize
can accept the non-copyable and non-moveable types like std::once_flag
(note that std::once_flag
is neither copyable nor moveable, see https://en.cppreference.com/w/cpp/thread/once_flag), many of us should think that std::deque
should accept the moveable types at least. I do not think it is a good idea to use std::deque<std::once_flag>
.
Alternatively, you use use std::vector<std::unique_ptr<std::once_flag>>
.
paddle/fluid/platform/gpu_info.cc
Outdated
|
||
std::call_once(device_flags[id], [&] { | ||
cudaDeviceProp device_prop; | ||
PADDLE_ENFORCE_CUDA_SUCCESS(cudaGetDeviceProperties(&device_prop, id)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need not to use the temporary variable device_prop
. You can write the more direct codes:
PADDLE_ENFORCE_CUDA_SUCCESS(cudaGetDeviceProperties(&device_props[id], id));
paddle/fluid/platform/gpu_info.h
Outdated
@@ -67,6 +67,15 @@ dim3 GetGpuMaxGridDimSize(int); | |||
//! Get a list of device ids from environment variable or use all. | |||
std::vector<int> GetSelectedDevices(); | |||
|
|||
//! Get the properties of device. | |||
#ifdef PADDLE_WITH_CUDA | |||
cudaDeviceProp *GetDeviceProperties(int id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have emphasized the Google coding style for a long time. Use const reference for unchanged variables, and use non-const pointer for changeable variables.
You should not return non-const pointer here, because one should think that returning non-const pointer means the users may change the returned value. However, I do not think that cudaDeviceProp
should be changed.
paddle/fluid/pybind/pybind.cc
Outdated
@@ -2267,6 +2267,71 @@ All parameter, weight, gradient are variables in Paddle. | |||
#endif | |||
#endif | |||
|
|||
#ifdef PADDLE_WITH_CUDA | |||
m.def("get_device_properties", | |||
[](int id) -> cudaDeviceProp * { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate codes for CUDA and HIP.
@@ -24,6 +24,7 @@ | |||
'synchronize', | |||
'device_count', | |||
'empty_cache', | |||
'get_device_properties' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comma in the end.
else: | ||
raise ValueError("device type must be int or paddle.CUDAPlace") | ||
|
||
return core.get_device_properties(device_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not friendly error message. If the users install the CPU version and he/she calls paddle.device.cuda.get_device_properties()
, he/she would get the error message like "core" has no attribute "get_device_properties"
. It is not friendly to users.
paddle/fluid/platform/gpu_info.cc
Outdated
@@ -22,6 +22,11 @@ limitations under the License. */ | |||
#else | |||
#include "paddle/fluid/platform/dynload/cudnn.h" | |||
#endif | |||
|
|||
#include <deque> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove useless headers.
paddle/fluid/platform/gpu_info.cc
Outdated
@@ -39,6 +44,10 @@ DECLARE_uint64(gpu_memory_limit_mb); | |||
|
|||
constexpr static float fraction_reserve_gpu_memory = 0.05f; | |||
|
|||
static std::once_flag g_init_flag; | |||
static std::vector<std::unique_ptr<std::once_flag>> g_device_flags; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable names g_init_flag
and g_device_flags
are not readable. Nobody knows what they mean if he/she does not read the following codes.
Try:
g_init_flag
->g_device_props_size_init_flag
g_device_flags
->g_device_prop_init_flags
paddle/fluid/platform/gpu_info.cc
Outdated
@@ -297,6 +306,43 @@ std::vector<int> GetSelectedDevices() { | |||
return devices; | |||
} | |||
|
|||
const gpuDeviceProp *GetDeviceProperties(int id) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you read my comment #35661 (comment) carefully? I mean const reference.
paddle/fluid/platform/gpu_info.cc
Outdated
@@ -297,6 +306,43 @@ std::vector<int> GetSelectedDevices() { | |||
return devices; | |||
} | |||
|
|||
const gpuDeviceProp *GetDeviceProperties(int id) { | |||
int gpu_num = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to put this variable outside the std::call_once
. I mean this line can be put inside the following std::call_once
.
paddle/fluid/platform/gpu_info.cc
Outdated
g_device_flags.resize(gpu_num); | ||
g_device_props.resize(gpu_num); | ||
for (int i = 0; i < gpu_num; ++i) { | ||
g_device_flags[i] = std::unique_ptr<std::once_flag>(new std::once_flag()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
g_device_flags[i] = std::make_unique<std::once_flag>()
would be simpler.
"The device id: %d exceeds out of range [0, the number of devices: %d " | ||
"on this machine). Because the device id should be greater than or " | ||
"equal to zero and smaller than the number of gpus. Please input " | ||
"appropriate device again!", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your error message would be like:
The device id: 8 exceeds out of range [0, the number of devices: 4 on this machine).
This is:
- in the wrong format in math.
- wrong grammar in English. The word
exceed
has the meaning ofout of range
!
I prefer that the error message would be like:
The device id 8 is out of range [0, 4), where 4 is the number of devices on this machine.
from paddle.fluid.wrapped_decorator import signature_safe_contextmanager | ||
from paddle.device import is_compiled_with_cuda |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this line would cause circular dependencies. Are you sure it is right? If you have checked, please tell me.
My thought on why there are circular dependencies:
- When users import
paddle.device.cuda
, Python would importpython/paddle/device/cuda/__init__.py
and reach this linefrom paddle.device import is_compiled_with_cuda
. - Then, Python would try to import
paddle.device
. Insidepython/paddle/device/__init__.py
, there is the codefrom . import cuda
. So, Python would importpython/paddle/device/cuda/__init__.py
again. That is why I think the circular dependencies occur.
''' | ||
|
||
place = framework._current_expected_place() | ||
if not isinstance(place, core.CUDAPlace) or not is_compiled_with_cuda(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think that you should check isinstance(place, core.CUDAPlace)
.
The framework._current_expected_place()
may return core.CPUPlace()
. Even if users install the GPU version Paddle, they can use CPU instead of GPU.
raise ValueError( | ||
"The current device: {} is not expected. Because paddle.device.cuda." | ||
"get_device_properties only support cuda device Please change device" | ||
"and input device again!".format(place)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused with this sentence. What are the differences between device
and input device
?
Return the properties of given CUDA device. | ||
|
||
Args: | ||
device(paddle.CUDAPlace() or int or str): The device, the ID of the device |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paddle.CUDAPlace()
-> paddle.CUDAPlace
Default: None. | ||
|
||
Returns: | ||
_CudaDeviceProperties: the properties of the device which include ASCII string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_CudaDeviceProperties
-> _gpuDeviceProperties
? I guess you can fix this in the next PR with test=document_fix
.。。
|
||
if not core.is_compiled_with_cuda(): | ||
raise ValueError( | ||
"The current device {} is not expected. Because paddle.device.cuda." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the error message is not right. It should be something like The API paddle.device.cuda.get_device_properties() is not supported in CPU-only PaddlePaddle. Please reinstall PaddlePaddle with GPU support to call this API
.
|
||
device_id = -1 | ||
|
||
if device is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if device is None
?
raise ValueError( | ||
"The current string {} is not expected. Because paddle.device." | ||
"cuda.get_device_properties only support string which is like 'gpu:x'. " | ||
"Please input appropriat string again!".format(device)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong word appropriat
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* Initial Commit * add unittest and add error information * modify doc * fix some error * fix some word * fix bug cudaDeviceProp* and modify error explanation * fix cudaDeviceProp* error and unnitest samples * fix hip error and PADDLE_WITH_HIP * update style * fix error is_compiled_with_cuda * fix paddle.device.cuda.get_device_properties * fix error for multi thread safe * update style * merge conflict * modify after mentor review * update style * delete word * fix unittest error for windows * support string input and modify some code * modify doc to support string input * fix error for express information * fix error for express information * fix unnitest for windows * fix device.startswith('gpu:') * format error and doc * fix after review * format code * fix error for doc compile * fix error for doc compile * fix error for doc compile * fix error for doc compile * fix error for doc compile * fix py2 error * fix wrong words and doc * fix _gpuDeviceProperties
PR types
New features
PR changes
APIs
Describe
Add paddle.device.cuda.get_device_properties