-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add custom init grad for backward function #31540
add custom init grad for backward function #31540
Conversation
Thanks for your contribution! |
@@ -133,7 +133,7 @@ def set_value(self, value): | |||
framework._current_expected_place()) | |||
|
|||
@framework.dygraph_only | |||
def backward(self, retain_graph=False): | |||
def backward(self, retain_graph=False, grad_tensor=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个grad_tensor放到retain_graph前面是不是更合理一些,因为相对来说使用评率会更高,放到前面可能会引入一些兼容性风险,但长期更合理,可以看看目前框架内有多少使用retain_graph的测试
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -147,6 +147,10 @@ def backward(self, retain_graph=False): | |||
:code:`retain_graph` to True, then the grads will be retained. Thus, seting it to False is much more memory-efficient. | |||
Defaults to False. | |||
|
|||
grad_tensor(Tensor, optional): initial gradient values of `outputs` . If `grad_tensor` is None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
因为这个目前是的Tensor
的API,所以这里initial gradient values of outputs
改为initial gradient values of current Tensor
是不是更容易理解
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -147,6 +147,10 @@ def backward(self, retain_graph=False): | |||
:code:`retain_graph` to True, then the grads will be retained. Thus, seting it to False is much more memory-efficient. | |||
Defaults to False. | |||
|
|||
grad_tensor(Tensor, optional): initial gradient values of `outputs` . If `grad_tensor` is None, | |||
the initial gradient values of `outputs` would be Tensor filled with 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -147,6 +147,10 @@ def backward(self, retain_graph=False): | |||
:code:`retain_graph` to True, then the grads will be retained. Thus, seting it to False is much more memory-efficient. | |||
Defaults to False. | |||
|
|||
grad_tensor(Tensor, optional): initial gradient values of `outputs` . If `grad_tensor` is None, | |||
the initial gradient values of `outputs` would be Tensor filled with 1; | |||
if `grad_tensor` is not None, it must have the same length as `outputs`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
grad_tensor=paddle.to_tensor(2.) | ||
for i in range(5): | ||
y = paddle.pow(x, 4.0) | ||
y.backward(grad_tensor=grad_tensor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果改到前面,这里的参数名可以省略,写法更简洁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -176,7 +191,12 @@ def backward(self, retain_graph=False): | |||
scaled_loss._run_backward(framework._dygraph_tracer(), | |||
retain_graph) | |||
else: | |||
self._run_backward(framework._dygraph_tracer(), retain_graph) | |||
if grad_tensor is not None: | |||
assert grad_tensor.shape == self.shape, "Variable Shape not match, Variable of grad_tensor [ {} ] with shape {} mismatch Variable [ {} ] with shape {}".format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable
-> Tensor
,not match
-> does not match
,语句再组织一下,语法不太通顺,Variable统一改为使用Tensor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -36,7 +36,7 @@ DECLARE_bool(sort_sum_gradient); | |||
namespace paddle { | |||
namespace imperative { | |||
|
|||
void BasicEngine::Init(VarBase* var, bool retain_graph) { | |||
void BasicEngine::Init(VarBase* var, bool retain_graph, VarBase* grad_tensor) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以把 grad_tensor 设置为默认从参数nullptr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
声明处默认参数为nullptr
paddle/fluid/pybind/imperative.cc
Outdated
@@ -920,11 +920,11 @@ void BindImperative(py::module *m_ptr) { | |||
)DOC") | |||
.def("_run_backward", | |||
[](imperative::VarBase &self, const imperative::Tracer &tracer, | |||
bool retain_graph) { | |||
bool retain_graph, imperative::VarBase &grad_tensor) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里需要处理下默认参数, 后面加下py::arg("grad_tensor") = nullptr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
grad_var->Resize(fwd_var.dims()); | ||
grad_var->mutable_data(fwd_var.place(), fwd_var.type()); | ||
operators::math::set_constant(*dev_ctx, grad_var, 1.0); | ||
paddle::framework::TensorCopy(grad_tensor->Var().Get<framework::LoDTensor>(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grad_tensor=nullptr时,就设为1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -176,7 +191,17 @@ def backward(self, retain_graph=False): | |||
scaled_loss._run_backward(framework._dygraph_tracer(), | |||
retain_graph) | |||
else: | |||
self._run_backward(framework._dygraph_tracer(), retain_graph) | |||
if grad_tensor is None: | |||
grad_tensor = paddle.ones_like(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为None的_run_backward传两个参数,不为None的话_run_backward传三个参数。这样兼容升级就完全不影响以前的模型
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里改为std::shared_ptr 之后,None类型会转为nullptr,传3个参数可统一调用
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
if grad_tensor is not None: | ||
assert isinstance( | ||
grad_tensor, core. | ||
VarBase), "The type of grad_tensot must be paddle.VarBase" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paddle.VarBase
-> paddle.Tensor
?
assert isinstance( | ||
grad_tensor, core. | ||
VarBase), "The type of grad_tensot must be paddle.VarBase" | ||
assert grad_tensor.shape == self.shape, "Variable shape not match, Variable of grad_tensor [ {} ] with shape {} mismatch Variable [ {} ] with shape {}".format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable shape
-> Tensor shape
, Variable的描述统一使用Tensor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
grad_var->mutable_data(fwd_var.place(), fwd_var.type()); | ||
operators::math::set_constant(*dev_ctx, grad_var, 1.0); | ||
} else { | ||
paddle::framework::TensorCopy( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里是否需要check,grad_tensor的维度和var的维度是否一致呢?
@@ -133,7 +133,7 @@ def set_value(self, value): | |||
framework._current_expected_place()) | |||
|
|||
@framework.dygraph_only | |||
def backward(self, retain_graph=False): | |||
def backward(self, grad_tensor=None, retain_graph=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里只能处理一个tensor吧。如果要处理多个grad tensor呢?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里处理多个grad tensor是刚需吗
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以循环处理吗?单独的backward接口使用很低频
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
报错提示建议再完善一下,建议首字母大写,结尾加句点,避免用must be
,should be
这种语气,直接告诉用户哪里错了,然后可以怎么改
PADDLE_ENFORCE_EQ( | ||
tensors.size(), grad_tensors.size(), | ||
platform::errors::Unavailable( | ||
"the size of tensors must equal the size of grad_tensors, but" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommend to capitalize the first letter, the
-> The
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
PADDLE_ENFORCE_EQ( | ||
var->HasGradVar(), true, | ||
platform::errors::NotFound("Grad variable not exist for variable %s", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The message is a bit strange, maybe can tell users Tensor %s has no gradient
directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
auto var = tensors[i]; | ||
auto grad_tensor = grad_tensors[i]; | ||
|
||
auto init_node_ = var->GradVarBase()->GradNode(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
temp var doesn't need _
, use init_node
as name directly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -74,6 +76,7 @@ class BasicEngine : public Engine { | |||
std::vector<GradientAccumulator*> leaf_accumulators_; | |||
|
|||
bool retain_graph_; | |||
bool create_graph_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where use the create_graph_
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not used, removed.
@@ -1412,6 +1418,19 @@ void BindImperative(py::module *m_ptr) { | |||
}, | |||
py::call_guard<py::gil_scoped_release>()); | |||
|
|||
m.def( | |||
"dygraph_run_backward", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method no need show to users, use _
in the begining of method name, maybe still use _run_backward
is better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dygraph_run_backward
-> _run_backward
, 不带下划线开头的是公开API,这个API不需要开放给用户
paddle/fluid/pybind/imperative.cc
Outdated
@@ -919,12 +920,17 @@ void BindImperative(py::module *m_ptr) { | |||
print(x.grad) # None | |||
)DOC") | |||
.def("_run_backward", | |||
[](imperative::VarBase &self, const imperative::Tracer &tracer, | |||
bool retain_graph) { | |||
[](std::shared_ptr<imperative::VarBase> &self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this method, call core._run_backward directly in python
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed this method, but core._run_backward is private which can not be find in python. use core.dygraph_run_backward instead.
from . import backward_mode | ||
from .backward_mode import backward | ||
|
||
__all__ = ['grad'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also need backward
in all here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the next line __all__ += backward_mode.__all__
add the backward
tensors = check_tensors(tensors, "tensors") | ||
|
||
assert len(tensors) == len(set( | ||
tensors)), "the arg tensors should not contains same element" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use complete word, The argument 'tensors' of paddle.autograd.backward contains duplicate paddle.Tensor object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if each_tensor is not None: | ||
assert isinstance( | ||
each_tensor, paddle.Tensor | ||
), "grad_tensors must be None, Tensor or list containing None or Tensor" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confusing, The argument 'grad_tensors' of paddle.autograd.backward is invalid, it can be 'None', 'paddle.Tensor' or 'list[None/paddle.Tensor]'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
同意豁免PR-CI-ROCM-Compile ,代码与ROCM无关 |
PR types
New features
PR changes
APIs
Describe
When computing the tensor's backward, the initial grad_tensor is dafault set as paddle.ones. This PR add new kwargs
grad_tensor
for user to self define the staring grad. If thegrad_tensor
not set, use the default value which is paddle.ones.Doc preview