Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add custom init grad for backward function #31540

Merged

Conversation

MingMingShangTian
Copy link
Contributor

@MingMingShangTian MingMingShangTian commented Mar 11, 2021

PR types

New features

PR changes

APIs

Describe

When computing the tensor's backward, the initial grad_tensor is dafault set as paddle.ones. This PR add new kwargs grad_tensor for user to self define the staring grad. If the grad_tensor not set, use the default value which is paddle.ones.

Doc preview
image

image

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@@ -133,7 +133,7 @@ def set_value(self, value):
framework._current_expected_place())

@framework.dygraph_only
def backward(self, retain_graph=False):
def backward(self, retain_graph=False, grad_tensor=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个grad_tensor放到retain_graph前面是不是更合理一些,因为相对来说使用评率会更高,放到前面可能会引入一些兼容性风险,但长期更合理,可以看看目前框架内有多少使用retain_graph的测试

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -147,6 +147,10 @@ def backward(self, retain_graph=False):
:code:`retain_graph` to True, then the grads will be retained. Thus, seting it to False is much more memory-efficient.
Defaults to False.

grad_tensor(Tensor, optional): initial gradient values of `outputs` . If `grad_tensor` is None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

因为这个目前是的Tensor的API,所以这里initial gradient values of outputs改为initial gradient values of current Tensor是不是更容易理解

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -147,6 +147,10 @@ def backward(self, retain_graph=False):
:code:`retain_graph` to True, then the grads will be retained. Thus, seting it to False is much more memory-efficient.
Defaults to False.

grad_tensor(Tensor, optional): initial gradient values of `outputs` . If `grad_tensor` is None,
the initial gradient values of `outputs` would be Tensor filled with 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -147,6 +147,10 @@ def backward(self, retain_graph=False):
:code:`retain_graph` to True, then the grads will be retained. Thus, seting it to False is much more memory-efficient.
Defaults to False.

grad_tensor(Tensor, optional): initial gradient values of `outputs` . If `grad_tensor` is None,
the initial gradient values of `outputs` would be Tensor filled with 1;
if `grad_tensor` is not None, it must have the same length as `outputs`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

grad_tensor=paddle.to_tensor(2.)
for i in range(5):
y = paddle.pow(x, 4.0)
y.backward(grad_tensor=grad_tensor)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果改到前面,这里的参数名可以省略,写法更简洁

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -176,7 +191,12 @@ def backward(self, retain_graph=False):
scaled_loss._run_backward(framework._dygraph_tracer(),
retain_graph)
else:
self._run_backward(framework._dygraph_tracer(), retain_graph)
if grad_tensor is not None:
assert grad_tensor.shape == self.shape, "Variable Shape not match, Variable of grad_tensor [ {} ] with shape {} mismatch Variable [ {} ] with shape {}".format(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable -> Tensornot match -> does not match,语句再组织一下,语法不太通顺,Variable统一改为使用Tensor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -36,7 +36,7 @@ DECLARE_bool(sort_sum_gradient);
namespace paddle {
namespace imperative {

void BasicEngine::Init(VarBase* var, bool retain_graph) {
void BasicEngine::Init(VarBase* var, bool retain_graph, VarBase* grad_tensor) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以把 grad_tensor 设置为默认从参数nullptr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

声明处默认参数为nullptr

@@ -920,11 +920,11 @@ void BindImperative(py::module *m_ptr) {
)DOC")
.def("_run_backward",
[](imperative::VarBase &self, const imperative::Tracer &tracer,
bool retain_graph) {
bool retain_graph, imperative::VarBase &grad_tensor) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里需要处理下默认参数, 后面加下py::arg("grad_tensor") = nullptr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

grad_var->Resize(fwd_var.dims());
grad_var->mutable_data(fwd_var.place(), fwd_var.type());
operators::math::set_constant(*dev_ctx, grad_var, 1.0);
paddle::framework::TensorCopy(grad_tensor->Var().Get<framework::LoDTensor>(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grad_tensor=nullptr时,就设为1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -176,7 +191,17 @@ def backward(self, retain_graph=False):
scaled_loss._run_backward(framework._dygraph_tracer(),
retain_graph)
else:
self._run_backward(framework._dygraph_tracer(), retain_graph)
if grad_tensor is None:
grad_tensor = paddle.ones_like(self)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为None的_run_backward传两个参数,不为None的话_run_backward传三个参数。这样兼容升级就完全不影响以前的模型

Copy link
Contributor Author

@MingMingShangTian MingMingShangTian Mar 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里改为std::shared_ptr 之后,None类型会转为nullptr,传3个参数可统一调用

TCChenlong
TCChenlong previously approved these changes Mar 16, 2021
Copy link
Contributor

@TCChenlong TCChenlong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

if grad_tensor is not None:
assert isinstance(
grad_tensor, core.
VarBase), "The type of grad_tensot must be paddle.VarBase"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paddle.VarBase -> paddle.Tensor?

assert isinstance(
grad_tensor, core.
VarBase), "The type of grad_tensot must be paddle.VarBase"
assert grad_tensor.shape == self.shape, "Variable shape not match, Variable of grad_tensor [ {} ] with shape {} mismatch Variable [ {} ] with shape {}".format(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable shape -> Tensor shape, Variable的描述统一使用Tensor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

grad_var->mutable_data(fwd_var.place(), fwd_var.type());
operators::math::set_constant(*dev_ctx, grad_var, 1.0);
} else {
paddle::framework::TensorCopy(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里是否需要check,grad_tensor的维度和var的维度是否一致呢?

@@ -133,7 +133,7 @@ def set_value(self, value):
framework._current_expected_place())

@framework.dygraph_only
def backward(self, retain_graph=False):
def backward(self, grad_tensor=None, retain_graph=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里只能处理一个tensor吧。如果要处理多个grad tensor呢?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里处理多个grad tensor是刚需吗

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以循环处理吗?单独的backward接口使用很低频

Copy link
Contributor

@chenwhql chenwhql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

报错提示建议再完善一下,建议首字母大写,结尾加句点,避免用must beshould be这种语气,直接告诉用户哪里错了,然后可以怎么改

PADDLE_ENFORCE_EQ(
tensors.size(), grad_tensors.size(),
platform::errors::Unavailable(
"the size of tensors must equal the size of grad_tensors, but"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend to capitalize the first letter, the -> The

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


PADDLE_ENFORCE_EQ(
var->HasGradVar(), true,
platform::errors::NotFound("Grad variable not exist for variable %s",
Copy link
Contributor

@chenwhql chenwhql Mar 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message is a bit strange, maybe can tell users Tensor %s has no gradient directly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

auto var = tensors[i];
auto grad_tensor = grad_tensors[i];

auto init_node_ = var->GradVarBase()->GradNode();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

temp var doesn't need _, use init_node as name directly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -74,6 +76,7 @@ class BasicEngine : public Engine {
std::vector<GradientAccumulator*> leaf_accumulators_;

bool retain_graph_;
bool create_graph_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where use the create_graph_?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not used, removed.

@@ -1412,6 +1418,19 @@ void BindImperative(py::module *m_ptr) {
},
py::call_guard<py::gil_scoped_release>());

m.def(
"dygraph_run_backward",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method no need show to users, use _ in the begining of method name, maybe still use _run_backward is better

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dygraph_run_backward -> _run_backward, 不带下划线开头的是公开API,这个API不需要开放给用户

@@ -919,12 +920,17 @@ void BindImperative(py::module *m_ptr) {
print(x.grad) # None
)DOC")
.def("_run_backward",
[](imperative::VarBase &self, const imperative::Tracer &tracer,
bool retain_graph) {
[](std::shared_ptr<imperative::VarBase> &self,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this method, call core._run_backward directly in python

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed this method, but core._run_backward is private which can not be find in python. use core.dygraph_run_backward instead.

from . import backward_mode
from .backward_mode import backward

__all__ = ['grad']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also need backward in all here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the next line __all__ += backward_mode.__all__ add the backward

tensors = check_tensors(tensors, "tensors")

assert len(tensors) == len(set(
tensors)), "the arg tensors should not contains same element"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use complete word, The argument 'tensors' of paddle.autograd.backward contains duplicate paddle.Tensor object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if each_tensor is not None:
assert isinstance(
each_tensor, paddle.Tensor
), "grad_tensors must be None, Tensor or list containing None or Tensor"
Copy link
Contributor

@chenwhql chenwhql Mar 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confusing, The argument 'grad_tensors' of paddle.autograd.backward is invalid, it can be 'None', 'paddle.Tensor' or 'list[None/paddle.Tensor]'.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@chenwhql
Copy link
Contributor

image
这里文档目录不太对,不应该有这一层

Copy link
Contributor

@chenwhql chenwhql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@ForFishes ForFishes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@TCChenlong TCChenlong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@qili93
Copy link
Contributor

qili93 commented Mar 31, 2021

同意豁免PR-CI-ROCM-Compile ,代码与ROCM无关

@MingMingShangTian MingMingShangTian merged commit 83b953f into PaddlePaddle:develop Apr 1, 2021
@MingMingShangTian MingMingShangTian deleted the custom_staring_grad branch April 1, 2021 06:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants