add custom init grad for backward function #31540

MingMingShangTian · 2021-03-11T02:07:28Z

PR types

New features

PR changes

APIs

Describe

When computing the tensor's backward, the initial grad_tensor is dafault set as paddle.ones. This PR add new kwargs grad_tensor for user to self define the staring grad. If the grad_tensor not set, use the default value which is paddle.ones.

Doc preview

paddle-bot-old · 2021-03-11T02:07:45Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

chenwhql · 2021-03-11T05:56:04Z

python/paddle/fluid/dygraph/varbase_patch_methods.py

@@ -133,7 +133,7 @@ def set_value(self, value):
                                      framework._current_expected_place())

    @framework.dygraph_only
-    def backward(self, retain_graph=False):
+    def backward(self, retain_graph=False, grad_tensor=None):


这个grad_tensor放到retain_graph前面是不是更合理一些，因为相对来说使用评率会更高，放到前面可能会引入一些兼容性风险，但长期更合理，可以看看目前框架内有多少使用retain_graph的测试

chenwhql · 2021-03-11T05:57:11Z

python/paddle/fluid/dygraph/varbase_patch_methods.py

@@ -147,6 +147,10 @@ def backward(self, retain_graph=False):
                :code:`retain_graph` to True, then the grads will be retained. Thus, seting it to False is much more memory-efficient.
                Defaults to False.

+            grad_tensor(Tensor, optional): initial gradient values of `outputs` . If `grad_tensor` is None, 


因为这个目前是的Tensor的API，所以这里initial gradient values of outputs改为initial gradient values of current Tensor是不是更容易理解

chenwhql · 2021-03-11T05:57:20Z

python/paddle/fluid/dygraph/varbase_patch_methods.py

@@ -147,6 +147,10 @@ def backward(self, retain_graph=False):
                :code:`retain_graph` to True, then the grads will be retained. Thus, seting it to False is much more memory-efficient.
                Defaults to False.

+            grad_tensor(Tensor, optional): initial gradient values of `outputs` . If `grad_tensor` is None, 
+            the initial gradient values of `outputs` would be Tensor filled with 1; 


chenwhql · 2021-03-11T05:57:25Z

python/paddle/fluid/dygraph/varbase_patch_methods.py

@@ -147,6 +147,10 @@ def backward(self, retain_graph=False):
                :code:`retain_graph` to True, then the grads will be retained. Thus, seting it to False is much more memory-efficient.
                Defaults to False.

+            grad_tensor(Tensor, optional): initial gradient values of `outputs` . If `grad_tensor` is None, 
+            the initial gradient values of `outputs` would be Tensor filled with 1; 
+            if `grad_tensor` is not None, it must have the same length as `outputs`.


chenwhql · 2021-03-11T05:58:30Z

python/paddle/fluid/dygraph/varbase_patch_methods.py

+                grad_tensor=paddle.to_tensor(2.)
+                for i in range(5):
+                    y = paddle.pow(x, 4.0)
+                    y.backward(grad_tensor=grad_tensor)


如果改到前面，这里的参数名可以省略，写法更简洁

chenwhql · 2021-03-11T05:59:46Z

python/paddle/fluid/dygraph/varbase_patch_methods.py

@@ -176,7 +191,12 @@ def backward(self, retain_graph=False):
                scaled_loss._run_backward(framework._dygraph_tracer(),
                                          retain_graph)
            else:
-                self._run_backward(framework._dygraph_tracer(), retain_graph)
+                if grad_tensor is not None:
+                    assert grad_tensor.shape == self.shape, "Variable Shape not match, Variable of grad_tensor [ {} ] with shape {} mismatch Variable [ {} ] with shape {}".format(


Variable -> Tensor，not match -> does not match，语句再组织一下，语法不太通顺，Variable统一改为使用Tensor

zhwesky2010 · 2021-03-11T09:39:03Z

paddle/fluid/imperative/basic_engine.cc

@@ -36,7 +36,7 @@ DECLARE_bool(sort_sum_gradient);
 namespace paddle {
 namespace imperative {

-void BasicEngine::Init(VarBase* var, bool retain_graph) {
+void BasicEngine::Init(VarBase* var, bool retain_graph, VarBase* grad_tensor) {


可以把 grad_tensor 设置为默认从参数nullptr

声明处默认参数为nullptr

zhwesky2010 · 2021-03-11T09:41:52Z

paddle/fluid/pybind/imperative.cc

@@ -920,11 +920,11 @@ void BindImperative(py::module *m_ptr) {
       )DOC")
      .def("_run_backward",
           [](imperative::VarBase &self, const imperative::Tracer &tracer,
-              bool retain_graph) {
+              bool retain_graph, imperative::VarBase &grad_tensor) {


这里需要处理下默认参数，后面加下py::arg("grad_tensor") = nullptr

zhwesky2010 · 2021-03-11T09:44:17Z

paddle/fluid/imperative/basic_engine.cc

-  grad_var->Resize(fwd_var.dims());
-  grad_var->mutable_data(fwd_var.place(), fwd_var.type());
-  operators::math::set_constant(*dev_ctx, grad_var, 1.0);
+  paddle::framework::TensorCopy(grad_tensor->Var().Get<framework::LoDTensor>(),


grad_tensor=nullptr时，就设为1

zhwesky2010 · 2021-03-11T09:44:32Z

python/paddle/fluid/dygraph/varbase_patch_methods.py

@@ -176,7 +191,17 @@ def backward(self, retain_graph=False):
                scaled_loss._run_backward(framework._dygraph_tracer(),
                                          retain_graph)
            else:
-                self._run_backward(framework._dygraph_tracer(), retain_graph)
+                if grad_tensor is None:
+                    grad_tensor = paddle.ones_like(self)


为None的_run_backward传两个参数，不为None的话_run_backward传三个参数。这样兼容升级就完全不影响以前的模型

这里改为std::shared_ptr 之后，None类型会转为nullptr，传3个参数可统一调用

TCChenlong

LGTM

chenwhql · 2021-03-16T04:59:59Z

python/paddle/fluid/dygraph/varbase_patch_methods.py

+            if grad_tensor is not None:
+                assert isinstance(
+                    grad_tensor, core.
+                    VarBase), "The type of grad_tensot must be paddle.VarBase"


paddle.VarBase -> paddle.Tensor?

chenwhql · 2021-03-16T05:00:54Z

python/paddle/fluid/dygraph/varbase_patch_methods.py

+                assert isinstance(
+                    grad_tensor, core.
+                    VarBase), "The type of grad_tensot must be paddle.VarBase"
+                assert grad_tensor.shape == self.shape, "Variable shape not match, Variable of grad_tensor [ {} ] with shape {} mismatch Variable [ {} ] with shape {}".format(


Variable shape -> Tensor shape, Variable的描述统一使用Tensor

ForFishes · 2021-03-16T06:43:51Z

paddle/fluid/imperative/basic_engine.cc

+    grad_var->mutable_data(fwd_var.place(), fwd_var.type());
+    operators::math::set_constant(*dev_ctx, grad_var, 1.0);
+  } else {
+    paddle::framework::TensorCopy(


这里是否需要check，grad_tensor的维度和var的维度是否一致呢？

ForFishes · 2021-03-16T06:55:35Z

python/paddle/fluid/dygraph/varbase_patch_methods.py

@@ -133,7 +133,7 @@ def set_value(self, value):
                                      framework._current_expected_place())

    @framework.dygraph_only
-    def backward(self, retain_graph=False):
+    def backward(self, grad_tensor=None, retain_graph=False):


这里只能处理一个tensor吧。如果要处理多个grad tensor呢？

这里处理多个grad tensor是刚需吗

可以循环处理吗？单独的backward接口使用很低频

chenwhql

报错提示建议再完善一下，建议首字母大写，结尾加句点，避免用must be，should be这种语气，直接告诉用户哪里错了，然后可以怎么改

chenwhql · 2021-03-26T11:19:01Z

paddle/fluid/imperative/basic_engine.cc

+  PADDLE_ENFORCE_EQ(
+      tensors.size(), grad_tensors.size(),
+      platform::errors::Unavailable(
+          "the size of tensors must equal the size of grad_tensors, but"


Recommend to capitalize the first letter, the -> The

chenwhql · 2021-03-26T11:23:37Z

paddle/fluid/imperative/basic_engine.cc

+
+    PADDLE_ENFORCE_EQ(
+        var->HasGradVar(), true,
+        platform::errors::NotFound("Grad variable not exist for variable %s",


The message is a bit strange, maybe can tell users Tensor %s has no gradient directly

chenwhql · 2021-03-26T11:25:32Z

paddle/fluid/imperative/basic_engine.cc

+    auto var = tensors[i];
+    auto grad_tensor = grad_tensors[i];
+
+    auto init_node_ = var->GradVarBase()->GradNode();


temp var doesn't need _, use init_node as name directly

chenwhql · 2021-03-26T11:27:35Z

paddle/fluid/imperative/basic_engine.h

@@ -74,6 +76,7 @@ class BasicEngine : public Engine {
  std::vector<GradientAccumulator*> leaf_accumulators_;

  bool retain_graph_;
+  bool create_graph_;


Where use the create_graph_?

Not used, removed.

chenwhql · 2021-03-26T11:31:41Z

paddle/fluid/pybind/imperative.cc

@@ -1412,6 +1418,19 @@ void BindImperative(py::module *m_ptr) {
      },
      py::call_guard<py::gil_scoped_release>());

+  m.def(
+      "dygraph_run_backward",


this method no need show to users, use _ in the begining of method name, maybe still use _run_backward is better

dygraph_run_backward -> _run_backward, 不带下划线开头的是公开API，这个API不需要开放给用户

chenwhql · 2021-03-26T11:32:12Z

paddle/fluid/pybind/imperative.cc

@@ -919,12 +920,17 @@ void BindImperative(py::module *m_ptr) {
              print(x.grad)          # None
       )DOC")
      .def("_run_backward",
-           [](imperative::VarBase &self, const imperative::Tracer &tracer,
-              bool retain_graph) {
+           [](std::shared_ptr<imperative::VarBase> &self,


remove this method, call core._run_backward directly in python

removed this method, but core._run_backward is private which can not be find in python. use core.dygraph_run_backward instead.

chenwhql · 2021-03-26T11:32:44Z

python/paddle/autograd/__init__.py

+from . import backward_mode
+from .backward_mode import backward
+
+__all__ = ['grad']


also need backward in all here

the next line __all__ += backward_mode.__all__ add the backward

chenwhql · 2021-03-26T11:37:24Z

python/paddle/autograd/backward_mode.py

+    tensors = check_tensors(tensors, "tensors")
+
+    assert len(tensors) == len(set(
+        tensors)), "the arg tensors should not contains same element"


use complete word, The argument 'tensors' of paddle.autograd.backward contains duplicate paddle.Tensor object.

chenwhql · 2021-03-26T11:40:41Z

python/paddle/autograd/backward_mode.py

+            if each_tensor is not None:
+                assert isinstance(
+                    each_tensor, paddle.Tensor
+                ), "grad_tensors must be None, Tensor or list containing None or Tensor"


Confusing, The argument 'grad_tensors' of paddle.autograd.backward is invalid, it can be 'None', 'paddle.Tensor' or 'list[None/paddle.Tensor]'.

chenwhql · 2021-03-26T11:43:56Z

这里文档目录不太对，不应该有这一层

chenwhql

LGTM

ForFishes

LGTM

TCChenlong

LGTM

qili93 · 2021-03-31T12:27:16Z

同意豁免PR-CI-ROCM-Compile ，代码与ROCM无关

add custom init grad for backward function

d0915f8

chenwhql reviewed Mar 11, 2021

View reviewed changes

add custom init grad for backward function

0bccce6

zhwesky2010 reviewed Mar 11, 2021

View reviewed changes

MingMingShangTian added 6 commits March 12, 2021 07:29

handle when the grad_tensor is none

5dac8e9

handle when the grad_tensor is none

ef4c7b9

fix the args type error on windows platform

33b0416

modify the args order and doc

837e26b

format code

1901970

add grad_tensor to xpu

55e0cfb

TCChenlong previously approved these changes Mar 16, 2021

View reviewed changes

chenwhql reviewed Mar 16, 2021

View reviewed changes

modify the grad_tensor type check

8271dc0

MingMingShangTian dismissed TCChenlong’s stale review via 8271dc0 March 16, 2021 06:17

ForFishes reviewed Mar 16, 2021

View reviewed changes

MingMingShangTian added 12 commits March 18, 2021 02:38

add paddle.backward api to support multi tensors gradient compute

5af3bd0

add paddle.backward api to support multi tensors gradient compute

1467feb

add paddle.atuograd module and backward api

eb267fa

Merge branch 'develop' into custom_staring_grad

b80f449

change tensor.backward func args

2bb8f3c

modify tensor backward api

41b375f

remove create_graph intputs args

6974e5c

add doc and examplex code for backward api

1e3e975

when have the same tensor, throw error

c7de011

modify test Init func args

2f2824c

modify the execute.Init func args in test files

8415df4

add paddle.autograd package in setup.py.in

be065e4

chenwhql reviewed Mar 26, 2021

View reviewed changes

MingMingShangTian added 2 commits March 29, 2021 03:57

modify error msg, remove _run_backward method in class Tensor

7f8e58c

add test cases for backward api

0374c0b

chenwhql approved these changes Mar 30, 2021

View reviewed changes

ForFishes approved these changes Mar 30, 2021

View reviewed changes

TCChenlong approved these changes Mar 31, 2021

View reviewed changes

lanxianghit approved these changes Mar 31, 2021

View reviewed changes

MingMingShangTian merged commit 83b953f into PaddlePaddle:develop Apr 1, 2021

MingMingShangTian deleted the custom_staring_grad branch April 1, 2021 06:57

add custom init grad for backward function #31540

add custom init grad for backward function #31540

Conversation

MingMingShangTian commented Mar 11, 2021 • edited Loading

PR types

PR changes

Describe

paddle-bot-old bot commented Mar 11, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MingMingShangTian Mar 15, 2021 • edited Loading

Choose a reason for hiding this comment

TCChenlong left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chenwhql left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chenwhql Mar 26, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chenwhql Mar 26, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chenwhql commented Mar 26, 2021

chenwhql left a comment

Choose a reason for hiding this comment

ForFishes left a comment

Choose a reason for hiding this comment

TCChenlong left a comment

Choose a reason for hiding this comment

qili93 commented Mar 31, 2021

MingMingShangTian commented Mar 11, 2021 •

edited

Loading

MingMingShangTian Mar 15, 2021 •

edited

Loading

chenwhql Mar 26, 2021 •

edited

Loading

chenwhql Mar 26, 2021 •

edited

Loading