Skip to content

Commit

Permalink
Merge branch 'develop' of github.com:PaddlePaddle/Paddle into feature…
Browse files Browse the repository at this point in the history
…/dynamic-recurrent-op
  • Loading branch information
Superjomn committed Oct 4, 2017
2 parents cf87cd6 + 15b35f9 commit f94f0a8
Show file tree
Hide file tree
Showing 130 changed files with 1,165 additions and 571 deletions.
2 changes: 1 addition & 1 deletion cmake/configure.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -49,11 +49,11 @@ if(NOT WITH_GOLANG)
endif(NOT WITH_GOLANG)

if(NOT WITH_GPU)
add_definitions(-DPADDLE_ONLY_CPU)
add_definitions(-DHPPL_STUB_FUNC)

list(APPEND CMAKE_CXX_SOURCE_FILE_EXTENSIONS cu)
else()
add_definitions(-DPADDLE_WITH_GPU)
FIND_PACKAGE(CUDA REQUIRED)

if(${CUDA_VERSION_MAJOR} VERSION_LESS 7)
Expand Down
180 changes: 180 additions & 0 deletions doc/design/refactor/session.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
# Design Doc: Session

## Abstract

The *session* object encapsulates the environment in which the
computation graph is executed.

We will have the *local* session and *remote* session, they offer the
same [interface](#interface). The local session encapsulates the local
runtime environment and the remote session encapsulates the cluster
runtime environment.

The local runtime environment contains:

1. computation devices (i.e., CPU, GPU) handles, and
1. the [scope](../scope.md) which holds all variables.

The remote runtime environment contains:

1. computation devices (i.e., CPU and GPU on node 0, 1) in a cluster,
and
1. the distributed [scope](../scope.md) in a cluster which holds all
variables.

The user can create a remote session on Paddle Cloud and evaluate the
computation graph with it. In this way, the user can control the
remote computation resource in a cluster from his local computer.


## Background

The current design has an implicit global session in which
`paddle.eval()` is executed. The pain point is:

Since the user is not able to explicitly switch between runtime
environments, the user cannot run a topology in two independent
environments.

For example, in reinforcement learning, the user may want to have a
stale model for inference and a fresh model for training, and only
replace the stale model with the fresh model periodically.

Furthermore, we have no concept that encapsulates a remote environment
that executes a computation graph.

We need the session object to address above issues.


## Session

A session is an object that owns the runtime environment. All
computations are executed through `session.eval()`.


### Interface

```python
eval(
targets,
feed_dict=None,
)
```

Evaluates the target Operations or Variables in `targets`.

- *targets*: the evaluation targets. Can be a single Operation or
Variable, or a list with the Operations or Variables as
elements. The value returned by `eval()` has the same shape as the
`target` argument.

The PaddlePaddle program is represented by
the [ProgramDesc](../design/program.md), `eval()` will infer the
ProgramDesc from the given targets and run the PaddlePaddle
program. Please
see
[this graph](./distributed_architecture.md#local-training-architecture) for
the detailed illustration for the local session
and
[this graph](./distributed_architecture.md#distributed-training-architecture) for
the detailed illustration for the remote session.

- *feed_dict*: a dictionary that contains the tensors which override
the edges of the computation graph.

feed_dict not only can provide the input data, it can override any
OP's input as well:

```python
a = pd.constant(2.0, name="a")
b = pd.variable(name="b")
c = pd.mul(a,b)
sess.eval(targets=c, feed_dict={"b":3.0}) # returns 6.0
```

```python
close()
```

Closes the session and releases the scope that the session owns.


### Create a Local Session

```python
session(
devices=None
)
```

Creates a new session. One session owns one global scope, so creating
multiple sessions will create different scopes.

- *devices*: a single `string` or a list of `string` of device names,
the corresponding devices will be the computation devices for
`eval()`. If not specified, all available devices (e.g., all GPUs)
will be used. The user doesn't need to specify the CPU device since
it will be always used. Multiple sessions can use the same device.


#### Example

```Python
a = paddle.constant(1.0)
b = paddle.constant(2.0)
c = a + b
sess = paddle.session(devices=["gpu:0", "gpu:1", "fpga:0"])
sess.eval(c)
sess.close()
```

### Create a Remote Session

```python
create_cloud_job(
name,
num_trainer,
mem_per_trainer,
gpu_per_trainer,
cpu_per_trainer,
num_ps,
mem_per_ps,
cpu_per_ps,
)
```

Creates a Paddle Cloud job. Fails if the job name exists.

```python
get_cloud_job(
name
)
```

Gets a Paddle Cloud job.

```python
remote_session(
job
)
```

- *job*: the Paddle Cloud job.

#### Example

```Python
reader = paddle.reader.recordio("/pfs/home/peter/mnist-train-*") # data stored on Paddle Cloud
image = reader.column(0)
label = reader.column(1)
fc1 = paddle.op.fc(image, size=256, act="sigmoid")
fc2 = paddle.op.fc(fc1, size=10, act="softmax")
cost = paddle.op.cross_entropy(fc2, label)
opt = paddle.optimizer.sgd(cost)

job = paddle.create_cloud_job("test", 3, "1G", 1, 1, 2, "1G", 1)
sess = paddle.remote_ession(job)
for i in range(1000):
sess.eval(opt)
sess.close()
```
29 changes: 26 additions & 3 deletions doc/design/register_grad_op.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,22 +33,45 @@ The mapping relationship between an operator and its gradient operators is a fun
```cpp
// (OpDesc) --> vector<OpDesc>
using GradOpDescMaker = std::function<std::vector<OpDesc>(const OpDesc&)>;
std::function<std::vector<OpDescBind>(const OpDescBind&)>;
```

The function take a `OpDesc` of the forward operator and return one or many gradient operator descriptions.
The function takes an `OpDescBind` of the forward operator and returns one or many gradient operator descriptions. `OpDescBind` is a C++ wrapper for protobuf message `OpDesc` to manipulate `OpDesc` fast.

The `GradOpDescMaker` will be registered in `OpInfo`, to replace `grad_op_type_` field. The `OpInfo` should be

```cpp
struct OpInfo {
GradOpDescMaker grad_op_maker_;
std::function<std::vector<std::unique_ptr<OpDescBind>>(const OpDescBind&)> grad_op_maker_;
...
};
```
The `grad_op_maker_ ` is `nullptr` if the operator does not have associated gradient operators.
We propose a base class called `GradOpDescMakerBase` to let operator developers generate `Gradient Operators` easily. The public interface of that class is
```cpp
class GradOpDescMakerBase {
public:
GradOpDescMakerBase(const OpDescBind& );
virtual std::vector<std::unique_ptr<OpDescBind>> operator()()const = 0;
};
```

We can convert `GradOpDescMakerBase` to `std::function<std::vector<std::unique_ptr<OpDescBind>>(const OpDescBind&)>` by

```cpp
using GradOpMaker = ...;
std::function<std::vector<OpDescBind>(const OpDescBind&)> func;
func = [] (const OpDescBind& fwd_op) {
GradOpMaker maker(fwd_op);
return maker();
};
```

We can write many helper functions since the `GradOpDescMakerBase` is a class now. The basic helper functions get the variables of `Input`, `Output`, `InputGradient` and `OutputGradient` in the forwarding operator.

We should chagne register macros at the same time. In the current solution, there is no difference between forwarding operators and backward operators. So `REGISTER_OP` just register one operator. If the `REGISTER_OPERATOR ` contains `OpProtoAndCheckerMaker` and `GradOpDescMaker`, we just list them in the same macro. It can be done by a macro contains `__VA_ARGS__`.

The user interface should be
Expand Down
2 changes: 1 addition & 1 deletion paddle/api/Util.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ bool isUsingGpu() { return FLAGS_use_gpu; }
void setUseGpu(bool useGpu) { FLAGS_use_gpu = useGpu; }

bool isGpuVersion() {
#ifdef PADDLE_ONLY_CPU
#ifndef PADDLE_WITH_GPU
return false;
#else
return true;
Expand Down
2 changes: 1 addition & 1 deletion paddle/capi/Matrix.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ paddle_error paddle_matrix_set_row(paddle_matrix mat,
if (rowID >= ptr->mat->getHeight()) return kPD_OUT_OF_RANGE;
paddle::real* buf = ptr->mat->getRowBuf(rowID);
size_t width = ptr->mat->getWidth();
#ifndef PADDLE_ONLY_CPU
#ifdef PADDLE_WITH_GPU
hl_memcpy(buf, rowArray, sizeof(paddle::real) * width);
#else
std::copy(rowArray, rowArray + width, buf);
Expand Down
26 changes: 22 additions & 4 deletions paddle/framework/backward.cc
Original file line number Diff line number Diff line change
Expand Up @@ -141,9 +141,26 @@ static std::unique_ptr<OperatorBase> BackwardRecursive(
net->ops_[op_offset]->Rename(name, dup_outputs.back());
}
// collect all the offset to append `add` op for each alias
insert_position.push_back(
{dup_op.back(), OpRegistry::CreateOp("add", {{"X", {dup_outputs}}},
{{"Out", {name}}}, {})});
//
// one variable is shared between multiple operators.
// insert add operator one by one, then add it to output
for (size_t output_idx = 0; output_idx < dup_outputs.size() - 1;
++output_idx) {
auto insert_add_x = dup_outputs[output_idx];
auto insert_add_y = dup_outputs[output_idx + 1];
auto insert_add_out = name + "@SHARED@" + std::to_string(output_idx);
// first add op inserted
if (output_idx == dup_outputs.size() - 2) {
insert_add_out = name;
}
if (output_idx != 0) {
insert_add_y = name + "@SHARED@" + std::to_string(output_idx - 1);
}
insert_position.push_back(
{dup_op.back(),
OpRegistry::CreateOp("sum", {{"X", {insert_add_x, insert_add_y}}},
{{"Out", {insert_add_out}}}, {})});
}
}

// make sure the inserted `add` ops follow the BFS order.
Expand Down Expand Up @@ -182,7 +199,8 @@ static std::unique_ptr<OperatorBase> BackwardRecursive(

// process recurrent gradient op as a special operator.
if (forwardOp.Type() == "recurrent") {
// NOTE clean up cycle call somewhere (RNN's stepnet constains itself), or
// NOTE clean up cycle call somewhere (RNN's stepnet constains itself),
// or
// this will result in infinite loop.
const auto& rnnop =
*static_cast<const operators::RecurrentOp*>(&forwardOp);
Expand Down
15 changes: 9 additions & 6 deletions paddle/framework/backward_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -133,15 +133,18 @@ class FillZeroOpMaker : public OpProtoAndCheckerMaker {
}
};

class AddOpMaker : public OpProtoAndCheckerMaker {
class SumOpMaker : public framework::OpProtoAndCheckerMaker {
public:
AddOpMaker(OpProto *proto, OpAttrChecker *op_checker)
SumOpMaker(framework::OpProto *proto, framework::OpAttrChecker *op_checker)
: OpProtoAndCheckerMaker(proto, op_checker) {
AddInput("X", "x").AsDuplicable();
AddOutput("Out", "out");
AddInput("X", "the input tensors of sum operator.")
.AsDuplicable()
.NotInGradient();
AddOutput("Out", "the output tensor of sum operator.").NotInGradient();
AddComment("");
}
};

} // namespace framework
} // namespace paddle

Expand All @@ -154,7 +157,7 @@ REGISTER_OP(mul, f::NOP, f::MulOpMaker, mul_grad, f::NOP);
REGISTER_OP(sigmoid, f::NOP, f::SigmoidOpMaker, sigmoid_grad, f::NOP);
REGISTER_OP_WITHOUT_GRADIENT(nograd, f::NOP, f::NoGradOpMaker);
REGISTER_OP_WITHOUT_GRADIENT(fill_zeros_like, f::NOP, f::FillZeroOpMaker);
REGISTER_OP(add, f::NOP, f::AddOpMaker, add_grad, f::NOP);
REGISTER_OP(sum, f::NOP, f::SumOpMaker, sum_grad, f::NOP);
REGISTER_OP_WITHOUT_GRADIENT(fc, f::FcOp, f::FcOpMaker);
REGISTER_OP(many_output_op, f::NOP, f::ManyOutputOpMaker, many_output_op_grad,
f::NOP);
Expand Down Expand Up @@ -283,7 +286,7 @@ TEST(Backward, net_shared_weight) {
ASSERT_TRUE(bwd->IsNetOp());
auto bwd_net = static_cast<ops::NetOp *>(bwd.get());
ASSERT_EQ(3UL, bwd_net->ops_.size());
ASSERT_EQ("add", bwd_net->ops_[2]->Type());
ASSERT_EQ("sum", bwd_net->ops_[2]->Type());
}

TEST(Backward, op_register_grad_not_for_network) {
Expand Down
6 changes: 3 additions & 3 deletions paddle/framework/block_desc.h
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ limitations under the License. */
#include <vector>
#include "paddle/framework/op_desc.h"
#include "paddle/framework/var_desc.h"
#include "paddle/platform/macros.h"

namespace paddle {
namespace framework {
Expand All @@ -34,9 +35,6 @@ class BlockDescBind {
BlockDescBind(ProgramDescBind *prog, BlockDesc *desc)
: prog_(prog), desc_(desc), need_update_(false) {}

BlockDescBind(const BlockDescBind &o) = delete;
BlockDescBind &operator=(const BlockDescBind &o) = delete;

int32_t ID() const { return desc_->idx(); }

int32_t Parent() const { return desc_->parent_idx(); }
Expand Down Expand Up @@ -66,6 +64,8 @@ class BlockDescBind {

std::deque<std::unique_ptr<OpDescBind>> ops_;
std::unordered_map<std::string, std::unique_ptr<VarDescBind>> vars_;

DISABLE_COPY_AND_ASSIGN(BlockDescBind);
};
} // namespace framework
} // namespace paddle
6 changes: 5 additions & 1 deletion paddle/framework/details/op_registry.h
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@

#pragma once

#include "paddle/framework/grad_op_desc_maker.h"
#include "paddle/framework/op_info.h"
#include "paddle/framework/op_proto_maker.h"
#include "paddle/framework/operator.h"
Expand Down Expand Up @@ -96,7 +97,10 @@ struct OpInfoFiller<T, kOpProtoAndCheckerMaker> {
template <typename T>
struct OpInfoFiller<T, kGradOpDescMaker> {
void operator()(const char* op_type, OpInfo* info) const {
info->grad_op_maker_ = new T();
info->grad_op_maker_ = [](const OpDescBind& fwd_op) {
T maker(fwd_op);
return maker();
};
}
};
} // namespace details
Expand Down
Loading

0 comments on commit f94f0a8

Please sign in to comment.