Skip to content

Commit

Permalink
corrected rmsprop documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
alicanb authored and ZengpanFan committed Aug 26, 2016
1 parent f868320 commit 5429980
Showing 1 changed file with 3 additions and 10 deletions.
13 changes: 3 additions & 10 deletions docs/tutorial/solver.md
Original file line number Diff line number Diff line change
Expand Up @@ -209,18 +209,11 @@ What distinguishes the method from SGD is the weight setting $$ W $$ on which we
The **RMSprop** (`type: "RMSProp"`), suggested by Tieleman in a Coursera course lecture, is a gradient-based optimization method (like SGD). The update formulas are

$$
(v_t)_i =
\begin{cases}
(v_{t-1})_i + \delta, &(\nabla L(W_t))_i(\nabla L(W_{t-1}))_i > 0\\
(v_{t-1})_i \cdot (1-\delta), & \text{else}
\end{cases}
\operatorname{MS}((W_t)_i)= \delta\operatorname{MS}((W_{t-1})_i)+ (1-\delta)(\nabla L(W_t))_i^2 \\
(W_{t+1})_i= (W_{t})_i -\alpha\frac{(\nabla L(W_t))_i}{\sqrt{\operatorname{MS}((W_t)_i)}}
$$

$$
(W_{t+1})_i =(W_t)_i - \alpha (v_t)_i,
$$

If the gradient updates results in oscillations the gradient is reduced by times $$1-\delta$$. Otherwise it will be increased by $$\delta$$. The default value of $$\delta$$ (`rms_decay`) is set to $$\delta = 0.02$$.
The default value of $$\delta$$ (`rms_decay`) is set to $$\delta=0.99$$.

[1] T. Tieleman, and G. Hinton.
[RMSProp: Divide the gradient by a running average of its recent magnitude](http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf).
Expand Down

0 comments on commit 5429980

Please sign in to comment.