Walkthrough: MNIST

Walkthrough: MNIST using Minerva

Contents:

About MNIST

If you are not familiar with MNIST dataset, please see here. Download the MNIST dataset mnist_all.mat.

Classify MNIST using 3 layer perceptron

Model

The network used here consists of

One input layer of size 784
One hidden layer of size 256; use RELU non-linearity
One classifier layer of size 10; use Softmax loss function

Algorithm step by step

Suppose the minibatch size is 256. For each minibatch, we have already converted them into two matrices: data and label. They are of size 784x256 and 10x256, respectively. Then,

Initialization: Weight and bias matrices are initialized as follows:

w1 = owl.randn([256, 784], 0.0, 0.01)
w2 = owl.randn([10, 256], 0.0, 0.01)
b1 = owl.zeros([256, 1])
b2 = owl.zeros([10, 1])

Feed-forward Propagation:

a1 = owl.elewise.relu(w1 * data + b1)  # hidden layer
a2 = owl.conv.softmax(w2 * a1 + b2)  # classifier layer

Backward Propagation:

s2 = a2 - label                                 # classifier layer
s1 = owl.elewise.relu_back(w2.trans() * s2, a1) # hidden layer
gw2 = s2 * a2.trans()                           # gradient of w2
gw1 = s1 * data.trans()                         # gradient of w1
gb2 = s2.sum(1)                                 # gradient of b2
gb1 = s1.sum(1)                                 # gradient of b1

Update:

w1 -= lr * gw1
w2 -= lr * gw2
b1 -= lr * gb1
b2 -= lr * gb2

Putting them together

import owl
import owl.conv as co
import owl.elewise as ele
import mnist_io, sys
# initial system
owl.initialize(sys.argv)
gpu = owl.create_gpu_device(0)
owl.set_device(gpu)
# training parameters and weights
lr = 0.01 / 256
w1 = owl.randn([256, 784], 0.0, 0.01)
w2 = owl.randn([10, 256], 0.0, 0.01)
b1 = owl.zeros([256, 1])
b2 = owl.zeros([10, 1])
(train_set, test_set) = mnist_io.load_mb_from_mat("mnist_all", 256)
# training
count = 1
for epoch in range(10):
  for (data_np, label_np) in train_set:
    count += 1
    data = owl.from_numpy(data_np)
    label = owl.from_numpy(label_np)
    # ff
    a1 = ele.relu(w1 * data + b1)  # hidden layer
    a2 = co.softmax(w2 * a1 + b2)    # classifier layer
    # bp
    s2 = a2 - label                                 # classifier layer
    s1 = ele.relu_back(w2.trans() * s2, a1) # hidden layer
    gw2 = s2 * a1.trans()                           # gradient of w2
    gw1 = s1 * data.trans()                         # gradient of w1
    gb2 = s2.sum(1)                                 # gradient of b2
    gb1 = s1.sum(1)                                 # gradient of b1
    # update
    w1 -= lr * gw1
    w2 -= lr * gw2
    b1 -= lr * gb1
    b2 -= lr * gb2
    # print accuracy
    if count % 20 == 0:
      pred = a2.argmax(0)
      truth = label.argmax(0)
      print "Accuracy: ", float((pred - truth).count_zero()) / 256
owl.wait_for_all()

To run the above code
1. Copy and save it to /path/to/minerva/owl/apps/mnist as for example simple_mnist.py.
2. Download mnist_all.mat into the save folder.
3. python simple_mnist.py.
We've provided a function load_mb_from_mat to load minibatch in numpy.ndarray from .mat file in mnist_io module
To convert from numpy array to owl.NArray. You could use owl.from_numpy function.
- ATTENTION: Since Minerva uses fortran-style array (or column major array) while numpy uses C-style array (row major), when coverting numpy.ndarray to owl.NArray, the dimension will be reversed. Please ready the document about this function here.
Since Minerva uses lazy evaluation, most owl APIs are asynchronous. In the above example, if without the last owl.wait_for_all() call, the main thread will exit while the worker threads of Minerva are still computing in the backend. This will lead to fault and errors. To avoid this, add a blocking call at the end of the program. For more information about blocking call and non-blocking call, please see this wiki page.

Classify MNIST using Convolution Neural Network

Model

One input layer of size 28x28
One convolution layer:
- kernel: 5x5
- stride: 1x1
- num_filters: 16
One pooling layer:
- window: 2x2
- stride: 2x2
One convolution layer:
- kernel: 5x5
- stride: 1x1
- num_filters: 32
- padding: 2x2
One pooling layer:
- window: 3x3
- stride: 3x3
One classifier layer (softmax loss) of size 10

Convolution ndarray format

Weight format for convolution: [kernel_width, kernel_height, in_channel, out_channel]
Bias format for convolution: [num_channels]
- ATTENTION: different from fully connected layer, see example below.
Data format for convolution: [image_width, image_height, num_channels, batch_size]

Algorithm step by step

Suppose the minibatch size is 256. For each minibatch, we have already converted them into two ndarrays: data and label. They are of size 28x28x1x256 and 10x256, respectively. Then,

Initialization:

w1 = owl.randn([5, 5, 1, 16], 0, 0.01)
w2 = owl.randn([5, 5, 16, 32], 0, 0.01)
w3 = owl.randn([10, 512], 0, 0.01)
b1 = owl.randn([16])
b2 = owl.randn([32])     # bias for convolution
b3 = owl.randn([10, 1])  # bias for fully connection
conv1 = owl.conv.Convolver(pad_h=0, pad_w=0, stride_v=1, stride_h=1)
conv2 = owl.conv.Convolver(pad_h=2, pad_w=2, stride_v=1, stride_h=1)
pool1 = owl.conv.Pooler(h=2, w=2, stride_v=2, stride_h=2)
pool2 = owl.conv.Pooler(h=3, w=3, stride_v=3, stride_h=3)

owl.conv.Convoler and owl.conv.Pooler are two classes provided in owl.conv module

Feed-forward Propagation:

a1 = owl.elewise.relu(conv1.ff(data, w1, b1))
a2 = pool1.ff(a1)
a3 = owl.elewise.relu(conv2.ff(a2, w2, b2))
a4 = pool2.ff(a3)
a5 = owl.conv.softmax(w3 * a4.reshape([512, 256]) + b3)

Backward Propagation:

 s5 = a5 - label
 s4 = (w3.trans() * s5).reshape(a4.shape)
 s3 = owl.elewise.relu_back(pool2.bp(s4, a4, a3), a3)
 s2 = conv2.bp(s3, a2, w2)
 s1 = owl.elewise.relu_back(pool1.bp(s2, a2, a1), a1)
 # gradient
 gw3 = s5 * a4.reshape([512, 256]).trans()
 gb3 = s5.sum(1)
 gw2 = conv2.weight_grad(s3, a2, w2)
 gb2 = conv2.bias_grad(s3)
 gw1 = conv1.weight_grad(s1, data, w1)
 gb1 = conv1.bias_grad(s1)

Update: The same as in MLP example.

Run MNIST example

We have written above examples, so you could directly run them both in C++/Python interfaces.

C++ Example

Configure with BUILD_CXX_APPS=1 in configure.in.
Build minerva.
Change to /path/to/minerva/release/apps, you should see three executables: mnist_mlp, mnist_cnn, mnist_cnn_2gpu.
Download the pre-processed data here. And extract it to the same folder.
Run ./mnist_mlp or other executables.
You could also pass --help flag when running these applications to get help.

Python Example

Build and install minerva and owl as in Install Minerva
Change to /path/to/minerva/owl/apps/mnist, you should see three scripts: mnist_mlp.py, mnist_cnn.py
Download the mnist_all.mat to that directory.
Run python mnist_mlp.py or other scripts.
Pass --help for more information.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Walkthrough: MNIST

Walkthrough: MNIST using Minerva

About MNIST

Classify MNIST using 3 layer perceptron

Model

Algorithm step by step

Putting them together

Classify MNIST using Convolution Neural Network

Model

Convolution ndarray format

Algorithm step by step

Run MNIST example

C++ Example

Python Example

Clone this wiki locally