Skip to content

Commit

Permalink
Merge pull request #65 from acfr/feature/lbdn-examples
Browse files Browse the repository at this point in the history
Feature/lbdn examples
  • Loading branch information
nic-barbara authored May 15, 2023
2 parents 39b96ab + 94faa74 commit 54f9dd9
Show file tree
Hide file tree
Showing 24 changed files with 1,251 additions and 201 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
**Manifest.toml**
docs/build/
**/.vscode/
2 changes: 2 additions & 0 deletions docs/Project.toml
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
[deps]
BSON = "fbb218c0-5317-5bc6-957e-2ee96dd4b1f0"
CairoMakie = "13f3f980-e62b-5c42-98c6-ff1f3baf88f0"
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
Flux = "587475ba-b771-5e3f-ad9e-33799f191a9c"
LiveServer = "16fef848-5104-11e9-1b77-fb7a48bbb589"
Expand Down
5 changes: 3 additions & 2 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,11 @@ makedocs(
"Contributing to the Package" => "introduction/developing.md",
],
"Examples" => Any[
"Image Classification" => "examples/lbdn.md",
"Fitting a Curve" => "examples/lbdn_curvefit.md",
"Image Classification" => "examples/lbdn_mnist.md",
"Reinforcement Learning" => "examples/rl.md",
"Nonlinear Control" => "examples/nonlinear_ctrl.md",
"PDE Observer" => "examples/pde_obsv.md",
"Nonlinear Control" => "examples/ren_ctrl.md",
],
"Library" => Any[
"Model Wrappers" => "lib/models.md",
Expand Down
Binary file added docs/src/assets/lbdn-mnist/lbdn_mnist.bson
Binary file not shown.
Binary file added docs/src/assets/lbdn-mnist/mnist_data.bson
Binary file not shown.
3 changes: 0 additions & 3 deletions docs/src/examples/lbdn.md

This file was deleted.

129 changes: 129 additions & 0 deletions docs/src/examples/lbdn_curvefit.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
# Fitting a Curve with LBDN

For our first example, let's fit a Lipschitz-bounded Deep Network (LBDN) to a curve in one dimension. Consider the multiple sine-wave function below as an example.
```math
f(x) = \sin(x) + \frac{1}{N}\sin(Nx)
```
Our aim is to demonstrate how to train a model in `RobustNeuralNetworks.jl`, and how to set constraints to ensure the model naturally satisfies some user-defined robustness certificate. We'll follow the steps below to fit an LBDN model to our function ``f(x)``:
1. Generate training data
2. Define a model with a Lipshitz bound (maximum slope) of `1.0`
3. Define a loss function
4. Train the model to minimise the loss function
5. Examine the trained model

## 1. Generate training data

Let's generate training data for ``f(x)`` on the interval ``[0, 2\pi]`` and choose ``N = 5`` as an example. We `zip()` the data up into a sequence of tuples `(x,y)` to make training with `Flux.jl` easier in Step 4.

```@example curve_fit
# Function to estimate
N = 5
f(x) = sin(x)+(1/N)*sin(N*x)
# Training data
dx = 0.1
xs = 0:dx:2π
ys = f.(xs)
data = zip(xs,ys)
```

## 2. Define a model

Since we are only dealing with a simple one-dimensional curve, we can afford to use a small model. Let's choose an LBDN with four hidden layers, each with 15 neurons, and a Lipschitz bound of `γ = 1.0`. This means that the maximum slope the model can achieve between two points will be exactly `1.0` by construction.

```@example curve_fit
using Random
using RobustNeuralNetworks
# Random seed for consistency
rng = MersenneTwister(42)
# Model specification
nu = 1 # Number of inputs
ny = 1 # Number of outputs
nh = fill(15,4) # 4 hidden layers, each with 15 neurons
γ = 1 # Lipschitz bound of 1
# Set up model: define parameters, then create model
model_ps = DenseLBDNParams{Float64}(nu, nh, ny, γ; rng=rng)
model = DiffLBDN(model_ps)
```

Notice that we first constructed the model parameters `model_ps` defining a (dense) LBDN with [`DenseLBDNParams`](@ref) and then created a callable `model` with the [`DiffLBDN`](@ref) wrapper. In `RobustNeuralNetworks.jl`, model parameterisations are separated from the "explicit" definition of the model used for evaluation on data. The [`DiffLBDN`](@ref) model wrapper combines the two together in a model structure more familiar to [`Flux.jl`](https://fluxml.ai/) users for convenience. See the [Package Overview](@ref) for more information.

## 3. Define a loss function

Let's stick to a simple loss function based on the mean-squared error (MSE) for this example. All [`AbstractLBDN`](@ref) models take an `AbstractArray` as their input, which is why `x` and `y` are wrapped in vectors.
```@example curve_fit
# Loss function
loss(model,x,y) = Flux.mse(model([x]),[y])
```

## 4. Train the model

Our objective is to minimise the MSE loss function with a model that has a Lipschitz bound no greater than `1.0`. Let's set up a callback function to check the fit error and slope of our model at each training epoch.

```@example curve_fit
using Flux
# Check fit error/slope during training
mse(model, xs, ys) = sum(loss.((model,), xs, ys)) / length(xs)
lip(model, xs, dx) = maximum(abs.(diff(model(xs'), dims=2)))/dx
# Callback function to show results while training
function progress(model, iter, xs, ys, dx)
fit_error = round(mse(model, xs, ys), digits=4)
slope = round(lip(model, xs, dx), digits=4)
@show iter fit_error slope
println()
end
```

We'll train the model for 200 training epochs a learning rate of `lr = 2e-4`. We'll also use the [`Adam`](https://fluxml.ai/Flux.jl/stable/training/optimisers/#Flux.Optimise.Adam) optimiser from `Flux.jl` and the default [`Flux.train!`](https://fluxml.ai/Flux.jl/stable/training/reference/#Flux.Optimise.train!-NTuple{4,%20Any}) method.

```@example curve_fit
# Define hyperparameters and optimiser
num_epochs = 200
lr = 2e-4
opt_state = Flux.setup(Adam(lr), model)
# Train the model
for i in 1:num_epochs
Flux.train!(loss, model, data, opt_state)
(i % 100 == 0) && progress(model, i, xs, ys, dx)
end
```

Note that this training loop is for demonstration only. For a better fit, and indeed on more complex problems, we strongly recommend:
- Increasing the number of training epochs
- Defining your own [training loop](https://fluxml.ai/Flux.jl/stable/training/training/)
- Using [ParameterSchedulers.jl](https://github.com/FluxML/ParameterSchedulers.jl) to vary the learning rate.

## 5. Examine the trained model

We can now plot the results to see what our model looks like.

```@example curve_fit
using CairoMakie
# Create a figure
f1 = Figure(resolution = (600, 400))
ax = Axis(f1[1,1], xlabel="x", ylabel="y")
ŷ = map(x -> model([x])[1], xs)
lines!(xs, ys, label = "Data")
lines!(xs, ŷ, label = "LBDN")
axislegend(ax)
save("lbdn_curve_fit.svg", f1)
```
![](lbdn_curve_fit.svg)

The model roughly approximates the multiple sine-wave ``f(x)``, but maintains a maximum Lipschitz constant (slope on the graph) below 1.

```@example curve_fit
# Estimate Lipschitz lower-bound
lip(model, xs, dx) = maximum(abs.(diff(model(xs'), dims=2)))/dx
println("Empirical lower Lipschitz bound: ", round(lip(model, xs, dx); digits=2))
```

The benefit of using an LBDN is that we have full control over the Lipschitz bound, and can still use standard unconstrained gradient descent tools lile `Flux.train!` to train our models. For examples in which setting the Lipschitz bound improves model performance and robustness, see [Image Classification with LBDN](@ref) and [Reinforcement Learning with LBDN](@ref).
193 changes: 193 additions & 0 deletions docs/src/examples/lbdn_mnist.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@
# Image Classification with LBDN

Our next example features an LBDN trained to classify the [MNIST](https://en.wikipedia.org/wiki/MNIST_database) dataset. We showed in [Wang & Manchester (2023)](https://doi.org/10.48550/arXiv.2301.11526) that tuning the built-in Lipschitz bounds of LBDNs is an efficient way of designing neural networks that are robust to adversarial attacks. In this example, we will demonstrate how to train an LBDN model on the MNIST dataset with the following steps:
1. Load the training and test data
2. Define a Lipschitz-bounded model
3. Define a loss function
4. Train the model to minimise the loss function
5. Evaluate the trained model

## 1. Load the data

Let's start by loading the training and test data. [`MLDatasets.jl`](https://juliaml.github.io/MLDatasets.jl/stable/) contains a number of common machine-learning datasets, including the [MNIST dataset](https://juliaml.github.io/MLDatasets.jl/stable/datasets/vision/#MLDatasets.MNIST). To load the full dataset of 60,000 training images and 10,000 test images, one would run the following code.

```julia
using MLDatasets: MNIST

# Get MNIST training and test data
T = Float64
x_train, y_train = MNIST(T, split=:train)[:]
x_test, y_test = MNIST(T, split=:test)[:]
```

For the purposes of this example, we'll only load a small subset of the dataset.
```@example mnist
using BSON
data = BSON.load("../../src/assets/lbdn-mnist/mnist_data.bson")
x_train, y_train = data["x_train"], data["y_train"]
x_test, y_test = data["x_test"], data["y_test"]
println("Train/test features: ", size(x_train), " / ", size(x_test))
println("Train/test labels: ", size(y_train), " / ", size(y_test))
```

The feature matrices `x_train` and `x_test` are three-dimensional arrays where each 28x28 layer contains pixel data for a single handwritten number from 0 to 9. The labels `y_train` and `y_test` are vectors containing the classification of each image as a number from 0 to 9. We can convert each of these to an input/output format better suited to training with [`Flux.jl`](https://fluxml.ai/).

```@example mnist
using Flux
using Flux: OneHotMatrix
# Reshape features for model input
x_train = Flux.flatten(x_train)
x_test = Flux.flatten(x_test)
# Encode categorical variables on output
y_train = Flux.onehotbatch(y_train, 0:9)
y_test = Flux.onehotbatch(y_test, 0:9)
println("Features: ", size(x_test))
println("Labels: ", size(y_test))
```

Features are now stored in a `Matrix` where each column contains pixel data from a single image, and the labels have been converted to a `OneHotMatrix` where each column contains a 1 in the row corresponding to the image's classification (eg: row 2 for an image showing the number 3).


## 2. Define a model

We can now construct an LBDN model to train on the MNIST dataset. In our [paper](https://doi.org/10.48550/arXiv.2301.11526) we use LBDN models with three hidden layers of (256, 356, 128) neurons (respectively) to achieve a classification accuracy of approximately 99% on the full MNIST dataset. For this example, we'll consider a smaller network and set a Lipschitz bound of `γ = 5.0` to demonstrate the method.

```@example mnist
using Random
using RobustNeuralNetworks
# Random seed for consistency
rng = MersenneTwister(42)
# Model specification
nu = 28*28 # Number of inputs (size of image)
ny = 10 # Number of outputs (possible classifications)
nh = fill(64,2) # 2 hidden layers, each with 64 neurons
γ = 5 # Lipschitz bound of 5.0
# Set up model: define parameters, then create model
model_ps = DenseLBDNParams{Float64}(nu, nh, ny, γ; rng=rng)
model = Chain(DiffLBDN(model_ps), Flux.softmax)
println(typeof(model))
```

The `model` contains a callable [`DiffLBDN`](@ref) model constructed from its direct parameterisation, which is defined by an instance of [`DenseLBDNParams`](@ref) (see the [Package Overview](@ref) for more detail). The output is converted to a probability distribution using a [`softmax`](https://fluxml.ai/Flux.jl/stable/models/nnlib/#NNlib.softmax) layer. Note that all [`AbstractLBDN`](@ref) models can be combined with traditional neural network layers using [`Flux.Chain`](https://fluxml.ai/Flux.jl/stable/models/layers/#Flux.Chain).


## 3. Define a loss function

A typical loss function for training on datasets with discrete labels is the cross entropy loss. We can use the [`crossentropy`](https://fluxml.ai/Flux.jl/stable/models/losses/#Flux.Losses.crossentropy) loss function shipped with `Flux.jl`.

```@example mnist
# Loss function
loss(model,x,y) = Flux.crossentropy(model(x), y)
```


## 4. Train the model

Before training the model to minimise the cross entropy loss, let's set up a callback function to evaluate the model performance during training.

```@example mnist
using Statistics
# Check test accuracy during training
compare(y::OneHotMatrix, ŷ) = maximum(ŷ, dims=1) .== maximum(y.*ŷ, dims=1)
accuracy(model, x, y::OneHotMatrix) = mean(compare(y, model(x)))
# Callback function to show results while training
function progress(model, iter)
train_loss = round(loss(model, x_train, y_train), digits=4)
test_acc = round(accuracy(model, x_test, y_test), digits=4)
@show iter train_loss test_acc
println()
end
```

Let's train the model over 600 epochs using two learning rates: `1e-3` for the first 300, and `1e-4` for the last 300. In both cases, we'll use the [`Adam`](https://fluxml.ai/Flux.jl/stable/training/optimisers/#Flux.Optimise.Adam) optimiser and the default [`Flux.train!`](https://fluxml.ai/Flux.jl/stable/training/reference/#Flux.Optimise.train!-NTuple{4,%20Any}) method. Once the model has been trained, we can save it for later with the [`BSON`](https://github.com/JuliaIO/BSON.jl) package.

```julia
using BSON

# Define hyperparameters and zip up data
num_epochs = 300
lrs = [1e-3, 1e-4]
data = [(x_train, y_train)]

# Train with the Adam optimiser, and display progress every 50 steps
for k in eachindex(lrs)
opt_state = Flux.setup(Adam(lrs[k]), model)
for i in 1:num_epochs
Flux.train!(loss, model, data, opt_state)
(i % 50 == 0) && progress(model, i)
end
end

# Save the model for later
bson("lbdn_mnist.bson", Dict("model" => model))
```

Running the training loop can take a few minutes, so here's one we prepared earlier. The model was trained on the full MNIST dataset (60,000 training images, 10,000 test images).

```@example mnist
model = BSON.load("../../src/assets/lbdn-mnist/lbdn_mnist.bson")["model"]
println(typeof(model))
```

## 5. Evaluate the trained model

Our final model has a test accuracy of about 99% on this small subset of the MNIST dataset. For those interested, it achieves 97.5% accuracy on the full 10,000-image test set. We could improve this further by (for example) using a larger model, training the model for longer, or fine-tuning the learning rate.

```@example mnist
# Print final results
train_acc = accuracy(model, x_train, y_train)*100
test_acc = accuracy(model, x_test, y_test)*100
println("Training accuracy: $(round(train_acc,digits=2))%")
println("Test accuracy: $(round(test_acc,digits=2))%")
```

Let's have a look at some examples too.
```@example mnist
using CairoMakie
# Make a couple of example plots
indx = rand(rng, 1:100, 3)
f1 = Figure(resolution = (800, 300))
for i in eachindex(indx)
# Get data and do prediction
x = x_test[:,indx[i]]
y = y_test[:,indx[i]]
ŷ = model(x)
# Reshape data for plotting
xmat = reshape(x, 28, 28)
yval = (0:9)[y][1]
ŷval = (0:9)[ŷ .== maximum(ŷ)][1]
# Plot results
ax, _ = image(
f1[1,i], xmat, axis=(
yreversed = true,
aspect = DataAspect(),
title = "True class: $(yval), Prediction: $(ŷval)"
)
)
# Format the plot
ax.xticksvisible = false
ax.yticksvisible = false
ax.xticklabelsvisible = false
ax.yticklabelsvisible = false
end
display(f1)
save("lbdn_mnist.svg", f1)
```
![](lbdn_mnist.svg)
File renamed without changes.
2 changes: 1 addition & 1 deletion docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Depth = 1
## Examples

```@contents
Pages = ["examples/lbdn.md", "examples/rl.md", "examples/nonlinear_ctrl.md", "examples/pde_obsv.md"]
Pages = ["examples/lbdn_curvefit.md", "examples/lbdn_mnist.md", "examples/rl.md", "examples/pde_obsv.md", "examples/ren_ctrl.md"]
Depth = 1
```

Expand Down
11 changes: 4 additions & 7 deletions docs/src/introduction/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ using RobustNeuralNetworks
# Setup
rng = MersenneTwister(42)
batches = 10
nu, nx, nv, ny = 4, 10, 20, 2
nu, nx, nv, ny = 4, 10, 20, 1
γ = 1
# Construct a REN
Expand All @@ -34,14 +34,11 @@ u0 = randn(rng, ren.nu, batches)
x1, y1 = ren(x0, u0)
# Print results for testing
yout = round.(y1; digits=2)
println(yout[1,:])
println(yout[2,:])
println(round.(y1; digits=2))
# output
[0.73, 0.72, -0.53, 0.25, 0.84, 0.97, 0.96, 1.13, 0.87, 1.07]
[1.13, 1.07, 1.44, 0.83, 0.94, 1.26, 0.86, 0.8, 0.96, 0.86]
[0.23 -0.01 -0.06 0.15 -0.03 -0.11 0.0 0.42 0.24 0.22]
```

See [Package Overview](@ref) for a detailed walkthrough of this example.
See [Package Overview](@ref) for a detailed walkthrough of this example. For a detailed example of training models from `RobustNeuralNetworks.jl`, we recommend starting with [Fitting a Curve with LBDN](@ref).
2 changes: 1 addition & 1 deletion docs/src/lib/model_params.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
```@index
Pages = ["ren_params.md"]
Pages = ["model_params.md"]
```

# Model Parameterisations
Expand Down
Loading

0 comments on commit 54f9dd9

Please sign in to comment.