Merge pull request #65 from acfr/feature/lbdn-examples

Feature/lbdn examples
acfr · May 15, 2023 · 54f9dd9 · 54f9dd9
2 parents 39b96ab + 94faa74
commit 54f9dd9
Show file tree

Hide file tree

Showing 24 changed files with 1,251 additions and 201 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,2 +1,3 @@
 **Manifest.toml**
 docs/build/
+**/.vscode/
diff --git a/docs/Project.toml b/docs/Project.toml
@@ -1,4 +1,6 @@
 [deps]
+BSON = "fbb218c0-5317-5bc6-957e-2ee96dd4b1f0"
+CairoMakie = "13f3f980-e62b-5c42-98c6-ff1f3baf88f0"
 Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
 Flux = "587475ba-b771-5e3f-ad9e-33799f191a9c"
 LiveServer = "16fef848-5104-11e9-1b77-fb7a48bbb589"

diff --git a/docs/make.jl b/docs/make.jl
@@ -15,10 +15,11 @@ makedocs(
             "Contributing to the Package" => "introduction/developing.md",
         ],
         "Examples" => Any[
-            "Image Classification" => "examples/lbdn.md",
+            "Fitting a Curve" => "examples/lbdn_curvefit.md",
+            "Image Classification" => "examples/lbdn_mnist.md",
             "Reinforcement Learning" => "examples/rl.md",
-            "Nonlinear Control" => "examples/nonlinear_ctrl.md",
             "PDE Observer" => "examples/pde_obsv.md",
+            "Nonlinear Control" => "examples/ren_ctrl.md",
         ],
         "Library" => Any[
             "Model Wrappers" => "lib/models.md",

diff --git a/docs/src/assets/lbdn-mnist/lbdn_mnist.bson b/docs/src/assets/lbdn-mnist/lbdn_mnist.bson
diff --git a/docs/src/assets/lbdn-mnist/mnist_data.bson b/docs/src/assets/lbdn-mnist/mnist_data.bson
diff --git a/docs/src/examples/lbdn.md b/docs/src/examples/lbdn.md
diff --git a/docs/src/examples/lbdn_curvefit.md b/docs/src/examples/lbdn_curvefit.md
@@ -0,0 +1,129 @@
+# Fitting a Curve with LBDN
+
+For our first example, let's fit a Lipschitz-bounded Deep Network (LBDN) to a curve in one dimension. Consider the multiple sine-wave function below as an example.
+```math
+f(x) = \sin(x) + \frac{1}{N}\sin(Nx)
+```
+Our aim is to demonstrate how to train a model in `RobustNeuralNetworks.jl`, and how to set constraints to ensure the model naturally satisfies some user-defined robustness certificate. We'll follow the steps below to fit an LBDN model to our function ``f(x)``:
+1. Generate training data
+2. Define a model with a Lipshitz bound (maximum slope) of `1.0`
+3. Define a loss function
+4. Train the model to minimise the loss function
+5. Examine the trained model
+
+## 1. Generate training data
+
+Let's generate training data for ``f(x)`` on the interval ``[0, 2\pi]`` and choose ``N = 5`` as an example. We `zip()` the data up into a sequence of tuples `(x,y)` to make training with `Flux.jl` easier in Step 4.
+
+```@example curve_fit
+# Function to estimate
+N = 5
+f(x) = sin(x)+(1/N)*sin(N*x)
+
+# Training data
+dx = 0.1
+xs = 0:dx:2π
+ys = f.(xs)
+data = zip(xs,ys)
+```
+
+## 2. Define a model
+
+Since we are only dealing with a simple one-dimensional curve, we can afford to use a small model. Let's choose an LBDN with four hidden layers, each with 15 neurons, and a Lipschitz bound of `γ = 1.0`. This means that the maximum slope the model can achieve between two points will be exactly `1.0` by construction.
+
+```@example curve_fit
+using Random
+using RobustNeuralNetworks
+
+# Random seed for consistency
+rng = MersenneTwister(42)
+
+# Model specification
+nu = 1                  # Number of inputs
+ny = 1                  # Number of outputs
+nh = fill(15,4)         # 4 hidden layers, each with 15 neurons
+γ = 1                   # Lipschitz bound of 1
+
+# Set up model: define parameters, then create model
+model_ps = DenseLBDNParams{Float64}(nu, nh, ny, γ; rng=rng)
+model = DiffLBDN(model_ps)
+```
+
+Notice that we first constructed the model parameters `model_ps` defining a (dense) LBDN with [`DenseLBDNParams`](@ref) and then created a callable `model` with the [`DiffLBDN`](@ref) wrapper. In `RobustNeuralNetworks.jl`, model parameterisations are separated from the "explicit" definition of the model used for evaluation on data. The [`DiffLBDN`](@ref) model wrapper combines the two together in a model structure more familiar to [`Flux.jl`](https://fluxml.ai/) users for convenience. See the [Package Overview](@ref) for more information.
+
+## 3. Define a loss function
+
+Let's stick to a simple loss function based on the mean-squared error (MSE) for this example. All [`AbstractLBDN`](@ref) models take an `AbstractArray` as their input, which is why `x` and `y` are wrapped in vectors.
+```@example curve_fit
+# Loss function
+loss(model,x,y) = Flux.mse(model([x]),[y]) 
+```
+
+## 4. Train the model
+
+Our objective is to minimise the MSE loss function with a model that has a Lipschitz bound no greater than `1.0`. Let's set up a callback function to check the fit error and slope of our model at each training epoch.
+
+```@example curve_fit
+using Flux
+
+# Check fit error/slope during training
+mse(model, xs, ys) = sum(loss.((model,), xs, ys)) / length(xs)
+lip(model, xs, dx) = maximum(abs.(diff(model(xs'), dims=2)))/dx
+
+# Callback function to show results while training
+function progress(model, iter, xs, ys, dx) 
+    fit_error = round(mse(model, xs, ys), digits=4)
+    slope = round(lip(model, xs, dx), digits=4)
+    @show iter fit_error slope
+    println()
+end
+```
+
+We'll train the model for 200 training epochs a learning rate of `lr = 2e-4`. We'll also use the [`Adam`](https://fluxml.ai/Flux.jl/stable/training/optimisers/#Flux.Optimise.Adam) optimiser from `Flux.jl` and the default [`Flux.train!`](https://fluxml.ai/Flux.jl/stable/training/reference/#Flux.Optimise.train!-NTuple{4,%20Any}) method.
+
+```@example curve_fit
+# Define hyperparameters and optimiser
+num_epochs = 200
+lr = 2e-4
+opt_state = Flux.setup(Adam(lr), model)
+
+# Train the model
+for i in 1:num_epochs
+    Flux.train!(loss, model, data, opt_state)
+    (i % 100 == 0) && progress(model, i, xs, ys, dx)
+end
+```
+
+Note that this training loop is for demonstration only. For a better fit, and indeed on more complex problems, we strongly recommend:
+- Increasing the number of training epochs
+- Defining your own [training loop](https://fluxml.ai/Flux.jl/stable/training/training/) 
+- Using [ParameterSchedulers.jl](https://github.com/FluxML/ParameterSchedulers.jl) to vary the learning rate.
+
+## 5. Examine the trained model
+
+We can now plot the results to see what our model looks like.
+
+```@example curve_fit
+using CairoMakie
+
+# Create a figure
+f1 = Figure(resolution = (600, 400))
+ax = Axis(f1[1,1], xlabel="x", ylabel="y")
+
+ŷ = map(x -> model([x])[1], xs)
+lines!(xs, ys, label = "Data")
+lines!(xs, ŷ, label = "LBDN")
+axislegend(ax)
+save("lbdn_curve_fit.svg", f1)
+```
+![](lbdn_curve_fit.svg)
+
+The model roughly approximates the multiple sine-wave ``f(x)``, but maintains a maximum Lipschitz constant (slope on the graph) below 1. 
+
+```@example curve_fit
+# Estimate Lipschitz lower-bound
+lip(model, xs, dx) = maximum(abs.(diff(model(xs'), dims=2)))/dx
+println("Empirical lower Lipschitz bound: ", round(lip(model, xs, dx); digits=2))
+```
+
+The benefit of using an LBDN is that we have full control over the Lipschitz bound, and can still use standard unconstrained gradient descent tools lile `Flux.train!` to train our models. For examples in which setting the Lipschitz bound improves model performance and robustness, see [Image Classification with LBDN](@ref) and [Reinforcement Learning with LBDN](@ref).
diff --git a/docs/src/examples/lbdn_mnist.md b/docs/src/examples/lbdn_mnist.md
@@ -0,0 +1,193 @@
+# Image Classification with LBDN
+
+Our next example features an LBDN trained to classify the [MNIST](https://en.wikipedia.org/wiki/MNIST_database) dataset. We showed in [Wang & Manchester (2023)](https://doi.org/10.48550/arXiv.2301.11526) that tuning the built-in Lipschitz bounds of LBDNs is an efficient way of designing neural networks that are robust to adversarial attacks. In this example, we will demonstrate how to train an LBDN model on the MNIST dataset with the following steps:
+1. Load the training and test data
+2. Define a Lipschitz-bounded model
+3. Define a loss function
+4. Train the model to minimise the loss function
+5. Evaluate the trained model
+
+## 1. Load the data
+
+Let's start by loading the training and test data. [`MLDatasets.jl`](https://juliaml.github.io/MLDatasets.jl/stable/) contains a number of common machine-learning datasets, including the [MNIST dataset](https://juliaml.github.io/MLDatasets.jl/stable/datasets/vision/#MLDatasets.MNIST). To load the full dataset of 60,000 training images and 10,000 test images, one would run the following code.
+
+```julia
+using MLDatasets: MNIST
+
+# Get MNIST training and test data
+T = Float64
+x_train, y_train = MNIST(T, split=:train)[:]
+x_test,  y_test  = MNIST(T, split=:test)[:]
+```
+
+For the purposes of this example, we'll only load a small subset of the dataset.
+```@example mnist
+using BSON
+
+data = BSON.load("../../src/assets/lbdn-mnist/mnist_data.bson")
+x_train, y_train = data["x_train"], data["y_train"]
+x_test,  y_test  = data["x_test"],  data["y_test"]
+
+println("Train/test features: ", size(x_train), " / ", size(x_test))
+println("Train/test labels:   ", size(y_train), " / ", size(y_test))
+```
+
+The feature matrices `x_train` and `x_test` are three-dimensional arrays where each 28x28 layer contains pixel data for a single handwritten number from 0 to 9. The labels `y_train` and `y_test` are vectors containing the classification of each image as a number from 0 to 9. We can convert each of these to an input/output format better suited to training with [`Flux.jl`](https://fluxml.ai/).
+
+```@example mnist
+using Flux
+using Flux: OneHotMatrix
+
+# Reshape features for model input
+x_train = Flux.flatten(x_train)
+x_test  = Flux.flatten(x_test)
+
+# Encode categorical variables on output
+y_train = Flux.onehotbatch(y_train, 0:9)
+y_test  = Flux.onehotbatch(y_test,  0:9)
+
+println("Features: ", size(x_test))
+println("Labels:   ", size(y_test))
+```
+
+Features are now stored in a `Matrix` where each column contains pixel data from a single image, and the labels have been converted to a `OneHotMatrix` where each column contains a 1 in the row corresponding to the image's classification (eg: row 2 for an image showing the number 3).
+
+
+## 2. Define a model
+
+We can now construct an LBDN model to train on the MNIST dataset. In our [paper](https://doi.org/10.48550/arXiv.2301.11526) we use LBDN models with three hidden layers of (256, 356, 128) neurons (respectively) to achieve a classification accuracy of approximately 99% on the full MNIST dataset. For this example, we'll consider a smaller network and set a Lipschitz bound of `γ = 5.0` to demonstrate the method.
+
+```@example mnist
+using Random
+using RobustNeuralNetworks
+
+# Random seed for consistency
+rng = MersenneTwister(42)
+
+# Model specification
+nu = 28*28              # Number of inputs (size of image)
+ny = 10                 # Number of outputs (possible classifications)
+nh = fill(64,2)         # 2 hidden layers, each with 64 neurons
+γ  = 5                  # Lipschitz bound of 5.0
+
+# Set up model: define parameters, then create model
+model_ps = DenseLBDNParams{Float64}(nu, nh, ny, γ; rng=rng)
+model = Chain(DiffLBDN(model_ps), Flux.softmax)
+
+println(typeof(model))
+```
+
+The `model` contains a callable [`DiffLBDN`](@ref) model constructed from its direct parameterisation, which is defined by an instance of [`DenseLBDNParams`](@ref) (see the [Package Overview](@ref) for more detail). The output is converted to a probability distribution using a [`softmax`](https://fluxml.ai/Flux.jl/stable/models/nnlib/#NNlib.softmax) layer. Note that all [`AbstractLBDN`](@ref) models can be combined with traditional neural network layers using [`Flux.Chain`](https://fluxml.ai/Flux.jl/stable/models/layers/#Flux.Chain).
+
+
+## 3. Define a loss function
+
+A typical loss function for training on datasets with discrete labels is the cross entropy loss. We can use the [`crossentropy`](https://fluxml.ai/Flux.jl/stable/models/losses/#Flux.Losses.crossentropy) loss function shipped with `Flux.jl`.
+
+```@example mnist
+# Loss function
+loss(model,x,y) = Flux.crossentropy(model(x), y)
+```
+
+
+## 4. Train the model
+
+Before training the model to minimise the cross entropy loss, let's set up a callback function to evaluate the model performance during training.
+
+```@example mnist
+using Statistics
+
+# Check test accuracy during training
+compare(y::OneHotMatrix, ŷ) = maximum(ŷ, dims=1) .== maximum(y.*ŷ, dims=1)
+accuracy(model, x, y::OneHotMatrix) = mean(compare(y, model(x)))
+
+# Callback function to show results while training
+function progress(model, iter)
+    train_loss = round(loss(model, x_train, y_train), digits=4)
+    test_acc = round(accuracy(model, x_test, y_test), digits=4)
+    @show iter train_loss test_acc
+    println()
+end
+```
+
+Let's train the model over 600 epochs using two learning rates: `1e-3` for the first 300, and `1e-4` for the last 300. In both cases, we'll use the [`Adam`](https://fluxml.ai/Flux.jl/stable/training/optimisers/#Flux.Optimise.Adam) optimiser and the default [`Flux.train!`](https://fluxml.ai/Flux.jl/stable/training/reference/#Flux.Optimise.train!-NTuple{4,%20Any}) method. Once the model has been trained, we can save it for later with the [`BSON`](https://github.com/JuliaIO/BSON.jl) package.
+
+```julia
+using BSON
+
+# Define hyperparameters and zip up data
+num_epochs = 300
+lrs = [1e-3, 1e-4]
+data = [(x_train, y_train)]
+
+# Train with the Adam optimiser, and display progress every 50 steps
+for k in eachindex(lrs)
+    opt_state = Flux.setup(Adam(lrs[k]), model)
+    for i in 1:num_epochs
+        Flux.train!(loss, model, data, opt_state)
+        (i % 50 == 0) && progress(model, i)
+    end
+end
+
+# Save the model for later
+bson("lbdn_mnist.bson", Dict("model" => model))
+```
+
+Running the training loop can take a few minutes, so here's one we prepared earlier. The model was trained on the full MNIST dataset (60,000 training images, 10,000 test images).
+
+```@example mnist
+model = BSON.load("../../src/assets/lbdn-mnist/lbdn_mnist.bson")["model"]
+println(typeof(model))
+```
+
+## 5. Evaluate the trained model
+
+Our final model has a test accuracy of about 99% on this small subset of the MNIST dataset. For those interested, it achieves 97.5% accuracy on the full 10,000-image test set. We could improve this further by (for example) using a larger model, training the model for longer, or fine-tuning the learning rate. 
+
+```@example mnist
+# Print final results
+train_acc = accuracy(model, x_train, y_train)*100
+test_acc  = accuracy(model, x_test,  y_test)*100
+println("Training accuracy: $(round(train_acc,digits=2))%")
+println("Test accuracy:     $(round(test_acc,digits=2))%")
+```
+
+Let's have a look at some examples too.
+```@example mnist
+using CairoMakie
+
+# Make a couple of example plots
+indx = rand(rng, 1:100, 3)
+f1 = Figure(resolution = (800, 300))
+for i in eachindex(indx)
+
+    # Get data and do prediction
+    x = x_test[:,indx[i]]
+    y = y_test[:,indx[i]]
+    ŷ = model(x)
+
+    # Reshape data for plotting
+    xmat = reshape(x, 28, 28)
+    yval = (0:9)[y][1]
+    ŷval = (0:9)[ŷ .== maximum(ŷ)][1]
+
+    # Plot results
+    ax, _ = image(
+        f1[1,i], xmat, axis=(
+            yreversed = true, 
+            aspect = DataAspect(), 
+            title = "True class: $(yval), Prediction: $(ŷval)"
+        )
+    )
+
+    # Format the plot
+    ax.xticksvisible = false
+    ax.yticksvisible = false
+    ax.xticklabelsvisible = false
+    ax.yticklabelsvisible = false
+
+end
+display(f1)
+save("lbdn_mnist.svg", f1)
+```
+![](lbdn_mnist.svg)
diff --git a/docs/src/examples/nonlinear_ctrl.md → docs/src/examples/ren_ctrl.md b/docs/src/examples/nonlinear_ctrl.md → docs/src/examples/ren_ctrl.md
diff --git a/docs/src/index.md b/docs/src/index.md
@@ -25,7 +25,7 @@ Depth = 1
 ## Examples
 
 ```@contents
-Pages = ["examples/lbdn.md", "examples/rl.md", "examples/nonlinear_ctrl.md", "examples/pde_obsv.md"]
+Pages = ["examples/lbdn_curvefit.md", "examples/lbdn_mnist.md", "examples/rl.md", "examples/pde_obsv.md", "examples/ren_ctrl.md"]
 Depth = 1
 ```
 

diff --git a/docs/src/introduction/getting_started.md b/docs/src/introduction/getting_started.md
@@ -19,7 +19,7 @@ using RobustNeuralNetworks
 # Setup
 rng = MersenneTwister(42)
 batches = 10
-nu, nx, nv, ny = 4, 10, 20, 2
+nu, nx, nv, ny = 4, 10, 20, 1
 γ = 1
 
 # Construct a REN
@@ -34,14 +34,11 @@ u0 = randn(rng, ren.nu, batches)
 x1, y1 = ren(x0, u0)
 
 # Print results for testing
-yout = round.(y1; digits=2)
-println(yout[1,:])
-println(yout[2,:])
+println(round.(y1; digits=2))
 
 # output
 
-[0.73, 0.72, -0.53, 0.25, 0.84, 0.97, 0.96, 1.13, 0.87, 1.07]
-[1.13, 1.07, 1.44, 0.83, 0.94, 1.26, 0.86, 0.8, 0.96, 0.86]
+[0.23 -0.01 -0.06 0.15 -0.03 -0.11 0.0 0.42 0.24 0.22]
 ```
 
-See [Package Overview](@ref) for a detailed walkthrough of this example.
+See [Package Overview](@ref) for a detailed walkthrough of this example. For a detailed example of training models from `RobustNeuralNetworks.jl`, we recommend starting with [Fitting a Curve with LBDN](@ref).
diff --git a/docs/src/lib/model_params.md b/docs/src/lib/model_params.md
@@ -1,5 +1,5 @@
 ```@index
-Pages = ["ren_params.md"]
+Pages = ["model_params.md"]
 ```
 
 # Model Parameterisations