diff --git a/docs/src/assets/lbdn-mnist/lbdn_mnist.svg b/docs/src/assets/lbdn-mnist/lbdn_mnist.svg index 8383238..71f4c61 100644 --- a/docs/src/assets/lbdn-mnist/lbdn_mnist.svg +++ b/docs/src/assets/lbdn-mnist/lbdn_mnist.svg @@ -2,531 +2,411 @@ - - + + - - + + - - + + - - + + - - + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + - - + + - - + + - - + + - - + + - - + + - - - - - - - + - - - - - - - - + + - - + + - - - - - - - - - - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - - - - - - + + + - - - - - - + + + - - - - - - + + + - + - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + + - + - + - + - + - - - - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + + - + - + - + - + - + - + - + - + - - - - - - - + - + - + - + - + - + - + - + - + - + - + - + - + - + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + - - - - - - - - - - - - + + + + + + + + + + + + + diff --git a/docs/src/assets/lbdn-mnist/lbdn_mnist_robust.svg b/docs/src/assets/lbdn-mnist/lbdn_mnist_robust.svg index f93f9d1..5d4d568 100644 --- a/docs/src/assets/lbdn-mnist/lbdn_mnist_robust.svg +++ b/docs/src/assets/lbdn-mnist/lbdn_mnist_robust.svg @@ -2,213 +2,219 @@ - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - - - - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + + + + + + + + + + - + - + - + - + - + - + - + - + - + - + - + - + - - + + - + - + @@ -216,186 +222,186 @@ - - - - + + + + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - + - - + + - - + + @@ -403,50 +409,50 @@ - - - - + + + + - + - + - + - + - + - + - + - + - + - + - + - + - + diff --git a/docs/src/examples/lbdn_mnist.md b/docs/src/examples/lbdn_mnist.md index ad08061..ecc66fe 100644 --- a/docs/src/examples/lbdn_mnist.md +++ b/docs/src/examples/lbdn_mnist.md @@ -16,13 +16,23 @@ For details on how Lipschitz bounds increase classification robustness and relia Let's start by loading the training and test data. [`MLDatasets.jl`](https://juliaml.github.io/MLDatasets.jl/stable/) contains a number of common machine-learning datasets, including the [MNIST dataset](https://juliaml.github.io/MLDatasets.jl/stable/datasets/vision/#MLDatasets.MNIST). The following code loads the full dataset of 60,000 training images and 10,000 test images. +!!! info "Working on the GPU" + Since we're dealing with images, we will load are data and models onto the GPU to speed up training. We'll be using [`CUDA.jl`](https://github.com/JuliaGPU/CUDA.jl). + + If you don't have a GPU on your machine, just switch to `dev = cpu`. If you have a GPU but not an NVIDIA GPU, switch out `CUDA.jl` with whichever GPU backend supports your device. For more information on training models on a GPU, see [here](https://fluxml.ai/Flux.jl/stable/gpu/). + ```julia +using CUDA using MLDatasets: MNIST +# Choose device +dev = gpu +# dev = cpu + # Get MNIST training and test data T = Float32 -x_train, y_train = MNIST(T, split=:train)[:] -x_test, y_test = MNIST(T, split=:test)[:] +x_train, y_train = MNIST(T, split=:train)[:] |> dev +x_test, y_test = MNIST(T, split=:test)[:] |> dev ``` The feature matrices `x_train` and `x_test` are three-dimensional arrays where each 28x28 layer contains pixel data for a single handwritten number from 0 to 9 (see below for an example). The labels `y_train` and `y_test` are vectors containing the classification of each image as a number from 0 to 9. We can convert each of these to an input/output format better suited to training with [`Flux.jl`](https://fluxml.ai/). @@ -63,10 +73,10 @@ nh = fill(64,2) # 2 hidden layers, each with 64 neurons # Set up model: define parameters, then create model model_ps = DenseLBDNParams{T}(nu, nh, ny, γ; rng) -model = Chain(DiffLBDN(model_ps), Flux.softmax) +model = Chain(DiffLBDN(model_ps), Flux.softmax) |> dev ``` -The `model` consisnts of two parts. The first is a callable [`DiffLBDN`](@ref) model constructed from its direct parameterisation, which is defined by an instance of [`DenseLBDNParams`](@ref) (see the [Package Overview](@ref) for more detail). The output is then converted to a probability distribution using a [`softmax`](https://fluxml.ai/Flux.jl/stable/models/nnlib/#NNlib.softmax) layer. Note that all [`AbstractLBDN`](@ref) models can be combined with traditional neural network layers using [`Flux.Chain`](https://fluxml.ai/Flux.jl/stable/models/layers/#Flux.Chain). We could also have used [`SandwichFC`](@ref) layers to build the network, as outlined in [Fitting a Curve with LBDN](@ref). +The `model` consisnts of two parts. The first is a callable [`DiffLBDN`](@ref) model constructed from its direct parameterisation, which is defined by an instance of [`DenseLBDNParams`](@ref) (see the [Package Overview](@ref) for more detail). The output is then converted to a probability distribution using a [`softmax`](https://fluxml.ai/Flux.jl/stable/models/nnlib/#NNlib.softmax) layer. Note that all [`AbstractLBDN`](@ref) models can be combined with traditional neural network layers using [`Flux.Chain`](https://fluxml.ai/Flux.jl/stable/models/layers/#Flux.Chain). We could also have used [`SandwichFC`](@ref) layers to build the network, as outlined in [Fitting a Curve with LBDN](@ref). The final model is loaded onto whichever device `dev` you chose in [1. Load the data](@ref). ## 3. Define a loss function @@ -118,9 +128,10 @@ end # Train and save the model for later train_mnist!(model, train_data) -bson("lbdn_mnist.bson", Dict("model" => model)) +bson("lbdn_mnist.bson", Dict("model" => model |> cpu)) ``` +Note that we move the model back to the `cpu` before saving it! ## 5. Evaluate the trained model @@ -152,6 +163,11 @@ for i in eachindex(indx) y = y_test[:,indx[i]] ŷ = model(x) + # Make sure data is on CPU for plotting + x = x |> cpu + y = y |> cpu + ŷ = ŷ |> cpu + # Reshape data for plotting xmat = reshape(x, 28, 28) yval = (0:9)[y][1] @@ -188,11 +204,11 @@ dense = Chain( Dense(nh[1], nh[2], Flux.relu; init, bias=initb(nh[2])), Dense(nh[2], ny; init, bias=initb(ny)), Flux.softmax -) +) |> dev # Train it and save for later train_mnist!(dense, train_data) -bson("dense_mnist.bson", Dict("model" => dense)) +bson("dense_mnist.bson", Dict("model" => dense |> cpu)) ``` The trained model performs similarly to the LBDN on the original test dataset. @@ -204,14 +220,14 @@ println("Training accuracy: $(round(train_acc,digits=2))%") println("Test accuracy: $(round(test_acc,digits=2))%") ``` ```@example mnist -println("Training accuracy:"," 97.64%") #hide -println("Test accuracy:"," 96.60%") #hide +println("Training accuracy:"," 97.65%") #hide +println("Test accuracy:"," 96.61%") #hide ``` As a simple test of robustness, we'll add uniformly-sampled random noise in the range ``[-\epsilon, \epsilon]`` to the pixel data in the test dataset for a range of noise magnitudes ``\epsilon \in [0, 200/255].`` We can record the test accuracy for each perturbation size and store it for plotting. ```julia # Get test accuracy as we add noise -uniform(x) = 2*rand(rng, T, size(x)...) .- 1 +uniform(x) = 2*rand(rng, T, size(x)...) .- 1 |> dev function noisy_test_error(model, ϵ=0) noisy_xtest = x_test .+ ϵ*uniform(x_test) accuracy(model, noisy_xtest, y_test)*100 diff --git a/docs/src/introduction/package_overview.md b/docs/src/introduction/package_overview.md index 9017e5f..1bff379 100644 --- a/docs/src/introduction/package_overview.md +++ b/docs/src/introduction/package_overview.md @@ -181,10 +181,28 @@ In some applications (eg: reinforcement learning), a model is called many times !!! info "Which wrapper should I use?" The model wrappers [`DiffREN`](@ref), [`DiffLBDN`](@ref), and [`SandwichFC`](@ref) re-compute the explicit parameters every time the model is called. In applications where the learnable parameters are updated after one model call (eg: image classification), it is often more convenient and equally fast to use these wrappers. - In applications where the model is called many times before updating it (eg: reinforcement learning), use \verb|REN| or \verb|LBDN|. They compute the explicit model when constructed and store it for later use, making them more efficient. + In applications where the model is called many times before updating it (eg: reinforcement learning), use [`REN`](@ref) or [`LBDN`](@ref). They compute the explicit model when constructed and store it for later use, making them more efficient. See [Can't I just use `DiffLBDN`?](@ref) in [Reinforcement Learning with LBDN](@ref) for a demonstration of this trade-off. +## Onto the GPU + +If you have a GPU on your machine, then you're in luck. All models in `RobustNeuralNetworks.jl` can be loaded onto the GPU for training and evaluation in exactly the same way as any other `Flux.jl` model. To adapt our example from [Explicit model wrappers](@ref) to run on the GPU, we would do the following. + +```julia +using CUDA + +model_params = model_params |> gpu +data = data |> gpu + +opt_state = Flux.setup(Adam(0.01), model_params) +for _ in 1:50 + Flux.train!(loss, model_params, data, opt_state) +end +``` + +An example of training a [`DiffLBDN`](@ref) on the GPU is provided in [Image Classification with LBDN](@ref). See [`Flux.jl`'s GPU support page](https://fluxml.ai/Flux.jl/stable/gpu/) for more information on training models with different GPU backends. + ## Robustness metrics and IQCs All neural network models in `RobustNeuralNetworks.jl` are designed to satisfy a set of user-defined robustness constraints. There are a number of different robustness criteria which our RENs can satisfy. Some relate to the internal dynamics of the model, others relate to the input-output map. LBDNs are less general, and are specifically constructed to satisfy Lipschitz bounds. See the section on [Lipschitz bounds (smoothness)](@ref) below. diff --git a/examples/GPU/README.md b/examples/GPU/README.md index 03db227..ddd5264 100644 --- a/examples/GPU/README.md +++ b/examples/GPU/README.md @@ -1,6 +1,6 @@ ## Testing `RobustNeuralNetworks.jl` on a GPU -There is currently full support for using models from `RobustNeuralNetworks.jl` on a GPU. However, the speed could definitely be improved with some code optimisations, and we don't have any CI testing on the GPU. +There is currently basic support for using models from `RobustNeuralNetworks.jl` on a GPU. However, the speed could definitely be improved with some code optimisations, and we don't have any CI testing on the GPU. The scripts in this directory serve two purposes: - They provide a means of benchmarking model performance on a GPU