typos and other stuff

JuliaTrustworthyAI · Sep 2, 2024 · e212365 · e212365
1 parent 41512af
commit e212365
Show file tree

Hide file tree

Showing 41 changed files with 22,308 additions and 17,404 deletions.
diff --git a/_freeze/docs/src/tutorials/logit/execute-results/md.json b/_freeze/docs/src/tutorials/logit/execute-results/md.json
@@ -1,8 +1,8 @@
 {
-  "hash": "c85939461717821d1f1c6056fbbf0569",
+  "hash": "64cc61b7b60f8aef12841a8bd09bc8bb",
   "result": {
     "engine": "jupyter",
-    "markdown": "```@meta\nCurrentModule = LaplaceRedux\n```\n\n# Bayesian Logistic Regression\n\n## Libraries\n\n::: {.cell execution_count=1}\n``` {.julia .cell-code}\nusing Pkg; Pkg.activate(\"docs\")\n# Import libraries\nusing Flux, Plots, TaijaPlotting, Random, Statistics, LaplaceRedux, LinearAlgebra\ntheme(:lime)\n```\n:::\n\n\n## Data\n\nWe will use synthetic data with linearly separable samples:\n\n::: {.cell execution_count=2}\n``` {.julia .cell-code}\n# set seed\nseed= 1234\nRandom.seed!(seed)\n# Number of points to generate.\nxs, ys = LaplaceRedux.Data.toy_data_linear(100; seed=seed)\nX = hcat(xs...) # bring into tabular format\n```\n:::\n\n\nsplit in a training and test set\n\n::: {.cell execution_count=3}\n``` {.julia .cell-code}\n# Shuffle the data\nn = length(ys)\nindices = randperm(n)\n\n# Define the split ratio\nsplit_ratio = 0.8\nsplit_index = Int(floor(split_ratio * n))\n\n# Split the data into training and test sets\ntrain_indices = indices[1:split_index]\ntest_indices = indices[split_index+1:end]\n\nxs_train = xs[train_indices]\nxs_test = xs[test_indices]\nys_train = ys[train_indices]\nys_test = ys[test_indices]\n# bring into tabular format\nX_train = hcat(xs_train...) \nX_test = hcat(xs_test...) \n\ndata = zip(xs_train,ys_train)\n```\n:::\n\n\n## Model\n\nLogistic regression with weight decay can be implemented in Flux.jl as a single dense (linear) layer with binary logit crossentropy loss:\n\n::: {.cell execution_count=4}\n``` {.julia .cell-code}\nnn = Chain(Dense(2,1))\nλ = 0.5\nsqnorm(x) = sum(abs2, x)\nweight_regularization(λ=λ) = 1/2 * λ^2 * sum(sqnorm, Flux.params(nn))\nloss(x, y) = Flux.Losses.logitbinarycrossentropy(nn(x), y) + weight_regularization()\n```\n:::\n\n\nThe code below simply trains the model. After about 50 training epochs training loss stagnates.\n\n::: {.cell execution_count=5}\n``` {.julia .cell-code}\nusing Flux.Optimise: update!, Adam\nopt = Adam()\nepochs = 50\navg_loss(data) = mean(map(d -> loss(d[1],d[2]), data))\nshow_every = epochs/10\n\nfor epoch = 1:epochs\n  for d in data\n    gs = gradient(Flux.params(nn)) do\n      l = loss(d...)\n    end\n    update!(opt, Flux.params(nn), gs)\n  end\n  if epoch % show_every == 0\n    println(\"Epoch \" * string(epoch))\n    @show avg_loss(data)\n  end\nend\n```\n:::\n\n\n## Laplace approximation\n\nLaplace approximation for the posterior predictive can be implemented as follows:\n\n::: {.cell execution_count=6}\n``` {.julia .cell-code}\nla = Laplace(nn; likelihood=:classification, λ=λ, subset_of_weights=:last_layer)\nfit!(la, data)\nla_untuned = deepcopy(la)   # saving for plotting\noptimize_prior!(la; verbose=true, n_steps=500)\n```\n:::\n\n\nThe plot below shows the resulting posterior predictive surface for the plugin estimator (left) and the Laplace approximation (right).\n\n::: {.cell execution_count=7}\n``` {.julia .cell-code}\nzoom = 0\np_plugin = plot(la, X, ys; title=\"Plugin\", link_approx=:plugin, clim=(0,1))\np_untuned = plot(la_untuned, X, ys; title=\"LA - raw (λ=$(unique(diag(la_untuned.prior.P₀))[1]))\", clim=(0,1), zoom=zoom)\np_laplace = plot(la, X, ys; title=\"LA - tuned (λ=$(round(unique(diag(la.prior.P₀))[1],digits=2)))\", clim=(0,1), zoom=zoom)\nplot(p_plugin, p_untuned, p_laplace, layout=(1,3), size=(1700,400))\n```\n\n::: {.cell-output .cell-output-display execution_count=8}\n![](logit_files/figure-commonmark/cell-8-output-1.svg){}\n:::\n:::\n\n\nNow we can test the level of calibration of the neural network.\nFirst we collect the predicted results over the test dataset\n\n::: {.cell execution_count=8}\n``` {.julia .cell-code}\n predicted_distributions= predict(la, X_test,ret_distr=true)\n```\n\n::: {.cell-output .cell-output-display execution_count=9}\n```\n1×20 Matrix{Distributions.Bernoulli{Float64}}:\n Distributions.Bernoulli{Float64}(p=0.13122)  …  Distributions.Bernoulli{Float64}(p=0.109559)\n```\n:::\n:::\n\n\nthen we plot the calibration plot\n\n::: {.cell execution_count=9}\n``` {.julia .cell-code}\nCalibration_Plot(la,ys_test,vec(predicted_distributions);n_bins = 10)\n```\n\n::: {.cell-output .cell-output-display}\n![](logit_files/figure-commonmark/cell-10-output-1.svg){}\n:::\n:::\n\n\nas we can see from the plot, although extremely accurate, the neural network doesn't seem to be calibrated well. This is however an effect of the extreme accuracy reached by the neural network which causes the lack of predictions with  high uncertainty (low certainty). We can see this by looking at the level of sharpness for the two classes which are extremely close to 1, indicating the high level of trust that the neural network has in the predictions.\n\n::: {.cell execution_count=10}\n``` {.julia .cell-code}\nsharpness_classification(ys_test,vec(predicted_distributions))\n```\n\n::: {.cell-output .cell-output-display execution_count=11}\n```\n(0.9131870336577175, 0.8865055827351365)\n```\n:::\n:::\n\n\n",
+    "markdown": "```@meta\nCurrentModule = LaplaceRedux\n```\n\n# Bayesian Logistic Regression\n\n## Libraries\n\n::: {.cell execution_count=1}\n``` {.julia .cell-code}\nusing Pkg; Pkg.activate(\"docs\")\n# Import libraries\nusing Flux, Plots, TaijaPlotting, Random, Statistics, LaplaceRedux, LinearAlgebra\ntheme(:lime)\n```\n:::\n\n\n## Data\n\nWe will use synthetic data with linearly separable samples:\n\n::: {.cell execution_count=2}\n``` {.julia .cell-code}\n# set seed\nseed= 1234\nRandom.seed!(seed)\n# Number of points to generate.\nxs, ys = LaplaceRedux.Data.toy_data_linear(100; seed=seed)\nX = hcat(xs...) # bring into tabular format\n```\n:::\n\n\nsplit in a training and test set\n\n::: {.cell execution_count=3}\n``` {.julia .cell-code}\n# Shuffle the data\nn = length(ys)\nindices = randperm(n)\n\n# Define the split ratio\nsplit_ratio = 0.8\nsplit_index = Int(floor(split_ratio * n))\n\n# Split the data into training and test sets\ntrain_indices = indices[1:split_index]\ntest_indices = indices[split_index+1:end]\n\nxs_train = xs[train_indices]\nxs_test = xs[test_indices]\nys_train = ys[train_indices]\nys_test = ys[test_indices]\n# bring into tabular format\nX_train = hcat(xs_train...) \nX_test = hcat(xs_test...) \n\ndata = zip(xs_train,ys_train)\n```\n:::\n\n\n## Model\n\nLogistic regression with weight decay can be implemented in Flux.jl as a single dense (linear) layer with binary logit crossentropy loss:\n\n::: {.cell execution_count=4}\n``` {.julia .cell-code}\nnn = Chain(Dense(2,1))\nλ = 0.5\nsqnorm(x) = sum(abs2, x)\nweight_regularization(λ=λ) = 1/2 * λ^2 * sum(sqnorm, Flux.params(nn))\nloss(x, y) = Flux.Losses.logitbinarycrossentropy(nn(x), y) + weight_regularization()\n```\n:::\n\n\nThe code below simply trains the model. After about 50 training epochs training loss stagnates.\n\n::: {.cell execution_count=5}\n``` {.julia .cell-code}\nusing Flux.Optimise: update!, Adam\nopt = Adam()\nepochs = 50\navg_loss(data) = mean(map(d -> loss(d[1],d[2]), data))\nshow_every = epochs/10\n\nfor epoch = 1:epochs\n  for d in data\n    gs = gradient(Flux.params(nn)) do\n      l = loss(d...)\n    end\n    update!(opt, Flux.params(nn), gs)\n  end\n  if epoch % show_every == 0\n    println(\"Epoch \" * string(epoch))\n    @show avg_loss(data)\n  end\nend\n```\n:::\n\n\n## Laplace approximation\n\nLaplace approximation for the posterior predictive can be implemented as follows:\n\n::: {.cell execution_count=6}\n``` {.julia .cell-code}\nla = Laplace(nn; likelihood=:classification, λ=λ, subset_of_weights=:last_layer)\nfit!(la, data)\nla_untuned = deepcopy(la)   # saving for plotting\noptimize_prior!(la; verbose=true, n_steps=500)\n```\n:::\n\n\nThe plot below shows the resulting posterior predictive surface for the plugin estimator (left) and the Laplace approximation (right).\n\n::: {.cell execution_count=7}\n``` {.julia .cell-code}\nzoom = 0\np_plugin = plot(la, X, ys; title=\"Plugin\", link_approx=:plugin, clim=(0,1))\np_untuned = plot(la_untuned, X, ys; title=\"LA - raw (λ=$(unique(diag(la_untuned.prior.P₀))[1]))\", clim=(0,1), zoom=zoom)\np_laplace = plot(la, X, ys; title=\"LA - tuned (λ=$(round(unique(diag(la.prior.P₀))[1],digits=2)))\", clim=(0,1), zoom=zoom)\nplot(p_plugin, p_untuned, p_laplace, layout=(1,3), size=(1700,400))\n```\n\n::: {.cell-output .cell-output-display execution_count=8}\n![](logit_files/figure-commonmark/cell-8-output-1.svg){}\n:::\n:::\n\n\nNow we can test the level of calibration of the neural network.\nFirst we collect the predicted results over the test dataset\n\n::: {.cell execution_count=8}\n``` {.julia .cell-code}\n predicted_distributions= predict(la, X_test,ret_distr=true)\n```\n\n::: {.cell-output .cell-output-display execution_count=9}\n```\n1×20 Matrix{Distributions.Bernoulli{Float64}}:\n Distributions.Bernoulli{Float64}(p=0.13122)  …  Distributions.Bernoulli{Float64}(p=0.109559)\n```\n:::\n:::\n\n\nthen we plot the calibration plot\n\n::: {.cell execution_count=9}\n``` {.julia .cell-code}\nCalibration_Plot(la,ys_test,vec(predicted_distributions);n_bins = 10)\n```\n\n::: {.cell-output .cell-output-display}\n![](logit_files/figure-commonmark/cell-10-output-1.svg){}\n:::\n:::\n\n\nas we can see from the plot, although extremely accurate, the neural network does not seem to be calibrated well. This is, however, an effect of the extreme accuracy reached by the neural network which causes the lack of predictions with  high uncertainty (low certainty). We can see this by looking at the level of sharpness for the two classes which are extremely close to 1, indicating the high level of trust that the neural network has in the predictions.\n\n::: {.cell execution_count=10}\n``` {.julia .cell-code}\nsharpness_classification(ys_test,vec(predicted_distributions))\n```\n\n::: {.cell-output .cell-output-display execution_count=11}\n```\n(0.9131870336577175, 0.8865055827351365)\n```\n:::\n:::\n\n\n",
     "supporting": [
       "logit_files"
     ],

diff --git a/_freeze/docs/src/tutorials/logit/figure-commonmark/cell-10-output-1.svg b/_freeze/docs/src/tutorials/logit/figure-commonmark/cell-10-output-1.svg
diff --git a/_freeze/docs/src/tutorials/logit/figure-commonmark/cell-8-output-1.svg b/_freeze/docs/src/tutorials/logit/figure-commonmark/cell-8-output-1.svg
diff --git a/_freeze/docs/src/tutorials/mlp/execute-results/md.json b/_freeze/docs/src/tutorials/mlp/execute-results/md.json
@@ -1,8 +1,8 @@
 {
-  "hash": "43f83d5e19807b86b411b03f022bf40c",
+  "hash": "d070805f89710f2cc874e27ecc1a1c2b",
   "result": {
     "engine": "jupyter",
-    "markdown": "```@meta\nCurrentModule = LaplaceRedux\n```\n\n# Bayesian MLP\n\n## Libraries\n\n::: {.cell execution_count=1}\n``` {.julia .cell-code}\nusing Pkg; Pkg.activate(\"docs\")\n# Import libraries\nusing Flux, Plots, TaijaPlotting, Random, Statistics, LaplaceRedux, LinearAlgebra\ntheme(:lime)\n```\n:::\n\n\n## Data\n\nThis time we use a synthetic dataset containing samples that are not linearly separable:\n\n::: {.cell execution_count=2}\n``` {.julia .cell-code}\n#set seed\nseed = 1234\nRandom.seed!(seed)\n# Number of points to generate.\nxs, ys = LaplaceRedux.Data.toy_data_non_linear(400, seed= seed)\n# Shuffle the data\nn = length(ys)\nindices = randperm(n)\n\n# Define the split ratio\nsplit_ratio = 0.8\nsplit_index = Int(floor(split_ratio * n))\n\n# Split the data into training and test sets\ntrain_indices = indices[1:split_index]\ntest_indices = indices[split_index+1:end]\n\nxs_train = xs[train_indices]\nxs_test = xs[test_indices]\nys_train = ys[train_indices]\nys_test = ys[test_indices]\n# bring into tabular format\nX_train = hcat(xs_train...) \nX_test = hcat(xs_test...) \n\ndata = zip(xs_train,ys_train)\n```\n:::\n\n\n## Model\nFor the classification task we build a neural network with weight decay composed of a single hidden layer.\n\n::: {.cell execution_count=3}\n``` {.julia .cell-code}\nn_hidden = 10\nD = size(X_train,1)\nnn = Chain(\n    Dense(D, n_hidden, σ),\n    Dense(n_hidden, 1)\n)  \nloss(x, y) = Flux.Losses.logitbinarycrossentropy(nn(x), y) \n```\n:::\n\n\nThe model is trained until training loss stagnates.\n\n::: {.cell execution_count=4}\n``` {.julia .cell-code}\nusing Flux.Optimise: update!, Adam\nopt = Adam(1e-3)\nepochs = 100\navg_loss(data) = mean(map(d -> loss(d[1],d[2]), data))\nshow_every = epochs/10\n\nfor epoch = 1:epochs\n  for d in data\n    gs = gradient(Flux.params(nn)) do\n      l = loss(d...)\n    end\n    update!(opt, Flux.params(nn), gs)\n  end\n  if epoch % show_every == 0\n    println(\"Epoch \" * string(epoch))\n    @show avg_loss(data)\n  end\nend\n```\n:::\n\n\n## Laplace Approximation\n\nLaplace approximation can be implemented as follows:\n\n::: {.cell execution_count=5}\n``` {.julia .cell-code}\nla = Laplace(nn; likelihood=:classification, subset_of_weights=:all)\nfit!(la, data)\nla_untuned = deepcopy(la)   # saving for plotting\noptimize_prior!(la; verbose=true, n_steps=500)\n```\n:::\n\n\nThe plot below shows the resulting posterior predictive surface for the plugin estimator (left) and the Laplace approximation (right).\n\n::: {.cell execution_count=6}\n``` {.julia .cell-code}\n# Plot the posterior distribution with a contour plot.\nzoom=0\np_plugin = plot(la, X_train, ys_train; title=\"Plugin\", link_approx=:plugin, clim=(0,1))\np_untuned = plot(la_untuned, X_train, ys_train; title=\"LA - raw (λ=$(unique(diag(la_untuned.prior.P₀))[1]))\", clim=(0,1), zoom=zoom)\np_laplace = plot(la, X_train, ys_train; title=\"LA - tuned (λ=$(round(unique(diag(la.prior.P₀))[1],digits=2)))\", clim=(0,1), zoom=zoom)\nplot(p_plugin, p_untuned, p_laplace, layout=(1,3), size=(1700,400))\n```\n\n::: {.cell-output .cell-output-display execution_count=7}\n![](mlp_files/figure-commonmark/cell-7-output-1.svg){}\n:::\n:::\n\n\nZooming out we can note that the plugin estimator produces high-confidence estimates in regions scarce of any samples. The Laplace approximation is much more conservative about these regions.\n\n::: {.cell execution_count=7}\n``` {.julia .cell-code}\nzoom=-50\np_plugin = plot(la, X_train, ys_train; title=\"Plugin\", link_approx=:plugin, clim=(0,1))\np_untuned = plot(la_untuned, X_train, ys_train; title=\"LA - raw (λ=$(unique(diag(la_untuned.prior.P₀))[1]))\", clim=(0,1), zoom=zoom)\np_laplace = plot(la, X_train, ys_train; title=\"LA - tuned (λ=$(round(unique(diag(la.prior.P₀))[1],digits=2)))\", clim=(0,1), zoom=zoom)\nplot(p_plugin, p_untuned, p_laplace, layout=(1,3), size=(1700,400))\n```\n\n::: {.cell-output .cell-output-display execution_count=8}\n![](mlp_files/figure-commonmark/cell-8-output-1.svg){}\n:::\n:::\n\n\nWe plot now the calibration plot to assess the level of average calibration reached by the neural network.\n\n::: {.cell execution_count=8}\n``` {.julia .cell-code}\npredicted_distributions= predict(la, X_test,ret_distr=true)\nCalibration_Plot(la,ys_test,vec(predicted_distributions);n_bins = 10)\n```\n\n::: {.cell-output .cell-output-display}\n![](mlp_files/figure-commonmark/cell-9-output-1.svg){}\n:::\n:::\n\n\nand the sharpness score\n\n::: {.cell execution_count=9}\n``` {.julia .cell-code}\nsharpness_classification(ys_test,vec(predicted_distributions))\n```\n\n::: {.cell-output .cell-output-display execution_count=10}\n```\n(0.9277189055456709, 0.9196132560599691)\n```\n:::\n:::\n\n\n",
+    "markdown": "```@meta\nCurrentModule = LaplaceRedux\n```\n\n# Bayesian MLP\n\n## Libraries\n\n::: {.cell execution_count=1}\n``` {.julia .cell-code}\nusing Pkg; Pkg.activate(\"docs\")\n# Import libraries\nusing Flux, Plots, TaijaPlotting, Random, Statistics, LaplaceRedux, LinearAlgebra\ntheme(:lime)\n```\n:::\n\n\n## Data\n\nThis time we use a synthetic dataset containing samples that are not linearly separable:\n\n::: {.cell execution_count=2}\n``` {.julia .cell-code}\n#set seed\nseed = 1234\nRandom.seed!(seed)\n# Number of points to generate.\nxs, ys = LaplaceRedux.Data.toy_data_non_linear(400; seed = seed)\n# Shuffle the data\nn = length(ys)\nindices = randperm(n)\n\n# Define the split ratio\nsplit_ratio = 0.8\nsplit_index = Int(floor(split_ratio * n))\n\n# Split the data into training and test sets\ntrain_indices = indices[1:split_index]\ntest_indices = indices[split_index+1:end]\n\nxs_train = xs[train_indices]\nxs_test = xs[test_indices]\nys_train = ys[train_indices]\nys_test = ys[test_indices]\n# bring into tabular format\nX_train = hcat(xs_train...) \nX_test = hcat(xs_test...) \n\ndata = zip(xs_train,ys_train)\n```\n:::\n\n\n## Model\nFor the classification task we build a neural network with weight decay composed of a single hidden layer.\n\n::: {.cell execution_count=3}\n``` {.julia .cell-code}\nn_hidden = 10\nD = size(X_train,1)\nnn = Chain(\n    Dense(D, n_hidden, σ),\n    Dense(n_hidden, 1)\n)  \nloss(x, y) = Flux.Losses.logitbinarycrossentropy(nn(x), y) \n```\n:::\n\n\nThe model is trained until training loss stagnates.\n\n::: {.cell execution_count=4}\n``` {.julia .cell-code}\nusing Flux.Optimise: update!, Adam\nopt = Adam(1e-3)\nepochs = 100\navg_loss(data) = mean(map(d -> loss(d[1],d[2]), data))\nshow_every = epochs/10\n\nfor epoch = 1:epochs\n  for d in data\n    gs = gradient(Flux.params(nn)) do\n      l = loss(d...)\n    end\n    update!(opt, Flux.params(nn), gs)\n  end\n  if epoch % show_every == 0\n    println(\"Epoch \" * string(epoch))\n    @show avg_loss(data)\n  end\nend\n```\n:::\n\n\n## Laplace Approximation\n\nLaplace approximation can be implemented as follows:\n\n::: {.cell execution_count=5}\n``` {.julia .cell-code}\nla = Laplace(nn; likelihood=:classification, subset_of_weights=:all)\nfit!(la, data)\nla_untuned = deepcopy(la)   # saving for plotting\noptimize_prior!(la; verbose=true, n_steps=500)\n```\n:::\n\n\nThe plot below shows the resulting posterior predictive surface for the plugin estimator (left) and the Laplace approximation (right).\n\n::: {.cell execution_count=6}\n``` {.julia .cell-code}\n# Plot the posterior distribution with a contour plot.\nzoom=0\np_plugin = plot(la, X_train, ys_train; title=\"Plugin\", link_approx=:plugin, clim=(0,1))\np_untuned = plot(la_untuned, X_train, ys_train; title=\"LA - raw (λ=$(unique(diag(la_untuned.prior.P₀))[1]))\", clim=(0,1), zoom=zoom)\np_laplace = plot(la, X_train, ys_train; title=\"LA - tuned (λ=$(round(unique(diag(la.prior.P₀))[1],digits=2)))\", clim=(0,1), zoom=zoom)\nplot(p_plugin, p_untuned, p_laplace, layout=(1,3), size=(1700,400))\n```\n\n::: {.cell-output .cell-output-display execution_count=7}\n![](mlp_files/figure-commonmark/cell-7-output-1.svg){}\n:::\n:::\n\n\nZooming out we can note that the plugin estimator produces high-confidence estimates in regions scarce of any samples. The Laplace approximation is much more conservative about these regions.\n\n::: {.cell execution_count=7}\n``` {.julia .cell-code}\nzoom=-50\np_plugin = plot(la, X_train, ys_train; title=\"Plugin\", link_approx=:plugin, clim=(0,1))\np_untuned = plot(la_untuned, X_train, ys_train; title=\"LA - raw (λ=$(unique(diag(la_untuned.prior.P₀))[1]))\", clim=(0,1), zoom=zoom)\np_laplace = plot(la, X_train, ys_train; title=\"LA - tuned (λ=$(round(unique(diag(la.prior.P₀))[1],digits=2)))\", clim=(0,1), zoom=zoom)\nplot(p_plugin, p_untuned, p_laplace, layout=(1,3), size=(1700,400))\n```\n\n::: {.cell-output .cell-output-display execution_count=8}\n![](mlp_files/figure-commonmark/cell-8-output-1.svg){}\n:::\n:::\n\n\nWe plot now the calibration plot to assess the level of average calibration reached by the neural network.\n\n::: {.cell execution_count=8}\n``` {.julia .cell-code}\npredicted_distributions= predict(la, X_test,ret_distr=true)\nCalibration_Plot(la,ys_test,vec(predicted_distributions);n_bins = 10)\n```\n\n::: {.cell-output .cell-output-display}\n![](mlp_files/figure-commonmark/cell-9-output-1.svg){}\n:::\n:::\n\n\nand the sharpness score\n\n::: {.cell execution_count=9}\n``` {.julia .cell-code}\nsharpness_classification(ys_test,vec(predicted_distributions))\n```\n\n::: {.cell-output .cell-output-display execution_count=10}\n```\n(0.9277189055456709, 0.9196132560599691)\n```\n:::\n:::\n\n\n",
     "supporting": [
       "mlp_files\\figure-commonmark"
     ],