From ae19c5abec132596cc830ada85e6c3bf30580f5a Mon Sep 17 00:00:00 2001
From: Carl Georg Biermann <carlgeorg@gmx.net>
Date: Wed, 12 May 2021 16:26:38 +0200
Subject: [PATCH] tweaks and corrections in binning analysis tutorial, make it
 use the same data sets as the autocorrelation tutorial

---
 .../error_analysis/error_analysis_part1.ipynb | 25 +++++++++----------
 1 file changed, 12 insertions(+), 13 deletions(-)

diff --git a/doc/tutorials/error_analysis/error_analysis_part1.ipynb b/doc/tutorials/error_analysis/error_analysis_part1.ipynb
index e37f0a2e683..6feea7120f9 100644
--- a/doc/tutorials/error_analysis/error_analysis_part1.ipynb
+++ b/doc/tutorials/error_analysis/error_analysis_part1.ipynb
@@ -4,7 +4,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Tutorial: Error Estimation"
+    "# Tutorial: Error Estimation - Part 1 (Introduction and Binning Analysis)"
    ]
   },
   {
@@ -43,7 +43,7 @@
     "import numpy as np\n",
     "import matplotlib.pyplot as plt\n",
     "\n",
-    "np.random.seed(44)\n",
+    "np.random.seed(43)\n",
     "\n",
     "def ar_1_process(n_samples, y0, c, phi, eps, n_warmup):\n",
     "    y = y0\n",
@@ -57,11 +57,11 @@
     "\n",
     "N_SAMPLES = 100000\n",
     "\n",
-    "time_series_1 = ar_1_process(N_SAMPLES, 0.0, 1.0, 0.9, 3.0, 100)\n",
-    "time_series_2 = ar_1_process(N_SAMPLES, 0.0, 0.05, 0.998, 1.0, 1000)\n",
+    "time_series_1 = ar_1_process(N_SAMPLES, 0.0, 2.0, 0.85, 2.0, 100)\n",
+    "time_series_2 = ar_1_process(N_SAMPLES, 0.0, 0.05, 0.995, 1.0, 1000)\n",
     "\n",
     "\n",
-    "plt.title(\"The first 2000 samples of both time series\")\n",
+    "plt.title(\"The first 1000 samples of both time series\")\n",
     "plt.plot(time_series_1[0:1000], label=\"time series 1\")\n",
     "plt.plot(time_series_2[0:1000], label=\"time series 2\")\n",
     "plt.legend()\n",
@@ -140,6 +140,7 @@
    "outputs": [],
    "source": [
     "plt.plot(time_series_1[1000:1050],\"x\")\n",
+    "plt.ylim((8,19))\n",
     "plt.show()"
    ]
   },
@@ -149,7 +150,7 @@
    "source": [
     "One can clearly see that each sample lies in the vicinity of the previous one.\n",
     "\n",
-    "Below is an example for almost completely uncorrelated samples. The data points are taken from the same time series as in the previous example, but this time they are chosen with large gaps in between (every 200th sample is used). These samples appear to fluctuate a lot more randomly."
+    "Below is an example for almost completely uncorrelated samples. The data points are taken from the same time series as in the previous example, but this time they are chosen with large gaps in between (every 800th sample is used). These samples appear to fluctuate a lot more randomly."
    ]
   },
   {
@@ -158,7 +159,8 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "plt.plot(time_series_1[1000:11000:200],\"x\")\n",
+    "plt.plot(time_series_1[2000:42000:800],\"x\")\n",
+    "plt.ylim((8,19))\n",
     "plt.show()"
    ]
   },
@@ -179,9 +181,7 @@
     "\n",
     "Once we have computed the bin averages $\\overline{X}_i$, getting the SEM is straightforward: we can simply treat $\\overline{X}_i$ as an uncorrelated time series. In other words, we can compute the SEM by using equation (1) and (2)!\n",
     "\n",
-    "Let's implement this.\n",
-    "\n",
-    "In the code cell below, we load the simulation data into numpy arrays so that we can analyze them."
+    "Let's implement this."
    ]
   },
   {
@@ -190,7 +190,6 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "N_SAMPLES = len(time_series_1[1000:])\n",
     "BIN_SIZE = 2000"
    ]
   },
@@ -352,7 +351,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "You should see that the series converges to a value between 0.05 and 0.06, before transitioning into a noisy tail. The tail becomes increasingly noisy, because as the block size increases, the number of blocks decreases, thus resulting in worse statistics.\n",
+    "You should see that the series converges to a value between 0.02 and 0.03, before transitioning into a noisy tail. The tail becomes increasingly noisy, because as the block size increases, the number of blocks decreases, thus resulting in worse statistics.\n",
     "\n",
     "To extract the correct SEM from this plot, we can fit an exponential function to the first part of the data, that doesn't suffer from too much noise."
    ]
@@ -366,7 +365,7 @@
     "from scipy.optimize import curve_fit\n",
     "\n",
     "# only fit to the first couple of SEMs\n",
-    "CUTOFF = 300\n",
+    "CUTOFF = 600\n",
     "\n",
     "# sizes of the corresponding bins\n",
     "sizes = np.arange(3,3+CUTOFF,dtype=int)\n",