diff --git a/.nojekyll b/.nojekyll index a07f2a18..8abd3f16 100644 --- a/.nojekyll +++ b/.nojekyll @@ -1 +1 @@ -53431154 \ No newline at end of file +3b939eca \ No newline at end of file diff --git a/notebooks/alternative_links_binary.html b/notebooks/alternative_links_binary.html index fd9e4d4e..6e0889a6 100644 --- a/notebooks/alternative_links_binary.html +++ b/notebooks/alternative_links_binary.html @@ -642,12 +642,12 @@

Logit link

Auto-assigning NUTS sampler...
 Initializing NUTS using jitter+adapt_diag...
-Multiprocess sampling (2 chains in 2 jobs)
+Multiprocess sampling (4 chains in 4 jobs)
 NUTS: [Intercept, x]
@@ -657,8 +657,7 @@

Logit link


 
-
Sampling 2 chains for 1_000 tune and 2_000 draw iterations (2_000 + 4_000 draws total) took 3 seconds.
-We recommend running at least 4 chains for robust computation of convergence diagnostics
+
Sampling 4 chains for 1_000 tune and 2_000 draw iterations (4_000 + 8_000 draws total) took 7 seconds.
@@ -670,12 +669,12 @@

Probit link

Auto-assigning NUTS sampler...
 Initializing NUTS using jitter+adapt_diag...
-Multiprocess sampling (2 chains in 2 jobs)
+Multiprocess sampling (4 chains in 4 jobs)
 NUTS: [Intercept, x]
@@ -685,25 +684,41 @@

Probit link


 
-
Sampling 2 chains for 1_000 tune and 2_000 draw iterations (2_000 + 4_000 draws total) took 3 seconds.
-We recommend running at least 4 chains for robust computation of convergence diagnostics
+
Sampling 4 chains for 1_000 tune and 2_000 draw iterations (4_000 + 8_000 draws total) took 7 seconds.
+
+ +
+
model_probit
+
+
       Formula: p(y, n) ~ x
+        Family: binomial
+          Link: p = probit
+  Observations: 8
+        Priors: 
+    target = p
+        Common-level effects
+            Intercept ~ Normal(mu: 0.0, sigma: 1.5)
+            x ~ Normal(mu: 0.0, sigma: 15.848)
+------
+* To see a plot of the priors call the .plot_priors() method.
+* To see a summary or plot of the posterior pass the object returned by .fit() to az.summary() or az.plot_trace()
@@ -722,56 +736,56 @@

Cloglog link

Results

We can use the samples from the posteriors to see the mean estimate for the probability of dying at each concentration level. To do so, we use a little helper function that will help us to write less code. This function leverages the power of the new Model.predict() method that is helpful to obtain both in-sample and out-of-sample predictions.

-
-
def get_predictions(model, idata, seq):
-    # Create a data frame with the new data
-    new_data = pd.DataFrame({"x": seq})
-    
-    # Predict probability of dying using out of sample data
-    model.predict(idata, data=new_data)
-    
-    # Get posterior mean across all chains and draws
-    mu = idata.posterior["p"].mean(("chain", "draw"))
-    return mu
-
-
x_seq = np.linspace(1.6, 2, num=200)
-
-mu_logit = get_predictions(model_logit, idata_logit, x_seq)
-mu_probit = get_predictions(model_probit, idata_probit, x_seq)
-mu_cloglog = get_predictions(model_cloglog, idata_cloglog, x_seq)
+
def get_predictions(model, idata, seq):
+    # Create a data frame with the new data
+    new_data = pd.DataFrame({"x": seq})
+    
+    # Predict probability of dying using out of sample data
+    model.predict(idata, data=new_data)
+    
+    # Get posterior mean across all chains and draws
+    mu = idata.posterior["p"].mean(("chain", "draw"))
+    return mu
-
plt.scatter(x, y / n, c = "white", edgecolors = "black", s=100)
-plt.plot(x_seq, mu_logit, lw=2, label="Logit")
-plt.plot(x_seq, mu_probit, lw=2, label="Probit")
-plt.plot(x_seq, mu_cloglog, lw=2, label="CLogLog")
-plt.axhline(0.5, c="k", alpha=0.5, ls="--")
-plt.xlabel(r"Dose $\log_{10}CS_2mgl^{-1}$")
-plt.ylabel("Probability of death")
-plt.legend();
+
x_seq = np.linspace(1.6, 2, num=200)
+
+mu_logit = get_predictions(model_logit, idata_logit, x_seq)
+mu_probit = get_predictions(model_probit, idata_probit, x_seq)
+mu_cloglog = get_predictions(model_cloglog, idata_cloglog, x_seq)
+
+
+
plt.scatter(x, y / n, c = "white", edgecolors = "black", s=100)
+plt.plot(x_seq, mu_logit, lw=2, label="Logit")
+plt.plot(x_seq, mu_probit, lw=2, label="Probit")
+plt.plot(x_seq, mu_cloglog, lw=2, label="CLogLog")
+plt.axhline(0.5, c="k", alpha=0.5, ls="--")
+plt.xlabel(r"Dose $\log_{10}CS_2mgl^{-1}$")
+plt.ylabel("Probability of death")
+plt.legend();
-

+

In this example, we can see the models using the logit and probit link functions present very similar estimations. With these particular data, all the three link functions fit the data well and the results do not differ significantly. However, there can be scenarios where the results are more sensitive to the choice of the link function.

References

Bliss, C. I. (1935). The calculation of the dose-mortality curve. Annals of Applied Biology 22, 134–167

-
-
%load_ext watermark
-%watermark -n -u -v -iv -w
+
+
%load_ext watermark
+%watermark -n -u -v -iv -w
-
Last updated: Sat May 25 2024
+
Last updated: Thu Aug 15 2024
 
 Python implementation: CPython
 Python version       : 3.11.9
 IPython version      : 8.24.0
 
-numpy     : 1.26.4
-bambi     : 0.13.1.dev37+g2a54df76.d20240525
 pandas    : 2.2.2
-arviz     : 0.18.0
 matplotlib: 3.8.4
+arviz     : 0.18.0
+numpy     : 1.26.4
+bambi     : 0.14.1.dev12+g64e57423.d20240730
 
 Watermark: 2.4.3
 
diff --git a/notebooks/alternative_links_binary_files/figure-html/cell-14-output-1.png b/notebooks/alternative_links_binary_files/figure-html/cell-14-output-1.png new file mode 100644 index 00000000..f601d0a5 Binary files /dev/null and b/notebooks/alternative_links_binary_files/figure-html/cell-14-output-1.png differ diff --git a/notebooks/hierarchical_binomial_bambi.html b/notebooks/hierarchical_binomial_bambi.html index ad08fcc1..f9146ace 100644 --- a/notebooks/hierarchical_binomial_bambi.html +++ b/notebooks/hierarchical_binomial_bambi.html @@ -750,9 +750,8 @@

Non-hierarchical mo Priors: target = p Common-level effects - playerID ~ Normal(mu: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.], sigma: [10.0223 10.0223 - 10.0223 10.0223 10.0223 10.0223 10.0223 10.0223 10.0223 - 10.0223 10.0223 10.0223 10.0223 10.0223 10.0223])

+ playerID ~ Normal(mu: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.], sigma: [1. 1. 1. 1. 1. 1. + 1. 1. 1. 1. 1. 1. 1. 1. 1.])
@@ -760,12 +759,12 @@

Non-hierarchical mo
Auto-assigning NUTS sampler...
 Initializing NUTS using jitter+adapt_diag...
-Multiprocess sampling (2 chains in 2 jobs)
+Multiprocess sampling (4 chains in 4 jobs)
 NUTS: [playerID]
@@ -775,8 +774,7 @@

Non-hierarchical mo

 

-
Sampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 4 seconds.
-We recommend running at least 4 chains for robust computation of convergence diagnostics
+
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 5 seconds.

Next we observe the posterior of the coefficient for each player. The compact=False argument means we want separated panels for each player.

@@ -817,7 +815,7 @@

Hierarchical model

Priors: target = p Common-level effects - Intercept ~ Normal(mu: 0.0, sigma: 2.5) + Intercept ~ Normal(mu: 0.0, sigma: 1.5) Group-level effects 1|playerID ~ Normal(mu: 0.0, sigma: HalfNormal(sigma: 2.5)) @@ -828,12 +826,12 @@

Hierarchical model

Auto-assigning NUTS sampler...
 Initializing NUTS using jitter+adapt_diag...
-Multiprocess sampling (2 chains in 2 jobs)
+Multiprocess sampling (4 chains in 4 jobs)
 NUTS: [Intercept, 1|playerID_sigma, 1|playerID_offset]
@@ -843,10 +841,8 @@

Hierarchical model


 
-
Sampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 5 seconds.
-There were 5 divergences after tuning. Increase `target_accept` or reparameterize.
-We recommend running at least 4 chains for robust computation of convergence diagnostics
-The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details
+
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 14 seconds.
+There were 6 divergences after tuning. Increase `target_accept` or reparameterize.

Sometimes, there can be several divergences when fitting a hierarchical model. What can we do in that case?

@@ -857,7 +853,7 @@

Hierarchical model

prior = az.extract_dataset(idata_prior, group="prior_predictive")["p(H, AB)"]
Sampling: [1|playerID_offset, 1|playerID_sigma, Intercept, p(H, AB)]
-/tmp/ipykernel_95801/2686921361.py:2: FutureWarning: extract_dataset has been deprecated, please use extract
+/tmp/ipykernel_12852/2686921361.py:2: FutureWarning: extract_dataset has been deprecated, please use extract
   prior = az.extract_dataset(idata_prior, group="prior_predictive")["p(H, AB)"]
@@ -930,7 +926,7 @@

Hierarchical model

plot_prior_predictive(df, prior)
Sampling: [1|playerID_offset, 1|playerID_sigma, Intercept, p(H, AB)]
-/tmp/ipykernel_95801/1302716284.py:3: FutureWarning: extract_dataset has been deprecated, please use extract
+/tmp/ipykernel_12852/1302716284.py:3: FutureWarning: extract_dataset has been deprecated, please use extract
   prior = az.extract_dataset(idata_prior, group="prior_predictive")["p(H, AB)"]
@@ -943,12 +939,12 @@

Hierarchical model

Auto-assigning NUTS sampler...
 Initializing NUTS using jitter+adapt_diag...
-Multiprocess sampling (2 chains in 2 jobs)
+Multiprocess sampling (4 chains in 4 jobs)
 NUTS: [Intercept, 1|playerID_sigma, 1|playerID_offset]
@@ -958,9 +954,8 @@

Hierarchical model


 
-
Sampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 6 seconds.
-There were 1 divergences after tuning. Increase `target_accept` or reparameterize.
-We recommend running at least 4 chains for robust computation of convergence diagnostics
+
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 13 seconds.
+There were 3 divergences after tuning. Increase `target_accept` or reparameterize.

Let’s try with increasing target_accept and the number of tune samples.

@@ -969,12 +964,12 @@

Hierarchical model

Auto-assigning NUTS sampler...
 Initializing NUTS using jitter+adapt_diag...
-Multiprocess sampling (2 chains in 2 jobs)
+Multiprocess sampling (4 chains in 4 jobs)
 NUTS: [Intercept, 1|playerID_sigma, 1|playerID_offset]
@@ -984,9 +979,7 @@

Hierarchical model


 
-
Sampling 2 chains for 2_000 tune and 2_000 draw iterations (4_000 + 4_000 draws total) took 13 seconds.
-There were 3 divergences after tuning. Increase `target_accept` or reparameterize.
-We recommend running at least 4 chains for robust computation of convergence diagnostics
+
Sampling 4 chains for 2_000 tune and 2_000 draw iterations (8_000 + 8_000 draws total) took 24 seconds.
@@ -1051,15 +1044,15 @@

Compare predictions
%load_ext watermark
 %watermark -n -u -v -iv -w

-
Last updated: Sat May 25 2024
+
Last updated: Thu Aug 15 2024
 
 Python implementation: CPython
 Python version       : 3.11.9
 IPython version      : 8.24.0
 
-matplotlib: 3.8.4
-bambi     : 0.13.1.dev37+g2a54df76.d20240525
+bambi     : 0.14.1.dev12+g64e57423.d20240730
 numpy     : 1.26.4
+matplotlib: 3.8.4
 arviz     : 0.18.0
 
 Watermark: 2.4.3
diff --git a/notebooks/hierarchical_binomial_bambi_files/figure-html/cell-14-output-1.png b/notebooks/hierarchical_binomial_bambi_files/figure-html/cell-14-output-1.png
index 1c6ec7a8..4f8b46f1 100644
Binary files a/notebooks/hierarchical_binomial_bambi_files/figure-html/cell-14-output-1.png and b/notebooks/hierarchical_binomial_bambi_files/figure-html/cell-14-output-1.png differ
diff --git a/notebooks/hierarchical_binomial_bambi_files/figure-html/cell-16-output-2.png b/notebooks/hierarchical_binomial_bambi_files/figure-html/cell-16-output-2.png
index 2efc3760..da0a4b2a 100644
Binary files a/notebooks/hierarchical_binomial_bambi_files/figure-html/cell-16-output-2.png and b/notebooks/hierarchical_binomial_bambi_files/figure-html/cell-16-output-2.png differ
diff --git a/notebooks/hierarchical_binomial_bambi_files/figure-html/cell-19-output-1.png b/notebooks/hierarchical_binomial_bambi_files/figure-html/cell-19-output-1.png
index d23b117e..9a170e6d 100644
Binary files a/notebooks/hierarchical_binomial_bambi_files/figure-html/cell-19-output-1.png and b/notebooks/hierarchical_binomial_bambi_files/figure-html/cell-19-output-1.png differ
diff --git a/notebooks/hierarchical_binomial_bambi_files/figure-html/cell-21-output-1.png b/notebooks/hierarchical_binomial_bambi_files/figure-html/cell-21-output-1.png
index 77fb500e..d811bded 100644
Binary files a/notebooks/hierarchical_binomial_bambi_files/figure-html/cell-21-output-1.png and b/notebooks/hierarchical_binomial_bambi_files/figure-html/cell-21-output-1.png differ
diff --git a/notebooks/hierarchical_binomial_bambi_files/figure-html/cell-9-output-1.png b/notebooks/hierarchical_binomial_bambi_files/figure-html/cell-9-output-1.png
index 8a1f76c5..87b8372f 100644
Binary files a/notebooks/hierarchical_binomial_bambi_files/figure-html/cell-9-output-1.png and b/notebooks/hierarchical_binomial_bambi_files/figure-html/cell-9-output-1.png differ
diff --git a/notebooks/logistic_regression.html b/notebooks/logistic_regression.html
index d33dd2c6..60869ba3 100644
--- a/notebooks/logistic_regression.html
+++ b/notebooks/logistic_regression.html
@@ -729,12 +729,12 @@ 

Specify and
Modeling the probability that vote==clinton
 Auto-assigning NUTS sampler...
 Initializing NUTS using jitter+adapt_diag...
-Multiprocess sampling (2 chains in 2 jobs)
+Multiprocess sampling (4 chains in 4 jobs)
 NUTS: [Intercept, party_id, party_id:age]

@@ -744,8 +744,7 @@

Specify and

 

-
Sampling 2 chains for 1_000 tune and 2_000 draw iterations (2_000 + 4_000 draws total) took 11 seconds.
-We recommend running at least 4 chains for robust computation of convergence diagnostics
+
Sampling 4 chains for 1_000 tune and 2_000 draw iterations (4_000 + 8_000 draws total) took 21 seconds.

We can print the model object to see information about the response distribution, the link function and the priors.

@@ -759,9 +758,9 @@

Specify and Priors: target = p Common-level effects - Intercept ~ Normal(mu: 0.0, sigma: 4.3846) - party_id ~ Normal(mu: [0. 0.], sigma: [5.4007 6.0634]) - party_id:age ~ Normal(mu: [0. 0. 0.], sigma: [0.0938 0.1007 0.1098]) + Intercept ~ Normal(mu: 0.0, sigma: 1.5) + party_id ~ Normal(mu: [0. 0.], sigma: [1. 1.]) + party_id:age ~ Normal(mu: [0. 0. 0.], sigma: [0.0582 0.0582 0.0582]) ------ * To see a plot of the priors call the .plot_priors() method. * To see a summary or plot of the posterior pass the object returned by .fit() to az.summary() or az.plot_trace() @@ -810,74 +809,74 @@

Specify and Intercept - 1.701 - 0.717 - 0.283 - 2.992 - 0.014 - 0.010 - 2770.0 - 2596.0 + 1.392 + 0.553 + 0.378 + 2.451 + 0.007 + 0.005 + 6778.0 + 5131.0 1.0 party_id[independent] - -0.304 - 0.944 - -2.076 - 1.435 - 0.019 - 0.015 - 2530.0 - 2452.0 + -0.046 + 0.650 + -1.221 + 1.173 + 0.008 + 0.008 + 6081.0 + 4477.0 1.0 party_id[republican] - -1.171 - 1.560 - -4.144 - 1.650 - 0.037 - 0.026 - 1813.0 - 2106.0 + -0.603 + 0.814 + -2.142 + 0.933 + 0.010 + 0.008 + 6336.0 + 5269.0 1.0 party_id:age[democrat] - 0.013 - 0.015 - -0.016 - 0.040 + 0.018 + 0.012 + -0.004 + 0.042 0.000 0.000 - 2669.0 - 2532.0 + 6588.0 + 5188.0 1.0 party_id:age[independent] - -0.033 - 0.012 - -0.055 - -0.011 + -0.032 + 0.010 + -0.052 + -0.014 0.000 0.000 - 3398.0 - 2370.0 + 6877.0 + 4836.0 1.0 party_id:age[republican] - -0.080 - 0.036 - -0.149 - -0.014 - 0.001 - 0.001 - 1765.0 - 1976.0 + -0.082 + 0.024 + -0.127 + -0.038 + 0.000 + 0.000 + 5677.0 + 5113.0 1.0 @@ -1002,13 +1001,6 @@

Run Inference

(dem > rep).mean().item()
-
0.995
+
1.0

Probability that the Democrat slope is greater than the Independent slope?

(dem > ind).mean().item()
-
0.99475
+
1.0

Probability that the Independent slope is greater than the Republican slope?

(ind > rep).mean().item()
-
0.8995
+
0.9795

Probability that the Democrat slope is greater than 0?

(dem > 0).mean().item()
-
0.80525
+
0.93775

Probability that the Republican slope is less than 0?

(rep < 0).mean().item()
-
0.993
+
0.999875

Probability that the Independent slope is less than 0?

(ind < 0).mean().item()
-
0.998
+
0.999625

If we look at the plot of the marginal posteriors, we may be suspicious that, for example, the probability that Democrat slope is greater than the Republican slope is 0.998 (almost 1!), given the overlap between the blue and green density functions. However, we can’t answer such a question using the marginal posteriors only, as shown in the plot. Since Democrat and Republican slopes (\(\beta_3\) and \(\beta_5\), respectively) are random variables, we need to use their joint distribution to answer probability questions that involve both of them. The fact that logical comparisons (e.g. > in dem > ind) are performed elementwise ensures we’re using samples from the joint posterior as we should. We also note that when the question involves only one of the random variables, it is fine to use the marginal distribution (e.g. (rep < 0).mean()).

@@ -1407,7 +1406,7 @@

Spaghe
# Select a sample of posterior values for the mean probability of voting for Clinton
 vote_posterior = az.extract_dataset(clinton_fitted, num_samples=2000)["p"]
-
/tmp/ipykernel_116464/1918123043.py:2: FutureWarning: extract_dataset has been deprecated, please use extract
+
/tmp/ipykernel_13027/1918123043.py:2: FutureWarning: extract_dataset has been deprecated, please use extract
   vote_posterior = az.extract_dataset(clinton_fitted, num_samples=2000)["p"]
@@ -1436,17 +1435,17 @@

Spaghe
%load_ext watermark
 %watermark -n -u -v -iv -w
-
Last updated: Sat May 25 2024
+
Last updated: Thu Aug 15 2024
 
 Python implementation: CPython
 Python version       : 3.11.9
 IPython version      : 8.24.0
 
-pandas    : 2.2.2
 matplotlib: 3.8.4
-numpy     : 1.26.4
+bambi     : 0.14.1.dev12+g64e57423.d20240730
 arviz     : 0.18.0
-bambi     : 0.13.1.dev37+g2a54df76.d20240525
+pandas    : 2.2.2
+numpy     : 1.26.4
 
 Watermark: 2.4.3
 
diff --git a/notebooks/logistic_regression_files/figure-html/cell-12-output-2.png b/notebooks/logistic_regression_files/figure-html/cell-12-output-2.png index 55cf91e8..3e3ae909 100644 Binary files a/notebooks/logistic_regression_files/figure-html/cell-12-output-2.png and b/notebooks/logistic_regression_files/figure-html/cell-12-output-2.png differ diff --git a/notebooks/logistic_regression_files/figure-html/cell-13-output-1.png b/notebooks/logistic_regression_files/figure-html/cell-13-output-1.png index 7bda5c77..0062e0d1 100644 Binary files a/notebooks/logistic_regression_files/figure-html/cell-13-output-1.png and b/notebooks/logistic_regression_files/figure-html/cell-13-output-1.png differ diff --git a/notebooks/logistic_regression_files/figure-html/cell-16-output-1.png b/notebooks/logistic_regression_files/figure-html/cell-16-output-1.png index 37f8aa85..6f19b62d 100644 Binary files a/notebooks/logistic_regression_files/figure-html/cell-16-output-1.png and b/notebooks/logistic_regression_files/figure-html/cell-16-output-1.png differ diff --git a/notebooks/logistic_regression_files/figure-html/cell-18-output-1.png b/notebooks/logistic_regression_files/figure-html/cell-18-output-1.png index bc4509d8..8537a874 100644 Binary files a/notebooks/logistic_regression_files/figure-html/cell-18-output-1.png and b/notebooks/logistic_regression_files/figure-html/cell-18-output-1.png differ diff --git a/notebooks/logistic_regression_files/figure-html/cell-19-output-1.png b/notebooks/logistic_regression_files/figure-html/cell-19-output-1.png index 17754d42..a189c5cd 100644 Binary files a/notebooks/logistic_regression_files/figure-html/cell-19-output-1.png and b/notebooks/logistic_regression_files/figure-html/cell-19-output-1.png differ diff --git a/notebooks/logistic_regression_files/figure-html/cell-21-output-1.png b/notebooks/logistic_regression_files/figure-html/cell-21-output-1.png index 51cd5587..6d3e040b 100644 Binary files a/notebooks/logistic_regression_files/figure-html/cell-21-output-1.png and b/notebooks/logistic_regression_files/figure-html/cell-21-output-1.png differ diff --git a/notebooks/logistic_regression_files/figure-html/cell-25-output-1.png b/notebooks/logistic_regression_files/figure-html/cell-25-output-1.png index 60462c41..ed70b66e 100644 Binary files a/notebooks/logistic_regression_files/figure-html/cell-25-output-1.png and b/notebooks/logistic_regression_files/figure-html/cell-25-output-1.png differ diff --git a/notebooks/logistic_regression_files/figure-html/cell-28-output-1.png b/notebooks/logistic_regression_files/figure-html/cell-28-output-1.png index 81a12442..b34e5ee2 100644 Binary files a/notebooks/logistic_regression_files/figure-html/cell-28-output-1.png and b/notebooks/logistic_regression_files/figure-html/cell-28-output-1.png differ diff --git a/notebooks/logistic_regression_files/figure-html/cell-38-output-1.png b/notebooks/logistic_regression_files/figure-html/cell-38-output-1.png index a9cbd3ce..6111417d 100644 Binary files a/notebooks/logistic_regression_files/figure-html/cell-38-output-1.png and b/notebooks/logistic_regression_files/figure-html/cell-38-output-1.png differ diff --git a/notebooks/plot_comparisons.html b/notebooks/plot_comparisons.html index 76fbe44b..668cf47b 100644 --- a/notebooks/plot_comparisons.html +++ b/notebooks/plot_comparisons.html @@ -525,12 +525,12 @@

Zero Inflated Poisso
Auto-assigning NUTS sampler...
 Initializing NUTS using jitter+adapt_diag...
-Multiprocess sampling (4 chains in 2 jobs)
+Multiprocess sampling (4 chains in 4 jobs)
 NUTS: [psi, Intercept, livebait, camper, persons, child]
@@ -540,7 +540,7 @@

Zero Inflated Poisso

 

-
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 25 seconds.
+
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 45 seconds.

@@ -1719,12 +1719,12 @@

Logistic RegressionModeling the probability that Survived==1 Auto-assigning NUTS sampler... Initializing NUTS using jitter+adapt_diag... -Multiprocess sampling (2 chains in 2 jobs) +Multiprocess sampling (4 chains in 4 jobs) NUTS: [Intercept, PClass, SexCode, PClass:SexCode, Age, PClass:Age, SexCode:Age, PClass:SexCode:Age]
@@ -1734,8 +1734,7 @@

Logistic Regression

-
Sampling 2 chains for 500 tune and 500 draw iterations (1_000 + 1_000 draws total) took 23 seconds.
-We recommend running at least 4 chains for robust computation of convergence diagnostics
+
Sampling 4 chains for 500 tune and 500 draw iterations (2_000 + 2_000 draws total) took 127 seconds.
@@ -1762,16 +1761,16 @@

Comparison types

%load_ext watermark
 %watermark -n -u -v -iv -w
-
Last updated: Sun May 26 2024
+
Last updated: Thu Aug 15 2024
 
 Python implementation: CPython
 Python version       : 3.11.9
 IPython version      : 8.24.0
 
+bambi : 0.14.1.dev12+g64e57423.d20240730
 pandas: 2.2.2
-bambi : 0.13.1.dev39+gb7d6a6cb
-arviz : 0.18.0
 numpy : 1.26.4
+arviz : 0.18.0
 
 Watermark: 2.4.3
 
diff --git a/notebooks/plot_comparisons_files/figure-html/cell-21-output-1.png b/notebooks/plot_comparisons_files/figure-html/cell-21-output-1.png index aefe9487..48881586 100644 Binary files a/notebooks/plot_comparisons_files/figure-html/cell-21-output-1.png and b/notebooks/plot_comparisons_files/figure-html/cell-21-output-1.png differ diff --git a/notebooks/plot_predictions.html b/notebooks/plot_predictions.html index 11b7873c..8c4d4683 100644 --- a/notebooks/plot_predictions.html +++ b/notebooks/plot_predictions.html @@ -523,12 +523,12 @@

Gaussian Linear Mode
Auto-assigning NUTS sampler...
 Initializing NUTS using jitter+adapt_diag...
-Multiprocess sampling (2 chains in 2 jobs)
+Multiprocess sampling (4 chains in 4 jobs)
 NUTS: [sigma, hp, wt, hp:wt, cyl, gear]
@@ -538,9 +538,7 @@

Gaussian Linear Mode

 

-
Sampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 41 seconds.
-We recommend running at least 4 chains for robust computation of convergence diagnostics
-The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details
+
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 147 seconds.

We can print the Bambi model object to obtain the model components. Below, we see that the Gaussian linear model uses an identity link function that results in no transformation of the linear predictor to the mean of the outcome variable, and the distrbution of the likelihood is Gaussian.

@@ -602,9 +600,9 @@

Default values

high A 3.21725 - 21.920836 - 15.675414 - 28.096253 + 21.753967 + 15.879149 + 28.467278 1 @@ -612,9 +610,9 @@

Default values

high A 3.21725 - 21.864299 - 16.083605 - 28.164090 + 21.626621 + 15.401400 + 28.073208 2 @@ -622,9 +620,9 @@

Default values

high A 3.21725 - 21.556234 - 15.924024 - 27.673221 + 21.380708 + 15.193230 + 27.758124 3 @@ -632,9 +630,9 @@

Default values

high A 3.21725 - 21.255795 - 15.181309 - 27.002164 + 21.220707 + 15.203868 + 27.537716 4 @@ -642,9 +640,9 @@

Default values

high A 3.21725 - 21.091406 - 14.821219 - 26.502849 + 21.016281 + 15.097675 + 27.076450 5 @@ -652,9 +650,9 @@

Default values

high A 3.21725 - 20.955432 - 15.520032 - 26.290511 + 20.824757 + 15.074355 + 26.626349 6 @@ -662,9 +660,9 @@

Default values

high A 3.21725 - 20.667079 - 15.119132 - 26.286052 + 20.597090 + 14.910748 + 26.088777 7 @@ -672,9 +670,9 @@

Default values

high A 3.21725 - 20.459366 - 14.927590 - 25.803844 + 20.307775 + 14.679274 + 25.904986 8 @@ -682,9 +680,9 @@

Default values

high A 3.21725 - 20.282865 - 14.978005 - 25.363441 + 20.240255 + 14.765620 + 25.693768 9 @@ -692,9 +690,9 @@

Default values

high A 3.21725 - 20.032851 - 14.842361 - 25.499042 + 19.970688 + 14.348661 + 25.422012 @@ -780,9 +778,9 @@

User provided values< 1.5 high A - 26.517721 - 22.231329 - 31.409020 + 26.566766 + 21.969808 + 30.943151 1 @@ -790,9 +788,9 @@

User provided values< 1.5 low A - 27.154631 - 23.710370 - 30.854390 + 27.329344 + 23.732834 + 31.023226 2 @@ -800,9 +798,9 @@

User provided values< 1.5 medium A - 25.388964 - 20.858360 - 29.439134 + 25.571809 + 21.451557 + 29.660434 3 @@ -810,9 +808,9 @@

User provided values< 3.5 high A - 19.125688 - 16.198962 - 22.480499 + 19.050509 + 15.782297 + 22.263920 4 @@ -820,9 +818,9 @@

User provided values< 3.5 low A - 19.762599 - 16.651143 - 23.520939 + 19.813088 + 16.472862 + 23.414660 5 @@ -830,9 +828,9 @@

User provided values< 3.5 medium A - 17.996931 - 15.646609 - 20.272951 + 18.055552 + 15.632819 + 20.474585 6 @@ -840,9 +838,9 @@

User provided values< 1.5 high A - 25.242986 - 21.389207 - 29.835175 + 25.300803 + 21.208498 + 29.449365 7 @@ -850,9 +848,9 @@

User provided values< 1.5 low A - 25.879897 - 21.993742 - 29.826717 + 26.063382 + 21.954223 + 29.802325 8 @@ -860,9 +858,9 @@

User provided values< 1.5 medium A - 24.114229 - 19.499411 - 27.954867 + 24.305846 + 20.088821 + 28.257285 9 @@ -870,9 +868,9 @@

User provided values< 3.5 high A - 18.499053 - 15.805280 - 21.040859 + 18.441151 + 15.569090 + 20.982979 10 @@ -880,9 +878,9 @@

User provided values< 3.5 low A - 19.135964 - 15.476760 - 22.781403 + 19.203729 + 15.518468 + 22.845879 11 @@ -890,9 +888,9 @@

User provided values< 3.5 medium A - 17.370296 - 14.906599 - 19.609224 + 17.446194 + 14.967520 + 19.927605 @@ -936,9 +934,9 @@

Unit level predicti B 110 2.620 - 22.230773 - 20.057140 - 24.322536 + 22.233743 + 20.121823 + 24.395500 1 @@ -946,9 +944,9 @@

Unit level predicti B 110 2.875 - 21.329605 - 19.371961 - 23.354494 + 21.317279 + 19.250855 + 23.284851 2 @@ -956,9 +954,9 @@

Unit level predicti B 93 2.320 - 25.914300 - 24.439324 - 27.767257 + 25.916714 + 24.242191 + 27.574153 3 @@ -966,9 +964,9 @@

Unit level predicti A 110 3.215 - 18.690801 - 16.457099 - 21.072167 + 18.775156 + 16.444976 + 21.276683 4 @@ -976,9 +974,9 @@

Unit level predicti A 175 3.440 - 16.924656 - 15.260324 - 18.603531 + 16.917035 + 15.219165 + 18.653843 @@ -1071,9 +1069,9 @@

Marginalizin 0 - 20.072681 - 17.957895 - 22.215034 + 20.06417 + 17.895285 + 22.250253 @@ -1123,12 +1121,12 @@

Negative Binomial
Auto-assigning NUTS sampler...
 Initializing NUTS using jitter+adapt_diag...
-Multiprocess sampling (4 chains in 2 jobs)
+Multiprocess sampling (4 chains in 4 jobs)
 NUTS: [alpha, prog, scale(math), prog:scale(math)]
@@ -1138,7 +1136,7 @@

Negative Binomial

 

-
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 10 seconds.
+
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 15 seconds.

This model utilizes a log link function and a negative binomial distribution for the likelihood. Also note that this model also contains an interaction prog:sale(math).

@@ -1226,12 +1224,12 @@

Logistic RegressionModeling the probability that certified_fresh==1 Auto-assigning NUTS sampler... Initializing NUTS using adapt_diag... -Multiprocess sampling (2 chains in 2 jobs) +Multiprocess sampling (4 chains in 4 jobs) NUTS: [length, style, length:style]
@@ -1241,8 +1239,7 @@

Logistic Regression

-
Sampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 484 seconds.
-We recommend running at least 4 chains for robust computation of convergence diagnostics
+
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 1686 seconds.

The logistic regression model uses a logit link function and a Bernoulli likelihood. Therefore, the link scale is the log-odds of a successful response and the response scale is the probability of a successful response.

@@ -1256,9 +1253,9 @@

Logistic Regression @@ -1299,12 +1296,12 @@

Plotting o
Auto-assigning NUTS sampler...
 Initializing NUTS using jitter+adapt_diag...
-Multiprocess sampling (2 chains in 2 jobs)
+Multiprocess sampling (4 chains in 4 jobs)
 NUTS: [Intercept, x, alpha_Intercept, alpha_x]
@@ -1314,9 +1311,8 @@

Plotting o

 

-
Sampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 6 seconds.
-There were 1 divergences after tuning. Increase `target_accept` or reparameterize.
-We recommend running at least 4 chains for robust computation of convergence diagnostics
+
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 18 seconds.
+There were 8 divergences after tuning. Increase `target_accept` or reparameterize.
@@ -1364,16 +1360,16 @@

Plotting o
%load_ext watermark
 %watermark -n -u -v -iv -w
-
Last updated: Sat May 25 2024
+
Last updated: Fri Aug 16 2024
 
 Python implementation: CPython
 Python version       : 3.11.9
 IPython version      : 8.24.0
 
-numpy     : 1.26.4
 matplotlib: 3.8.4
+bambi     : 0.14.1.dev12+g64e57423.d20240730
+numpy     : 1.26.4
 pandas    : 2.2.2
-bambi     : 0.13.1.dev37+g2a54df76.d20240525
 
 Watermark: 2.4.3
 
diff --git a/notebooks/plot_predictions_files/figure-html/cell-10-output-1.png b/notebooks/plot_predictions_files/figure-html/cell-10-output-1.png index a8c3d959..9d5d83cb 100644 Binary files a/notebooks/plot_predictions_files/figure-html/cell-10-output-1.png and b/notebooks/plot_predictions_files/figure-html/cell-10-output-1.png differ diff --git a/notebooks/plot_predictions_files/figure-html/cell-15-output-1.png b/notebooks/plot_predictions_files/figure-html/cell-15-output-1.png index 12326301..a444411b 100644 Binary files a/notebooks/plot_predictions_files/figure-html/cell-15-output-1.png and b/notebooks/plot_predictions_files/figure-html/cell-15-output-1.png differ diff --git a/notebooks/plot_predictions_files/figure-html/cell-20-output-1.png b/notebooks/plot_predictions_files/figure-html/cell-20-output-1.png index cd966b47..eda0d237 100644 Binary files a/notebooks/plot_predictions_files/figure-html/cell-20-output-1.png and b/notebooks/plot_predictions_files/figure-html/cell-20-output-1.png differ diff --git a/notebooks/plot_predictions_files/figure-html/cell-23-output-1.png b/notebooks/plot_predictions_files/figure-html/cell-23-output-1.png index cdab0be2..e69ac75a 100644 Binary files a/notebooks/plot_predictions_files/figure-html/cell-23-output-1.png and b/notebooks/plot_predictions_files/figure-html/cell-23-output-1.png differ diff --git a/notebooks/plot_predictions_files/figure-html/cell-24-output-1.png b/notebooks/plot_predictions_files/figure-html/cell-24-output-1.png index 8cef9221..39147309 100644 Binary files a/notebooks/plot_predictions_files/figure-html/cell-24-output-1.png and b/notebooks/plot_predictions_files/figure-html/cell-24-output-1.png differ diff --git a/notebooks/plot_predictions_files/figure-html/cell-27-output-1.png b/notebooks/plot_predictions_files/figure-html/cell-27-output-1.png index 8acd4607..5e20019d 100644 Binary files a/notebooks/plot_predictions_files/figure-html/cell-27-output-1.png and b/notebooks/plot_predictions_files/figure-html/cell-27-output-1.png differ diff --git a/notebooks/plot_predictions_files/figure-html/cell-28-output-1.png b/notebooks/plot_predictions_files/figure-html/cell-28-output-1.png index aaaff493..d6464339 100644 Binary files a/notebooks/plot_predictions_files/figure-html/cell-28-output-1.png and b/notebooks/plot_predictions_files/figure-html/cell-28-output-1.png differ diff --git a/notebooks/plot_predictions_files/figure-html/cell-4-output-2.png b/notebooks/plot_predictions_files/figure-html/cell-4-output-2.png index b3ab63ca..c7c5bfc5 100644 Binary files a/notebooks/plot_predictions_files/figure-html/cell-4-output-2.png and b/notebooks/plot_predictions_files/figure-html/cell-4-output-2.png differ diff --git a/notebooks/plot_predictions_files/figure-html/cell-6-output-1.png b/notebooks/plot_predictions_files/figure-html/cell-6-output-1.png index 44af3658..ddee3115 100644 Binary files a/notebooks/plot_predictions_files/figure-html/cell-6-output-1.png and b/notebooks/plot_predictions_files/figure-html/cell-6-output-1.png differ diff --git a/notebooks/plot_predictions_files/figure-html/cell-8-output-1.png b/notebooks/plot_predictions_files/figure-html/cell-8-output-1.png index 7a9755c2..d231e00b 100644 Binary files a/notebooks/plot_predictions_files/figure-html/cell-8-output-1.png and b/notebooks/plot_predictions_files/figure-html/cell-8-output-1.png differ diff --git a/notebooks/plot_predictions_files/figure-html/cell-9-output-1.png b/notebooks/plot_predictions_files/figure-html/cell-9-output-1.png index e3dddbc4..607ad36e 100644 Binary files a/notebooks/plot_predictions_files/figure-html/cell-9-output-1.png and b/notebooks/plot_predictions_files/figure-html/cell-9-output-1.png differ diff --git a/search.json b/search.json index b75131be..55333c11 100644 --- a/search.json +++ b/search.json @@ -207,7 +207,7 @@ "href": "notebooks/alternative_links_binary.html", "title": "Bambi", "section": "", - "text": "In this example we use a simple dataset to fit a Generalized Linear Model for a binary response using different link functions.\n\nimport arviz as az\nimport bambi as bmb\nimport matplotlib.pyplot as plt\nimport numpy as np\nimport pandas as pd\n\nfrom scipy.special import expit as invlogit\nfrom scipy.stats import norm\n\n\naz.style.use(\"arviz-darkgrid\")\nnp.random.seed(1234)\n\n\n\nFirst of all, let’s review some concepts. A Generalized Linear Model (GLM) is made of three components.\n1. Random component\nA set of independent and identically distributed random variables \\(Y_i\\). Their (conditional) probability distribution belongs to the same family \\(f\\) with a mean given by \\(\\mu_i\\).\n2. Systematic component (a.k.a linear predictor)\nConstructed by a linear combination of the parameters \\(\\beta_j\\) and explanatory variables \\(x_j\\), represented by \\(\\eta_i\\)\n\\[\n\\eta_i = \\mathbf{x}_i^T\\mathbf{\\beta} = x_{i1}\\beta_1 + x_{i2}\\beta_2 + \\cdots + x_{ip}\\beta_p\n\\]\n3. Link function\nA monotone and differentiable function \\(g\\) such that\n\\[\ng(\\mu_i) = \\eta_i = \\mathbf{x}_i^T\\mathbf{\\beta}\n\\] where \\(\\mu_i = E(Y_i)\\)\nAs we can see, this function specifies the link between the random and the systematic components of the model.\nAn important feature of GLMs is that no matter we are modeling a function of \\(\\mu\\) (and not just \\(\\mu\\), unless \\(g\\) is the identity function) is that we can show predictions in terms of the mean \\(\\mu\\) by using the inverse of \\(g\\) on the linear predictor \\(\\eta_i\\)\n\\[\ng^{-1}(\\eta_i) = g^{-1}(\\mathbf{x}_i^T\\mathbf{\\beta}) = \\mu_i\n\\]\nIn Bambi, we can use family=\"bernoulli\" to tell we are modeling a binary variable that follows a Bernoulli distribution and our random component is of the form\n\\[\nY_i =\n\\left\\{\n \\begin{array}{ll}\n 1 & \\textrm{with probability } \\pi_i \\\\\n 0 & \\textrm{with probability } 1 - \\pi_i\n \\end{array}\n\\right.\n\\]\nthat has a mean \\(\\mu_i\\) equal to the probability of success \\(\\pi_i\\).\nBy default, this family implies \\(g\\) is the logit function.\n\\[\n\\begin{array}{lcr} \n \\displaystyle \\text{logit}(\\pi_i) = \\log{\\left( \\frac{\\pi_i}{1 - \\pi_i} \\right)} = \\eta_i &\n \\text{ with } &\n \\displaystyle g^{-1}(\\eta) = \\frac{1}{1 + e^{-\\eta}} = \\pi_i\n\\end{array}\n\\]\nBut there are other options available, like the probit and the cloglog link functions.\nThe probit function is the inverse of the cumulative density function of a standard Gaussian distribution\n\\[\n\\begin{array}{lcr} \n \\displaystyle \\text{probit}(\\pi_i) = \\Phi^{-1}(\\pi_i) = \\eta_i &\n \\text{ with } &\n \\displaystyle g^{-1}(\\eta) = \\Phi(\\eta_i) = \\pi_i\n\\end{array}\n\\]\nAnd with the cloglog link function we have\n\\[\n\\begin{array}{lcr} \n \\displaystyle \\text{cloglog}(\\pi_i) = \\log(-\\log(1 - \\pi)) = \\eta_i &\n \\text{ with } &\n \\displaystyle g^{-1}(\\eta) = 1 - \\exp(-\\exp(\\eta_i)) = \\pi_i\n\\end{array}\n\\]\ncloglog stands for complementary log-log and \\(g^{-1}\\) is the cumulative density function of the extreme minimum value distribution.\nLet’s plot them to better understand the implications of what we’re saying.\n\ndef invcloglog(x):\n return 1 - np.exp(-np.exp(x))\n\n\nx = np.linspace(-5, 5, num=200)\n\n# inverse of the logit function\nlogit = invlogit(x)\n\n# cumulative density function of standard gaussian\nprobit = norm.cdf(x)\n\n# inverse of the cloglog function\ncloglog = invcloglog(x)\n\nplt.plot(x, logit, color=\"C0\", lw=2, label=\"Logit\")\nplt.plot(x, probit, color=\"C1\", lw=2, label=\"Probit\")\nplt.plot(x, cloglog, color=\"C2\", lw=2, label=\"CLogLog\")\nplt.axvline(0, c=\"k\", alpha=0.5, ls=\"--\")\nplt.axhline(0.5, c=\"k\", alpha=0.5, ls=\"--\")\nplt.xlabel(r\"$x$\")\nplt.ylabel(r\"$\\pi$\")\nplt.legend();\n\n\n\n\nIn the plot above we can see both the logit and the probit links are symmetric in terms of their slopes at \\(-x\\) and \\(x\\). We can say the function approaches \\(\\pi = 0.5\\) at the same rate as it moves away from it. However, these two functions differ in their tails. The probit link approaches 0 and 1 faster than the logit link as we move away from \\(x=0\\). Just see the orange line is below the blue one for \\(x < 0\\) and it is above for \\(x > 0\\). In other words, the logit function has heavier tails than the probit.\nOn the other hand, the cloglog does not present this symmetry, and we can clearly see it since the green line does not cross the point (0, 0.5). This function approaches faster the 1 than 0 as we move away from \\(x=0\\).\n\n\n\nWe use a data set consisting of the numbers of beetles dead after five hours of exposure to gaseous carbon disulphide at various concentrations. This data can be found in An Introduction to Generalized Linear Models by A. J. Dobson and A. G. Barnett, but the original source is (Bliss, 1935).\n\n\n\n\n\n\n\n\nDose, \\(x_i\\) (\\(\\log_{10}\\text{CS}_2\\text{mgl}^{-1}\\))\nNumber of beetles, \\(n_i\\)\nNumber killed, \\(y_i\\)\n\n\n\n\n1.6907\n59\n6\n\n\n1.7242\n60\n13\n\n\n1.7552\n62\n18\n\n\n1.7842\n56\n28\n\n\n1.8113\n63\n52\n\n\n1.8369\n59\n53\n\n\n1.8610\n62\n61\n\n\n1.8839\n60\n60\n\n\n\nWe create a data frame where the data is in long format (i.e. each row is an observation with a 0-1 outcome).\n\nx = np.array([1.6907, 1.7242, 1.7552, 1.7842, 1.8113, 1.8369, 1.8610, 1.8839])\nn = np.array([59, 60, 62, 56, 63, 59, 62, 60])\ny = np.array([6, 13, 18, 28, 52, 53, 61, 60])\n\ndata = pd.DataFrame({\"x\": x, \"n\": n, \"y\": y})\n\n\n\n\nBambi has two families to model binary data: Bernoulli and Binomial. The first one can be used when each row represents a single observation with a column containing the binary outcome, while the second is used when each row represents a group of observations or realizations and there’s one column for the number of successes and another column for the number of trials.\nSince we have aggregated data, we’re going to use the Binomial family. This family requires using the function proportion(y, n) on the left side of the model formula to indicate we want to model the proportion between two variables. This function can be replaced by any of its aliases prop(y, n) or p(y, n). Let’s use the shortest one here.\n\nformula = \"p(y, n) ~ x\"\n\n\n\nThe logit link is the default link when we say family=\"binomial\", so there’s no need to add it.\n\nmodel_logit = bmb.Model(formula, data, family=\"binomial\")\nidata_logit = model_logit.fit(draws=2000)\n\nAuto-assigning NUTS sampler...\nInitializing NUTS using jitter+adapt_diag...\nMultiprocess sampling (2 chains in 2 jobs)\nNUTS: [Intercept, x]\n\n\n\n\n\n\n\n\n\n\n\nSampling 2 chains for 1_000 tune and 2_000 draw iterations (2_000 + 4_000 draws total) took 3 seconds.\nWe recommend running at least 4 chains for robust computation of convergence diagnostics\n\n\n\n\n\n\nmodel_probit = bmb.Model(formula, data, family=\"binomial\", link=\"probit\")\nidata_probit = model_probit.fit(draws=2000)\n\nAuto-assigning NUTS sampler...\nInitializing NUTS using jitter+adapt_diag...\nMultiprocess sampling (2 chains in 2 jobs)\nNUTS: [Intercept, x]\n\n\n\n\n\n\n\n\n\n\n\nSampling 2 chains for 1_000 tune and 2_000 draw iterations (2_000 + 4_000 draws total) took 3 seconds.\nWe recommend running at least 4 chains for robust computation of convergence diagnostics\n\n\n\n\n\n\nmodel_cloglog = bmb.Model(formula, data, family=\"binomial\", link=\"cloglog\")\nidata_cloglog = model_cloglog.fit(draws=2000)\n\nAuto-assigning NUTS sampler...\nInitializing NUTS using jitter+adapt_diag...\nMultiprocess sampling (2 chains in 2 jobs)\nNUTS: [Intercept, x]\n\n\n\n\n\n\n\n\n\n\n\nSampling 2 chains for 1_000 tune and 2_000 draw iterations (2_000 + 4_000 draws total) took 3 seconds.\nWe recommend running at least 4 chains for robust computation of convergence diagnostics\n\n\n\n\n\n\nWe can use the samples from the posteriors to see the mean estimate for the probability of dying at each concentration level. To do so, we use a little helper function that will help us to write less code. This function leverages the power of the new Model.predict() method that is helpful to obtain both in-sample and out-of-sample predictions.\n\ndef get_predictions(model, idata, seq):\n # Create a data frame with the new data\n new_data = pd.DataFrame({\"x\": seq})\n \n # Predict probability of dying using out of sample data\n model.predict(idata, data=new_data)\n \n # Get posterior mean across all chains and draws\n mu = idata.posterior[\"p\"].mean((\"chain\", \"draw\"))\n return mu\n\n\nx_seq = np.linspace(1.6, 2, num=200)\n\nmu_logit = get_predictions(model_logit, idata_logit, x_seq)\nmu_probit = get_predictions(model_probit, idata_probit, x_seq)\nmu_cloglog = get_predictions(model_cloglog, idata_cloglog, x_seq)\n\n\nplt.scatter(x, y / n, c = \"white\", edgecolors = \"black\", s=100)\nplt.plot(x_seq, mu_logit, lw=2, label=\"Logit\")\nplt.plot(x_seq, mu_probit, lw=2, label=\"Probit\")\nplt.plot(x_seq, mu_cloglog, lw=2, label=\"CLogLog\")\nplt.axhline(0.5, c=\"k\", alpha=0.5, ls=\"--\")\nplt.xlabel(r\"Dose $\\log_{10}CS_2mgl^{-1}$\")\nplt.ylabel(\"Probability of death\")\nplt.legend();\n\n\n\n\nIn this example, we can see the models using the logit and probit link functions present very similar estimations. With these particular data, all the three link functions fit the data well and the results do not differ significantly. However, there can be scenarios where the results are more sensitive to the choice of the link function.\nReferences\nBliss, C. I. (1935). The calculation of the dose-mortality curve. Annals of Applied Biology 22, 134–167\n\n%load_ext watermark\n%watermark -n -u -v -iv -w\n\nLast updated: Sat May 25 2024\n\nPython implementation: CPython\nPython version : 3.11.9\nIPython version : 8.24.0\n\nnumpy : 1.26.4\nbambi : 0.13.1.dev37+g2a54df76.d20240525\npandas : 2.2.2\narviz : 0.18.0\nmatplotlib: 3.8.4\n\nWatermark: 2.4.3" + "text": "In this example we use a simple dataset to fit a Generalized Linear Model for a binary response using different link functions.\n\nimport arviz as az\nimport bambi as bmb\nimport matplotlib.pyplot as plt\nimport numpy as np\nimport pandas as pd\n\nfrom scipy.special import expit as invlogit\nfrom scipy.stats import norm\n\n\naz.style.use(\"arviz-darkgrid\")\nnp.random.seed(1234)\n\n\n\nFirst of all, let’s review some concepts. A Generalized Linear Model (GLM) is made of three components.\n1. Random component\nA set of independent and identically distributed random variables \\(Y_i\\). Their (conditional) probability distribution belongs to the same family \\(f\\) with a mean given by \\(\\mu_i\\).\n2. Systematic component (a.k.a linear predictor)\nConstructed by a linear combination of the parameters \\(\\beta_j\\) and explanatory variables \\(x_j\\), represented by \\(\\eta_i\\)\n\\[\n\\eta_i = \\mathbf{x}_i^T\\mathbf{\\beta} = x_{i1}\\beta_1 + x_{i2}\\beta_2 + \\cdots + x_{ip}\\beta_p\n\\]\n3. Link function\nA monotone and differentiable function \\(g\\) such that\n\\[\ng(\\mu_i) = \\eta_i = \\mathbf{x}_i^T\\mathbf{\\beta}\n\\] where \\(\\mu_i = E(Y_i)\\)\nAs we can see, this function specifies the link between the random and the systematic components of the model.\nAn important feature of GLMs is that no matter we are modeling a function of \\(\\mu\\) (and not just \\(\\mu\\), unless \\(g\\) is the identity function) is that we can show predictions in terms of the mean \\(\\mu\\) by using the inverse of \\(g\\) on the linear predictor \\(\\eta_i\\)\n\\[\ng^{-1}(\\eta_i) = g^{-1}(\\mathbf{x}_i^T\\mathbf{\\beta}) = \\mu_i\n\\]\nIn Bambi, we can use family=\"bernoulli\" to tell we are modeling a binary variable that follows a Bernoulli distribution and our random component is of the form\n\\[\nY_i =\n\\left\\{\n \\begin{array}{ll}\n 1 & \\textrm{with probability } \\pi_i \\\\\n 0 & \\textrm{with probability } 1 - \\pi_i\n \\end{array}\n\\right.\n\\]\nthat has a mean \\(\\mu_i\\) equal to the probability of success \\(\\pi_i\\).\nBy default, this family implies \\(g\\) is the logit function.\n\\[\n\\begin{array}{lcr} \n \\displaystyle \\text{logit}(\\pi_i) = \\log{\\left( \\frac{\\pi_i}{1 - \\pi_i} \\right)} = \\eta_i &\n \\text{ with } &\n \\displaystyle g^{-1}(\\eta) = \\frac{1}{1 + e^{-\\eta}} = \\pi_i\n\\end{array}\n\\]\nBut there are other options available, like the probit and the cloglog link functions.\nThe probit function is the inverse of the cumulative density function of a standard Gaussian distribution\n\\[\n\\begin{array}{lcr} \n \\displaystyle \\text{probit}(\\pi_i) = \\Phi^{-1}(\\pi_i) = \\eta_i &\n \\text{ with } &\n \\displaystyle g^{-1}(\\eta) = \\Phi(\\eta_i) = \\pi_i\n\\end{array}\n\\]\nAnd with the cloglog link function we have\n\\[\n\\begin{array}{lcr} \n \\displaystyle \\text{cloglog}(\\pi_i) = \\log(-\\log(1 - \\pi)) = \\eta_i &\n \\text{ with } &\n \\displaystyle g^{-1}(\\eta) = 1 - \\exp(-\\exp(\\eta_i)) = \\pi_i\n\\end{array}\n\\]\ncloglog stands for complementary log-log and \\(g^{-1}\\) is the cumulative density function of the extreme minimum value distribution.\nLet’s plot them to better understand the implications of what we’re saying.\n\ndef invcloglog(x):\n return 1 - np.exp(-np.exp(x))\n\n\nx = np.linspace(-5, 5, num=200)\n\n# inverse of the logit function\nlogit = invlogit(x)\n\n# cumulative density function of standard gaussian\nprobit = norm.cdf(x)\n\n# inverse of the cloglog function\ncloglog = invcloglog(x)\n\nplt.plot(x, logit, color=\"C0\", lw=2, label=\"Logit\")\nplt.plot(x, probit, color=\"C1\", lw=2, label=\"Probit\")\nplt.plot(x, cloglog, color=\"C2\", lw=2, label=\"CLogLog\")\nplt.axvline(0, c=\"k\", alpha=0.5, ls=\"--\")\nplt.axhline(0.5, c=\"k\", alpha=0.5, ls=\"--\")\nplt.xlabel(r\"$x$\")\nplt.ylabel(r\"$\\pi$\")\nplt.legend();\n\n\n\n\nIn the plot above we can see both the logit and the probit links are symmetric in terms of their slopes at \\(-x\\) and \\(x\\). We can say the function approaches \\(\\pi = 0.5\\) at the same rate as it moves away from it. However, these two functions differ in their tails. The probit link approaches 0 and 1 faster than the logit link as we move away from \\(x=0\\). Just see the orange line is below the blue one for \\(x < 0\\) and it is above for \\(x > 0\\). In other words, the logit function has heavier tails than the probit.\nOn the other hand, the cloglog does not present this symmetry, and we can clearly see it since the green line does not cross the point (0, 0.5). This function approaches faster the 1 than 0 as we move away from \\(x=0\\).\n\n\n\nWe use a data set consisting of the numbers of beetles dead after five hours of exposure to gaseous carbon disulphide at various concentrations. This data can be found in An Introduction to Generalized Linear Models by A. J. Dobson and A. G. Barnett, but the original source is (Bliss, 1935).\n\n\n\n\n\n\n\n\nDose, \\(x_i\\) (\\(\\log_{10}\\text{CS}_2\\text{mgl}^{-1}\\))\nNumber of beetles, \\(n_i\\)\nNumber killed, \\(y_i\\)\n\n\n\n\n1.6907\n59\n6\n\n\n1.7242\n60\n13\n\n\n1.7552\n62\n18\n\n\n1.7842\n56\n28\n\n\n1.8113\n63\n52\n\n\n1.8369\n59\n53\n\n\n1.8610\n62\n61\n\n\n1.8839\n60\n60\n\n\n\nWe create a data frame where the data is in long format (i.e. each row is an observation with a 0-1 outcome).\n\nx = np.array([1.6907, 1.7242, 1.7552, 1.7842, 1.8113, 1.8369, 1.8610, 1.8839])\nn = np.array([59, 60, 62, 56, 63, 59, 62, 60])\ny = np.array([6, 13, 18, 28, 52, 53, 61, 60])\n\ndata = pd.DataFrame({\"x\": x, \"n\": n, \"y\": y})\n\n\n\n\nBambi has two families to model binary data: Bernoulli and Binomial. The first one can be used when each row represents a single observation with a column containing the binary outcome, while the second is used when each row represents a group of observations or realizations and there’s one column for the number of successes and another column for the number of trials.\nSince we have aggregated data, we’re going to use the Binomial family. This family requires using the function proportion(y, n) on the left side of the model formula to indicate we want to model the proportion between two variables. This function can be replaced by any of its aliases prop(y, n) or p(y, n). Let’s use the shortest one here.\n\nformula = \"p(y, n) ~ x\"\n\n\n\nThe logit link is the default link when we say family=\"binomial\", so there’s no need to add it.\n\nmodel_logit = bmb.Model(formula, data, family=\"binomial\")\nidata_logit = model_logit.fit(draws=2000)\n\nAuto-assigning NUTS sampler...\nInitializing NUTS using jitter+adapt_diag...\nMultiprocess sampling (4 chains in 4 jobs)\nNUTS: [Intercept, x]\n\n\n\n\n\n\n\n\n\n\n\nSampling 4 chains for 1_000 tune and 2_000 draw iterations (4_000 + 8_000 draws total) took 7 seconds.\n\n\n\n\n\n\nmodel_probit = bmb.Model(formula, data, family=\"binomial\", link=\"probit\")\nidata_probit = model_probit.fit(draws=2000)\n\nAuto-assigning NUTS sampler...\nInitializing NUTS using jitter+adapt_diag...\nMultiprocess sampling (4 chains in 4 jobs)\nNUTS: [Intercept, x]\n\n\n\n\n\n\n\n\n\n\n\nSampling 4 chains for 1_000 tune and 2_000 draw iterations (4_000 + 8_000 draws total) took 7 seconds.\n\n\n\nmodel_probit\n\n Formula: p(y, n) ~ x\n Family: binomial\n Link: p = probit\n Observations: 8\n Priors: \n target = p\n Common-level effects\n Intercept ~ Normal(mu: 0.0, sigma: 1.5)\n x ~ Normal(mu: 0.0, sigma: 15.848)\n------\n* To see a plot of the priors call the .plot_priors() method.\n* To see a summary or plot of the posterior pass the object returned by .fit() to az.summary() or az.plot_trace()\n\n\n\n\n\n\nmodel_cloglog = bmb.Model(formula, data, family=\"binomial\", link=\"cloglog\")\nidata_cloglog = model_cloglog.fit(draws=2000)\n\nAuto-assigning NUTS sampler...\nInitializing NUTS using jitter+adapt_diag...\nMultiprocess sampling (4 chains in 4 jobs)\nNUTS: [Intercept, x]\n\n\n\n\n\n\n\n\n\n\n\nSampling 4 chains for 1_000 tune and 2_000 draw iterations (4_000 + 8_000 draws total) took 5 seconds.\n\n\n\n\n\n\nWe can use the samples from the posteriors to see the mean estimate for the probability of dying at each concentration level. To do so, we use a little helper function that will help us to write less code. This function leverages the power of the new Model.predict() method that is helpful to obtain both in-sample and out-of-sample predictions.\n\ndef get_predictions(model, idata, seq):\n # Create a data frame with the new data\n new_data = pd.DataFrame({\"x\": seq})\n \n # Predict probability of dying using out of sample data\n model.predict(idata, data=new_data)\n \n # Get posterior mean across all chains and draws\n mu = idata.posterior[\"p\"].mean((\"chain\", \"draw\"))\n return mu\n\n\nx_seq = np.linspace(1.6, 2, num=200)\n\nmu_logit = get_predictions(model_logit, idata_logit, x_seq)\nmu_probit = get_predictions(model_probit, idata_probit, x_seq)\nmu_cloglog = get_predictions(model_cloglog, idata_cloglog, x_seq)\n\n\nplt.scatter(x, y / n, c = \"white\", edgecolors = \"black\", s=100)\nplt.plot(x_seq, mu_logit, lw=2, label=\"Logit\")\nplt.plot(x_seq, mu_probit, lw=2, label=\"Probit\")\nplt.plot(x_seq, mu_cloglog, lw=2, label=\"CLogLog\")\nplt.axhline(0.5, c=\"k\", alpha=0.5, ls=\"--\")\nplt.xlabel(r\"Dose $\\log_{10}CS_2mgl^{-1}$\")\nplt.ylabel(\"Probability of death\")\nplt.legend();\n\n\n\n\nIn this example, we can see the models using the logit and probit link functions present very similar estimations. With these particular data, all the three link functions fit the data well and the results do not differ significantly. However, there can be scenarios where the results are more sensitive to the choice of the link function.\nReferences\nBliss, C. I. (1935). The calculation of the dose-mortality curve. Annals of Applied Biology 22, 134–167\n\n%load_ext watermark\n%watermark -n -u -v -iv -w\n\nLast updated: Thu Aug 15 2024\n\nPython implementation: CPython\nPython version : 3.11.9\nIPython version : 8.24.0\n\npandas : 2.2.2\nmatplotlib: 3.8.4\narviz : 0.18.0\nnumpy : 1.26.4\nbambi : 0.14.1.dev12+g64e57423.d20240730\n\nWatermark: 2.4.3" }, { "objectID": "notebooks/alternative_samplers.html", @@ -256,7 +256,7 @@ "href": "notebooks/hierarchical_binomial_bambi.html", "title": "Bambi", "section": "", - "text": "This notebook shows how to build a hierarchical logistic regression model with the Binomial family in Bambi.\nThis example is based on the Hierarchical baseball article in Bayesian Analysis Recipes, a collection of articles on how to do Bayesian data analysis with PyMC3 made by Eric Ma.\n\n\nExtracted from the original work:\n\nBaseball players have many metrics measured for them. Let’s say we are on a baseball team, and would like to quantify player performance, one metric being their batting average (defined by how many times a batter hit a pitched ball, divided by the number of times they were up for batting (“at bat”)). How would you go about this task?\n\n\n\n\n\nimport arviz as az\nimport bambi as bmb\nimport matplotlib.pyplot as plt\nimport numpy as np\n\nfrom matplotlib.lines import Line2D\nfrom matplotlib.patches import Patch\n\n\nrandom_seed = 1234\n\nWe first need some measurements of batting data. Today we’re going to use data from the Baseball Databank. It is a compilation of historical baseball data in a convenient, tidy format, distributed under Open Data terms.\nThis repository contains several datasets in the form of .csv files. This example is going to use the Batting.csv file, which can be loaded directly with Bambi in a convenient way.\n\ndf = bmb.load_data(\"batting\")\n\n# Then clean some of the data\ndf[\"AB\"] = df[\"AB\"].replace(0, np.nan)\ndf = df.dropna()\ndf[\"batting_avg\"] = df[\"H\"] / df[\"AB\"]\ndf = df[df[\"yearID\"] >= 2016]\ndf = df.iloc[0:15] \ndf.head(5)\n\n\n\n\n\n \n \n \n playerID\n yearID\n stint\n teamID\n lgID\n G\n AB\n R\n H\n 2B\n ...\n SB\n CS\n BB\n SO\n IBB\n HBP\n SH\n SF\n GIDP\n batting_avg\n \n \n \n \n 101348\n abadfe01\n 2016\n 1\n MIN\n AL\n 39\n 1.0\n 0\n 0\n 0\n ...\n 0.0\n 0.0\n 0\n 1.0\n 0.0\n 0.0\n 0.0\n 0.0\n 0.0\n 0.000000\n \n \n 101350\n abreujo02\n 2016\n 1\n CHA\n AL\n 159\n 624.0\n 67\n 183\n 32\n ...\n 0.0\n 2.0\n 47\n 125.0\n 7.0\n 15.0\n 0.0\n 9.0\n 21.0\n 0.293269\n \n \n 101352\n ackledu01\n 2016\n 1\n NYA\n AL\n 28\n 61.0\n 6\n 9\n 0\n ...\n 0.0\n 0.0\n 8\n 9.0\n 0.0\n 0.0\n 0.0\n 1.0\n 0.0\n 0.147541\n \n \n 101353\n adamecr01\n 2016\n 1\n COL\n NL\n 121\n 225.0\n 25\n 49\n 7\n ...\n 2.0\n 3.0\n 24\n 47.0\n 0.0\n 4.0\n 3.0\n 0.0\n 5.0\n 0.217778\n \n \n 101355\n adamsma01\n 2016\n 1\n SLN\n NL\n 118\n 297.0\n 37\n 74\n 18\n ...\n 0.0\n 1.0\n 25\n 81.0\n 1.0\n 2.0\n 0.0\n 3.0\n 5.0\n 0.249158\n \n \n\n5 rows × 23 columns\n\n\n\nFrom all the columns above, we’re going to use the following:\n\nplayerID: Unique identification for the player.\nAB: Number of times the player was up for batting.\nH: Number of times the player hit the ball while batting.\nbatting_avg: Simply ratio between H and AB.\n\n\n\n\nIt’s always good to explore the data before starting to write down our models. This is very useful to gain a good understanding of the distribution of the variables and their relationships, and even anticipate some problems that may occur during the sampling process.\nThe following graph summarizes the percentage of hits, as well as the number of times the players were up for batting and the number of times they hit the ball.\n\nBLUE = \"#2a5674\"\nRED = \"#b13f64\"\n\n\n_, ax = plt.subplots(figsize=(10, 6))\n\n# Customize x limits. \n# This adds space on the left side to indicate percentage of hits.\nax.set_xlim(-120, 320)\n\n# Add dots for the the number of hits and the times at bat\nax.scatter(df[\"H\"], list(range(15)), s=140, color=RED, zorder=10)\nax.scatter(df[\"AB\"], list(range(15)), s=140, color=BLUE, zorder=10)\n\n# Also a line connecting them\nax.hlines(list(range(15)), df[\"H\"], df[\"AB\"], color=\"#b3b3b3\", lw=4)\n\nax.axvline(ls=\"--\", lw=1.4, color=\"#a3a3a3\")\nax.hlines(list(range(15)), -110, -50, lw=6, color=\"#b3b3b3\", capstyle=\"round\")\nax.scatter(60 * df[\"batting_avg\"] - 110, list(range(15)), s=28, color=RED, zorder=10)\n\n# Add the percentage of hits\nfor j in range(15): \n text = f\"{round(df['batting_avg'].iloc[j] * 100)}%\"\n ax.text(-12, j, text, ha=\"right\", va=\"center\", fontsize=14, color=\"#333\")\n\n# Customize tick positions and labels\nax.yaxis.set_ticks(list(range(15)))\nax.yaxis.set_ticklabels(df[\"playerID\"])\nax.xaxis.set_ticks(range(0, 400, 100))\n\n# Create handles for the legend (just dots and labels)\nhandles = [\n Line2D(\n [0], [0], label=\"Hits\", marker=\"o\", color=\"None\", markeredgewidth=0, \n markerfacecolor=RED, markersize=13\n ),\n Line2D(\n [0], [0], label=\"At Bat\", marker=\"o\", color=\"None\", markeredgewidth=0,\n markerfacecolor=BLUE, markersize=12\n )\n]\n\n# Add legend on top-right corner\nlegend = ax.legend(\n handles=handles, \n loc=1, \n fontsize=14, \n handletextpad=0.4,\n frameon=True\n)\n\n# Finally add labels and a title\nax.set_xlabel(\"Count\", fontsize=14)\nax.set_ylabel(\"Player\", fontsize=14)\nax.set_title(\"How often do batters hit the ball?\", fontsize=20);\n\n\n\n\nThe first thing one can see is that the number of times players were up for batting varies quite a lot. Some players have been there for very few times, while there are others who have been there hundreds of times. We can also note the percentage of hits is usually a number between 12% and 29%.\nThere are two players, alberma01 and abadfe01, who had only one chance to bat. The first one hit the ball, while the latter missed. That’s why alberma01 as a 100% hit percentage, while abadfe01 has 0%. There’s another player, aguilje01, who has a success record of 0% because he missed all the few opportunities he had to bat. These extreme situations, where the empirical estimation lives in the boundary of the parameter space, are associated with estimation problems when using a maximum-likelihood estimation approach. Nonetheless, they can also impact the sampling process, especially when using wide priors.\nAs a final note, abreujo02, has been there for batting 624 times, and thus the grey dot representing this number does not appear in the plot.\n\n\n\nLet’s get started with a simple cell-means logistic regression for \\(p_i\\), the probability of hitting the ball for the player \\(i\\)\n\\[\n\\begin{array}{lr}\n \\displaystyle \\text{logit}(p_i) = \\beta_i & \\text{with } i = 0, \\cdots, 14\n\\end{array} \n\\]\nWhere\n\\[\n\\beta_i \\sim \\text{Normal}(0, \\ \\sigma_{\\beta}),\n\\]\n\\(\\sigma_{\\beta}\\) is a common constant for all the players, and \\(\\text{logit}(p_i) = \\log\\left(\\frac{p_i}{1 - p_i}\\right)\\).\nSpecifying this model is quite simple in Bambi thanks to its formula interface.\nFirst of all, note this is a Binomial family and the response involves both the number of hits (H) and the number of times at bat (AB). We use the p(x, n) function for the response term. This just tells Bambi we want to model the proportion resulting from dividing x over n.\nThe right-hand side of the formula is \"0 + playerID\". This means the model includes a coefficient for each player ID, but does not include a global intercept.\nFinally, using the Binomial family is as easy as passing family=\"binomial\". By default, the link function for this family is link=\"logit\", so there’s nothing to change there.\n\nmodel_non_hierarchical = bmb.Model(\"p(H, AB) ~ 0 + playerID\", df, family=\"binomial\")\nmodel_non_hierarchical\n\n Formula: p(H, AB) ~ 0 + playerID\n Family: binomial\n Link: p = logit\n Observations: 15\n Priors: \n target = p\n Common-level effects\n playerID ~ Normal(mu: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.], sigma: [10.0223 10.0223\n 10.0223 10.0223 10.0223 10.0223 10.0223 10.0223 10.0223\n 10.0223 10.0223 10.0223 10.0223 10.0223 10.0223])\n\n\n\nidata_non_hierarchical = model_non_hierarchical.fit(random_seed=random_seed)\n\nAuto-assigning NUTS sampler...\nInitializing NUTS using jitter+adapt_diag...\nMultiprocess sampling (2 chains in 2 jobs)\nNUTS: [playerID]\n\n\n\n\n\n\n\n\n\n\n\nSampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 4 seconds.\nWe recommend running at least 4 chains for robust computation of convergence diagnostics\n\n\nNext we observe the posterior of the coefficient for each player. The compact=False argument means we want separated panels for each player.\n\naz.plot_trace(idata_non_hierarchical, compact=False, backend_kwargs={\"layout\": \"constrained\"});\n\n\n\n\nSo far so good! The traceplots indicate the sampler worked well.\nNow, let’s keep this posterior aside for later use and let’s fit the hierarchical version.\n\n\n\nThis model incorporates a group-specific intercept for each player:\n\\[\n\\begin{array}{lr}\n \\displaystyle \\text{logit}(p_i) = \\alpha + \\gamma_i & \\text{with } i = 0, \\cdots, 14\n\\end{array} \n\\]\nwhere\n\\[\n\\begin{array}{c}\n \\alpha \\sim \\text{Normal}(0, \\ \\sigma_{\\alpha}) \\\\\n \\gamma_i \\sim \\text{Normal}(0, \\ \\sigma_{\\gamma}) \\\\\n \\sigma_{\\gamma} \\sim \\text{HalfNormal}(\\tau_{\\gamma})\n\\end{array}\n\\]\nThe group-specific terms are indicated with the | operator in the formula. In this case, since there is an intercept for each player, we write 1|playerID.\n\nmodel_hierarchical = bmb.Model(\"p(H, AB) ~ 1 + (1|playerID)\", df, family=\"binomial\")\nmodel_hierarchical\n\n Formula: p(H, AB) ~ 1 + (1|playerID)\n Family: binomial\n Link: p = logit\n Observations: 15\n Priors: \n target = p\n Common-level effects\n Intercept ~ Normal(mu: 0.0, sigma: 2.5)\n \n Group-level effects\n 1|playerID ~ Normal(mu: 0.0, sigma: HalfNormal(sigma: 2.5))\n\n\n\nidata_hierarchical = model_hierarchical.fit(random_seed=random_seed)\n\nAuto-assigning NUTS sampler...\nInitializing NUTS using jitter+adapt_diag...\nMultiprocess sampling (2 chains in 2 jobs)\nNUTS: [Intercept, 1|playerID_sigma, 1|playerID_offset]\n\n\n\n\n\n\n\n\n\n\n\nSampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 5 seconds.\nThere were 5 divergences after tuning. Increase `target_accept` or reparameterize.\nWe recommend running at least 4 chains for robust computation of convergence diagnostics\nThe rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details\n\n\nSometimes, there can be several divergences when fitting a hierarchical model. What can we do in that case?\nOne thing we could try is increasing target_accept. But if there are many divergences, that suggests a problem with the underlying model. Let’s take a look at the prior predictive distribution to see whether our priors are too informative or too wide.\nThe Model instance has a method called prior_predictive() that generates samples from the prior predictive distribution. It returns an InferenceData object that contains the values of the prior predictive distribution.\n\nidata_prior = model_hierarchical.prior_predictive()\nprior = az.extract_dataset(idata_prior, group=\"prior_predictive\")[\"p(H, AB)\"]\n\nSampling: [1|playerID_offset, 1|playerID_sigma, Intercept, p(H, AB)]\n/tmp/ipykernel_95801/2686921361.py:2: FutureWarning: extract_dataset has been deprecated, please use extract\n prior = az.extract_dataset(idata_prior, group=\"prior_predictive\")[\"p(H, AB)\"]\n\n\nIf we inspect the DataArray, we see there are 500 draws (sample) for each of the 15 players (__obs__)\nLet’s plot these distributions together with the observed proportion of hits for every player here.\n\n# We define this function because this plot is going to be repeated below.\ndef plot_prior_predictive(df, prior):\n AB = df[\"AB\"].values\n H = df[\"H\"].values\n\n fig, axes = plt.subplots(5, 3, figsize=(10, 6), sharex=\"col\")\n\n for idx, ax in enumerate(axes.ravel()):\n pps = prior.sel({\"__obs__\":idx})\n ab = AB[idx]\n h = H[idx]\n hist = ax.hist(pps / ab, bins=25, color=\"#a3a3a3\")\n ax.axvline(h / ab, color=RED, lw=2)\n ax.set_yticks([])\n ax.tick_params(labelsize=12)\n \n fig.subplots_adjust(left=0.025, right=0.975, hspace=0.05, wspace=0.05, bottom=0.125)\n fig.legend(\n handles=[Line2D([0], [0], label=\"Observed proportion\", color=RED, linewidth=2)],\n handlelength=1.5,\n handletextpad=0.8,\n borderaxespad=0,\n frameon=True,\n fontsize=11, \n bbox_to_anchor=(0.975, 0.92),\n loc=\"right\"\n \n )\n fig.text(0.5, 0.05, \"Prior probability of hitting\", fontsize=15, ha=\"center\", va=\"baseline\")\n\n\nplot_prior_predictive(df, prior)\n\n\n\n\nIndeed, priors are too wide! Let’s use tighter priors and see what’s the result\n\npriors = {\n \"Intercept\": bmb.Prior(\"Normal\", mu=0, sigma=1),\n \"1|playerID\": bmb.Prior(\"Normal\", mu=0, sigma=bmb.Prior(\"HalfNormal\", sigma=1))\n}\nmodel_hierarchical = bmb.Model(\"p(H, AB) ~ 1 + (1|playerID)\", df, family=\"binomial\", priors=priors)\nmodel_hierarchical\n\n Formula: p(H, AB) ~ 1 + (1|playerID)\n Family: binomial\n Link: p = logit\n Observations: 15\n Priors: \n target = p\n Common-level effects\n Intercept ~ Normal(mu: 0.0, sigma: 1.0)\n \n Group-level effects\n 1|playerID ~ Normal(mu: 0.0, sigma: HalfNormal(sigma: 1.0))\n\n\nNow let’s check the prior predictive distribution for these new priors.\n\nmodel_hierarchical.build()\nidata_prior = model_hierarchical.prior_predictive()\nprior = az.extract_dataset(idata_prior, group=\"prior_predictive\")[\"p(H, AB)\"]\nplot_prior_predictive(df, prior)\n\nSampling: [1|playerID_offset, 1|playerID_sigma, Intercept, p(H, AB)]\n/tmp/ipykernel_95801/1302716284.py:3: FutureWarning: extract_dataset has been deprecated, please use extract\n prior = az.extract_dataset(idata_prior, group=\"prior_predictive\")[\"p(H, AB)\"]\n\n\n\n\n\nDefinetely it looks much better. Now the priors tend to have a symmetric shape with a mode at 0.5, with substantial probability on the whole domain.\n\nidata_hierarchical = model_hierarchical.fit(random_seed=random_seed)\n\nAuto-assigning NUTS sampler...\nInitializing NUTS using jitter+adapt_diag...\nMultiprocess sampling (2 chains in 2 jobs)\nNUTS: [Intercept, 1|playerID_sigma, 1|playerID_offset]\n\n\n\n\n\n\n\n\n\n\n\nSampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 6 seconds.\nThere were 1 divergences after tuning. Increase `target_accept` or reparameterize.\nWe recommend running at least 4 chains for robust computation of convergence diagnostics\n\n\nLet’s try with increasing target_accept and the number of tune samples.\n\nidata_hierarchical = model_hierarchical.fit(tune=2000, draws=2000, target_accept=0.95, random_seed=random_seed)\n\nAuto-assigning NUTS sampler...\nInitializing NUTS using jitter+adapt_diag...\nMultiprocess sampling (2 chains in 2 jobs)\nNUTS: [Intercept, 1|playerID_sigma, 1|playerID_offset]\n\n\n\n\n\n\n\n\n\n\n\nSampling 2 chains for 2_000 tune and 2_000 draw iterations (4_000 + 4_000 draws total) took 13 seconds.\nThere were 3 divergences after tuning. Increase `target_accept` or reparameterize.\nWe recommend running at least 4 chains for robust computation of convergence diagnostics\n\n\n\nvar_names = [\"Intercept\", \"1|playerID\", \"1|playerID_sigma\"]\naz.plot_trace(idata_hierarchical, var_names=var_names, compact=False, backend_kwargs={\"layout\": \"constrained\"});\n\n\n\n\nLet’s jump onto the next section where we plot and compare the probability of hit for the players using both models.\n\n\n\nNow we’re going to plot the distribution of the probability of hit for each player, using both models.\nBut before doing that, we need to obtain the posterior in that scale. We could manually take the posterior of the coefficients, compute the linear predictor, and transform that to the probability scale. But that’s a lot of work!\nFortunately, Bambi models have a method called .predict() that we can use to predict in the probability scale. By default, it modifies in-place the InferenceData object we pass to it. Then, the posterior samples can be found in the variable p.\n\nmodel_non_hierarchical.predict(idata_non_hierarchical)\nmodel_hierarchical.predict(idata_hierarchical)\n\nLet’s create a forestplot using the posteriors obtained with both models so we can compare them very easily .\n\n_, ax = plt.subplots(figsize = (8, 8))\n\n# Add vertical line for the global probability of hitting\nax.axvline(x=(df[\"H\"] / df[\"AB\"]).mean(), ls=\"--\", color=\"black\", alpha=0.5)\n\n# Create forestplot with ArviZ, only for the mean.\naz.plot_forest(\n [idata_non_hierarchical, idata_hierarchical], \n var_names=\"p\", \n combined=True, \n colors=[\"#666666\", RED], \n linewidth=2.6, \n markersize=8,\n ax=ax\n)\n\n# Create custom y axis tick labels\nylabels = [f\"H: {round(h)}, AB: {round(ab)}\" for h, ab in zip(df[\"H\"].values, df[\"AB\"].values)]\nylabels = list(reversed(ylabels))\n\n# Put the labels for the y axis in the mid of the original location of the tick marks.\nax.set_yticklabels(ylabels, ha=\"right\")\n\n# Create legend\nhandles = [\n Patch(label=\"Non-hierarchical\", facecolor=\"#666666\"),\n Patch(label=\"Hierarchical\", facecolor=RED),\n Line2D([0], [0], label=\"Mean probability\", ls=\"--\", color=\"black\", alpha=0.5)\n]\n\nlegend = ax.legend(handles=handles, loc=4, fontsize=14, frameon=True, framealpha=0.8);\n\n\n\n\nOne of the first things one can see is that not only the center of the distributions varies but also their dispersion. Those posteriors that are very wide are associated with players who have batted only once or few times, while tighter posteriors correspond to players who batted several times.\nPlayers who have extreme empirical proportions have similar extreme posteriors under the non-hierarchical model. However, under the hierarchical model, these distributions are now shrunk towards the global mean. Extreme values are very unlikely under the hierarchical model.\nAnd finally, paraphrasing Eric, there’s nothing ineherently right or wrong about shrinkage and hierarchical models. Whether this is reasonable or not depends on our prior knowledge about the problem. And to me, after having seen the hit rates of the other players, it is much more reasonable to shrink extreme posteriors based on very few data points towards the global mean rather than just let them concentrate around 0 or 1.\n\n%load_ext watermark\n%watermark -n -u -v -iv -w\n\nLast updated: Sat May 25 2024\n\nPython implementation: CPython\nPython version : 3.11.9\nIPython version : 8.24.0\n\nmatplotlib: 3.8.4\nbambi : 0.13.1.dev37+g2a54df76.d20240525\nnumpy : 1.26.4\narviz : 0.18.0\n\nWatermark: 2.4.3\n\n\n\n\n\n\n\n By default, the .predict() method obtains the posterior for the mean of the likelihood distribution. This mean would be \\(np\\) for the Binomial family. However, since \\(n\\) varies from observation to observation, it returns the value of \\(p\\), as if it was a Bernoulli family." + "text": "This notebook shows how to build a hierarchical logistic regression model with the Binomial family in Bambi.\nThis example is based on the Hierarchical baseball article in Bayesian Analysis Recipes, a collection of articles on how to do Bayesian data analysis with PyMC3 made by Eric Ma.\n\n\nExtracted from the original work:\n\nBaseball players have many metrics measured for them. Let’s say we are on a baseball team, and would like to quantify player performance, one metric being their batting average (defined by how many times a batter hit a pitched ball, divided by the number of times they were up for batting (“at bat”)). How would you go about this task?\n\n\n\n\n\nimport arviz as az\nimport bambi as bmb\nimport matplotlib.pyplot as plt\nimport numpy as np\n\nfrom matplotlib.lines import Line2D\nfrom matplotlib.patches import Patch\n\n\nrandom_seed = 1234\n\nWe first need some measurements of batting data. Today we’re going to use data from the Baseball Databank. It is a compilation of historical baseball data in a convenient, tidy format, distributed under Open Data terms.\nThis repository contains several datasets in the form of .csv files. This example is going to use the Batting.csv file, which can be loaded directly with Bambi in a convenient way.\n\ndf = bmb.load_data(\"batting\")\n\n# Then clean some of the data\ndf[\"AB\"] = df[\"AB\"].replace(0, np.nan)\ndf = df.dropna()\ndf[\"batting_avg\"] = df[\"H\"] / df[\"AB\"]\ndf = df[df[\"yearID\"] >= 2016]\ndf = df.iloc[0:15] \ndf.head(5)\n\n\n\n\n\n \n \n \n playerID\n yearID\n stint\n teamID\n lgID\n G\n AB\n R\n H\n 2B\n ...\n SB\n CS\n BB\n SO\n IBB\n HBP\n SH\n SF\n GIDP\n batting_avg\n \n \n \n \n 101348\n abadfe01\n 2016\n 1\n MIN\n AL\n 39\n 1.0\n 0\n 0\n 0\n ...\n 0.0\n 0.0\n 0\n 1.0\n 0.0\n 0.0\n 0.0\n 0.0\n 0.0\n 0.000000\n \n \n 101350\n abreujo02\n 2016\n 1\n CHA\n AL\n 159\n 624.0\n 67\n 183\n 32\n ...\n 0.0\n 2.0\n 47\n 125.0\n 7.0\n 15.0\n 0.0\n 9.0\n 21.0\n 0.293269\n \n \n 101352\n ackledu01\n 2016\n 1\n NYA\n AL\n 28\n 61.0\n 6\n 9\n 0\n ...\n 0.0\n 0.0\n 8\n 9.0\n 0.0\n 0.0\n 0.0\n 1.0\n 0.0\n 0.147541\n \n \n 101353\n adamecr01\n 2016\n 1\n COL\n NL\n 121\n 225.0\n 25\n 49\n 7\n ...\n 2.0\n 3.0\n 24\n 47.0\n 0.0\n 4.0\n 3.0\n 0.0\n 5.0\n 0.217778\n \n \n 101355\n adamsma01\n 2016\n 1\n SLN\n NL\n 118\n 297.0\n 37\n 74\n 18\n ...\n 0.0\n 1.0\n 25\n 81.0\n 1.0\n 2.0\n 0.0\n 3.0\n 5.0\n 0.249158\n \n \n\n5 rows × 23 columns\n\n\n\nFrom all the columns above, we’re going to use the following:\n\nplayerID: Unique identification for the player.\nAB: Number of times the player was up for batting.\nH: Number of times the player hit the ball while batting.\nbatting_avg: Simply ratio between H and AB.\n\n\n\n\nIt’s always good to explore the data before starting to write down our models. This is very useful to gain a good understanding of the distribution of the variables and their relationships, and even anticipate some problems that may occur during the sampling process.\nThe following graph summarizes the percentage of hits, as well as the number of times the players were up for batting and the number of times they hit the ball.\n\nBLUE = \"#2a5674\"\nRED = \"#b13f64\"\n\n\n_, ax = plt.subplots(figsize=(10, 6))\n\n# Customize x limits. \n# This adds space on the left side to indicate percentage of hits.\nax.set_xlim(-120, 320)\n\n# Add dots for the the number of hits and the times at bat\nax.scatter(df[\"H\"], list(range(15)), s=140, color=RED, zorder=10)\nax.scatter(df[\"AB\"], list(range(15)), s=140, color=BLUE, zorder=10)\n\n# Also a line connecting them\nax.hlines(list(range(15)), df[\"H\"], df[\"AB\"], color=\"#b3b3b3\", lw=4)\n\nax.axvline(ls=\"--\", lw=1.4, color=\"#a3a3a3\")\nax.hlines(list(range(15)), -110, -50, lw=6, color=\"#b3b3b3\", capstyle=\"round\")\nax.scatter(60 * df[\"batting_avg\"] - 110, list(range(15)), s=28, color=RED, zorder=10)\n\n# Add the percentage of hits\nfor j in range(15): \n text = f\"{round(df['batting_avg'].iloc[j] * 100)}%\"\n ax.text(-12, j, text, ha=\"right\", va=\"center\", fontsize=14, color=\"#333\")\n\n# Customize tick positions and labels\nax.yaxis.set_ticks(list(range(15)))\nax.yaxis.set_ticklabels(df[\"playerID\"])\nax.xaxis.set_ticks(range(0, 400, 100))\n\n# Create handles for the legend (just dots and labels)\nhandles = [\n Line2D(\n [0], [0], label=\"Hits\", marker=\"o\", color=\"None\", markeredgewidth=0, \n markerfacecolor=RED, markersize=13\n ),\n Line2D(\n [0], [0], label=\"At Bat\", marker=\"o\", color=\"None\", markeredgewidth=0,\n markerfacecolor=BLUE, markersize=12\n )\n]\n\n# Add legend on top-right corner\nlegend = ax.legend(\n handles=handles, \n loc=1, \n fontsize=14, \n handletextpad=0.4,\n frameon=True\n)\n\n# Finally add labels and a title\nax.set_xlabel(\"Count\", fontsize=14)\nax.set_ylabel(\"Player\", fontsize=14)\nax.set_title(\"How often do batters hit the ball?\", fontsize=20);\n\n\n\n\nThe first thing one can see is that the number of times players were up for batting varies quite a lot. Some players have been there for very few times, while there are others who have been there hundreds of times. We can also note the percentage of hits is usually a number between 12% and 29%.\nThere are two players, alberma01 and abadfe01, who had only one chance to bat. The first one hit the ball, while the latter missed. That’s why alberma01 as a 100% hit percentage, while abadfe01 has 0%. There’s another player, aguilje01, who has a success record of 0% because he missed all the few opportunities he had to bat. These extreme situations, where the empirical estimation lives in the boundary of the parameter space, are associated with estimation problems when using a maximum-likelihood estimation approach. Nonetheless, they can also impact the sampling process, especially when using wide priors.\nAs a final note, abreujo02, has been there for batting 624 times, and thus the grey dot representing this number does not appear in the plot.\n\n\n\nLet’s get started with a simple cell-means logistic regression for \\(p_i\\), the probability of hitting the ball for the player \\(i\\)\n\\[\n\\begin{array}{lr}\n \\displaystyle \\text{logit}(p_i) = \\beta_i & \\text{with } i = 0, \\cdots, 14\n\\end{array} \n\\]\nWhere\n\\[\n\\beta_i \\sim \\text{Normal}(0, \\ \\sigma_{\\beta}),\n\\]\n\\(\\sigma_{\\beta}\\) is a common constant for all the players, and \\(\\text{logit}(p_i) = \\log\\left(\\frac{p_i}{1 - p_i}\\right)\\).\nSpecifying this model is quite simple in Bambi thanks to its formula interface.\nFirst of all, note this is a Binomial family and the response involves both the number of hits (H) and the number of times at bat (AB). We use the p(x, n) function for the response term. This just tells Bambi we want to model the proportion resulting from dividing x over n.\nThe right-hand side of the formula is \"0 + playerID\". This means the model includes a coefficient for each player ID, but does not include a global intercept.\nFinally, using the Binomial family is as easy as passing family=\"binomial\". By default, the link function for this family is link=\"logit\", so there’s nothing to change there.\n\nmodel_non_hierarchical = bmb.Model(\"p(H, AB) ~ 0 + playerID\", df, family=\"binomial\")\nmodel_non_hierarchical\n\n Formula: p(H, AB) ~ 0 + playerID\n Family: binomial\n Link: p = logit\n Observations: 15\n Priors: \n target = p\n Common-level effects\n playerID ~ Normal(mu: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.], sigma: [1. 1. 1. 1. 1. 1.\n 1. 1. 1. 1. 1. 1. 1. 1. 1.])\n\n\n\nidata_non_hierarchical = model_non_hierarchical.fit(random_seed=random_seed)\n\nAuto-assigning NUTS sampler...\nInitializing NUTS using jitter+adapt_diag...\nMultiprocess sampling (4 chains in 4 jobs)\nNUTS: [playerID]\n\n\n\n\n\n\n\n\n\n\n\nSampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 5 seconds.\n\n\nNext we observe the posterior of the coefficient for each player. The compact=False argument means we want separated panels for each player.\n\naz.plot_trace(idata_non_hierarchical, compact=False, backend_kwargs={\"layout\": \"constrained\"});\n\n\n\n\nSo far so good! The traceplots indicate the sampler worked well.\nNow, let’s keep this posterior aside for later use and let’s fit the hierarchical version.\n\n\n\nThis model incorporates a group-specific intercept for each player:\n\\[\n\\begin{array}{lr}\n \\displaystyle \\text{logit}(p_i) = \\alpha + \\gamma_i & \\text{with } i = 0, \\cdots, 14\n\\end{array} \n\\]\nwhere\n\\[\n\\begin{array}{c}\n \\alpha \\sim \\text{Normal}(0, \\ \\sigma_{\\alpha}) \\\\\n \\gamma_i \\sim \\text{Normal}(0, \\ \\sigma_{\\gamma}) \\\\\n \\sigma_{\\gamma} \\sim \\text{HalfNormal}(\\tau_{\\gamma})\n\\end{array}\n\\]\nThe group-specific terms are indicated with the | operator in the formula. In this case, since there is an intercept for each player, we write 1|playerID.\n\nmodel_hierarchical = bmb.Model(\"p(H, AB) ~ 1 + (1|playerID)\", df, family=\"binomial\")\nmodel_hierarchical\n\n Formula: p(H, AB) ~ 1 + (1|playerID)\n Family: binomial\n Link: p = logit\n Observations: 15\n Priors: \n target = p\n Common-level effects\n Intercept ~ Normal(mu: 0.0, sigma: 1.5)\n \n Group-level effects\n 1|playerID ~ Normal(mu: 0.0, sigma: HalfNormal(sigma: 2.5))\n\n\n\nidata_hierarchical = model_hierarchical.fit(random_seed=random_seed)\n\nAuto-assigning NUTS sampler...\nInitializing NUTS using jitter+adapt_diag...\nMultiprocess sampling (4 chains in 4 jobs)\nNUTS: [Intercept, 1|playerID_sigma, 1|playerID_offset]\n\n\n\n\n\n\n\n\n\n\n\nSampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 14 seconds.\nThere were 6 divergences after tuning. Increase `target_accept` or reparameterize.\n\n\nSometimes, there can be several divergences when fitting a hierarchical model. What can we do in that case?\nOne thing we could try is increasing target_accept. But if there are many divergences, that suggests a problem with the underlying model. Let’s take a look at the prior predictive distribution to see whether our priors are too informative or too wide.\nThe Model instance has a method called prior_predictive() that generates samples from the prior predictive distribution. It returns an InferenceData object that contains the values of the prior predictive distribution.\n\nidata_prior = model_hierarchical.prior_predictive()\nprior = az.extract_dataset(idata_prior, group=\"prior_predictive\")[\"p(H, AB)\"]\n\nSampling: [1|playerID_offset, 1|playerID_sigma, Intercept, p(H, AB)]\n/tmp/ipykernel_12852/2686921361.py:2: FutureWarning: extract_dataset has been deprecated, please use extract\n prior = az.extract_dataset(idata_prior, group=\"prior_predictive\")[\"p(H, AB)\"]\n\n\nIf we inspect the DataArray, we see there are 500 draws (sample) for each of the 15 players (__obs__)\nLet’s plot these distributions together with the observed proportion of hits for every player here.\n\n# We define this function because this plot is going to be repeated below.\ndef plot_prior_predictive(df, prior):\n AB = df[\"AB\"].values\n H = df[\"H\"].values\n\n fig, axes = plt.subplots(5, 3, figsize=(10, 6), sharex=\"col\")\n\n for idx, ax in enumerate(axes.ravel()):\n pps = prior.sel({\"__obs__\":idx})\n ab = AB[idx]\n h = H[idx]\n hist = ax.hist(pps / ab, bins=25, color=\"#a3a3a3\")\n ax.axvline(h / ab, color=RED, lw=2)\n ax.set_yticks([])\n ax.tick_params(labelsize=12)\n \n fig.subplots_adjust(left=0.025, right=0.975, hspace=0.05, wspace=0.05, bottom=0.125)\n fig.legend(\n handles=[Line2D([0], [0], label=\"Observed proportion\", color=RED, linewidth=2)],\n handlelength=1.5,\n handletextpad=0.8,\n borderaxespad=0,\n frameon=True,\n fontsize=11, \n bbox_to_anchor=(0.975, 0.92),\n loc=\"right\"\n \n )\n fig.text(0.5, 0.05, \"Prior probability of hitting\", fontsize=15, ha=\"center\", va=\"baseline\")\n\n\nplot_prior_predictive(df, prior)\n\n\n\n\nIndeed, priors are too wide! Let’s use tighter priors and see what’s the result\n\npriors = {\n \"Intercept\": bmb.Prior(\"Normal\", mu=0, sigma=1),\n \"1|playerID\": bmb.Prior(\"Normal\", mu=0, sigma=bmb.Prior(\"HalfNormal\", sigma=1))\n}\nmodel_hierarchical = bmb.Model(\"p(H, AB) ~ 1 + (1|playerID)\", df, family=\"binomial\", priors=priors)\nmodel_hierarchical\n\n Formula: p(H, AB) ~ 1 + (1|playerID)\n Family: binomial\n Link: p = logit\n Observations: 15\n Priors: \n target = p\n Common-level effects\n Intercept ~ Normal(mu: 0.0, sigma: 1.0)\n \n Group-level effects\n 1|playerID ~ Normal(mu: 0.0, sigma: HalfNormal(sigma: 1.0))\n\n\nNow let’s check the prior predictive distribution for these new priors.\n\nmodel_hierarchical.build()\nidata_prior = model_hierarchical.prior_predictive()\nprior = az.extract_dataset(idata_prior, group=\"prior_predictive\")[\"p(H, AB)\"]\nplot_prior_predictive(df, prior)\n\nSampling: [1|playerID_offset, 1|playerID_sigma, Intercept, p(H, AB)]\n/tmp/ipykernel_12852/1302716284.py:3: FutureWarning: extract_dataset has been deprecated, please use extract\n prior = az.extract_dataset(idata_prior, group=\"prior_predictive\")[\"p(H, AB)\"]\n\n\n\n\n\nDefinetely it looks much better. Now the priors tend to have a symmetric shape with a mode at 0.5, with substantial probability on the whole domain.\n\nidata_hierarchical = model_hierarchical.fit(random_seed=random_seed)\n\nAuto-assigning NUTS sampler...\nInitializing NUTS using jitter+adapt_diag...\nMultiprocess sampling (4 chains in 4 jobs)\nNUTS: [Intercept, 1|playerID_sigma, 1|playerID_offset]\n\n\n\n\n\n\n\n\n\n\n\nSampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 13 seconds.\nThere were 3 divergences after tuning. Increase `target_accept` or reparameterize.\n\n\nLet’s try with increasing target_accept and the number of tune samples.\n\nidata_hierarchical = model_hierarchical.fit(tune=2000, draws=2000, target_accept=0.95, random_seed=random_seed)\n\nAuto-assigning NUTS sampler...\nInitializing NUTS using jitter+adapt_diag...\nMultiprocess sampling (4 chains in 4 jobs)\nNUTS: [Intercept, 1|playerID_sigma, 1|playerID_offset]\n\n\n\n\n\n\n\n\n\n\n\nSampling 4 chains for 2_000 tune and 2_000 draw iterations (8_000 + 8_000 draws total) took 24 seconds.\n\n\n\nvar_names = [\"Intercept\", \"1|playerID\", \"1|playerID_sigma\"]\naz.plot_trace(idata_hierarchical, var_names=var_names, compact=False, backend_kwargs={\"layout\": \"constrained\"});\n\n\n\n\nLet’s jump onto the next section where we plot and compare the probability of hit for the players using both models.\n\n\n\nNow we’re going to plot the distribution of the probability of hit for each player, using both models.\nBut before doing that, we need to obtain the posterior in that scale. We could manually take the posterior of the coefficients, compute the linear predictor, and transform that to the probability scale. But that’s a lot of work!\nFortunately, Bambi models have a method called .predict() that we can use to predict in the probability scale. By default, it modifies in-place the InferenceData object we pass to it. Then, the posterior samples can be found in the variable p.\n\nmodel_non_hierarchical.predict(idata_non_hierarchical)\nmodel_hierarchical.predict(idata_hierarchical)\n\nLet’s create a forestplot using the posteriors obtained with both models so we can compare them very easily .\n\n_, ax = plt.subplots(figsize = (8, 8))\n\n# Add vertical line for the global probability of hitting\nax.axvline(x=(df[\"H\"] / df[\"AB\"]).mean(), ls=\"--\", color=\"black\", alpha=0.5)\n\n# Create forestplot with ArviZ, only for the mean.\naz.plot_forest(\n [idata_non_hierarchical, idata_hierarchical], \n var_names=\"p\", \n combined=True, \n colors=[\"#666666\", RED], \n linewidth=2.6, \n markersize=8,\n ax=ax\n)\n\n# Create custom y axis tick labels\nylabels = [f\"H: {round(h)}, AB: {round(ab)}\" for h, ab in zip(df[\"H\"].values, df[\"AB\"].values)]\nylabels = list(reversed(ylabels))\n\n# Put the labels for the y axis in the mid of the original location of the tick marks.\nax.set_yticklabels(ylabels, ha=\"right\")\n\n# Create legend\nhandles = [\n Patch(label=\"Non-hierarchical\", facecolor=\"#666666\"),\n Patch(label=\"Hierarchical\", facecolor=RED),\n Line2D([0], [0], label=\"Mean probability\", ls=\"--\", color=\"black\", alpha=0.5)\n]\n\nlegend = ax.legend(handles=handles, loc=4, fontsize=14, frameon=True, framealpha=0.8);\n\n\n\n\nOne of the first things one can see is that not only the center of the distributions varies but also their dispersion. Those posteriors that are very wide are associated with players who have batted only once or few times, while tighter posteriors correspond to players who batted several times.\nPlayers who have extreme empirical proportions have similar extreme posteriors under the non-hierarchical model. However, under the hierarchical model, these distributions are now shrunk towards the global mean. Extreme values are very unlikely under the hierarchical model.\nAnd finally, paraphrasing Eric, there’s nothing ineherently right or wrong about shrinkage and hierarchical models. Whether this is reasonable or not depends on our prior knowledge about the problem. And to me, after having seen the hit rates of the other players, it is much more reasonable to shrink extreme posteriors based on very few data points towards the global mean rather than just let them concentrate around 0 or 1.\n\n%load_ext watermark\n%watermark -n -u -v -iv -w\n\nLast updated: Thu Aug 15 2024\n\nPython implementation: CPython\nPython version : 3.11.9\nIPython version : 8.24.0\n\nbambi : 0.14.1.dev12+g64e57423.d20240730\nnumpy : 1.26.4\nmatplotlib: 3.8.4\narviz : 0.18.0\n\nWatermark: 2.4.3\n\n\n\n\n\n\n\n By default, the .predict() method obtains the posterior for the mean of the likelihood distribution. This mean would be \\(np\\) for the Binomial family. However, since \\(n\\) varies from observation to observation, it returns the value of \\(p\\), as if it was a Bernoulli family." }, { "objectID": "notebooks/how_bambi_works.html", @@ -298,7 +298,7 @@ "href": "notebooks/logistic_regression.html", "title": "Bambi", "section": "", - "text": "import arviz as az\nimport bambi as bmb\nimport matplotlib.pyplot as plt\nimport numpy as np\nimport pandas as pd\n\n\naz.style.use(\"arviz-darkgrid\")\nSEED = 7355608\n\n\n\nThese data are from the 2016 pilot study. The full study consisted of 1200 people, but here we’ve selected the subset of 487 people who responded to a question about whether they would vote for Hillary Clinton or Donald Trump.\n\ndata = bmb.load_data(\"ANES\")\ndata.head()\n\n\n\n\n\n \n \n \n vote\n age\n party_id\n \n \n \n \n 0\n clinton\n 56\n democrat\n \n \n 1\n trump\n 65\n republican\n \n \n 2\n clinton\n 80\n democrat\n \n \n 3\n trump\n 38\n republican\n \n \n 4\n trump\n 60\n republican\n \n \n\n\n\n\nOur outcome variable is vote, which gives peoples’ responses to the following question prompt:\n“If the 2016 presidential election were between Hillary Clinton for the Democrats and Donald Trump for the Republicans, would you vote for Hillary Clinton, Donald Trump, someone else, or probably not vote?”\n\ndata[\"vote\"].value_counts()\n\nvote\nclinton 215\ntrump 158\nsomeone_else 48\nName: count, dtype: int64\n\n\nThe two predictors we’ll examine are a respondent’s age and their political party affiliation, party_id, which is their response to the following question prompt:\n“Generally speaking, do you usually think of yourself as a Republican, a Democrat, an independent, or what?”\n\ndata[\"party_id\"].value_counts()\n\nparty_id\ndemocrat 186\nindependent 138\nrepublican 97\nName: count, dtype: int64\n\n\nThese two predictors are somewhat correlated, but not all that much:\n\nfig, ax = plt.subplots(1, 3, figsize=(10, 4), sharey=True, constrained_layout=True)\nkey = dict(zip(data[\"party_id\"].unique(), range(3)))\nfor label, df in data.groupby(\"party_id\"):\n ax[key[label]].hist(df[\"age\"])\n ax[key[label]].set_xlim([18, 90])\n ax[key[label]].set_xlabel(\"Age\")\n ax[key[label]].set_ylabel(\"Frequency\")\n ax[key[label]].set_title(label)\n ax[key[label]].axvline(df[\"age\"].mean(), color=\"C1\")\n\n\n\n\nWe can get a pretty clear idea of how party identification is related to voting intentions by just looking at a contingency table for these two variables:\n\npd.crosstab(data[\"vote\"], data[\"party_id\"])\n\n\n\n\n\n \n \n party_id\n democrat\n independent\n republican\n \n \n vote\n \n \n \n \n \n \n \n clinton\n 159\n 51\n 5\n \n \n someone_else\n 10\n 22\n 16\n \n \n trump\n 17\n 65\n 76\n \n \n\n\n\n\nBut our main question here will be: How is respondent age related to voting intentions, and is this relationship different for different party affiliations? For this we will use a logistic regression.\n\n\n\nTo keep this simple, let’s look at only the data from people who indicated that they would vote for either Clinton or Trump, and we’ll model the probability of voting for Clinton.\n\nclinton_data = data.loc[data[\"vote\"].isin([\"clinton\", \"trump\"]), :]\nclinton_data.head()\n\n\n\n\n\n \n \n \n vote\n age\n party_id\n \n \n \n \n 0\n clinton\n 56\n democrat\n \n \n 1\n trump\n 65\n republican\n \n \n 2\n clinton\n 80\n democrat\n \n \n 3\n trump\n 38\n republican\n \n \n 4\n trump\n 60\n republican\n \n \n\n\n\n\n\n\nWe’ll use a logistic regression model to estimate the probability of voting for Clinton as a function of age and party affiliation. We can think we have a response variable \\(Y\\) defined as\n\\[\nY =\n\\left\\{\n \\begin{array}{ll}\n 1 & \\textrm{if the person votes for Clinton} \\\\\n 0 & \\textrm{if the person votes for Trump}\n \\end{array}\n\\right.\n\\]\nand we are interested in modelling \\(\\pi = P(Y = 1)\\) (a.k.a. probability of success) based on two explanatory variables, age and party affiliation.\nA logistic regression is a model that links the \\(\\text{logit}(\\pi)\\) to a linear combination of the predictors. In our example, we’re going to include a main effect for party affiliation and the interaction effect between party affiliation and age (i.e. we’ll have a different age slope for each affiliation). The mathematical equation for our model is\n$$\n\\[\\begin{aligned}\n \\log{\\left(\\frac{\\pi}{1 - \\pi}\\right)} &=\n \\beta_0 + \\beta_1 X_1 + \\beta_2 X_2 + \\beta_3 X_3 X_4 + \\beta_4 X_1 X_4 + \\beta_5 X_2 X_4 \\\\\n\n X_1 &= \\left\\{\n \\begin{array}{ll}\n 1 & \\textrm{if party affiliation is Independent} \\\\\n 0 & \\textrm{in other case}\n \\end{array}\n \\right. \\\\\n\n X_2 &= \\left\\{\n \\begin{array}{ll}\n 1 & \\textrm{if party affiliation is Republican} \\\\\n 0 & \\textrm{in other case}\n \\end{array}\n \\right. \\\\\n\n X_3 &=\n \\left\\{\n \\begin{array}{ll}\n 1 & \\textrm{if party affiliation is Democrat} \\\\\n 0 & \\textrm{in other case}\n \\end{array}\n \\right. \\\\\n\n X_4 &= \\text{Age}\n\\end{aligned}\\]\n$$\nNotice we don’t have a main effect for \\(X_3\\). This happens because Democrat party affiliation is being taken as baseline in the encoding of the categorical variable party_id and \\(\\beta_1\\) and \\(\\beta_2\\) represent deviations from that baseline. Thus, we see the main effect of Democrat affiliation is being represented by the Intercept, \\(\\beta_0\\).\nIf we represent the right hand side of the model equation with \\(\\eta\\), the expression can be re-arranged to express our probability of interest, \\(\\pi\\), as a function of the linear predictor \\(\\eta\\).\n\\[\\pi = \\frac{e^\\eta}{1 + e^\\eta}= \\frac{1}{1 + e^{-\\eta}}\\]\nSince we’re Bayesian folks who draw samples from posteriors, we need to specify a prior for the parameters as well as a likelihood function before accomplishing our task. In this occasion, we’re going to use the default priors in Bambi and just note the likelihood is the product of \\(n\\) Bernoulli trials, \\(\\prod_{i=1}^{n}{p_i^y(1-p_i)^{1-y_i}}\\) where \\(p_i = P(Y=1)\\) and \\(y_i = 1\\) if the vote intention is for Clinton and \\(y_i = 0\\) if Trump.\n\n\n\nSpecifying and fitting the model is simple. Bambi is good and doesn’t ask us to translate all the math to code. We just need to specify our model using the formula syntax and pass the correct family argument. Notice the (optional) syntax that we use on the left-hand-side of the formula: We say vote[clinton] to instruct Bambi that we wish the model the probability that vote=='clinton', rather than the probability that vote=='trump'. If we leave this unspecified, Bambi will just pick one of the events to model, but will inform you which one it picked when you build the model (and again when you look at model summaries).\nOn the right-hand-side of the formula we use party_id + party_id:age to instruct Bambi that we want to use party_id and the interaction between party_id and age as the explanatory variables in the model.\n\n\nclinton_model = bmb.Model(\"vote['clinton'] ~ party_id + party_id:age\", clinton_data, family=\"bernoulli\")\nclinton_fitted = clinton_model.fit(\n draws=2000, target_accept=0.85, random_seed=SEED, idata_kwargs={\"log_likelihood\": True}\n)\n\nModeling the probability that vote==clinton\nAuto-assigning NUTS sampler...\nInitializing NUTS using jitter+adapt_diag...\nMultiprocess sampling (2 chains in 2 jobs)\nNUTS: [Intercept, party_id, party_id:age]\n\n\n\n\n\n\n\n\n\n\n\nSampling 2 chains for 1_000 tune and 2_000 draw iterations (2_000 + 4_000 draws total) took 11 seconds.\nWe recommend running at least 4 chains for robust computation of convergence diagnostics\n\n\nWe can print the model object to see information about the response distribution, the link function and the priors.\n\nclinton_model\n\n Formula: vote['clinton'] ~ party_id + party_id:age\n Family: bernoulli\n Link: p = logit\n Observations: 373\n Priors: \n target = p\n Common-level effects\n Intercept ~ Normal(mu: 0.0, sigma: 4.3846)\n party_id ~ Normal(mu: [0. 0.], sigma: [5.4007 6.0634])\n party_id:age ~ Normal(mu: [0. 0. 0.], sigma: [0.0938 0.1007 0.1098])\n------\n* To see a plot of the priors call the .plot_priors() method.\n* To see a summary or plot of the posterior pass the object returned by .fit() to az.summary() or az.plot_trace()\n\n\nUnder the hood, Bambi selected Gaussian priors for all the parameters in the model. By construction, all the priors, except the one for Intercept, are centered around 0, which is consistent with the desired weakly informative behavior. The standard deviation is specific to each parameter.\nSome more info about these default priors can be found in this technical paper.\nWe can also call clinton_model.plot_priors() to visualize the sensitive default priors Bambi has chosen for us.\n\nclinton_model.plot_priors();\n\nSampling: [Intercept, party_id, party_id:age]\n\n\n\n\n\nNow let’s check out the results! We get traceplots and density estimates for the posteriors with az.plot_trace() and a summary of the posteriors with az.summary().\n\naz.plot_trace(clinton_fitted, compact=False);\n\n\n\n\n\naz.summary(clinton_fitted)\n\n\n\n\n\n \n \n \n mean\n sd\n hdi_3%\n hdi_97%\n mcse_mean\n mcse_sd\n ess_bulk\n ess_tail\n r_hat\n \n \n \n \n Intercept\n 1.701\n 0.717\n 0.283\n 2.992\n 0.014\n 0.010\n 2770.0\n 2596.0\n 1.0\n \n \n party_id[independent]\n -0.304\n 0.944\n -2.076\n 1.435\n 0.019\n 0.015\n 2530.0\n 2452.0\n 1.0\n \n \n party_id[republican]\n -1.171\n 1.560\n -4.144\n 1.650\n 0.037\n 0.026\n 1813.0\n 2106.0\n 1.0\n \n \n party_id:age[democrat]\n 0.013\n 0.015\n -0.016\n 0.040\n 0.000\n 0.000\n 2669.0\n 2532.0\n 1.0\n \n \n party_id:age[independent]\n -0.033\n 0.012\n -0.055\n -0.011\n 0.000\n 0.000\n 3398.0\n 2370.0\n 1.0\n \n \n party_id:age[republican]\n -0.080\n 0.036\n -0.149\n -0.014\n 0.001\n 0.001\n 1765.0\n 1976.0\n 1.0\n \n \n\n\n\n\n\n\n\n\nBefore moving forward to inference, we can evaluate the quality of the model’s fit. We will take a look at two different ways of assessing how good is the model’s fit using its predictions.\n\n\nThere is a way of assessing the performance of a model with binary outcomes (such as logistic regression) in a visual way called separation plot. In a separation plot, the model’s predictions are averaged, ordered and represented as consecutive vertical lines. These vertical lines are colored according to the class indicated by their corresponding observed value, in this case light blue indicates class 0 (vote == 'Trump') and blue represents class 1 (vote =='Clinton'). We can use the ArviZ’ implementation of the separation plot, but first we have to obtain the model’s predictions.\n\nclinton_model.predict(clinton_fitted, kind=\"response\")\n\n\nax = az.plot_separation(clinton_fitted, y=\"vote\", figsize=(9,0.5));\n\n\n\n\nIn this separation plot we can see that some observations are misspredicted, specially in the right hand side of the plot where the model predicts Trump votes when there were really Clinton ones. We can further investigate this using another of ArviZ model evaluation tool.\n\n\n\n\nWe can also use ArviZ to compute LOO and find influential observations using the estimated \\(\\hat \\kappa\\) parameter value.\n\n# compute pointwise LOO\nloo = az.loo(clinton_fitted, pointwise=True)\n\n\n# plot kappa values\naz.plot_khat(loo.pareto_k);\n\n\n\n\nA first look at the khat plot shows that most observations’ \\(\\hat \\kappa\\) values are grouped together in a range that goes up to roughly 0.2. Above that value, we observe some dispersion and a few points that stand out by having the highest \\(\\hat \\kappa\\) values.\nAn observation is influential in the sense that if we refit the data by first removing that observation from the data set, the fitted result will be more different than if we do the same for a non influential observation. Clearly the level of influence of observations can vary continuously. An observation can be influential either because it is an outlier (a measurement error, a data entry error, etc) or because the model is not flexible enough to capture the observation. The approximations used to compute LOO are no longer reliable for \\(\\hat \\kappa > 0.7\\).\nLet us first take a look at the observation with the highest \\(\\hat \\kappa\\).\n\nax = az.plot_khat(loo.pareto_k.values.ravel())\nsorted_kappas = np.sort(loo.pareto_k.values.ravel())\n\n# find observation where the kappa value exceeds the threshold\nthreshold = sorted_kappas[-1:]\nax.axhline(threshold, ls=\"--\", color=\"orange\")\ninfluential_observations = clinton_data.reset_index()[loo.pareto_k.values >= threshold].index\n\nfor x in influential_observations:\n y = loo.pareto_k.values[x]\n ax.text(x, y + 0.01, str(x), ha=\"center\", va=\"baseline\")\n\n\n\n\n\nclinton_data.reset_index()[loo.pareto_k.values >= threshold]\n\n\n\n\n\n \n \n \n index\n vote\n age\n party_id\n \n \n \n \n 191\n 215\n trump\n 95\n republican\n \n \n\n\n\n\nThis observation corresponds to a 95 year old Republican party member that voted for Trump.\n\nLet us take a look at six observations with the highest \\(\\hat \\kappa\\) values.\n\nax = az.plot_khat(loo.pareto_k)\n\n# find observation where the kappa value exceeds the threshold\nthreshold = sorted_kappas[-6:].min()\nax.axhline(threshold, ls=\"--\", color=\"orange\")\ninfluential_observations = clinton_data.reset_index()[loo.pareto_k.values >= threshold].index\n\nfor x in influential_observations:\n y = loo.pareto_k.values[x]\n ax.text(x, y + 0.01, str(x), ha=\"center\", va=\"baseline\")\n\n\n\n\n\nclinton_data.reset_index()[loo.pareto_k.values>=threshold]\n\n\n\n\n\n \n \n \n index\n vote\n age\n party_id\n \n \n \n \n 17\n 17\n trump\n 76\n republican\n \n \n 34\n 34\n trump\n 83\n republican\n \n \n 58\n 64\n trump\n 84\n republican\n \n \n 62\n 68\n trump\n 91\n republican\n \n \n 191\n 215\n trump\n 95\n republican\n \n \n 309\n 347\n trump\n 79\n republican\n \n \n\n\n\n\nObservations number 34, 58, 62, and 191 correspond to individuals in under represented age groups in the data set. The rest correspond to Republican party members that voted for Clinton. Let us check how many observations we have of individuals older than 80 years old.\n\nclinton_data[clinton_data.age > 80]\n\n\n\n\n\n \n \n \n vote\n age\n party_id\n \n \n \n \n 34\n trump\n 83\n republican\n \n \n 64\n trump\n 84\n republican\n \n \n 68\n trump\n 91\n republican\n \n \n 97\n clinton\n 83\n democrat\n \n \n 215\n trump\n 95\n republican\n \n \n 246\n clinton\n 82\n democrat\n \n \n 403\n clinton\n 81\n democrat\n \n \n\n\n\n\nLet us check how many observations there are of Republicans who voted for Clinton\n\nclinton_data[(clinton_data.vote =='clinton') & (clinton_data.party_id == 'republican')]\n\n\n\n\n\n \n \n \n vote\n age\n party_id\n \n \n \n \n 170\n clinton\n 27\n republican\n \n \n 248\n clinton\n 36\n republican\n \n \n 359\n clinton\n 22\n republican\n \n \n 361\n clinton\n 37\n republican\n \n \n 410\n clinton\n 55\n republican\n \n \n\n\n\n\nThere are only two observations for individuals older than 80 years old and five observations for individuals of the Republican party that vote for Clinton. The fact that the model finds it difficult to predict for these observations is related to model uncertainty, due to a scarce number of observations that exhibit these characteristics.\nLet us repeat the separation plot, this time marking the observations we have analyzed. This plot will show us how the model predicted these particular observations.\n\nimport matplotlib.patheffects as pe\n\nax = az.plot_separation(clinton_fitted, y=\"vote\", figsize=(9, 0.5))\n\ny = np.random.uniform(0.1, 0.5, size=len(influential_observations))\n\nfor x, y in zip(influential_observations, y):\n text = str(x)\n x = x / len(clinton_data)\n ax.scatter(x, y, marker=\"+\", s=50, color=\"red\", zorder=3)\n ax.text(\n x, y + 0.1, text, color=\"white\", ha=\"center\", va=\"bottom\",\n path_effects=[pe.withStroke(linewidth=2, foreground=\"black\")]\n )\n\n\n\n\n\nclinton_data.reset_index()[loo.pareto_k.values>=threshold]\n\n\n\n\n\n \n \n \n index\n vote\n age\n party_id\n \n \n \n \n 17\n 17\n trump\n 76\n republican\n \n \n 34\n 34\n trump\n 83\n republican\n \n \n 58\n 64\n trump\n 84\n republican\n \n \n 62\n 68\n trump\n 91\n republican\n \n \n 191\n 215\n trump\n 95\n republican\n \n \n 309\n 347\n trump\n 79\n republican\n \n \n\n\n\n\nThis assessment helped us to further understand the model and quality of the fit. It also illustrates the intuition that we should be cautious when predicting for under represented age groups and voting behaviours.\n\n\n\nGrab the posteriors samples of the age slopes for the three party_id categories.\n\nparties = [\"democrat\", \"independent\", \"republican\"]\ndem, ind, rep = [clinton_fitted.posterior[\"party_id:age\"].sel({\"party_id:age_dim\":party}) for party in parties]\n\nPlot the marginal posteriors for the age slopes for the three political affiliations.\n\n_, ax = plt.subplots()\nfor idx, x in enumerate([dem, ind, rep]):\n az.plot_dist(x, label=x[\"party_id:age_dim\"].item(), plot_kwargs={\"color\": f\"C{idx}\"}, ax=ax)\nax.legend(loc=\"upper left\");\n\n\n\n\nNow, using the joint posterior, we can answer our questions in terms of probabilities.\nWhat is the probability that the Democrat slope is greater than the Republican slope?\n\n(dem > rep).mean().item()\n\n0.995\n\n\nProbability that the Democrat slope is greater than the Independent slope?\n\n(dem > ind).mean().item()\n\n0.99475\n\n\nProbability that the Independent slope is greater than the Republican slope?\n\n(ind > rep).mean().item()\n\n0.8995\n\n\nProbability that the Democrat slope is greater than 0?\n\n(dem > 0).mean().item()\n\n0.80525\n\n\nProbability that the Republican slope is less than 0?\n\n(rep < 0).mean().item()\n\n0.993\n\n\nProbability that the Independent slope is less than 0?\n\n(ind < 0).mean().item()\n\n0.998\n\n\nIf we look at the plot of the marginal posteriors, we may be suspicious that, for example, the probability that Democrat slope is greater than the Republican slope is 0.998 (almost 1!), given the overlap between the blue and green density functions. However, we can’t answer such a question using the marginal posteriors only, as shown in the plot. Since Democrat and Republican slopes (\\(\\beta_3\\) and \\(\\beta_5\\), respectively) are random variables, we need to use their joint distribution to answer probability questions that involve both of them. The fact that logical comparisons (e.g. > in dem > ind) are performed elementwise ensures we’re using samples from the joint posterior as we should. We also note that when the question involves only one of the random variables, it is fine to use the marginal distribution (e.g. (rep < 0).mean()).\nFinally, all these comments may have not been necessary since we didn’t need to mention anything about marginal or joint distributions when performing the calculations, we’ve just grabbed the samples and applied some basic math. But that’s an advantage of Bambi and the Bayesian approach. Things that are not so simple, became simpler :)\n\n\n\nHere we make use of the Model.predict() method to predict the probability of voting for Clinton for an out-of-sample dataset that we create.\n\nage = np.arange(18, 91)\nnew_data = pd.DataFrame({\n \"age\": np.tile(age, 3),\n \"party_id\": np.repeat([\"democrat\", \"republican\", \"independent\"], len(age))\n})\nnew_data\n\n\n\n\n\n \n \n \n age\n party_id\n \n \n \n \n 0\n 18\n democrat\n \n \n 1\n 19\n democrat\n \n \n 2\n 20\n democrat\n \n \n 3\n 21\n democrat\n \n \n 4\n 22\n democrat\n \n \n ...\n ...\n ...\n \n \n 214\n 86\n independent\n \n \n 215\n 87\n independent\n \n \n 216\n 88\n independent\n \n \n 217\n 89\n independent\n \n \n 218\n 90\n independent\n \n \n\n219 rows × 2 columns\n\n\n\nObtain predictions for the new dataset. By default, Bambi is going to obtain a posterior distribution for the mean probability of voting for Clinton. These values are stored as the \"p\" variable in clinton_fitted.posterior.\n\nclinton_model.predict(clinton_fitted, data=new_data)\n\n\n# Select a sample of posterior values for the mean probability of voting for Clinton\nvote_posterior = az.extract_dataset(clinton_fitted, num_samples=2000)[\"p\"]\n\n/tmp/ipykernel_116464/1918123043.py:2: FutureWarning: extract_dataset has been deprecated, please use extract\n vote_posterior = az.extract_dataset(clinton_fitted, num_samples=2000)[\"p\"]\n\n\nMake the plot!\n\n_, ax = plt.subplots(figsize=(7, 5))\n\nfor i, party in enumerate([\"democrat\", \"republican\", \"independent\"]):\n # Which rows in new_data correspond to party?\n idx = new_data.index[new_data[\"party_id\"] == party].tolist()\n ax.plot(age, vote_posterior[idx], alpha=0.04, color=f\"C{i}\")\n\nax.set_ylabel(\"P(vote='clinton' | age)\")\nax.set_xlabel(\"Age\", fontsize=15)\nax.set_ylim(0, 1)\nax.set_xlim(18, 90);\n\n\n\n\nThe following is a rough interpretation of the information contained in the plot we’ve just created.\nAccording to our logistic model, the mean probability of voting for Clinton is almost always 0.8 or greater for Democrats no matter the age (blue line). Also, the older the person, the closer the mean probability of voting Clinton to 1.\nOn the other hand, Republicans have a non-zero probability of voting for Clinton when they are young, but it tends to zero for older persons (green line). We can also note the high variability of P(vote = ‘Clinton’) for young Republicans. This reflects our high uncertainty when estimating this probability and it is due to the small amount of Republicans in that age range plus there are only 5 Republicans out of 97 voting for Clinton in the dataset.\nFinally, the mean probability of voting Clinton for the independents is around 0.7 for the youngest and decreases towards 0.2 as they get older (orange line). Since the spread of the lines is similar along all the ages, we can conclude our uncertainty in this estimate is similar for all the age groups.\n\n%load_ext watermark\n%watermark -n -u -v -iv -w\n\nLast updated: Sat May 25 2024\n\nPython implementation: CPython\nPython version : 3.11.9\nIPython version : 8.24.0\n\npandas : 2.2.2\nmatplotlib: 3.8.4\nnumpy : 1.26.4\narviz : 0.18.0\nbambi : 0.13.1.dev37+g2a54df76.d20240525\n\nWatermark: 2.4.3" + "text": "import arviz as az\nimport bambi as bmb\nimport matplotlib.pyplot as plt\nimport numpy as np\nimport pandas as pd\n\n\naz.style.use(\"arviz-darkgrid\")\nSEED = 7355608\n\n\n\nThese data are from the 2016 pilot study. The full study consisted of 1200 people, but here we’ve selected the subset of 487 people who responded to a question about whether they would vote for Hillary Clinton or Donald Trump.\n\ndata = bmb.load_data(\"ANES\")\ndata.head()\n\n\n\n\n\n \n \n \n vote\n age\n party_id\n \n \n \n \n 0\n clinton\n 56\n democrat\n \n \n 1\n trump\n 65\n republican\n \n \n 2\n clinton\n 80\n democrat\n \n \n 3\n trump\n 38\n republican\n \n \n 4\n trump\n 60\n republican\n \n \n\n\n\n\nOur outcome variable is vote, which gives peoples’ responses to the following question prompt:\n“If the 2016 presidential election were between Hillary Clinton for the Democrats and Donald Trump for the Republicans, would you vote for Hillary Clinton, Donald Trump, someone else, or probably not vote?”\n\ndata[\"vote\"].value_counts()\n\nvote\nclinton 215\ntrump 158\nsomeone_else 48\nName: count, dtype: int64\n\n\nThe two predictors we’ll examine are a respondent’s age and their political party affiliation, party_id, which is their response to the following question prompt:\n“Generally speaking, do you usually think of yourself as a Republican, a Democrat, an independent, or what?”\n\ndata[\"party_id\"].value_counts()\n\nparty_id\ndemocrat 186\nindependent 138\nrepublican 97\nName: count, dtype: int64\n\n\nThese two predictors are somewhat correlated, but not all that much:\n\nfig, ax = plt.subplots(1, 3, figsize=(10, 4), sharey=True, constrained_layout=True)\nkey = dict(zip(data[\"party_id\"].unique(), range(3)))\nfor label, df in data.groupby(\"party_id\"):\n ax[key[label]].hist(df[\"age\"])\n ax[key[label]].set_xlim([18, 90])\n ax[key[label]].set_xlabel(\"Age\")\n ax[key[label]].set_ylabel(\"Frequency\")\n ax[key[label]].set_title(label)\n ax[key[label]].axvline(df[\"age\"].mean(), color=\"C1\")\n\n\n\n\nWe can get a pretty clear idea of how party identification is related to voting intentions by just looking at a contingency table for these two variables:\n\npd.crosstab(data[\"vote\"], data[\"party_id\"])\n\n\n\n\n\n \n \n party_id\n democrat\n independent\n republican\n \n \n vote\n \n \n \n \n \n \n \n clinton\n 159\n 51\n 5\n \n \n someone_else\n 10\n 22\n 16\n \n \n trump\n 17\n 65\n 76\n \n \n\n\n\n\nBut our main question here will be: How is respondent age related to voting intentions, and is this relationship different for different party affiliations? For this we will use a logistic regression.\n\n\n\nTo keep this simple, let’s look at only the data from people who indicated that they would vote for either Clinton or Trump, and we’ll model the probability of voting for Clinton.\n\nclinton_data = data.loc[data[\"vote\"].isin([\"clinton\", \"trump\"]), :]\nclinton_data.head()\n\n\n\n\n\n \n \n \n vote\n age\n party_id\n \n \n \n \n 0\n clinton\n 56\n democrat\n \n \n 1\n trump\n 65\n republican\n \n \n 2\n clinton\n 80\n democrat\n \n \n 3\n trump\n 38\n republican\n \n \n 4\n trump\n 60\n republican\n \n \n\n\n\n\n\n\nWe’ll use a logistic regression model to estimate the probability of voting for Clinton as a function of age and party affiliation. We can think we have a response variable \\(Y\\) defined as\n\\[\nY =\n\\left\\{\n \\begin{array}{ll}\n 1 & \\textrm{if the person votes for Clinton} \\\\\n 0 & \\textrm{if the person votes for Trump}\n \\end{array}\n\\right.\n\\]\nand we are interested in modelling \\(\\pi = P(Y = 1)\\) (a.k.a. probability of success) based on two explanatory variables, age and party affiliation.\nA logistic regression is a model that links the \\(\\text{logit}(\\pi)\\) to a linear combination of the predictors. In our example, we’re going to include a main effect for party affiliation and the interaction effect between party affiliation and age (i.e. we’ll have a different age slope for each affiliation). The mathematical equation for our model is\n$$\n\\[\\begin{aligned}\n \\log{\\left(\\frac{\\pi}{1 - \\pi}\\right)} &=\n \\beta_0 + \\beta_1 X_1 + \\beta_2 X_2 + \\beta_3 X_3 X_4 + \\beta_4 X_1 X_4 + \\beta_5 X_2 X_4 \\\\\n\n X_1 &= \\left\\{\n \\begin{array}{ll}\n 1 & \\textrm{if party affiliation is Independent} \\\\\n 0 & \\textrm{in other case}\n \\end{array}\n \\right. \\\\\n\n X_2 &= \\left\\{\n \\begin{array}{ll}\n 1 & \\textrm{if party affiliation is Republican} \\\\\n 0 & \\textrm{in other case}\n \\end{array}\n \\right. \\\\\n\n X_3 &=\n \\left\\{\n \\begin{array}{ll}\n 1 & \\textrm{if party affiliation is Democrat} \\\\\n 0 & \\textrm{in other case}\n \\end{array}\n \\right. \\\\\n\n X_4 &= \\text{Age}\n\\end{aligned}\\]\n$$\nNotice we don’t have a main effect for \\(X_3\\). This happens because Democrat party affiliation is being taken as baseline in the encoding of the categorical variable party_id and \\(\\beta_1\\) and \\(\\beta_2\\) represent deviations from that baseline. Thus, we see the main effect of Democrat affiliation is being represented by the Intercept, \\(\\beta_0\\).\nIf we represent the right hand side of the model equation with \\(\\eta\\), the expression can be re-arranged to express our probability of interest, \\(\\pi\\), as a function of the linear predictor \\(\\eta\\).\n\\[\\pi = \\frac{e^\\eta}{1 + e^\\eta}= \\frac{1}{1 + e^{-\\eta}}\\]\nSince we’re Bayesian folks who draw samples from posteriors, we need to specify a prior for the parameters as well as a likelihood function before accomplishing our task. In this occasion, we’re going to use the default priors in Bambi and just note the likelihood is the product of \\(n\\) Bernoulli trials, \\(\\prod_{i=1}^{n}{p_i^y(1-p_i)^{1-y_i}}\\) where \\(p_i = P(Y=1)\\) and \\(y_i = 1\\) if the vote intention is for Clinton and \\(y_i = 0\\) if Trump.\n\n\n\nSpecifying and fitting the model is simple. Bambi is good and doesn’t ask us to translate all the math to code. We just need to specify our model using the formula syntax and pass the correct family argument. Notice the (optional) syntax that we use on the left-hand-side of the formula: We say vote[clinton] to instruct Bambi that we wish the model the probability that vote=='clinton', rather than the probability that vote=='trump'. If we leave this unspecified, Bambi will just pick one of the events to model, but will inform you which one it picked when you build the model (and again when you look at model summaries).\nOn the right-hand-side of the formula we use party_id + party_id:age to instruct Bambi that we want to use party_id and the interaction between party_id and age as the explanatory variables in the model.\n\n\nclinton_model = bmb.Model(\"vote['clinton'] ~ party_id + party_id:age\", clinton_data, family=\"bernoulli\")\nclinton_fitted = clinton_model.fit(\n draws=2000, target_accept=0.85, random_seed=SEED, idata_kwargs={\"log_likelihood\": True}\n)\n\nModeling the probability that vote==clinton\nAuto-assigning NUTS sampler...\nInitializing NUTS using jitter+adapt_diag...\nMultiprocess sampling (4 chains in 4 jobs)\nNUTS: [Intercept, party_id, party_id:age]\n\n\n\n\n\n\n\n\n\n\n\nSampling 4 chains for 1_000 tune and 2_000 draw iterations (4_000 + 8_000 draws total) took 21 seconds.\n\n\nWe can print the model object to see information about the response distribution, the link function and the priors.\n\nclinton_model\n\n Formula: vote['clinton'] ~ party_id + party_id:age\n Family: bernoulli\n Link: p = logit\n Observations: 373\n Priors: \n target = p\n Common-level effects\n Intercept ~ Normal(mu: 0.0, sigma: 1.5)\n party_id ~ Normal(mu: [0. 0.], sigma: [1. 1.])\n party_id:age ~ Normal(mu: [0. 0. 0.], sigma: [0.0582 0.0582 0.0582])\n------\n* To see a plot of the priors call the .plot_priors() method.\n* To see a summary or plot of the posterior pass the object returned by .fit() to az.summary() or az.plot_trace()\n\n\nUnder the hood, Bambi selected Gaussian priors for all the parameters in the model. By construction, all the priors, except the one for Intercept, are centered around 0, which is consistent with the desired weakly informative behavior. The standard deviation is specific to each parameter.\nSome more info about these default priors can be found in this technical paper.\nWe can also call clinton_model.plot_priors() to visualize the sensitive default priors Bambi has chosen for us.\n\nclinton_model.plot_priors();\n\nSampling: [Intercept, party_id, party_id:age]\n\n\n\n\n\nNow let’s check out the results! We get traceplots and density estimates for the posteriors with az.plot_trace() and a summary of the posteriors with az.summary().\n\naz.plot_trace(clinton_fitted, compact=False);\n\n\n\n\n\naz.summary(clinton_fitted)\n\n\n\n\n\n \n \n \n mean\n sd\n hdi_3%\n hdi_97%\n mcse_mean\n mcse_sd\n ess_bulk\n ess_tail\n r_hat\n \n \n \n \n Intercept\n 1.392\n 0.553\n 0.378\n 2.451\n 0.007\n 0.005\n 6778.0\n 5131.0\n 1.0\n \n \n party_id[independent]\n -0.046\n 0.650\n -1.221\n 1.173\n 0.008\n 0.008\n 6081.0\n 4477.0\n 1.0\n \n \n party_id[republican]\n -0.603\n 0.814\n -2.142\n 0.933\n 0.010\n 0.008\n 6336.0\n 5269.0\n 1.0\n \n \n party_id:age[democrat]\n 0.018\n 0.012\n -0.004\n 0.042\n 0.000\n 0.000\n 6588.0\n 5188.0\n 1.0\n \n \n party_id:age[independent]\n -0.032\n 0.010\n -0.052\n -0.014\n 0.000\n 0.000\n 6877.0\n 4836.0\n 1.0\n \n \n party_id:age[republican]\n -0.082\n 0.024\n -0.127\n -0.038\n 0.000\n 0.000\n 5677.0\n 5113.0\n 1.0\n \n \n\n\n\n\n\n\n\n\nBefore moving forward to inference, we can evaluate the quality of the model’s fit. We will take a look at two different ways of assessing how good is the model’s fit using its predictions.\n\n\nThere is a way of assessing the performance of a model with binary outcomes (such as logistic regression) in a visual way called separation plot. In a separation plot, the model’s predictions are averaged, ordered and represented as consecutive vertical lines. These vertical lines are colored according to the class indicated by their corresponding observed value, in this case light blue indicates class 0 (vote == 'Trump') and blue represents class 1 (vote =='Clinton'). We can use the ArviZ’ implementation of the separation plot, but first we have to obtain the model’s predictions.\n\nclinton_model.predict(clinton_fitted, kind=\"response\")\n\n\nax = az.plot_separation(clinton_fitted, y=\"vote\", figsize=(9,0.5));\n\n\n\n\nIn this separation plot we can see that some observations are misspredicted, specially in the right hand side of the plot where the model predicts Trump votes when there were really Clinton ones. We can further investigate this using another of ArviZ model evaluation tool.\n\n\n\n\nWe can also use ArviZ to compute LOO and find influential observations using the estimated \\(\\hat \\kappa\\) parameter value.\n\n# compute pointwise LOO\nloo = az.loo(clinton_fitted, pointwise=True)\n\n\n# plot kappa values\naz.plot_khat(loo.pareto_k);\n\n\n\n\nA first look at the khat plot shows that most observations’ \\(\\hat \\kappa\\) values are grouped together in a range that goes up to roughly 0.2. Above that value, we observe some dispersion and a few points that stand out by having the highest \\(\\hat \\kappa\\) values.\nAn observation is influential in the sense that if we refit the data by first removing that observation from the data set, the fitted result will be more different than if we do the same for a non influential observation. Clearly the level of influence of observations can vary continuously. An observation can be influential either because it is an outlier (a measurement error, a data entry error, etc) or because the model is not flexible enough to capture the observation. The approximations used to compute LOO are no longer reliable for \\(\\hat \\kappa > 0.7\\).\nLet us first take a look at the observation with the highest \\(\\hat \\kappa\\).\n\nax = az.plot_khat(loo.pareto_k.values.ravel())\nsorted_kappas = np.sort(loo.pareto_k.values.ravel())\n\n# find observation where the kappa value exceeds the threshold\nthreshold = sorted_kappas[-1:]\nax.axhline(threshold, ls=\"--\", color=\"orange\")\ninfluential_observations = clinton_data.reset_index()[loo.pareto_k.values >= threshold].index\n\nfor x in influential_observations:\n y = loo.pareto_k.values[x]\n ax.text(x, y + 0.01, str(x), ha=\"center\", va=\"baseline\")\n\n\n\n\n\nclinton_data.reset_index()[loo.pareto_k.values >= threshold]\n\n\n\n\n\n \n \n \n index\n vote\n age\n party_id\n \n \n \n \n 191\n 215\n trump\n 95\n republican\n \n \n\n\n\n\nThis observation corresponds to a 95 year old Republican party member that voted for Trump.\n\nLet us take a look at six observations with the highest \\(\\hat \\kappa\\) values.\n\nax = az.plot_khat(loo.pareto_k)\n\n# find observation where the kappa value exceeds the threshold\nthreshold = sorted_kappas[-6:].min()\nax.axhline(threshold, ls=\"--\", color=\"orange\")\ninfluential_observations = clinton_data.reset_index()[loo.pareto_k.values >= threshold].index\n\nfor x in influential_observations:\n y = loo.pareto_k.values[x]\n ax.text(x, y + 0.01, str(x), ha=\"center\", va=\"baseline\")\n\n\n\n\n\nclinton_data.reset_index()[loo.pareto_k.values>=threshold]\n\n\n\n\n\n \n \n \n index\n vote\n age\n party_id\n \n \n \n \n 34\n 34\n trump\n 83\n republican\n \n \n 45\n 46\n trump\n 75\n democrat\n \n \n 58\n 64\n trump\n 84\n republican\n \n \n 62\n 68\n trump\n 91\n republican\n \n \n 191\n 215\n trump\n 95\n republican\n \n \n 365\n 410\n clinton\n 55\n republican\n \n \n\n\n\n\nObservations number 34, 58, 62, and 191 correspond to individuals in under represented age groups in the data set. The rest correspond to Republican party members that voted for Clinton. Let us check how many observations we have of individuals older than 80 years old.\n\nclinton_data[clinton_data.age > 80]\n\n\n\n\n\n \n \n \n vote\n age\n party_id\n \n \n \n \n 34\n trump\n 83\n republican\n \n \n 64\n trump\n 84\n republican\n \n \n 68\n trump\n 91\n republican\n \n \n 97\n clinton\n 83\n democrat\n \n \n 215\n trump\n 95\n republican\n \n \n 246\n clinton\n 82\n democrat\n \n \n 403\n clinton\n 81\n democrat\n \n \n\n\n\n\nLet us check how many observations there are of Republicans who voted for Clinton\n\nclinton_data[(clinton_data.vote =='clinton') & (clinton_data.party_id == 'republican')]\n\n\n\n\n\n \n \n \n vote\n age\n party_id\n \n \n \n \n 170\n clinton\n 27\n republican\n \n \n 248\n clinton\n 36\n republican\n \n \n 359\n clinton\n 22\n republican\n \n \n 361\n clinton\n 37\n republican\n \n \n 410\n clinton\n 55\n republican\n \n \n\n\n\n\nThere are only two observations for individuals older than 80 years old and five observations for individuals of the Republican party that vote for Clinton. The fact that the model finds it difficult to predict for these observations is related to model uncertainty, due to a scarce number of observations that exhibit these characteristics.\nLet us repeat the separation plot, this time marking the observations we have analyzed. This plot will show us how the model predicted these particular observations.\n\nimport matplotlib.patheffects as pe\n\nax = az.plot_separation(clinton_fitted, y=\"vote\", figsize=(9, 0.5))\n\ny = np.random.uniform(0.1, 0.5, size=len(influential_observations))\n\nfor x, y in zip(influential_observations, y):\n text = str(x)\n x = x / len(clinton_data)\n ax.scatter(x, y, marker=\"+\", s=50, color=\"red\", zorder=3)\n ax.text(\n x, y + 0.1, text, color=\"white\", ha=\"center\", va=\"bottom\",\n path_effects=[pe.withStroke(linewidth=2, foreground=\"black\")]\n )\n\n\n\n\n\nclinton_data.reset_index()[loo.pareto_k.values>=threshold]\n\n\n\n\n\n \n \n \n index\n vote\n age\n party_id\n \n \n \n \n 34\n 34\n trump\n 83\n republican\n \n \n 45\n 46\n trump\n 75\n democrat\n \n \n 58\n 64\n trump\n 84\n republican\n \n \n 62\n 68\n trump\n 91\n republican\n \n \n 191\n 215\n trump\n 95\n republican\n \n \n 365\n 410\n clinton\n 55\n republican\n \n \n\n\n\n\nThis assessment helped us to further understand the model and quality of the fit. It also illustrates the intuition that we should be cautious when predicting for under represented age groups and voting behaviours.\n\n\n\nGrab the posteriors samples of the age slopes for the three party_id categories.\n\nparties = [\"democrat\", \"independent\", \"republican\"]\ndem, ind, rep = [clinton_fitted.posterior[\"party_id:age\"].sel({\"party_id:age_dim\":party}) for party in parties]\n\nPlot the marginal posteriors for the age slopes for the three political affiliations.\n\n_, ax = plt.subplots()\nfor idx, x in enumerate([dem, ind, rep]):\n az.plot_dist(x, label=x[\"party_id:age_dim\"].item(), plot_kwargs={\"color\": f\"C{idx}\"}, ax=ax)\nax.legend(loc=\"upper left\");\n\n\n\n\nNow, using the joint posterior, we can answer our questions in terms of probabilities.\nWhat is the probability that the Democrat slope is greater than the Republican slope?\n\n(dem > rep).mean().item()\n\n1.0\n\n\nProbability that the Democrat slope is greater than the Independent slope?\n\n(dem > ind).mean().item()\n\n1.0\n\n\nProbability that the Independent slope is greater than the Republican slope?\n\n(ind > rep).mean().item()\n\n0.9795\n\n\nProbability that the Democrat slope is greater than 0?\n\n(dem > 0).mean().item()\n\n0.93775\n\n\nProbability that the Republican slope is less than 0?\n\n(rep < 0).mean().item()\n\n0.999875\n\n\nProbability that the Independent slope is less than 0?\n\n(ind < 0).mean().item()\n\n0.999625\n\n\nIf we look at the plot of the marginal posteriors, we may be suspicious that, for example, the probability that Democrat slope is greater than the Republican slope is 0.998 (almost 1!), given the overlap between the blue and green density functions. However, we can’t answer such a question using the marginal posteriors only, as shown in the plot. Since Democrat and Republican slopes (\\(\\beta_3\\) and \\(\\beta_5\\), respectively) are random variables, we need to use their joint distribution to answer probability questions that involve both of them. The fact that logical comparisons (e.g. > in dem > ind) are performed elementwise ensures we’re using samples from the joint posterior as we should. We also note that when the question involves only one of the random variables, it is fine to use the marginal distribution (e.g. (rep < 0).mean()).\nFinally, all these comments may have not been necessary since we didn’t need to mention anything about marginal or joint distributions when performing the calculations, we’ve just grabbed the samples and applied some basic math. But that’s an advantage of Bambi and the Bayesian approach. Things that are not so simple, became simpler :)\n\n\n\nHere we make use of the Model.predict() method to predict the probability of voting for Clinton for an out-of-sample dataset that we create.\n\nage = np.arange(18, 91)\nnew_data = pd.DataFrame({\n \"age\": np.tile(age, 3),\n \"party_id\": np.repeat([\"democrat\", \"republican\", \"independent\"], len(age))\n})\nnew_data\n\n\n\n\n\n \n \n \n age\n party_id\n \n \n \n \n 0\n 18\n democrat\n \n \n 1\n 19\n democrat\n \n \n 2\n 20\n democrat\n \n \n 3\n 21\n democrat\n \n \n 4\n 22\n democrat\n \n \n ...\n ...\n ...\n \n \n 214\n 86\n independent\n \n \n 215\n 87\n independent\n \n \n 216\n 88\n independent\n \n \n 217\n 89\n independent\n \n \n 218\n 90\n independent\n \n \n\n219 rows × 2 columns\n\n\n\nObtain predictions for the new dataset. By default, Bambi is going to obtain a posterior distribution for the mean probability of voting for Clinton. These values are stored as the \"p\" variable in clinton_fitted.posterior.\n\nclinton_model.predict(clinton_fitted, data=new_data)\n\n\n# Select a sample of posterior values for the mean probability of voting for Clinton\nvote_posterior = az.extract_dataset(clinton_fitted, num_samples=2000)[\"p\"]\n\n/tmp/ipykernel_13027/1918123043.py:2: FutureWarning: extract_dataset has been deprecated, please use extract\n vote_posterior = az.extract_dataset(clinton_fitted, num_samples=2000)[\"p\"]\n\n\nMake the plot!\n\n_, ax = plt.subplots(figsize=(7, 5))\n\nfor i, party in enumerate([\"democrat\", \"republican\", \"independent\"]):\n # Which rows in new_data correspond to party?\n idx = new_data.index[new_data[\"party_id\"] == party].tolist()\n ax.plot(age, vote_posterior[idx], alpha=0.04, color=f\"C{i}\")\n\nax.set_ylabel(\"P(vote='clinton' | age)\")\nax.set_xlabel(\"Age\", fontsize=15)\nax.set_ylim(0, 1)\nax.set_xlim(18, 90);\n\n\n\n\nThe following is a rough interpretation of the information contained in the plot we’ve just created.\nAccording to our logistic model, the mean probability of voting for Clinton is almost always 0.8 or greater for Democrats no matter the age (blue line). Also, the older the person, the closer the mean probability of voting Clinton to 1.\nOn the other hand, Republicans have a non-zero probability of voting for Clinton when they are young, but it tends to zero for older persons (green line). We can also note the high variability of P(vote = ‘Clinton’) for young Republicans. This reflects our high uncertainty when estimating this probability and it is due to the small amount of Republicans in that age range plus there are only 5 Republicans out of 97 voting for Clinton in the dataset.\nFinally, the mean probability of voting Clinton for the independents is around 0.7 for the youngest and decreases towards 0.2 as they get older (orange line). Since the spread of the lines is similar along all the ages, we can conclude our uncertainty in this estimate is similar for all the age groups.\n\n%load_ext watermark\n%watermark -n -u -v -iv -w\n\nLast updated: Thu Aug 15 2024\n\nPython implementation: CPython\nPython version : 3.11.9\nIPython version : 8.24.0\n\nmatplotlib: 3.8.4\nbambi : 0.14.1.dev12+g64e57423.d20240730\narviz : 0.18.0\npandas : 2.2.2\nnumpy : 1.26.4\n\nWatermark: 2.4.3" }, { "objectID": "notebooks/mister_p.html", @@ -368,14 +368,14 @@ "href": "notebooks/plot_comparisons.html", "title": "Bambi", "section": "", - "text": "comparisons and plot_comparisons are a part of Bambi’s sub-package plots that feature a set of functions used to interpret complex regression models. This sub-package is inspired by the R package marginaleffects. These two functions allow the modeler to compare the predictions made by a model for different contrast and covariate values. Below, it is described why comparing predictions is useful in interpreting generalized linear models (GLMs), how this methodology is implemented in Bambi, and how to use comparisons and plot_comparisons. It is assumed that the reader is familiar with the basics of GLMs. If not, refer to the Bambi Basic Building Blocks example.\nDue to the link function in a GLM, there are typically three quantities of interest to interpret:\n\nthe linear predictor \\(\\eta = X\\beta\\) where \\(X\\) is an \\(n\\) x \\(p\\) matrix of explanatory variables.\nthe mean \\(\\mu = g^{-1}(\\eta)\\) where the link function \\(g(\\cdot)\\) relates the linear predictor to the mean of the outcome variable \\(\\mu = g^{-1}(\\eta) = g^{-1}(X\\beta)\\)\nthe response variable \\(Y \\sim \\mathcal{D}(\\mu, \\theta)\\) where \\(\\mu\\) is the mean parameter and \\(\\theta\\) is (possibly) a vector that contains all the other “auxillary” parameters of the distribution.\n\nOften, with GLMs, \\(\\eta\\) is linear in the parameters, but nonlinear in relation of inputs to the outcome \\(Y\\) due to the link function \\(g\\). Thus, as modelers, we are usually more interested in interpreting (2) and (3). For example, in logistic regression, the linear predictor is on the log-odds scale, but the quantity of interest is on the probability scale. In Poisson regression, the linear predictor is on the log-scale, but the response variable is on the count scale. Referring back to logistic regression, a specified difference in one of the \\(x\\) variables does not correspond to a constant difference in the the probability of the outcome.\nIt is often helpful with GLMs, for the modeler and audience, to have a summary that gives the expected difference in the outcome corresponding to a unit difference in each of the input variables. Thus, the goal of comparisons and plot_comparisons is to provide the modeler with a summary and visualization of the average predicted difference.\n\n\nHere, we adopt the notation from Chapter 14.4 of Regression and Other Stories to describe average predictive differences. Assume we have fit a Bambi model predicting an outcome \\(Y\\) based on inputs \\(X\\) and parameters \\(\\theta\\). Consider the following scalar inputs:\n\\[w: \\text{the input of interest}\\] \\[c: \\text{all the other inputs}\\] \\[X = (w, c)\\]\nSuppose for the input of interest, we are interested in comparing \\(w^{\\text{high}}\\) to \\(w^{\\text{low}}\\) (perhaps age = \\(60\\) and \\(40\\) respectively) with all other inputs \\(c\\) held constant. The predictive difference in the outcome changing only \\(w\\) is:\n\\[\\text{average predictive difference} = \\mathbb{E}(y|w^{\\text{high}}, c, \\theta) - \\mathbb{E}(y|w^{\\text{low}}, c, \\theta)\\]\nSelecting the maximum and minimum values of \\(w\\) and averaging over all other inputs \\(c\\) in the data gives you a new “hypothetical” dataset and corresponds to counting all pairs of transitions of \\((w^\\text{low})\\) to \\((w^\\text{high})\\), i.e., differences in \\(w\\) with \\(c\\) held constant. The difference between these two terms is the average predictive difference.\n\n\nThe objective of comparisons and plot_comparisons is to compute the expected difference in the outcome corresponding to three different scenarios for \\(w\\) and \\(c\\) where \\(w\\) is either provided by the user, else a default value is computed by Bambi (described in the default values section). The three scenarios are:\n\nuser provided values for \\(c\\).\na grid of equally spaced and central values for \\(c\\).\nempirical distribution (original data used to fit the model) for \\(c\\).\n\nIn the case of (1) and (2) above, Bambi assembles all pairwise combinations (transitions) of \\(w\\) and \\(c\\) into a new “hypothetical” dataset. In (3), Bambi uses the original \\(c\\), but replaces \\(w\\) with the user provided value or the default value computed by Bambi. In each scenario, predictions are made on the data using the fitted model. Once the predictions are made, comparisons are computed using the posterior samples by taking the difference in the predicted outcome for each pair of transitions. The average of these differences is the average predictive difference.\nThus, the goal of comparisons and plot_comparisons is to provide the modeler with a summary and visualization of the average predictive difference. Below, we demonstrate how to compute and plot average predictive differences with comparisons and plot_comparions using several examples.\n\nimport arviz as az\nimport numpy as np\nimport pandas as pd\nimport warnings\n\nimport bambi as bmb\n\nwarnings.simplefilter(action=\"ignore\", category=FutureWarning)\n\n\n\n\n\nWe model and predict how many fish are caught by visitors at a state park using survey data. Many visitors catch zero fish, either because they did not fish at all, or because they were unlucky. We would like to explicitly model this bimodal behavior (zero versus non-zero) using a Zero Inflated Poisson model, and to compare how different inputs of interest \\(w\\) and other covariate values \\(c\\) are associated with the number of fish caught. The dataset contains data on 250 groups that went to a state park to fish. Each group was questioned about how many fish they caught (count), how many children were in the group (child), how many people were in the group (persons), if they used a live bait and whether or not they brought a camper to the park (camper).\n\nfish_data = pd.read_csv(\"https://stats.idre.ucla.edu/stat/data/fish.csv\")\ncols = [\"count\", \"livebait\", \"camper\", \"persons\", \"child\"]\nfish_data = fish_data[cols]\nfish_data[\"child\"] = fish_data[\"child\"].astype(int)\nfish_data[\"livebait\"] = pd.Categorical(fish_data[\"livebait\"])\nfish_data[\"camper\"] = pd.Categorical(fish_data[\"camper\"])\n\n\nfish_model = bmb.Model(\n \"count ~ livebait + camper + persons + child\", \n fish_data, \n family=\"zero_inflated_poisson\"\n)\n\nfish_idata = fish_model.fit(\n draws=1000, \n target_accept=0.95, \n random_seed=1234, \n chains=4\n)\n\nAuto-assigning NUTS sampler...\nInitializing NUTS using jitter+adapt_diag...\nMultiprocess sampling (4 chains in 2 jobs)\nNUTS: [psi, Intercept, livebait, camper, persons, child]\n\n\n\n\n\n\n\n\n\n\n\nSampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 25 seconds.\n\n\n\n\nFirst, an example of scenario 1 (user provided values) is given below. In both plot_comparisons and comparisons, \\(w\\) and \\(c\\) are represented by contrast and conditional, respectively. The modeler has the ability to pass their own values for contrast and conditional by using a dictionary where the key-value pairs are the covariate and value(s) of interest. For example, if we wanted to compare the number of fish caught for \\(4\\) versus \\(1\\) persons conditional on a range of child and livebait values, we would pass the following dictionary in the code block below. By default, for \\(w\\), Bambi compares \\(w^\\text{high}\\) to \\(w^\\text{low}\\). Thus, in this example, \\(w^\\text{high}\\) = 4 and \\(w^\\text{low}\\) = 1. The user is not limited to passing a list for the values. A np.array can also be used. Furthermore, Bambi by default, maps the order of the dict keys to the main, group, and panel of the matplotlib figure. Below, since child is the first key, this is used for the x-axis, and livebait is used for the group (color). If a third key was passed, it would be used for the panel (facet).\n\nfig, ax = bmb.interpret.plot_comparisons(\n model=fish_model,\n idata=fish_idata,\n contrast={\"persons\": [1, 4]},\n conditional={\"child\": [0, 1, 2], \"livebait\": [0, 1]},\n) \nfig.set_size_inches(7, 3)\n\nDefault computed for unspecified variable: camper\n\n\n\n\n\nThe plot above shows that, comparing \\(4\\) to \\(1\\) persons given \\(0\\) children and using livebait, the expected difference is about \\(26\\) fish. When not using livebait, the expected difference decreases substantially to about \\(5\\) fish. Using livebait with a group of people is associated with a much larger expected difference in the number of fish caught.\ncomparisons can be called to view a summary dataframe that includes the term \\(w\\) and its contrast, the specified conditional covariate, and the expected difference in the outcome with the uncertainty interval (by default the 94% highest density interval is computed).\n\nbmb.interpret.comparisons(\n model=fish_model,\n idata=fish_idata,\n contrast={\"persons\": [1, 4]},\n conditional={\"child\": [0, 1, 2], \"livebait\": [0, 1]},\n) \n\nDefault computed for unspecified variable: camper\n\n\n\n\n\n\n \n \n \n term\n estimate_type\n value\n child\n livebait\n camper\n estimate\n lower_3.0%\n upper_97.0%\n \n \n \n \n 0\n persons\n diff\n (1, 4)\n 0\n 0\n 1\n 4.834472\n 2.563472\n 7.037150\n \n \n 1\n persons\n diff\n (1, 4)\n 0\n 1\n 1\n 26.423188\n 23.739729\n 29.072748\n \n \n 2\n persons\n diff\n (1, 4)\n 1\n 0\n 1\n 1.202003\n 0.631629\n 1.780965\n \n \n 3\n persons\n diff\n (1, 4)\n 1\n 1\n 1\n 6.571943\n 5.469275\n 7.642248\n \n \n 4\n persons\n diff\n (1, 4)\n 2\n 0\n 1\n 0.301384\n 0.143676\n 0.467608\n \n \n 5\n persons\n diff\n (1, 4)\n 2\n 1\n 1\n 1.648417\n 1.140415\n 2.187190\n \n \n\n\n\n\nBut why is camper also in the summary dataframe? This is because in order to peform predictions, Bambi is expecting a value for each covariate used to fit the model. Additionally, with GLM models, average predictive comparisons are conditional in the sense that the estimate depends on the values of all the covariates in the model. Thus, for unspecified covariates, comparisons and plot_comparisons computes a default value (mean or mode based on the data type of the covariate). Thus, \\(c\\) = child, livebait, camper. Each row in the summary dataframe is read as “comparing \\(4\\) to \\(1\\) persons conditional on \\(c\\), the expected difference in the outcome is \\(y\\).”\n\n\n\nUsers can also perform comparisons on multiple contrast values. For example, if we wanted to compare the number of fish caught between \\((1, 2)\\), \\((1, 4)\\), and \\((2, 4)\\) persons conditional on a range of values for child and livebait.\n\nmultiple_values = bmb.interpret.comparisons(\n model=fish_model,\n idata=fish_idata,\n contrast={\"persons\": [1, 2, 4]},\n conditional={\"child\": [0, 1, 2], \"livebait\": [0, 1]}\n)\n\nmultiple_values\n\nDefault computed for unspecified variable: camper\n\n\n\n\n\n\n \n \n \n term\n estimate_type\n value\n child\n livebait\n camper\n estimate\n lower_3.0%\n upper_97.0%\n \n \n \n \n 0\n persons\n diff\n (1, 2)\n 0\n 0\n 1\n 0.527627\n 0.295451\n 0.775465\n \n \n 1\n persons\n diff\n (1, 2)\n 0\n 1\n 1\n 2.883694\n 2.605690\n 3.177685\n \n \n 2\n persons\n diff\n (1, 2)\n 1\n 0\n 1\n 0.131319\n 0.067339\n 0.195132\n \n \n 3\n persons\n diff\n (1, 2)\n 1\n 1\n 1\n 0.717965\n 0.592968\n 0.857893\n \n \n 4\n persons\n diff\n (1, 2)\n 2\n 0\n 1\n 0.032960\n 0.015212\n 0.052075\n \n \n 5\n persons\n diff\n (1, 2)\n 2\n 1\n 1\n 0.180270\n 0.123173\n 0.244695\n \n \n 6\n persons\n diff\n (1, 4)\n 0\n 0\n 1\n 4.834472\n 2.563472\n 7.037150\n \n \n 7\n persons\n diff\n (1, 4)\n 0\n 1\n 1\n 26.423188\n 23.739729\n 29.072748\n \n \n 8\n persons\n diff\n (1, 4)\n 1\n 0\n 1\n 1.202003\n 0.631629\n 1.780965\n \n \n 9\n persons\n diff\n (1, 4)\n 1\n 1\n 1\n 6.571943\n 5.469275\n 7.642248\n \n \n 10\n persons\n diff\n (1, 4)\n 2\n 0\n 1\n 0.301384\n 0.143676\n 0.467608\n \n \n 11\n persons\n diff\n (1, 4)\n 2\n 1\n 1\n 1.648417\n 1.140415\n 2.187190\n \n \n 12\n persons\n diff\n (2, 4)\n 0\n 0\n 1\n 4.306845\n 2.267097\n 6.280005\n \n \n 13\n persons\n diff\n (2, 4)\n 0\n 1\n 1\n 23.539494\n 20.990931\n 26.240169\n \n \n 14\n persons\n diff\n (2, 4)\n 1\n 0\n 1\n 1.070683\n 0.565931\n 1.585718\n \n \n 15\n persons\n diff\n (2, 4)\n 1\n 1\n 1\n 5.853978\n 4.858957\n 6.848519\n \n \n 16\n persons\n diff\n (2, 4)\n 2\n 0\n 1\n 0.268423\n 0.124033\n 0.412274\n \n \n 17\n persons\n diff\n (2, 4)\n 2\n 1\n 1\n 1.468147\n 1.024800\n 1.960934\n \n \n\n\n\n\nNotice how the contrast \\(w\\) varies while the covariates \\(c\\) are held constant. Currently, however, plotting multiple contrast values can be difficult to interpret since the contrast is “abstracted” away onto the y-axis. Thus, it would be difficult to interpret which portion of the plot corresponds to which contrast value. Therefore, it is currently recommended that if you want to plot multiple contrast values, call comparisons directly to obtain the summary dataframe and plot the results yourself.\n\n\n\nNow, we move onto scenario 2 described above (grid of equally spaced and central values) in computing average predictive comparisons. You are not required to pass values for contrast and conditional. If you do not pass values, Bambi will compute default values for you. Below, it is described how these default values are computed.\nThe default value for contrast is a centered difference at the mean for a contrast variable with a numeric dtype, and unique levels for a contrast varaible with a categorical dtype. For example, if the modeler is interested in the comparison of a \\(5\\) unit increase in \\(w\\) where \\(w\\) is a numeric variable, Bambi computes the mean and then subtracts and adds \\(2.5\\) units to the mean to obtain a centered difference. By default, if no value is passed for the contrast covariate, Bambi computes a one unit centered difference at the mean. For example, if only contrast=\"persons\" is passed, then \\(\\pm\\) \\(0.5\\) is applied to the mean of persons. If \\(w\\) is a categorical variable, Bambi computes and returns the unique levels. For example, if \\(w\\) has levels [“high scool”, “vocational”, “university”], Bambi computes and returns the unique values of this variable.\nThe default values for conditional are more involved. Currently, by default, if a dict or list is passed to conditional, Bambi uses the ordering (keys if dict and elements if list) to determine which covariate to use as the main, group (color), and panel (facet) variable. This is the same logic used in plot_comparisons described above. Subsequently, the default values used for the conditional covariates depend on their ordering and dtype. Below, the psuedocode used for computing default values covariates passed to conditional is outlined:\nif v == \"main\":\n \n if v == numeric:\n return np.linspace(v.min(), v.max(), 50)\n elif v == categorical:\n return np.unique(v)\n\nelif v == \"group\":\n \n if v == numeric:\n return np.quantile(v, np.linspace(0, 1, 5))\n elif v == categorical:\n return np.unique(v)\n\nelif v == \"panel\":\n \n if v == numeric:\n return np.quantile(v, np.linspace(0, 1, 5))\n elif v == categorical:\n return np.unique(v)\nThus, letting Bambi compute default values for conditional is equivalent to creating a hypothetical “data grid” of new values. Lets say we are interested in comparing the number of fish caught for the contrast livebait conditional on persons and child. This time, lets call comparisons first to gain an understanding of the data generating the plot.\n\ncontrast_df = bmb.interpret.comparisons(\n model=fish_model,\n idata=fish_idata,\n contrast=\"livebait\",\n conditional=[\"persons\", \"child\"],\n)\n\ncontrast_df.head(10)\n\nDefault computed for contrast variable: livebait\nDefault computed for conditional variable: persons, child\nDefault computed for unspecified variable: camper\n\n\n\n\n\n\n \n \n \n term\n estimate_type\n value\n persons\n child\n camper\n estimate\n lower_3.0%\n upper_97.0%\n \n \n \n \n 0\n livebait\n diff\n (0, 1)\n 1\n 0\n 1\n 1.694646\n 1.252803\n 2.081207\n \n \n 1\n livebait\n diff\n (0, 1)\n 1\n 1\n 1\n 0.422448\n 0.299052\n 0.551766\n \n \n 2\n livebait\n diff\n (0, 1)\n 1\n 2\n 1\n 0.106202\n 0.063174\n 0.152961\n \n \n 3\n livebait\n diff\n (0, 1)\n 1\n 3\n 1\n 0.026923\n 0.012752\n 0.043035\n \n \n 4\n livebait\n diff\n (0, 1)\n 2\n 0\n 1\n 4.050713\n 3.299138\n 4.782635\n \n \n 5\n livebait\n diff\n (0, 1)\n 2\n 1\n 1\n 1.009094\n 0.755449\n 1.249551\n \n \n 6\n livebait\n diff\n (0, 1)\n 2\n 2\n 1\n 0.253511\n 0.163468\n 0.355214\n \n \n 7\n livebait\n diff\n (0, 1)\n 2\n 3\n 1\n 0.064224\n 0.032733\n 0.100539\n \n \n 8\n livebait\n diff\n (0, 1)\n 3\n 0\n 1\n 9.701813\n 8.391122\n 11.084168\n \n \n 9\n livebait\n diff\n (0, 1)\n 3\n 1\n 1\n 2.415233\n 1.922802\n 2.927737\n \n \n\n\n\n\nBefore we talk about the summary output, you should have noticed that messages are being logged to the console. By default interpret is verbose and logs a message to the console if a default value is computed for covariates in conditional and contrast. This is useful because unless the documentation is read, it can be difficult to tell which covariates are having default values computed for. Thus, Bambi has a config file bmb.config[\"INTERPRET_VERBOSE\"] where we can specify whether or not to log messages. By default, this is set to true. To turn off logging, set bmb.config[\"INTERPRET_VERBOSE\"] = False. From here on, we will turn off logging.\nAs livebait was encoded as a categorical dtype, Bambi returned the unique levels of \\([0, 1]\\) for the contrast. persons and child were passed as the first and second element and thus act as the main and group variables, respectively. It can be see from the output above, that an equally spaced grid was used to compute the values for persons, whereas a quantile based grid was used for child. Furthermore, as camper was unspecified, the mode was used as the default value. Lets go ahead and plot the commparisons.\n\nbmb.config[\"INTERPRET_VERBOSE\"] = False\n\n\nfig, ax = bmb.interpret.plot_comparisons(\n model=fish_model,\n idata=fish_idata,\n contrast=\"livebait\",\n conditional=[\"persons\", \"child\"],\n) \nfig.set_size_inches(7, 3)\n\n\n\n\nThe plot shows us that the expected differences in fish caught comparing a group of people who use livebait and no livebait is not only conditional on the number of persons, but also children. However, the plotted comparisons for child = \\(3\\) is difficult to interpret on a single plot. Thus, it can be useful to pass specific group and panel arguments to aid in the interpretation of the plot. Therefore, subplot_kwargs allows the user to manipulate the plotting by passing a dictionary where the keys are {\"main\": ..., \"group\": ..., \"panel\": ...} and the values are the names of the covariates to be plotted. Below, we plot the same comparisons as above, but this time we specify group and panel to both be child.\n\nfig, ax = bmb.interpret.plot_comparisons(\n model=fish_model,\n idata=fish_idata,\n contrast=\"livebait\",\n conditional=[\"persons\", \"child\"],\n subplot_kwargs={\"main\": \"persons\", \"group\": \"child\", \"panel\": \"child\"},\n fig_kwargs={\"figsize\":(12, 3), \"sharey\": True},\n legend=False\n) \n\n\n\n\n\n\n\nEvaluating average predictive comparisons at central values for the conditional covariates \\(c\\) can be problematic when the inputs have a large variance since no single central value (mean, median, etc.) is representative of the covariate. This is especially true when \\(c\\) exhibits bi or multimodality. Thus, it may be desireable to use the empirical distribution of \\(c\\) to compute the predictive comparisons, and then average over a specific or set of covariates to obtain the average predictive comparisons. To achieve unit level contrasts, do not pass a parameter into conditional and or specify None.\n\nunit_level = bmb.interpret.comparisons(\n model=fish_model,\n idata=fish_idata,\n contrast=\"livebait\",\n conditional=None,\n)\n\n# empirical distribution\nprint(unit_level.shape[0] == fish_model.data.shape[0])\nunit_level.head(10)\n\nTrue\n\n\n\n\n\n\n \n \n \n term\n estimate_type\n value\n camper\n child\n persons\n estimate\n lower_3.0%\n upper_97.0%\n \n \n \n \n 0\n livebait\n diff\n (0, 1)\n 0\n 0\n 1\n 0.864408\n 0.627063\n 1.116105\n \n \n 1\n livebait\n diff\n (0, 1)\n 1\n 0\n 1\n 1.694646\n 1.252803\n 2.081207\n \n \n 2\n livebait\n diff\n (0, 1)\n 0\n 0\n 1\n 0.864408\n 0.627063\n 1.116105\n \n \n 3\n livebait\n diff\n (0, 1)\n 1\n 1\n 2\n 1.009094\n 0.755449\n 1.249551\n \n \n 4\n livebait\n diff\n (0, 1)\n 0\n 0\n 1\n 0.864408\n 0.627063\n 1.116105\n \n \n 5\n livebait\n diff\n (0, 1)\n 1\n 2\n 4\n 1.453235\n 0.964674\n 1.956434\n \n \n 6\n livebait\n diff\n (0, 1)\n 0\n 1\n 3\n 1.233247\n 0.900295\n 1.569891\n \n \n 7\n livebait\n diff\n (0, 1)\n 0\n 3\n 4\n 0.188019\n 0.090328\n 0.289560\n \n \n 8\n livebait\n diff\n (0, 1)\n 1\n 2\n 3\n 0.606361\n 0.390571\n 0.818549\n \n \n 9\n livebait\n diff\n (0, 1)\n 1\n 0\n 1\n 1.694646\n 1.252803\n 2.081207\n \n \n\n\n\n\n\n# empirical (observed) data used to fit the model\nfish_model.data.head(10)\n\n\n\n\n\n \n \n \n count\n livebait\n camper\n persons\n child\n \n \n \n \n 0\n 0\n 0\n 0\n 1\n 0\n \n \n 1\n 0\n 1\n 1\n 1\n 0\n \n \n 2\n 0\n 1\n 0\n 1\n 0\n \n \n 3\n 0\n 1\n 1\n 2\n 1\n \n \n 4\n 1\n 1\n 0\n 1\n 0\n \n \n 5\n 0\n 1\n 1\n 4\n 2\n \n \n 6\n 0\n 1\n 0\n 3\n 1\n \n \n 7\n 0\n 1\n 0\n 4\n 3\n \n \n 8\n 0\n 0\n 1\n 3\n 2\n \n \n 9\n 1\n 1\n 1\n 1\n 0\n \n \n\n\n\n\nAbove, unit_level is the comparisons summary dataframe and fish_model.data is the empirical data. Notice how the values for \\(c\\) are identical in both dataframes. However, for \\(w\\), the values are different. However, these unit level contrasts are difficult to interpret as each row corresponds to that unit’s contrast. Therefore, it is useful to average over (marginalize) the estimates to summarize the unit level predictive comparisons.\n\n\nSince the empirical distrubution is used for computing the average predictive comparisons, the same number of rows (250) is returned as the data used to fit the model. To average over a covariate, use the average_by argument. If True is passed, then comparisons averages over all covariates. Else, if a single or list of covariates are passed, then comparisons averages by the covariates passed.\n\n# marginalize over all covariates\nbmb.interpret.comparisons(\n model=fish_model,\n idata=fish_idata,\n contrast=\"livebait\",\n conditional=None,\n average_by=True\n)\n\n\n\n\n\n \n \n \n term\n estimate_type\n value\n estimate\n lower_3.0%\n upper_97.0%\n \n \n \n \n 0\n livebait\n diff\n (0, 1)\n 3.649691\n 2.956185\n 4.333621\n \n \n\n\n\n\nPassing True to average_by averages over all covariates and is equivalent to taking the mean of the estimate and uncertainty columns. For example:\n\nunit_level = bmb.interpret.comparisons(\n model=fish_model,\n idata=fish_idata,\n contrast=\"livebait\",\n conditional=None,\n)\n\nunit_level[[\"estimate\", \"lower_3.0%\", \"upper_97.0%\"]].mean()\n\nestimate 3.649691\nlower_3.0% 2.956185\nupper_97.0% 4.333621\ndtype: float64\n\n\n\n\n\nAveraging over all covariates may not be desired, and you would rather average by a group or specific covariate. To perform averaging by subgroups, users can pass a single or list of covariates to average_by to average over specific covariates. For example, if we wanted to average by persons:\n\n# average by number of persons\nbmb.interpret.comparisons(\n model=fish_model,\n idata=fish_idata,\n contrast=\"livebait\",\n conditional=None,\n average_by=\"persons\"\n)\n\n\n\n\n\n \n \n \n term\n estimate_type\n value\n persons\n estimate\n lower_3.0%\n upper_97.0%\n \n \n \n \n 0\n livebait\n diff\n (0, 1)\n 1\n 1.374203\n 1.011290\n 1.708711\n \n \n 1\n livebait\n diff\n (0, 1)\n 2\n 1.963362\n 1.543330\n 2.376636\n \n \n 2\n livebait\n diff\n (0, 1)\n 3\n 3.701510\n 3.056586\n 4.357385\n \n \n 3\n livebait\n diff\n (0, 1)\n 4\n 7.358662\n 6.047642\n 8.655654\n \n \n\n\n\n\n\n# average by number of persons and camper by passing a list\nbmb.interpret.comparisons(\n model=fish_model,\n idata=fish_idata,\n contrast=\"livebait\",\n conditional=None,\n average_by=[\"persons\", \"camper\"]\n)\n\n\n\n\n\n \n \n \n term\n estimate_type\n value\n persons\n camper\n estimate\n lower_3.0%\n upper_97.0%\n \n \n \n \n 0\n livebait\n diff\n (0, 1)\n 1\n 0\n 0.864408\n 0.627063\n 1.116105\n \n \n 1\n livebait\n diff\n (0, 1)\n 1\n 1\n 1.694646\n 1.252803\n 2.081207\n \n \n 2\n livebait\n diff\n (0, 1)\n 2\n 0\n 1.424598\n 1.078389\n 1.777154\n \n \n 3\n livebait\n diff\n (0, 1)\n 2\n 1\n 2.344439\n 1.872191\n 2.800661\n \n \n 4\n livebait\n diff\n (0, 1)\n 3\n 0\n 2.429459\n 1.871578\n 2.964242\n \n \n 5\n livebait\n diff\n (0, 1)\n 3\n 1\n 4.443540\n 3.747840\n 5.170052\n \n \n 6\n livebait\n diff\n (0, 1)\n 4\n 0\n 3.541921\n 2.686445\n 4.391176\n \n \n 7\n livebait\n diff\n (0, 1)\n 4\n 1\n 10.739204\n 9.024702\n 12.432764\n \n \n\n\n\n\nIt is still possible to use plot_comparisons when passing an argument to average_by. In the plot below, the empirical distribution is used to compute unit level contrasts for livebait and then averaged over persons to obtain the average predictive comparisons. The plot below is similar to the second plot in this notebook. The differences being that: (1) a pairwise transition grid is defined for the second plot above, whereas the empirical distribution is used in the plot below, and (2) in the plot below, we marginalized over the other covariates in the model (thus the reason for not having a camper or child group and panel, and a reduction in the uncertainty interval).\n\nfig, ax = bmb.interpret.plot_comparisons(\n model=fish_model,\n idata=fish_idata,\n contrast=\"livebait\",\n conditional=None,\n average_by=\"persons\"\n)\nfig.set_size_inches(7, 3)\n\n\n\n\n\n\n\n\n\nTo showcase an additional functionality of comparisons and plot_comparisons, we fit a logistic regression model to the titanic dataset with interaction terms to model the probability of survival. The titanic dataset gives the values of four categorical attributes for each of the 2201 people on board the Titanic when it struck an iceberg and sank. The attributes are social class (first class, second class, third class, crewmember), age, sex (0 = female, 1 = male), and whether or not the person survived (0 = deceased, 1 = survived).\n\ndat = pd.read_csv(\"https://vincentarelbundock.github.io/Rdatasets/csv/Stat2Data/Titanic.csv\", index_col=0)\n\ndat[\"PClass\"] = dat[\"PClass\"].str.replace(\"[st, nd, rd]\", \"\", regex=True)\ndat[\"PClass\"] = dat[\"PClass\"].str.replace(\"*\", \"0\").astype(int)\ndat[\"PClass\"] = dat[\"PClass\"].replace(0, np.nan)\ndat[\"PClass\"] = pd.Categorical(dat[\"PClass\"], ordered=True)\ndat[\"SexCode\"] = pd.Categorical(dat[\"SexCode\"], ordered=True)\n\ndat = dat.dropna(axis=0, how=\"any\")\n\n\ntitanic_model = bmb.Model(\n \"Survived ~ PClass * SexCode * Age\", \n data=dat, \n family=\"bernoulli\"\n)\ntitanic_idata = titanic_model.fit(\n draws=500, \n tune=500, \n target_accept=0.95, \n random_seed=1234\n)\n\nModeling the probability that Survived==1\nAuto-assigning NUTS sampler...\nInitializing NUTS using jitter+adapt_diag...\nMultiprocess sampling (2 chains in 2 jobs)\nNUTS: [Intercept, PClass, SexCode, PClass:SexCode, Age, PClass:Age, SexCode:Age, PClass:SexCode:Age]\n\n\n\n\n\n\n\n\n\n\n\nSampling 2 chains for 500 tune and 500 draw iterations (1_000 + 1_000 draws total) took 23 seconds.\nWe recommend running at least 4 chains for robust computation of convergence diagnostics\n\n\n\n\ncomparisons and plot_comparisons also allow you to specify the type of comparison to be computed. By default, a difference is used. However, it is also possible to take the ratio where comparisons would then become average predictive ratios. To achieve this, pass \"ratio\" into the argument comparison_type. Using different comparison types offers a way to produce alternative insights; especially when there are interaction terms as the value of one covariate depends on the value of the other covariate.\n\nfig, ax = bmb.interpret.plot_comparisons(\n model=titanic_model,\n idata=titanic_idata,\n contrast={\"PClass\": [1, 3]},\n conditional=[\"Age\", \"SexCode\"],\n comparison_type=\"ratio\",\n subplot_kwargs={\"main\": \"Age\", \"group\": \"SexCode\", \"panel\": \"SexCode\"},\n fig_kwargs={\"figsize\":(12, 3), \"sharey\": True},\n legend=False\n\n)\n\n\n\n\nThe left panel shows that the ratio of the probability of survival comparing PClass \\(3\\) to \\(1\\) conditional on Age is non-constant. Whereas the right panel shows an approximately constant ratio in the probability of survival comparing PClass \\(3\\) to \\(1\\) conditional on Age.\n\n%load_ext watermark\n%watermark -n -u -v -iv -w\n\nLast updated: Sun May 26 2024\n\nPython implementation: CPython\nPython version : 3.11.9\nIPython version : 8.24.0\n\npandas: 2.2.2\nbambi : 0.13.1.dev39+gb7d6a6cb\narviz : 0.18.0\nnumpy : 1.26.4\n\nWatermark: 2.4.3" + "text": "comparisons and plot_comparisons are a part of Bambi’s sub-package plots that feature a set of functions used to interpret complex regression models. This sub-package is inspired by the R package marginaleffects. These two functions allow the modeler to compare the predictions made by a model for different contrast and covariate values. Below, it is described why comparing predictions is useful in interpreting generalized linear models (GLMs), how this methodology is implemented in Bambi, and how to use comparisons and plot_comparisons. It is assumed that the reader is familiar with the basics of GLMs. If not, refer to the Bambi Basic Building Blocks example.\nDue to the link function in a GLM, there are typically three quantities of interest to interpret:\n\nthe linear predictor \\(\\eta = X\\beta\\) where \\(X\\) is an \\(n\\) x \\(p\\) matrix of explanatory variables.\nthe mean \\(\\mu = g^{-1}(\\eta)\\) where the link function \\(g(\\cdot)\\) relates the linear predictor to the mean of the outcome variable \\(\\mu = g^{-1}(\\eta) = g^{-1}(X\\beta)\\)\nthe response variable \\(Y \\sim \\mathcal{D}(\\mu, \\theta)\\) where \\(\\mu\\) is the mean parameter and \\(\\theta\\) is (possibly) a vector that contains all the other “auxillary” parameters of the distribution.\n\nOften, with GLMs, \\(\\eta\\) is linear in the parameters, but nonlinear in relation of inputs to the outcome \\(Y\\) due to the link function \\(g\\). Thus, as modelers, we are usually more interested in interpreting (2) and (3). For example, in logistic regression, the linear predictor is on the log-odds scale, but the quantity of interest is on the probability scale. In Poisson regression, the linear predictor is on the log-scale, but the response variable is on the count scale. Referring back to logistic regression, a specified difference in one of the \\(x\\) variables does not correspond to a constant difference in the the probability of the outcome.\nIt is often helpful with GLMs, for the modeler and audience, to have a summary that gives the expected difference in the outcome corresponding to a unit difference in each of the input variables. Thus, the goal of comparisons and plot_comparisons is to provide the modeler with a summary and visualization of the average predicted difference.\n\n\nHere, we adopt the notation from Chapter 14.4 of Regression and Other Stories to describe average predictive differences. Assume we have fit a Bambi model predicting an outcome \\(Y\\) based on inputs \\(X\\) and parameters \\(\\theta\\). Consider the following scalar inputs:\n\\[w: \\text{the input of interest}\\] \\[c: \\text{all the other inputs}\\] \\[X = (w, c)\\]\nSuppose for the input of interest, we are interested in comparing \\(w^{\\text{high}}\\) to \\(w^{\\text{low}}\\) (perhaps age = \\(60\\) and \\(40\\) respectively) with all other inputs \\(c\\) held constant. The predictive difference in the outcome changing only \\(w\\) is:\n\\[\\text{average predictive difference} = \\mathbb{E}(y|w^{\\text{high}}, c, \\theta) - \\mathbb{E}(y|w^{\\text{low}}, c, \\theta)\\]\nSelecting the maximum and minimum values of \\(w\\) and averaging over all other inputs \\(c\\) in the data gives you a new “hypothetical” dataset and corresponds to counting all pairs of transitions of \\((w^\\text{low})\\) to \\((w^\\text{high})\\), i.e., differences in \\(w\\) with \\(c\\) held constant. The difference between these two terms is the average predictive difference.\n\n\nThe objective of comparisons and plot_comparisons is to compute the expected difference in the outcome corresponding to three different scenarios for \\(w\\) and \\(c\\) where \\(w\\) is either provided by the user, else a default value is computed by Bambi (described in the default values section). The three scenarios are:\n\nuser provided values for \\(c\\).\na grid of equally spaced and central values for \\(c\\).\nempirical distribution (original data used to fit the model) for \\(c\\).\n\nIn the case of (1) and (2) above, Bambi assembles all pairwise combinations (transitions) of \\(w\\) and \\(c\\) into a new “hypothetical” dataset. In (3), Bambi uses the original \\(c\\), but replaces \\(w\\) with the user provided value or the default value computed by Bambi. In each scenario, predictions are made on the data using the fitted model. Once the predictions are made, comparisons are computed using the posterior samples by taking the difference in the predicted outcome for each pair of transitions. The average of these differences is the average predictive difference.\nThus, the goal of comparisons and plot_comparisons is to provide the modeler with a summary and visualization of the average predictive difference. Below, we demonstrate how to compute and plot average predictive differences with comparisons and plot_comparions using several examples.\n\nimport arviz as az\nimport numpy as np\nimport pandas as pd\nimport warnings\n\nimport bambi as bmb\n\nwarnings.simplefilter(action=\"ignore\", category=FutureWarning)\n\n\n\n\n\nWe model and predict how many fish are caught by visitors at a state park using survey data. Many visitors catch zero fish, either because they did not fish at all, or because they were unlucky. We would like to explicitly model this bimodal behavior (zero versus non-zero) using a Zero Inflated Poisson model, and to compare how different inputs of interest \\(w\\) and other covariate values \\(c\\) are associated with the number of fish caught. The dataset contains data on 250 groups that went to a state park to fish. Each group was questioned about how many fish they caught (count), how many children were in the group (child), how many people were in the group (persons), if they used a live bait and whether or not they brought a camper to the park (camper).\n\nfish_data = pd.read_csv(\"https://stats.idre.ucla.edu/stat/data/fish.csv\")\ncols = [\"count\", \"livebait\", \"camper\", \"persons\", \"child\"]\nfish_data = fish_data[cols]\nfish_data[\"child\"] = fish_data[\"child\"].astype(int)\nfish_data[\"livebait\"] = pd.Categorical(fish_data[\"livebait\"])\nfish_data[\"camper\"] = pd.Categorical(fish_data[\"camper\"])\n\n\nfish_model = bmb.Model(\n \"count ~ livebait + camper + persons + child\", \n fish_data, \n family=\"zero_inflated_poisson\"\n)\n\nfish_idata = fish_model.fit(\n draws=1000, \n target_accept=0.95, \n random_seed=1234, \n chains=4\n)\n\nAuto-assigning NUTS sampler...\nInitializing NUTS using jitter+adapt_diag...\nMultiprocess sampling (4 chains in 4 jobs)\nNUTS: [psi, Intercept, livebait, camper, persons, child]\n\n\n\n\n\n\n\n\n\n\n\nSampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 45 seconds.\n\n\n\n\nFirst, an example of scenario 1 (user provided values) is given below. In both plot_comparisons and comparisons, \\(w\\) and \\(c\\) are represented by contrast and conditional, respectively. The modeler has the ability to pass their own values for contrast and conditional by using a dictionary where the key-value pairs are the covariate and value(s) of interest. For example, if we wanted to compare the number of fish caught for \\(4\\) versus \\(1\\) persons conditional on a range of child and livebait values, we would pass the following dictionary in the code block below. By default, for \\(w\\), Bambi compares \\(w^\\text{high}\\) to \\(w^\\text{low}\\). Thus, in this example, \\(w^\\text{high}\\) = 4 and \\(w^\\text{low}\\) = 1. The user is not limited to passing a list for the values. A np.array can also be used. Furthermore, Bambi by default, maps the order of the dict keys to the main, group, and panel of the matplotlib figure. Below, since child is the first key, this is used for the x-axis, and livebait is used for the group (color). If a third key was passed, it would be used for the panel (facet).\n\nfig, ax = bmb.interpret.plot_comparisons(\n model=fish_model,\n idata=fish_idata,\n contrast={\"persons\": [1, 4]},\n conditional={\"child\": [0, 1, 2], \"livebait\": [0, 1]},\n) \nfig.set_size_inches(7, 3)\n\nDefault computed for unspecified variable: camper\n\n\n\n\n\nThe plot above shows that, comparing \\(4\\) to \\(1\\) persons given \\(0\\) children and using livebait, the expected difference is about \\(26\\) fish. When not using livebait, the expected difference decreases substantially to about \\(5\\) fish. Using livebait with a group of people is associated with a much larger expected difference in the number of fish caught.\ncomparisons can be called to view a summary dataframe that includes the term \\(w\\) and its contrast, the specified conditional covariate, and the expected difference in the outcome with the uncertainty interval (by default the 94% highest density interval is computed).\n\nbmb.interpret.comparisons(\n model=fish_model,\n idata=fish_idata,\n contrast={\"persons\": [1, 4]},\n conditional={\"child\": [0, 1, 2], \"livebait\": [0, 1]},\n) \n\nDefault computed for unspecified variable: camper\n\n\n\n\n\n\n \n \n \n term\n estimate_type\n value\n child\n livebait\n camper\n estimate\n lower_3.0%\n upper_97.0%\n \n \n \n \n 0\n persons\n diff\n (1, 4)\n 0\n 0\n 1\n 4.834472\n 2.563472\n 7.037150\n \n \n 1\n persons\n diff\n (1, 4)\n 0\n 1\n 1\n 26.423188\n 23.739729\n 29.072748\n \n \n 2\n persons\n diff\n (1, 4)\n 1\n 0\n 1\n 1.202003\n 0.631629\n 1.780965\n \n \n 3\n persons\n diff\n (1, 4)\n 1\n 1\n 1\n 6.571943\n 5.469275\n 7.642248\n \n \n 4\n persons\n diff\n (1, 4)\n 2\n 0\n 1\n 0.301384\n 0.143676\n 0.467608\n \n \n 5\n persons\n diff\n (1, 4)\n 2\n 1\n 1\n 1.648417\n 1.140415\n 2.187190\n \n \n\n\n\n\nBut why is camper also in the summary dataframe? This is because in order to peform predictions, Bambi is expecting a value for each covariate used to fit the model. Additionally, with GLM models, average predictive comparisons are conditional in the sense that the estimate depends on the values of all the covariates in the model. Thus, for unspecified covariates, comparisons and plot_comparisons computes a default value (mean or mode based on the data type of the covariate). Thus, \\(c\\) = child, livebait, camper. Each row in the summary dataframe is read as “comparing \\(4\\) to \\(1\\) persons conditional on \\(c\\), the expected difference in the outcome is \\(y\\).”\n\n\n\nUsers can also perform comparisons on multiple contrast values. For example, if we wanted to compare the number of fish caught between \\((1, 2)\\), \\((1, 4)\\), and \\((2, 4)\\) persons conditional on a range of values for child and livebait.\n\nmultiple_values = bmb.interpret.comparisons(\n model=fish_model,\n idata=fish_idata,\n contrast={\"persons\": [1, 2, 4]},\n conditional={\"child\": [0, 1, 2], \"livebait\": [0, 1]}\n)\n\nmultiple_values\n\nDefault computed for unspecified variable: camper\n\n\n\n\n\n\n \n \n \n term\n estimate_type\n value\n child\n livebait\n camper\n estimate\n lower_3.0%\n upper_97.0%\n \n \n \n \n 0\n persons\n diff\n (1, 2)\n 0\n 0\n 1\n 0.527627\n 0.295451\n 0.775465\n \n \n 1\n persons\n diff\n (1, 2)\n 0\n 1\n 1\n 2.883694\n 2.605690\n 3.177685\n \n \n 2\n persons\n diff\n (1, 2)\n 1\n 0\n 1\n 0.131319\n 0.067339\n 0.195132\n \n \n 3\n persons\n diff\n (1, 2)\n 1\n 1\n 1\n 0.717965\n 0.592968\n 0.857893\n \n \n 4\n persons\n diff\n (1, 2)\n 2\n 0\n 1\n 0.032960\n 0.015212\n 0.052075\n \n \n 5\n persons\n diff\n (1, 2)\n 2\n 1\n 1\n 0.180270\n 0.123173\n 0.244695\n \n \n 6\n persons\n diff\n (1, 4)\n 0\n 0\n 1\n 4.834472\n 2.563472\n 7.037150\n \n \n 7\n persons\n diff\n (1, 4)\n 0\n 1\n 1\n 26.423188\n 23.739729\n 29.072748\n \n \n 8\n persons\n diff\n (1, 4)\n 1\n 0\n 1\n 1.202003\n 0.631629\n 1.780965\n \n \n 9\n persons\n diff\n (1, 4)\n 1\n 1\n 1\n 6.571943\n 5.469275\n 7.642248\n \n \n 10\n persons\n diff\n (1, 4)\n 2\n 0\n 1\n 0.301384\n 0.143676\n 0.467608\n \n \n 11\n persons\n diff\n (1, 4)\n 2\n 1\n 1\n 1.648417\n 1.140415\n 2.187190\n \n \n 12\n persons\n diff\n (2, 4)\n 0\n 0\n 1\n 4.306845\n 2.267097\n 6.280005\n \n \n 13\n persons\n diff\n (2, 4)\n 0\n 1\n 1\n 23.539494\n 20.990931\n 26.240169\n \n \n 14\n persons\n diff\n (2, 4)\n 1\n 0\n 1\n 1.070683\n 0.565931\n 1.585718\n \n \n 15\n persons\n diff\n (2, 4)\n 1\n 1\n 1\n 5.853978\n 4.858957\n 6.848519\n \n \n 16\n persons\n diff\n (2, 4)\n 2\n 0\n 1\n 0.268423\n 0.124033\n 0.412274\n \n \n 17\n persons\n diff\n (2, 4)\n 2\n 1\n 1\n 1.468147\n 1.024800\n 1.960934\n \n \n\n\n\n\nNotice how the contrast \\(w\\) varies while the covariates \\(c\\) are held constant. Currently, however, plotting multiple contrast values can be difficult to interpret since the contrast is “abstracted” away onto the y-axis. Thus, it would be difficult to interpret which portion of the plot corresponds to which contrast value. Therefore, it is currently recommended that if you want to plot multiple contrast values, call comparisons directly to obtain the summary dataframe and plot the results yourself.\n\n\n\nNow, we move onto scenario 2 described above (grid of equally spaced and central values) in computing average predictive comparisons. You are not required to pass values for contrast and conditional. If you do not pass values, Bambi will compute default values for you. Below, it is described how these default values are computed.\nThe default value for contrast is a centered difference at the mean for a contrast variable with a numeric dtype, and unique levels for a contrast varaible with a categorical dtype. For example, if the modeler is interested in the comparison of a \\(5\\) unit increase in \\(w\\) where \\(w\\) is a numeric variable, Bambi computes the mean and then subtracts and adds \\(2.5\\) units to the mean to obtain a centered difference. By default, if no value is passed for the contrast covariate, Bambi computes a one unit centered difference at the mean. For example, if only contrast=\"persons\" is passed, then \\(\\pm\\) \\(0.5\\) is applied to the mean of persons. If \\(w\\) is a categorical variable, Bambi computes and returns the unique levels. For example, if \\(w\\) has levels [“high scool”, “vocational”, “university”], Bambi computes and returns the unique values of this variable.\nThe default values for conditional are more involved. Currently, by default, if a dict or list is passed to conditional, Bambi uses the ordering (keys if dict and elements if list) to determine which covariate to use as the main, group (color), and panel (facet) variable. This is the same logic used in plot_comparisons described above. Subsequently, the default values used for the conditional covariates depend on their ordering and dtype. Below, the psuedocode used for computing default values covariates passed to conditional is outlined:\nif v == \"main\":\n \n if v == numeric:\n return np.linspace(v.min(), v.max(), 50)\n elif v == categorical:\n return np.unique(v)\n\nelif v == \"group\":\n \n if v == numeric:\n return np.quantile(v, np.linspace(0, 1, 5))\n elif v == categorical:\n return np.unique(v)\n\nelif v == \"panel\":\n \n if v == numeric:\n return np.quantile(v, np.linspace(0, 1, 5))\n elif v == categorical:\n return np.unique(v)\nThus, letting Bambi compute default values for conditional is equivalent to creating a hypothetical “data grid” of new values. Lets say we are interested in comparing the number of fish caught for the contrast livebait conditional on persons and child. This time, lets call comparisons first to gain an understanding of the data generating the plot.\n\ncontrast_df = bmb.interpret.comparisons(\n model=fish_model,\n idata=fish_idata,\n contrast=\"livebait\",\n conditional=[\"persons\", \"child\"],\n)\n\ncontrast_df.head(10)\n\nDefault computed for contrast variable: livebait\nDefault computed for conditional variable: persons, child\nDefault computed for unspecified variable: camper\n\n\n\n\n\n\n \n \n \n term\n estimate_type\n value\n persons\n child\n camper\n estimate\n lower_3.0%\n upper_97.0%\n \n \n \n \n 0\n livebait\n diff\n (0, 1)\n 1\n 0\n 1\n 1.694646\n 1.252803\n 2.081207\n \n \n 1\n livebait\n diff\n (0, 1)\n 1\n 1\n 1\n 0.422448\n 0.299052\n 0.551766\n \n \n 2\n livebait\n diff\n (0, 1)\n 1\n 2\n 1\n 0.106202\n 0.063174\n 0.152961\n \n \n 3\n livebait\n diff\n (0, 1)\n 1\n 3\n 1\n 0.026923\n 0.012752\n 0.043035\n \n \n 4\n livebait\n diff\n (0, 1)\n 2\n 0\n 1\n 4.050713\n 3.299138\n 4.782635\n \n \n 5\n livebait\n diff\n (0, 1)\n 2\n 1\n 1\n 1.009094\n 0.755449\n 1.249551\n \n \n 6\n livebait\n diff\n (0, 1)\n 2\n 2\n 1\n 0.253511\n 0.163468\n 0.355214\n \n \n 7\n livebait\n diff\n (0, 1)\n 2\n 3\n 1\n 0.064224\n 0.032733\n 0.100539\n \n \n 8\n livebait\n diff\n (0, 1)\n 3\n 0\n 1\n 9.701813\n 8.391122\n 11.084168\n \n \n 9\n livebait\n diff\n (0, 1)\n 3\n 1\n 1\n 2.415233\n 1.922802\n 2.927737\n \n \n\n\n\n\nBefore we talk about the summary output, you should have noticed that messages are being logged to the console. By default interpret is verbose and logs a message to the console if a default value is computed for covariates in conditional and contrast. This is useful because unless the documentation is read, it can be difficult to tell which covariates are having default values computed for. Thus, Bambi has a config file bmb.config[\"INTERPRET_VERBOSE\"] where we can specify whether or not to log messages. By default, this is set to true. To turn off logging, set bmb.config[\"INTERPRET_VERBOSE\"] = False. From here on, we will turn off logging.\nAs livebait was encoded as a categorical dtype, Bambi returned the unique levels of \\([0, 1]\\) for the contrast. persons and child were passed as the first and second element and thus act as the main and group variables, respectively. It can be see from the output above, that an equally spaced grid was used to compute the values for persons, whereas a quantile based grid was used for child. Furthermore, as camper was unspecified, the mode was used as the default value. Lets go ahead and plot the commparisons.\n\nbmb.config[\"INTERPRET_VERBOSE\"] = False\n\n\nfig, ax = bmb.interpret.plot_comparisons(\n model=fish_model,\n idata=fish_idata,\n contrast=\"livebait\",\n conditional=[\"persons\", \"child\"],\n) \nfig.set_size_inches(7, 3)\n\n\n\n\nThe plot shows us that the expected differences in fish caught comparing a group of people who use livebait and no livebait is not only conditional on the number of persons, but also children. However, the plotted comparisons for child = \\(3\\) is difficult to interpret on a single plot. Thus, it can be useful to pass specific group and panel arguments to aid in the interpretation of the plot. Therefore, subplot_kwargs allows the user to manipulate the plotting by passing a dictionary where the keys are {\"main\": ..., \"group\": ..., \"panel\": ...} and the values are the names of the covariates to be plotted. Below, we plot the same comparisons as above, but this time we specify group and panel to both be child.\n\nfig, ax = bmb.interpret.plot_comparisons(\n model=fish_model,\n idata=fish_idata,\n contrast=\"livebait\",\n conditional=[\"persons\", \"child\"],\n subplot_kwargs={\"main\": \"persons\", \"group\": \"child\", \"panel\": \"child\"},\n fig_kwargs={\"figsize\":(12, 3), \"sharey\": True},\n legend=False\n) \n\n\n\n\n\n\n\nEvaluating average predictive comparisons at central values for the conditional covariates \\(c\\) can be problematic when the inputs have a large variance since no single central value (mean, median, etc.) is representative of the covariate. This is especially true when \\(c\\) exhibits bi or multimodality. Thus, it may be desireable to use the empirical distribution of \\(c\\) to compute the predictive comparisons, and then average over a specific or set of covariates to obtain the average predictive comparisons. To achieve unit level contrasts, do not pass a parameter into conditional and or specify None.\n\nunit_level = bmb.interpret.comparisons(\n model=fish_model,\n idata=fish_idata,\n contrast=\"livebait\",\n conditional=None,\n)\n\n# empirical distribution\nprint(unit_level.shape[0] == fish_model.data.shape[0])\nunit_level.head(10)\n\nTrue\n\n\n\n\n\n\n \n \n \n term\n estimate_type\n value\n camper\n child\n persons\n estimate\n lower_3.0%\n upper_97.0%\n \n \n \n \n 0\n livebait\n diff\n (0, 1)\n 0\n 0\n 1\n 0.864408\n 0.627063\n 1.116105\n \n \n 1\n livebait\n diff\n (0, 1)\n 1\n 0\n 1\n 1.694646\n 1.252803\n 2.081207\n \n \n 2\n livebait\n diff\n (0, 1)\n 0\n 0\n 1\n 0.864408\n 0.627063\n 1.116105\n \n \n 3\n livebait\n diff\n (0, 1)\n 1\n 1\n 2\n 1.009094\n 0.755449\n 1.249551\n \n \n 4\n livebait\n diff\n (0, 1)\n 0\n 0\n 1\n 0.864408\n 0.627063\n 1.116105\n \n \n 5\n livebait\n diff\n (0, 1)\n 1\n 2\n 4\n 1.453235\n 0.964674\n 1.956434\n \n \n 6\n livebait\n diff\n (0, 1)\n 0\n 1\n 3\n 1.233247\n 0.900295\n 1.569891\n \n \n 7\n livebait\n diff\n (0, 1)\n 0\n 3\n 4\n 0.188019\n 0.090328\n 0.289560\n \n \n 8\n livebait\n diff\n (0, 1)\n 1\n 2\n 3\n 0.606361\n 0.390571\n 0.818549\n \n \n 9\n livebait\n diff\n (0, 1)\n 1\n 0\n 1\n 1.694646\n 1.252803\n 2.081207\n \n \n\n\n\n\n\n# empirical (observed) data used to fit the model\nfish_model.data.head(10)\n\n\n\n\n\n \n \n \n count\n livebait\n camper\n persons\n child\n \n \n \n \n 0\n 0\n 0\n 0\n 1\n 0\n \n \n 1\n 0\n 1\n 1\n 1\n 0\n \n \n 2\n 0\n 1\n 0\n 1\n 0\n \n \n 3\n 0\n 1\n 1\n 2\n 1\n \n \n 4\n 1\n 1\n 0\n 1\n 0\n \n \n 5\n 0\n 1\n 1\n 4\n 2\n \n \n 6\n 0\n 1\n 0\n 3\n 1\n \n \n 7\n 0\n 1\n 0\n 4\n 3\n \n \n 8\n 0\n 0\n 1\n 3\n 2\n \n \n 9\n 1\n 1\n 1\n 1\n 0\n \n \n\n\n\n\nAbove, unit_level is the comparisons summary dataframe and fish_model.data is the empirical data. Notice how the values for \\(c\\) are identical in both dataframes. However, for \\(w\\), the values are different. However, these unit level contrasts are difficult to interpret as each row corresponds to that unit’s contrast. Therefore, it is useful to average over (marginalize) the estimates to summarize the unit level predictive comparisons.\n\n\nSince the empirical distrubution is used for computing the average predictive comparisons, the same number of rows (250) is returned as the data used to fit the model. To average over a covariate, use the average_by argument. If True is passed, then comparisons averages over all covariates. Else, if a single or list of covariates are passed, then comparisons averages by the covariates passed.\n\n# marginalize over all covariates\nbmb.interpret.comparisons(\n model=fish_model,\n idata=fish_idata,\n contrast=\"livebait\",\n conditional=None,\n average_by=True\n)\n\n\n\n\n\n \n \n \n term\n estimate_type\n value\n estimate\n lower_3.0%\n upper_97.0%\n \n \n \n \n 0\n livebait\n diff\n (0, 1)\n 3.649691\n 2.956185\n 4.333621\n \n \n\n\n\n\nPassing True to average_by averages over all covariates and is equivalent to taking the mean of the estimate and uncertainty columns. For example:\n\nunit_level = bmb.interpret.comparisons(\n model=fish_model,\n idata=fish_idata,\n contrast=\"livebait\",\n conditional=None,\n)\n\nunit_level[[\"estimate\", \"lower_3.0%\", \"upper_97.0%\"]].mean()\n\nestimate 3.649691\nlower_3.0% 2.956185\nupper_97.0% 4.333621\ndtype: float64\n\n\n\n\n\nAveraging over all covariates may not be desired, and you would rather average by a group or specific covariate. To perform averaging by subgroups, users can pass a single or list of covariates to average_by to average over specific covariates. For example, if we wanted to average by persons:\n\n# average by number of persons\nbmb.interpret.comparisons(\n model=fish_model,\n idata=fish_idata,\n contrast=\"livebait\",\n conditional=None,\n average_by=\"persons\"\n)\n\n\n\n\n\n \n \n \n term\n estimate_type\n value\n persons\n estimate\n lower_3.0%\n upper_97.0%\n \n \n \n \n 0\n livebait\n diff\n (0, 1)\n 1\n 1.374203\n 1.011290\n 1.708711\n \n \n 1\n livebait\n diff\n (0, 1)\n 2\n 1.963362\n 1.543330\n 2.376636\n \n \n 2\n livebait\n diff\n (0, 1)\n 3\n 3.701510\n 3.056586\n 4.357385\n \n \n 3\n livebait\n diff\n (0, 1)\n 4\n 7.358662\n 6.047642\n 8.655654\n \n \n\n\n\n\n\n# average by number of persons and camper by passing a list\nbmb.interpret.comparisons(\n model=fish_model,\n idata=fish_idata,\n contrast=\"livebait\",\n conditional=None,\n average_by=[\"persons\", \"camper\"]\n)\n\n\n\n\n\n \n \n \n term\n estimate_type\n value\n persons\n camper\n estimate\n lower_3.0%\n upper_97.0%\n \n \n \n \n 0\n livebait\n diff\n (0, 1)\n 1\n 0\n 0.864408\n 0.627063\n 1.116105\n \n \n 1\n livebait\n diff\n (0, 1)\n 1\n 1\n 1.694646\n 1.252803\n 2.081207\n \n \n 2\n livebait\n diff\n (0, 1)\n 2\n 0\n 1.424598\n 1.078389\n 1.777154\n \n \n 3\n livebait\n diff\n (0, 1)\n 2\n 1\n 2.344439\n 1.872191\n 2.800661\n \n \n 4\n livebait\n diff\n (0, 1)\n 3\n 0\n 2.429459\n 1.871578\n 2.964242\n \n \n 5\n livebait\n diff\n (0, 1)\n 3\n 1\n 4.443540\n 3.747840\n 5.170052\n \n \n 6\n livebait\n diff\n (0, 1)\n 4\n 0\n 3.541921\n 2.686445\n 4.391176\n \n \n 7\n livebait\n diff\n (0, 1)\n 4\n 1\n 10.739204\n 9.024702\n 12.432764\n \n \n\n\n\n\nIt is still possible to use plot_comparisons when passing an argument to average_by. In the plot below, the empirical distribution is used to compute unit level contrasts for livebait and then averaged over persons to obtain the average predictive comparisons. The plot below is similar to the second plot in this notebook. The differences being that: (1) a pairwise transition grid is defined for the second plot above, whereas the empirical distribution is used in the plot below, and (2) in the plot below, we marginalized over the other covariates in the model (thus the reason for not having a camper or child group and panel, and a reduction in the uncertainty interval).\n\nfig, ax = bmb.interpret.plot_comparisons(\n model=fish_model,\n idata=fish_idata,\n contrast=\"livebait\",\n conditional=None,\n average_by=\"persons\"\n)\nfig.set_size_inches(7, 3)\n\n\n\n\n\n\n\n\n\nTo showcase an additional functionality of comparisons and plot_comparisons, we fit a logistic regression model to the titanic dataset with interaction terms to model the probability of survival. The titanic dataset gives the values of four categorical attributes for each of the 2201 people on board the Titanic when it struck an iceberg and sank. The attributes are social class (first class, second class, third class, crewmember), age, sex (0 = female, 1 = male), and whether or not the person survived (0 = deceased, 1 = survived).\n\ndat = pd.read_csv(\"https://vincentarelbundock.github.io/Rdatasets/csv/Stat2Data/Titanic.csv\", index_col=0)\n\ndat[\"PClass\"] = dat[\"PClass\"].str.replace(\"[st, nd, rd]\", \"\", regex=True)\ndat[\"PClass\"] = dat[\"PClass\"].str.replace(\"*\", \"0\").astype(int)\ndat[\"PClass\"] = dat[\"PClass\"].replace(0, np.nan)\ndat[\"PClass\"] = pd.Categorical(dat[\"PClass\"], ordered=True)\ndat[\"SexCode\"] = pd.Categorical(dat[\"SexCode\"], ordered=True)\n\ndat = dat.dropna(axis=0, how=\"any\")\n\n\ntitanic_model = bmb.Model(\n \"Survived ~ PClass * SexCode * Age\", \n data=dat, \n family=\"bernoulli\"\n)\ntitanic_idata = titanic_model.fit(\n draws=500, \n tune=500, \n target_accept=0.95, \n random_seed=1234\n)\n\nModeling the probability that Survived==1\nAuto-assigning NUTS sampler...\nInitializing NUTS using jitter+adapt_diag...\nMultiprocess sampling (4 chains in 4 jobs)\nNUTS: [Intercept, PClass, SexCode, PClass:SexCode, Age, PClass:Age, SexCode:Age, PClass:SexCode:Age]\n\n\n\n\n\n\n\n\n\n\n\nSampling 4 chains for 500 tune and 500 draw iterations (2_000 + 2_000 draws total) took 127 seconds.\n\n\n\n\ncomparisons and plot_comparisons also allow you to specify the type of comparison to be computed. By default, a difference is used. However, it is also possible to take the ratio where comparisons would then become average predictive ratios. To achieve this, pass \"ratio\" into the argument comparison_type. Using different comparison types offers a way to produce alternative insights; especially when there are interaction terms as the value of one covariate depends on the value of the other covariate.\n\nfig, ax = bmb.interpret.plot_comparisons(\n model=titanic_model,\n idata=titanic_idata,\n contrast={\"PClass\": [1, 3]},\n conditional=[\"Age\", \"SexCode\"],\n comparison_type=\"ratio\",\n subplot_kwargs={\"main\": \"Age\", \"group\": \"SexCode\", \"panel\": \"SexCode\"},\n fig_kwargs={\"figsize\":(12, 3), \"sharey\": True},\n legend=False\n\n)\n\n\n\n\nThe left panel shows that the ratio of the probability of survival comparing PClass \\(3\\) to \\(1\\) conditional on Age is non-constant. Whereas the right panel shows an approximately constant ratio in the probability of survival comparing PClass \\(3\\) to \\(1\\) conditional on Age.\n\n%load_ext watermark\n%watermark -n -u -v -iv -w\n\nLast updated: Thu Aug 15 2024\n\nPython implementation: CPython\nPython version : 3.11.9\nIPython version : 8.24.0\n\nbambi : 0.14.1.dev12+g64e57423.d20240730\npandas: 2.2.2\nnumpy : 1.26.4\narviz : 0.18.0\n\nWatermark: 2.4.3" }, { "objectID": "notebooks/plot_predictions.html", "href": "notebooks/plot_predictions.html", "title": "Bambi", "section": "", - "text": "This notebook shows how to use, and the capabilities, of the plot_predictions function. The plot_predictions function is a part of Bambi’s sub-package interpret that features a set of tools used to interpret complex regression models that is inspired by the R package marginaleffects.\n\n\nThe purpose of the generalized linear model (GLM) is to unify the approaches needed to analyze data for which either: (1) the assumption of a linear relation between \\(x\\) and \\(y\\), or (2) the assumption of normal variation is not appropriate. GLMs are typically specified in three stages: 1. the linear predictor \\(\\eta = X\\beta\\) where \\(X\\) is an \\(n\\) x \\(p\\) matrix of explanatory variables. 2. the link function \\(g(\\cdot)\\) that relates the linear predictor to the mean of the outcome variable \\(\\mu = g^{-1}(\\eta) = g^{-1}(X\\beta)\\) 3. the random component specifying the distribution of the outcome variable \\(y\\) with mean \\(\\mathbb{E}(y|X) = \\mu\\).\nBased on these three specifications, the mean of the distribution of \\(y\\), given \\(X\\), is determined by \\(X\\beta: \\mathbb{E}(y|X) = g^{-1}(X\\beta)\\).\nGLMs are a broad family of models where the output \\(y\\) is typically assumed to follow an exponential family distribution, e.g., Binomial, Poisson, Gamma, Exponential, and Normal. The job of the link function is to map the linear space of the model \\(X\\beta\\) onto the non-linear space of a parameter like \\(\\mu\\). Commonly used link function are the logit and log link. Also known as the canonical link functions. This brief introduction to GLMs is not meant to be exhuastive, and another good starting point is the Bambi Basic Building Blocks example.\nDue to the link function, there are typically three quantities of interest to interpret in a GLM: 1. the linear predictor \\(\\eta\\) 2. the mean \\(\\mu = g^{-1}(\\eta)\\) 3. the response variable \\(Y \\sim \\mathcal{D}(\\mu, \\theta)\\) where \\(\\mu\\) is the mean parameter and \\(\\theta\\) is (possibly) a vector that contains all the other “nuissance” parameters of the distribution.\nAs modelers, we are usually more interested in interpreting (2) and (3). However, \\(\\mu\\) is not always on the same scale of the response variable and can be more difficult to interpret. Rather, the response scale is a more interpretable scale. Additionally, it is often the case that modelers would like to analyze how a model parameter varies across a range of explanatory variable values. To achieve such an analysis, Bambi has taken inspiration from the R package marginaleffects, and implemented a plot_predictions function that plots the conditional adjusted predictions to aid in the interpretation of GLMs. Below, it is briefly discussed what are conditionally adjusted predictions, how they are computed, and ultimately how to use the plot_predictions function.\n\n\n\nAdjusted predictions refers to the outcome predicted by a fitted model on a specified scale for a given combination of values of the predictor variables, such as their observed values, their means, or some user specified grid of values. The specification of the scale to make the predictions, the link or response scale, refers to the scale used to estimate the model. In normal linear regression, the link scale and the response scale are identical, and therefore, the adjusted prediction is expressed as the mean value of the response variable at the given values of the predictor variables. On the other hand, a logistic regression’s link and response scale are not identical. An adjusted prediction on the link scale will be represented as the log-odds of a successful response given values of the predictor variables. Whereas an adjusted prediction on the response scale gives the probability that the response variable equals 1. The conditional part of conditionally adjusted predictions represents the specific predictor(s) and its values we would like to condition on when plotting predictions.\n\n\nThe objective of plotting conditional adjusted predictions is to visualize how a parameter of the (conditional) response distribution varies as a function of (some) explanatory variables. In predictions, there are three scenarios to compute conditional adjusted predictions:\n\nuser provided values\na grid of equally spaced and central values\nempirical distribution (original data used to fit the model)\n\nIn the case of (1) above, a dictionary is passed with the explanatory variables as keys, and the values to condition on are the values. With this dictionary, Bambi assembles all pairwise combinations (transitions) of the specified explanatory variables into a new “hypothetical” dataset. Covariates not existient in the dictionary are held at their mean or mode.\nIn (2), a string or list is passed with the name(s) of the explanatory variable(s) to create a grid of equally spaced values. This is done by holding all other explanatory variables constant at some specified value, a reference grid, that may or may not correspond to actual observations in the dataset used to fit the model. By default, the plot_predictions function uses a grid of 200 equally spaced values between the minimum and maximum values of the specified explanatory variable as the reference grid.\nLastly, in (3), the original data used to fit the model is used to compute predictions. This is known as unit level predictions.\nUsing the data, from scenario 1, 2, or 3, the plot_predictions function uses the fitted model to then compute the predictions. The plot_predictions function then uses these predictions to plot the model parameter as a function of (some) explanatory variable.\n\nimport matplotlib.pyplot as plt\nimport numpy as np\nimport pandas as pd\nimport warnings\n\nimport bambi as bmb\n\nwarnings.simplefilter(action=\"ignore\", category=FutureWarning)\n\n\n\n\n\nFor the first demonstration, we will use a Gaussian linear regression model with the mtcars dataset to better understand the plot_predictions function and its arguments. The mtcars dataset was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models). The following is a brief description of the variables in the dataset:\n\nmpg: Miles/(US) gallon\ncyl: Number of cylinders\ndisp: Displacement (cu.in.)\nhp: Gross horsepower\ndrat: Rear axle ratio\nwt: Weight (1000 lbs)\nqsec: 1/4 mile time\nvs: Engine (0 = V-shaped, 1 = straight)\nam: Transmission (0 = automatic, 1 = manual)\ngear: Number of forward gear\n\n\n# Load data\ndata = bmb.load_data('mtcars')\ndata[\"cyl\"] = data[\"cyl\"].replace({4: \"low\", 6: \"medium\", 8: \"high\"})\ndata[\"gear\"] = data[\"gear\"].replace({3: \"A\", 4: \"B\", 5: \"C\"})\ndata[\"cyl\"] = pd.Categorical(data[\"cyl\"], categories=[\"low\", \"medium\", \"high\"], ordered=True)\n\n# Define and fit the Bambi model\nmodel = bmb.Model(\"mpg ~ 0 + hp * wt + cyl + gear\", data)\nidata = model.fit(draws=1000, target_accept=0.95, random_seed=1234)\n\nAuto-assigning NUTS sampler...\nInitializing NUTS using jitter+adapt_diag...\nMultiprocess sampling (2 chains in 2 jobs)\nNUTS: [sigma, hp, wt, hp:wt, cyl, gear]\n\n\n\n\n\n\n\n\n\n\n\nSampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 41 seconds.\nWe recommend running at least 4 chains for robust computation of convergence diagnostics\nThe rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details\n\n\nWe can print the Bambi model object to obtain the model components. Below, we see that the Gaussian linear model uses an identity link function that results in no transformation of the linear predictor to the mean of the outcome variable, and the distrbution of the likelihood is Gaussian.\n\n\nNow that we have fitted the model, we can visualize how a model parameter varies as a function of (some) interpolated covariate. For this example, we will visualize how the mean response mpg varies as a function of the covariate hp.\nThe Bambi model, ArviZ inference data object (containing the posterior samples and the data used to fit the model), and a list or dictionary of covariates, in this example only hp, are passed to the plot_predictions function. The plot_predictions function then computes the conditional adjusted predictions for each covariate in the list or dictionary using the method described above. The plot_predictions function returns a matplotlib figure object that can be further customized.\n\nfig, ax = plt.subplots(figsize=(7, 3), dpi=120)\nbmb.interpret.plot_predictions(model, idata, \"hp\", ax=ax);\n\nDefault computed for conditional variable: hp\nDefault computed for unspecified variable: cyl, gear, wt\n\n\n\n\n\nBefore we talk about the plot, you will notice that some messages have been logged to the console. By default interpret is verbose and logs a message to the console if a default value is computed for covariates in conditional. This is useful because unless the documentation is read, it can be difficult to tell which covariates are having default values computed for. Thus, Bambi has a config file bmb.config[\"INTERPRET_VERBOSE\"] where we can specify whether or not to log messages. By default, this is set to true. To turn off logging, set bmb.config[\"INTERPRET_VERBOSE\"] = False. From here on, we will turn off logging.\nThe plot above shows that as hp increases, the mean mpg decreases. As stated above, this insight was obtained by creating the reference grid and then using the fitted model to compute the predicted values of the model parameter, in this example mpg, at each value of the reference grid.\nBy default, plot_predictions uses the highest density interval (HDI) of the posterior distribution to compute the credible interval of the conditional adjusted predictions. The HDI is a Bayesian analog to the frequentist confidence interval. The HDI is the shortest interval that contains a specified probability of the posterior distribution. By default, plot_predictions uses the 94% HDI.\nplot_predictions uses the posterior distribution by default to visualize some mean outcome parameter . However, the posterior predictive distribution can also be plotted by specifying pps=True where pps stands for posterior predictive samples of the response variable.\n\nbmb.config[\"INTERPRET_VERBOSE\"] = False\n\n\nfig, ax = plt.subplots(figsize=(7, 3), dpi=120)\nbmb.interpret.plot_predictions(model, idata, \"hp\", pps=True, ax=ax);\n\n\n\n\nHere, we notice that the uncertainty in the conditional adjusted predictions is much larger than the uncertainty when pps=False. This is because the posterior predictive distribution accounts for the uncertainty in the model parameters and the uncertainty in the data. Whereas, the posterior distribution only accounts for the uncertainty in the model parameters.\nAdditionally, predictions can be called to obtain a summary dataframe of the data, mean predictions (estimate), and uncertainty interval. The covariate columns in this dataframe is used to create the plot.\n\nsummary_df = bmb.interpret.predictions(model, idata, \"hp\", pps=True)\nsummary_df.head(10)\n\n\n\n\n\n \n \n \n hp\n cyl\n gear\n wt\n estimate\n lower_3.0%\n upper_97.0%\n \n \n \n \n 0\n 52\n high\n A\n 3.21725\n 21.920836\n 15.675414\n 28.096253\n \n \n 1\n 57\n high\n A\n 3.21725\n 21.864299\n 16.083605\n 28.164090\n \n \n 2\n 63\n high\n A\n 3.21725\n 21.556234\n 15.924024\n 27.673221\n \n \n 3\n 69\n high\n A\n 3.21725\n 21.255795\n 15.181309\n 27.002164\n \n \n 4\n 75\n high\n A\n 3.21725\n 21.091406\n 14.821219\n 26.502849\n \n \n 5\n 80\n high\n A\n 3.21725\n 20.955432\n 15.520032\n 26.290511\n \n \n 6\n 86\n high\n A\n 3.21725\n 20.667079\n 15.119132\n 26.286052\n \n \n 7\n 92\n high\n A\n 3.21725\n 20.459366\n 14.927590\n 25.803844\n \n \n 8\n 98\n high\n A\n 3.21725\n 20.282865\n 14.978005\n 25.363441\n \n \n 9\n 103\n high\n A\n 3.21725\n 20.032851\n 14.842361\n 25.499042\n \n \n\n\n\n\nplot_predictions allows up to three covariates to be plotted simultaneously where the first element in the list represents the main (x-axis) covariate, the second element the group (hue / color), and the third element the facet (panel). However, when plotting more than one covariate, it can be useful to pass specific group and panel arguments to aid in the interpretation of the plot. Therefore, subplot_kwargs allows the user to manipulate the plotting by passing a dictionary where the keys are {\"main\": ..., \"group\": ..., \"panel\": ...} and the values are the names of the covariates to be plotted. For example, passing two covariates hp and wt and specifying subplot_kwargs={\"main\": \"hp\", \"group\": \"wt\", \"panel\": \"wt\"}.\n\nbmb.interpret.plot_predictions(\n model=model, \n idata=idata, \n conditional={\"hp\": np.linspace(50, 350, 50), \"wt\": np.linspace(1, 6, 5)},\n legend=False,\n subplot_kwargs={\"main\": \"hp\", \"group\": \"wt\", \"panel\": \"wt\"},\n fig_kwargs={\"figsize\": (20, 8), \"sharey\": True}\n)\nplt.tight_layout();\n\n\n\n\nFurthermore, categorical covariates can also be plotted. We plot the the mean mpg as a function of the two categorical covariates gear and cyl below. The plot_predictions function automatically plots the conditional adjusted predictions for each level of the categorical covariate. Furthermore, when passing a list of covariates into the plot_predictions function, the list will be converted into a dictionary object where the key is taken from (“horizontal”, “color”, “panel”) and the values are the names of the variables. By default, the first element of the list is specified as the “horizontal” covariate, the second element of the list is specified as the “color” covariate, and the third element of the list is mapped to different plot panels.\n\nfig, ax = plt.subplots(figsize=(7, 3), dpi=120)\nbmb.interpret.plot_predictions(model, idata, [\"gear\", \"cyl\"], ax=ax);\n\n\n\n\n\n\n\nIn the previous example, default values were computed to construct a reference grid to compute the conditional adjusted predictions. We can also pass our own values for the covariates into conditional using a dictionary where the key-value pairs are the covariate and value(s) of interest. For example, if we wanted to compute the predictions for hp=100, wt=[1.5, 3.5], and cyl=[\"low\", \"medium\", \"high\"] we would pass the following dictionary in the code block below. As can be seen, several data types can be passed such as: np.ndarray, list, int, float, and str.\nFurthermore, Bambi by default, maps the order of the dict keys to the main, group, and panel of the matplotlib figure. Below, since hp is the first key, this is used for the x-axis, wt for the group (color), and cyl for the panel (facet).\n\nbmb.interpret.plot_predictions(\n model,\n idata,\n conditional={\n \"hp\": [100, 120],\n \"wt\": np.array([1.5, 3.5]),\n \"cyl\": [\"low\", \"medium\", \"high\"]\n },\n fig_kwargs={\"figsize\": (10, 4), \"sharey\": True}\n);\n\n\n\n\nBefore the plot is described, lets see how the dictionary passed to conditional was used to create the dataset in order to compute predictions.\n\nsummary_df = bmb.interpret.predictions(\n model,\n idata,\n conditional={\n \"hp\": [100, 120],\n \"wt\": np.array([1.5, 3.5]),\n \"cyl\": [\"low\", \"medium\", \"high\"]\n },\n)\nsummary_df\n\n\n\n\n\n \n \n \n hp\n wt\n cyl\n gear\n estimate\n lower_3.0%\n upper_97.0%\n \n \n \n \n 0\n 100\n 1.5\n high\n A\n 26.517721\n 22.231329\n 31.409020\n \n \n 1\n 100\n 1.5\n low\n A\n 27.154631\n 23.710370\n 30.854390\n \n \n 2\n 100\n 1.5\n medium\n A\n 25.388964\n 20.858360\n 29.439134\n \n \n 3\n 100\n 3.5\n high\n A\n 19.125688\n 16.198962\n 22.480499\n \n \n 4\n 100\n 3.5\n low\n A\n 19.762599\n 16.651143\n 23.520939\n \n \n 5\n 100\n 3.5\n medium\n A\n 17.996931\n 15.646609\n 20.272951\n \n \n 6\n 120\n 1.5\n high\n A\n 25.242986\n 21.389207\n 29.835175\n \n \n 7\n 120\n 1.5\n low\n A\n 25.879897\n 21.993742\n 29.826717\n \n \n 8\n 120\n 1.5\n medium\n A\n 24.114229\n 19.499411\n 27.954867\n \n \n 9\n 120\n 3.5\n high\n A\n 18.499053\n 15.805280\n 21.040859\n \n \n 10\n 120\n 3.5\n low\n A\n 19.135964\n 15.476760\n 22.781403\n \n \n 11\n 120\n 3.5\n medium\n A\n 17.370296\n 14.906599\n 19.609224\n \n \n\n\n\n\nWhen a dictionary is passed, that informs Bambi that the user wants to compute predictions on user provided values. Thus, a pairwise grid is constructed using the dictionary values. Otherwise, a dataframe of unequal array lengths cannot be constructed. Furthermore, since gear was not passed as a key, but was a term in the model, the default value of A was computed for it.\nGiven we now know that a pairwise grid was computed usiong the conditional dict, One interpretation of the plot above is that across all cylinder groups, a larger wt results in a lower mean mpg.\n\n\n\nIn the previous example, user provided values were computed to construct a pairwise grid to compute the conditional adjusted predictions. It is also possible to compute predictions using the observed (empirical) data used to fit the model and then average over a specific or set of covariates to obtain average adjusted predictions. This is known as unit level predictions. To compute unit level predictions, do not pass any values to the conditional arg. and or specify None (the default).\n\nsummary_df = bmb.interpret.predictions(\n model,\n idata,\n conditional=None\n)\nsummary_df.head()\n\n\n\n\n\n \n \n \n cyl\n gear\n hp\n wt\n estimate\n lower_3.0%\n upper_97.0%\n \n \n \n \n 0\n medium\n B\n 110\n 2.620\n 22.230773\n 20.057140\n 24.322536\n \n \n 1\n medium\n B\n 110\n 2.875\n 21.329605\n 19.371961\n 23.354494\n \n \n 2\n low\n B\n 93\n 2.320\n 25.914300\n 24.439324\n 27.767257\n \n \n 3\n medium\n A\n 110\n 3.215\n 18.690801\n 16.457099\n 21.072167\n \n \n 4\n high\n A\n 175\n 3.440\n 16.924656\n 15.260324\n 18.603531\n \n \n\n\n\n\n\n# data used to fit the model\nmodel.data[[\"cyl\", \"gear\", \"hp\", \"wt\"]].head()\n\n\n\n\n\n \n \n \n cyl\n gear\n hp\n wt\n \n \n \n \n 0\n medium\n B\n 110\n 2.620\n \n \n 1\n medium\n B\n 110\n 2.875\n \n \n 2\n low\n B\n 93\n 2.320\n \n \n 3\n medium\n A\n 110\n 3.215\n \n \n 4\n high\n A\n 175\n 3.440\n \n \n\n\n\n\nNotice how the data in the summary dataframe and model data are the same.\n\n\nSince the empirical distrubution is used for computing predictions, the same number of rows (\\(32\\)) is returned as the data used to fit the model. To average over a covariate, use the average_by argument. If True is passed, then predictions averages over all covariates and a single estimate is returned. Else, if a single or list of covariates are passed, then predictions averages by the covariates passed.\n\nsummary_df = bmb.interpret.predictions(\n model,\n idata,\n conditional=None,\n average_by=True\n)\nsummary_df\n\n\n\n\n\n \n \n \n estimate\n lower_3.0%\n upper_97.0%\n \n \n \n \n 0\n 20.072681\n 17.957895\n 22.215034\n \n \n\n\n\n\n\n\n\nIt is still possible to plot predictions when computing unit level predictions. However, now a covariate(s) must be passed to average_by to obtain average adjusted predictions by group. In the plot below, we obtain average predictions grouped by gear and cyl.\n\nbmb.interpret.plot_predictions(\n model,\n idata,\n conditional=None,\n average_by=[\"gear\", \"cyl\"],\n fig_kwargs={\"figsize\": (7, 3)},\n);\n\n\n\n\n\n\n\n\n\nLets move onto a model that uses a distribution that is a member of the exponential distribution family and utilizes a link function. For this, we will implement the Negative binomial model from the students absences example. School administrators study the attendance behavior of high school juniors at two schools. Predictors of the number of days of absence include the type of program in which the student is enrolled and a standardized test in math. We have attendance data on 314 high school juniors. The variables of insterest in the dataset are the following:\n\ndaysabs: The number of days of absence. It is our response variable.\nprogr: The type of program. Can be one of ‘General’, ‘Academic’, or ‘Vocational’.\nmath: Score in a standardized math test.\n\n\n# Load data, define and fit Bambi model\ndata = pd.read_stata(\"https://stats.idre.ucla.edu/stat/stata/dae/nb_data.dta\")\ndata[\"prog\"] = data[\"prog\"].map({1: \"General\", 2: \"Academic\", 3: \"Vocational\"})\n\nmodel_interaction = bmb.Model(\n \"daysabs ~ 0 + prog + scale(math) + prog:scale(math)\",\n data,\n family=\"negativebinomial\"\n)\nidata_interaction = model_interaction.fit(\n draws=1000, target_accept=0.95, random_seed=1234, chains=4\n)\n\nAuto-assigning NUTS sampler...\nInitializing NUTS using jitter+adapt_diag...\nMultiprocess sampling (4 chains in 2 jobs)\nNUTS: [alpha, prog, scale(math), prog:scale(math)]\n\n\n\n\n\n\n\n\n\n\n\nSampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 10 seconds.\n\n\nThis model utilizes a log link function and a negative binomial distribution for the likelihood. Also note that this model also contains an interaction prog:sale(math).\n\nmodel_interaction\n\n Formula: daysabs ~ 0 + prog + scale(math) + prog:scale(math)\n Family: negativebinomial\n Link: mu = log\n Observations: 314\n Priors: \n target = mu\n Common-level effects\n prog ~ Normal(mu: [0. 0. 0.], sigma: [5.0102 7.4983 5.2746])\n scale(math) ~ Normal(mu: 0.0, sigma: 2.5)\n prog:scale(math) ~ Normal(mu: [0. 0.], sigma: [6.1735 4.847 ])\n \n Auxiliary parameters\n alpha ~ HalfCauchy(beta: 1.0)\n------\n* To see a plot of the priors call the .plot_priors() method.\n* To see a summary or plot of the posterior pass the object returned by .fit() to az.summary() or az.plot_trace()\n\n\n\nfig, ax = plt.subplots(figsize=(7, 3), dpi=120)\nbmb.interpret.plot_predictions(\n model_interaction, \n idata_interaction, \n \"math\", \n ax=ax, \n pps=False\n);\n\n\n\n\nThe plot above shows that as math increases, the mean daysabs decreases. However, as the model contains an interaction term, the effect of math on daysabs depends on the value of prog. Therefore, we will use plot_predictions to plot the conditional adjusted predictions for each level of prog.\n\nfig, ax = plt.subplots(figsize=(7, 3), dpi=120)\nbmb.interpret.plot_predictions(\n model_interaction, \n idata_interaction, \n [\"math\", \"prog\"], \n ax=ax, \n pps=False\n);\n\n\n\n\nPassing specific subplot_kwargs can allow for a more interpretable plot. Especially when the posterior predictive distribution plot results in overlapping credible intervals.\n\nbmb.interpret.plot_predictions(\n model_interaction, \n idata_interaction, \n conditional=[\"math\", \"prog\"],\n pps=True,\n subplot_kwargs={\"main\": \"math\", \"group\": \"prog\", \"panel\": \"prog\"},\n legend=False,\n fig_kwargs={\"figsize\": (16, 5), \"sharey\": True}\n);\n\n\n\n\n\n\n\nTo further demonstrate the plot_predictions function, we will implement a logistic regression model. This example is taken from the marginaleffects plot_predictions documentation. The internet movie database, http://imdb.com/, is a website devoted to collecting movie data supplied by studios and fans. It claims to be the biggest movie database on the web and is run by Amazon. The movies in this dataset were selected for inclusion if they had a known length and had been rated by at least one imdb user. The dataset below contains 28,819 rows and 24 columns. The variables of interest in the dataset are the following: - title. Title of the movie. - year. Year of release. - budget. Total budget (if known) in US dollars - length. Length in minutes. - rating. Average IMDB user rating. - votes. Number of IMDB users who rated this movie. - r1-10. Multiplying by ten gives percentile (to nearest 10%) of users who rated this movie a 1. - mpaa. MPAA rating. - action, animation, comedy, drama, documentary, romance, short. Binary variables represent- ing if movie was classified as belonging to that genre.\n\ndata = pd.read_csv(\"https://vincentarelbundock.github.io/Rdatasets/csv/ggplot2movies/movies.csv\")\n\ndata[\"style\"] = \"Other\"\ndata.loc[data[\"Action\"] == 1, \"style\"] = \"Action\"\ndata.loc[data[\"Comedy\"] == 1, \"style\"] = \"Comedy\"\ndata.loc[data[\"Drama\"] == 1, \"style\"] = \"Drama\"\ndata[\"certified_fresh\"] = (data[\"rating\"] >= 8) * 1\ndata = data[data[\"length\"] < 240]\n\npriors = {\"style\": bmb.Prior(\"Normal\", mu=0, sigma=2)}\nmodel = bmb.Model(\"certified_fresh ~ 0 + length * style\", data=data, priors=priors, family=\"bernoulli\")\nidata = model.fit(random_seed=1234, target_accept=0.9, init=\"adapt_diag\")\n\nModeling the probability that certified_fresh==1\nAuto-assigning NUTS sampler...\nInitializing NUTS using adapt_diag...\nMultiprocess sampling (2 chains in 2 jobs)\nNUTS: [length, style, length:style]\n\n\n\n\n\n\n\n\n\n\n\nSampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 484 seconds.\nWe recommend running at least 4 chains for robust computation of convergence diagnostics\n\n\nThe logistic regression model uses a logit link function and a Bernoulli likelihood. Therefore, the link scale is the log-odds of a successful response and the response scale is the probability of a successful response.\n\nmodel\n\n Formula: certified_fresh ~ 0 + length * style\n Family: bernoulli\n Link: p = logit\n Observations: 58662\n Priors: \n target = p\n Common-level effects\n length ~ Normal(mu: 0.0, sigma: 0.0708)\n style ~ Normal(mu: 0.0, sigma: 2.0)\n length:style ~ Normal(mu: [0. 0. 0.], sigma: [0.0702 0.0509 0.0611])\n------\n* To see a plot of the priors call the .plot_priors() method.\n* To see a summary or plot of the posterior pass the object returned by .fit() to az.summary() or az.plot_trace()\n\n\nAgain, by default, the plot_predictions function plots the mean outcome on the response scale. Therefore, the plot below shows the probability of a successful response certified_fresh as a function of length.\n\nfig, ax = plt.subplots(figsize=(7, 3), dpi=120)\nbmb.interpret.plot_predictions(model, idata, \"length\", ax=ax);\n\n\n\n\nAdditionally, we can see how the probability of certified_fresh varies as a function of categorical covariates.\n\nfig, ax = plt.subplots(figsize=(7, 3), dpi=120)\nbmb.interpret.plot_predictions(model, idata, \"style\", ax=ax);\n\n\n\n\n\n\n\nplot_predictions also has the argument target where target determines what parameter of the response distribution is plotted as a function of the explanatory variables. This argument is useful in distributional models, i.e., when the response distribution contains a parameter for location, scale and or shape. The default of this argument is mean and passing a parameter into target only works when the argument pps=False because when pps=True the posterior predictive distribution is plotted and thus, can only refer to the outcome variable (instead of any of the parameters of the response distribution). For this example, we will simulate our own dataset.\n\nrng = np.random.default_rng(121195)\nN = 200\na, b = 0.5, 1.1\nx = rng.uniform(-1.5, 1.5, N)\nshape = np.exp(0.3 + x * 0.5 + rng.normal(scale=0.1, size=N))\ny = rng.gamma(shape, np.exp(a + b * x) / shape, N)\ndata_gamma = pd.DataFrame({\"x\": x, \"y\": y})\n\nformula = bmb.Formula(\"y ~ x\", \"alpha ~ x\")\nmodel = bmb.Model(formula, data_gamma, family=\"gamma\")\nidata = model.fit(random_seed=1234)\n\nAuto-assigning NUTS sampler...\nInitializing NUTS using jitter+adapt_diag...\nMultiprocess sampling (2 chains in 2 jobs)\nNUTS: [Intercept, x, alpha_Intercept, alpha_x]\n\n\n\n\n\n\n\n\n\n\n\nSampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 6 seconds.\nThere were 1 divergences after tuning. Increase `target_accept` or reparameterize.\nWe recommend running at least 4 chains for robust computation of convergence diagnostics\n\n\n\nmodel\n\n Formula: y ~ x\n alpha ~ x\n Family: gamma\n Link: mu = inverse\n alpha = log\n Observations: 200\n Priors: \n target = mu\n Common-level effects\n Intercept ~ Normal(mu: 0.0, sigma: 2.5037)\n x ~ Normal(mu: 0.0, sigma: 2.8025)\n target = alpha\n Common-level effects\n alpha_Intercept ~ Normal(mu: 0.0, sigma: 1.0)\n alpha_x ~ Normal(mu: 0.0, sigma: 1.0)\n------\n* To see a plot of the priors call the .plot_priors() method.\n* To see a summary or plot of the posterior pass the object returned by .fit() to az.summary() or az.plot_trace()\n\n\nThe model we defined uses a gamma distribution parameterized by alpha and mu where alpha utilizes a log link and mu goes through an inverse link. Therefore, we can plot either: (1) the mu of the response distribution (which is the default), or (2) alpha of the response distribution as a function of the explanatory variable \\(x\\).\n\n# First, the mean of the response (default)\nfig, ax = plt.subplots(figsize=(7, 3), dpi=120)\nbmb.interpret.plot_predictions(model, idata, \"x\", ax=ax);\n\n\n\n\nBelow, instead of plotting the default target, target=mean, we set target=alpha to visualize how the model parameter alpha varies as a function of the x predictor.\n\n# Second, another param. of the distribution: alpha\nfig, ax = plt.subplots(figsize=(7, 3), dpi=120)\nbmb.interpret.plot_predictions(model, idata, \"x\", target=\"alpha\", ax=ax);\n\n\n\n\n\n%load_ext watermark\n%watermark -n -u -v -iv -w\n\nLast updated: Sat May 25 2024\n\nPython implementation: CPython\nPython version : 3.11.9\nIPython version : 8.24.0\n\nnumpy : 1.26.4\nmatplotlib: 3.8.4\npandas : 2.2.2\nbambi : 0.13.1.dev37+g2a54df76.d20240525\n\nWatermark: 2.4.3" + "text": "This notebook shows how to use, and the capabilities, of the plot_predictions function. The plot_predictions function is a part of Bambi’s sub-package interpret that features a set of tools used to interpret complex regression models that is inspired by the R package marginaleffects.\n\n\nThe purpose of the generalized linear model (GLM) is to unify the approaches needed to analyze data for which either: (1) the assumption of a linear relation between \\(x\\) and \\(y\\), or (2) the assumption of normal variation is not appropriate. GLMs are typically specified in three stages: 1. the linear predictor \\(\\eta = X\\beta\\) where \\(X\\) is an \\(n\\) x \\(p\\) matrix of explanatory variables. 2. the link function \\(g(\\cdot)\\) that relates the linear predictor to the mean of the outcome variable \\(\\mu = g^{-1}(\\eta) = g^{-1}(X\\beta)\\) 3. the random component specifying the distribution of the outcome variable \\(y\\) with mean \\(\\mathbb{E}(y|X) = \\mu\\).\nBased on these three specifications, the mean of the distribution of \\(y\\), given \\(X\\), is determined by \\(X\\beta: \\mathbb{E}(y|X) = g^{-1}(X\\beta)\\).\nGLMs are a broad family of models where the output \\(y\\) is typically assumed to follow an exponential family distribution, e.g., Binomial, Poisson, Gamma, Exponential, and Normal. The job of the link function is to map the linear space of the model \\(X\\beta\\) onto the non-linear space of a parameter like \\(\\mu\\). Commonly used link function are the logit and log link. Also known as the canonical link functions. This brief introduction to GLMs is not meant to be exhuastive, and another good starting point is the Bambi Basic Building Blocks example.\nDue to the link function, there are typically three quantities of interest to interpret in a GLM: 1. the linear predictor \\(\\eta\\) 2. the mean \\(\\mu = g^{-1}(\\eta)\\) 3. the response variable \\(Y \\sim \\mathcal{D}(\\mu, \\theta)\\) where \\(\\mu\\) is the mean parameter and \\(\\theta\\) is (possibly) a vector that contains all the other “nuissance” parameters of the distribution.\nAs modelers, we are usually more interested in interpreting (2) and (3). However, \\(\\mu\\) is not always on the same scale of the response variable and can be more difficult to interpret. Rather, the response scale is a more interpretable scale. Additionally, it is often the case that modelers would like to analyze how a model parameter varies across a range of explanatory variable values. To achieve such an analysis, Bambi has taken inspiration from the R package marginaleffects, and implemented a plot_predictions function that plots the conditional adjusted predictions to aid in the interpretation of GLMs. Below, it is briefly discussed what are conditionally adjusted predictions, how they are computed, and ultimately how to use the plot_predictions function.\n\n\n\nAdjusted predictions refers to the outcome predicted by a fitted model on a specified scale for a given combination of values of the predictor variables, such as their observed values, their means, or some user specified grid of values. The specification of the scale to make the predictions, the link or response scale, refers to the scale used to estimate the model. In normal linear regression, the link scale and the response scale are identical, and therefore, the adjusted prediction is expressed as the mean value of the response variable at the given values of the predictor variables. On the other hand, a logistic regression’s link and response scale are not identical. An adjusted prediction on the link scale will be represented as the log-odds of a successful response given values of the predictor variables. Whereas an adjusted prediction on the response scale gives the probability that the response variable equals 1. The conditional part of conditionally adjusted predictions represents the specific predictor(s) and its values we would like to condition on when plotting predictions.\n\n\nThe objective of plotting conditional adjusted predictions is to visualize how a parameter of the (conditional) response distribution varies as a function of (some) explanatory variables. In predictions, there are three scenarios to compute conditional adjusted predictions:\n\nuser provided values\na grid of equally spaced and central values\nempirical distribution (original data used to fit the model)\n\nIn the case of (1) above, a dictionary is passed with the explanatory variables as keys, and the values to condition on are the values. With this dictionary, Bambi assembles all pairwise combinations (transitions) of the specified explanatory variables into a new “hypothetical” dataset. Covariates not existient in the dictionary are held at their mean or mode.\nIn (2), a string or list is passed with the name(s) of the explanatory variable(s) to create a grid of equally spaced values. This is done by holding all other explanatory variables constant at some specified value, a reference grid, that may or may not correspond to actual observations in the dataset used to fit the model. By default, the plot_predictions function uses a grid of 200 equally spaced values between the minimum and maximum values of the specified explanatory variable as the reference grid.\nLastly, in (3), the original data used to fit the model is used to compute predictions. This is known as unit level predictions.\nUsing the data, from scenario 1, 2, or 3, the plot_predictions function uses the fitted model to then compute the predictions. The plot_predictions function then uses these predictions to plot the model parameter as a function of (some) explanatory variable.\n\nimport matplotlib.pyplot as plt\nimport numpy as np\nimport pandas as pd\nimport warnings\n\nimport bambi as bmb\n\nwarnings.simplefilter(action=\"ignore\", category=FutureWarning)\n\n\n\n\n\nFor the first demonstration, we will use a Gaussian linear regression model with the mtcars dataset to better understand the plot_predictions function and its arguments. The mtcars dataset was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models). The following is a brief description of the variables in the dataset:\n\nmpg: Miles/(US) gallon\ncyl: Number of cylinders\ndisp: Displacement (cu.in.)\nhp: Gross horsepower\ndrat: Rear axle ratio\nwt: Weight (1000 lbs)\nqsec: 1/4 mile time\nvs: Engine (0 = V-shaped, 1 = straight)\nam: Transmission (0 = automatic, 1 = manual)\ngear: Number of forward gear\n\n\n# Load data\ndata = bmb.load_data('mtcars')\ndata[\"cyl\"] = data[\"cyl\"].replace({4: \"low\", 6: \"medium\", 8: \"high\"})\ndata[\"gear\"] = data[\"gear\"].replace({3: \"A\", 4: \"B\", 5: \"C\"})\ndata[\"cyl\"] = pd.Categorical(data[\"cyl\"], categories=[\"low\", \"medium\", \"high\"], ordered=True)\n\n# Define and fit the Bambi model\nmodel = bmb.Model(\"mpg ~ 0 + hp * wt + cyl + gear\", data)\nidata = model.fit(draws=1000, target_accept=0.95, random_seed=1234)\n\nAuto-assigning NUTS sampler...\nInitializing NUTS using jitter+adapt_diag...\nMultiprocess sampling (4 chains in 4 jobs)\nNUTS: [sigma, hp, wt, hp:wt, cyl, gear]\n\n\n\n\n\n\n\n\n\n\n\nSampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 147 seconds.\n\n\nWe can print the Bambi model object to obtain the model components. Below, we see that the Gaussian linear model uses an identity link function that results in no transformation of the linear predictor to the mean of the outcome variable, and the distrbution of the likelihood is Gaussian.\n\n\nNow that we have fitted the model, we can visualize how a model parameter varies as a function of (some) interpolated covariate. For this example, we will visualize how the mean response mpg varies as a function of the covariate hp.\nThe Bambi model, ArviZ inference data object (containing the posterior samples and the data used to fit the model), and a list or dictionary of covariates, in this example only hp, are passed to the plot_predictions function. The plot_predictions function then computes the conditional adjusted predictions for each covariate in the list or dictionary using the method described above. The plot_predictions function returns a matplotlib figure object that can be further customized.\n\nfig, ax = plt.subplots(figsize=(7, 3), dpi=120)\nbmb.interpret.plot_predictions(model, idata, \"hp\", ax=ax);\n\nDefault computed for conditional variable: hp\nDefault computed for unspecified variable: cyl, gear, wt\n\n\n\n\n\nBefore we talk about the plot, you will notice that some messages have been logged to the console. By default interpret is verbose and logs a message to the console if a default value is computed for covariates in conditional. This is useful because unless the documentation is read, it can be difficult to tell which covariates are having default values computed for. Thus, Bambi has a config file bmb.config[\"INTERPRET_VERBOSE\"] where we can specify whether or not to log messages. By default, this is set to true. To turn off logging, set bmb.config[\"INTERPRET_VERBOSE\"] = False. From here on, we will turn off logging.\nThe plot above shows that as hp increases, the mean mpg decreases. As stated above, this insight was obtained by creating the reference grid and then using the fitted model to compute the predicted values of the model parameter, in this example mpg, at each value of the reference grid.\nBy default, plot_predictions uses the highest density interval (HDI) of the posterior distribution to compute the credible interval of the conditional adjusted predictions. The HDI is a Bayesian analog to the frequentist confidence interval. The HDI is the shortest interval that contains a specified probability of the posterior distribution. By default, plot_predictions uses the 94% HDI.\nplot_predictions uses the posterior distribution by default to visualize some mean outcome parameter . However, the posterior predictive distribution can also be plotted by specifying pps=True where pps stands for posterior predictive samples of the response variable.\n\nbmb.config[\"INTERPRET_VERBOSE\"] = False\n\n\nfig, ax = plt.subplots(figsize=(7, 3), dpi=120)\nbmb.interpret.plot_predictions(model, idata, \"hp\", pps=True, ax=ax);\n\n\n\n\nHere, we notice that the uncertainty in the conditional adjusted predictions is much larger than the uncertainty when pps=False. This is because the posterior predictive distribution accounts for the uncertainty in the model parameters and the uncertainty in the data. Whereas, the posterior distribution only accounts for the uncertainty in the model parameters.\nAdditionally, predictions can be called to obtain a summary dataframe of the data, mean predictions (estimate), and uncertainty interval. The covariate columns in this dataframe is used to create the plot.\n\nsummary_df = bmb.interpret.predictions(model, idata, \"hp\", pps=True)\nsummary_df.head(10)\n\n\n\n\n\n \n \n \n hp\n cyl\n gear\n wt\n estimate\n lower_3.0%\n upper_97.0%\n \n \n \n \n 0\n 52\n high\n A\n 3.21725\n 21.753967\n 15.879149\n 28.467278\n \n \n 1\n 57\n high\n A\n 3.21725\n 21.626621\n 15.401400\n 28.073208\n \n \n 2\n 63\n high\n A\n 3.21725\n 21.380708\n 15.193230\n 27.758124\n \n \n 3\n 69\n high\n A\n 3.21725\n 21.220707\n 15.203868\n 27.537716\n \n \n 4\n 75\n high\n A\n 3.21725\n 21.016281\n 15.097675\n 27.076450\n \n \n 5\n 80\n high\n A\n 3.21725\n 20.824757\n 15.074355\n 26.626349\n \n \n 6\n 86\n high\n A\n 3.21725\n 20.597090\n 14.910748\n 26.088777\n \n \n 7\n 92\n high\n A\n 3.21725\n 20.307775\n 14.679274\n 25.904986\n \n \n 8\n 98\n high\n A\n 3.21725\n 20.240255\n 14.765620\n 25.693768\n \n \n 9\n 103\n high\n A\n 3.21725\n 19.970688\n 14.348661\n 25.422012\n \n \n\n\n\n\nplot_predictions allows up to three covariates to be plotted simultaneously where the first element in the list represents the main (x-axis) covariate, the second element the group (hue / color), and the third element the facet (panel). However, when plotting more than one covariate, it can be useful to pass specific group and panel arguments to aid in the interpretation of the plot. Therefore, subplot_kwargs allows the user to manipulate the plotting by passing a dictionary where the keys are {\"main\": ..., \"group\": ..., \"panel\": ...} and the values are the names of the covariates to be plotted. For example, passing two covariates hp and wt and specifying subplot_kwargs={\"main\": \"hp\", \"group\": \"wt\", \"panel\": \"wt\"}.\n\nbmb.interpret.plot_predictions(\n model=model, \n idata=idata, \n conditional={\"hp\": np.linspace(50, 350, 50), \"wt\": np.linspace(1, 6, 5)},\n legend=False,\n subplot_kwargs={\"main\": \"hp\", \"group\": \"wt\", \"panel\": \"wt\"},\n fig_kwargs={\"figsize\": (20, 8), \"sharey\": True}\n)\nplt.tight_layout();\n\n\n\n\nFurthermore, categorical covariates can also be plotted. We plot the the mean mpg as a function of the two categorical covariates gear and cyl below. The plot_predictions function automatically plots the conditional adjusted predictions for each level of the categorical covariate. Furthermore, when passing a list of covariates into the plot_predictions function, the list will be converted into a dictionary object where the key is taken from (“horizontal”, “color”, “panel”) and the values are the names of the variables. By default, the first element of the list is specified as the “horizontal” covariate, the second element of the list is specified as the “color” covariate, and the third element of the list is mapped to different plot panels.\n\nfig, ax = plt.subplots(figsize=(7, 3), dpi=120)\nbmb.interpret.plot_predictions(model, idata, [\"gear\", \"cyl\"], ax=ax);\n\n\n\n\n\n\n\nIn the previous example, default values were computed to construct a reference grid to compute the conditional adjusted predictions. We can also pass our own values for the covariates into conditional using a dictionary where the key-value pairs are the covariate and value(s) of interest. For example, if we wanted to compute the predictions for hp=100, wt=[1.5, 3.5], and cyl=[\"low\", \"medium\", \"high\"] we would pass the following dictionary in the code block below. As can be seen, several data types can be passed such as: np.ndarray, list, int, float, and str.\nFurthermore, Bambi by default, maps the order of the dict keys to the main, group, and panel of the matplotlib figure. Below, since hp is the first key, this is used for the x-axis, wt for the group (color), and cyl for the panel (facet).\n\nbmb.interpret.plot_predictions(\n model,\n idata,\n conditional={\n \"hp\": [100, 120],\n \"wt\": np.array([1.5, 3.5]),\n \"cyl\": [\"low\", \"medium\", \"high\"]\n },\n fig_kwargs={\"figsize\": (10, 4), \"sharey\": True}\n);\n\n\n\n\nBefore the plot is described, lets see how the dictionary passed to conditional was used to create the dataset in order to compute predictions.\n\nsummary_df = bmb.interpret.predictions(\n model,\n idata,\n conditional={\n \"hp\": [100, 120],\n \"wt\": np.array([1.5, 3.5]),\n \"cyl\": [\"low\", \"medium\", \"high\"]\n },\n)\nsummary_df\n\n\n\n\n\n \n \n \n hp\n wt\n cyl\n gear\n estimate\n lower_3.0%\n upper_97.0%\n \n \n \n \n 0\n 100\n 1.5\n high\n A\n 26.566766\n 21.969808\n 30.943151\n \n \n 1\n 100\n 1.5\n low\n A\n 27.329344\n 23.732834\n 31.023226\n \n \n 2\n 100\n 1.5\n medium\n A\n 25.571809\n 21.451557\n 29.660434\n \n \n 3\n 100\n 3.5\n high\n A\n 19.050509\n 15.782297\n 22.263920\n \n \n 4\n 100\n 3.5\n low\n A\n 19.813088\n 16.472862\n 23.414660\n \n \n 5\n 100\n 3.5\n medium\n A\n 18.055552\n 15.632819\n 20.474585\n \n \n 6\n 120\n 1.5\n high\n A\n 25.300803\n 21.208498\n 29.449365\n \n \n 7\n 120\n 1.5\n low\n A\n 26.063382\n 21.954223\n 29.802325\n \n \n 8\n 120\n 1.5\n medium\n A\n 24.305846\n 20.088821\n 28.257285\n \n \n 9\n 120\n 3.5\n high\n A\n 18.441151\n 15.569090\n 20.982979\n \n \n 10\n 120\n 3.5\n low\n A\n 19.203729\n 15.518468\n 22.845879\n \n \n 11\n 120\n 3.5\n medium\n A\n 17.446194\n 14.967520\n 19.927605\n \n \n\n\n\n\nWhen a dictionary is passed, that informs Bambi that the user wants to compute predictions on user provided values. Thus, a pairwise grid is constructed using the dictionary values. Otherwise, a dataframe of unequal array lengths cannot be constructed. Furthermore, since gear was not passed as a key, but was a term in the model, the default value of A was computed for it.\nGiven we now know that a pairwise grid was computed usiong the conditional dict, One interpretation of the plot above is that across all cylinder groups, a larger wt results in a lower mean mpg.\n\n\n\nIn the previous example, user provided values were computed to construct a pairwise grid to compute the conditional adjusted predictions. It is also possible to compute predictions using the observed (empirical) data used to fit the model and then average over a specific or set of covariates to obtain average adjusted predictions. This is known as unit level predictions. To compute unit level predictions, do not pass any values to the conditional arg. and or specify None (the default).\n\nsummary_df = bmb.interpret.predictions(\n model,\n idata,\n conditional=None\n)\nsummary_df.head()\n\n\n\n\n\n \n \n \n cyl\n gear\n hp\n wt\n estimate\n lower_3.0%\n upper_97.0%\n \n \n \n \n 0\n medium\n B\n 110\n 2.620\n 22.233743\n 20.121823\n 24.395500\n \n \n 1\n medium\n B\n 110\n 2.875\n 21.317279\n 19.250855\n 23.284851\n \n \n 2\n low\n B\n 93\n 2.320\n 25.916714\n 24.242191\n 27.574153\n \n \n 3\n medium\n A\n 110\n 3.215\n 18.775156\n 16.444976\n 21.276683\n \n \n 4\n high\n A\n 175\n 3.440\n 16.917035\n 15.219165\n 18.653843\n \n \n\n\n\n\n\n# data used to fit the model\nmodel.data[[\"cyl\", \"gear\", \"hp\", \"wt\"]].head()\n\n\n\n\n\n \n \n \n cyl\n gear\n hp\n wt\n \n \n \n \n 0\n medium\n B\n 110\n 2.620\n \n \n 1\n medium\n B\n 110\n 2.875\n \n \n 2\n low\n B\n 93\n 2.320\n \n \n 3\n medium\n A\n 110\n 3.215\n \n \n 4\n high\n A\n 175\n 3.440\n \n \n\n\n\n\nNotice how the data in the summary dataframe and model data are the same.\n\n\nSince the empirical distrubution is used for computing predictions, the same number of rows (\\(32\\)) is returned as the data used to fit the model. To average over a covariate, use the average_by argument. If True is passed, then predictions averages over all covariates and a single estimate is returned. Else, if a single or list of covariates are passed, then predictions averages by the covariates passed.\n\nsummary_df = bmb.interpret.predictions(\n model,\n idata,\n conditional=None,\n average_by=True\n)\nsummary_df\n\n\n\n\n\n \n \n \n estimate\n lower_3.0%\n upper_97.0%\n \n \n \n \n 0\n 20.06417\n 17.895285\n 22.250253\n \n \n\n\n\n\n\n\n\nIt is still possible to plot predictions when computing unit level predictions. However, now a covariate(s) must be passed to average_by to obtain average adjusted predictions by group. In the plot below, we obtain average predictions grouped by gear and cyl.\n\nbmb.interpret.plot_predictions(\n model,\n idata,\n conditional=None,\n average_by=[\"gear\", \"cyl\"],\n fig_kwargs={\"figsize\": (7, 3)},\n);\n\n\n\n\n\n\n\n\n\nLets move onto a model that uses a distribution that is a member of the exponential distribution family and utilizes a link function. For this, we will implement the Negative binomial model from the students absences example. School administrators study the attendance behavior of high school juniors at two schools. Predictors of the number of days of absence include the type of program in which the student is enrolled and a standardized test in math. We have attendance data on 314 high school juniors. The variables of insterest in the dataset are the following:\n\ndaysabs: The number of days of absence. It is our response variable.\nprogr: The type of program. Can be one of ‘General’, ‘Academic’, or ‘Vocational’.\nmath: Score in a standardized math test.\n\n\n# Load data, define and fit Bambi model\ndata = pd.read_stata(\"https://stats.idre.ucla.edu/stat/stata/dae/nb_data.dta\")\ndata[\"prog\"] = data[\"prog\"].map({1: \"General\", 2: \"Academic\", 3: \"Vocational\"})\n\nmodel_interaction = bmb.Model(\n \"daysabs ~ 0 + prog + scale(math) + prog:scale(math)\",\n data,\n family=\"negativebinomial\"\n)\nidata_interaction = model_interaction.fit(\n draws=1000, target_accept=0.95, random_seed=1234, chains=4\n)\n\nAuto-assigning NUTS sampler...\nInitializing NUTS using jitter+adapt_diag...\nMultiprocess sampling (4 chains in 4 jobs)\nNUTS: [alpha, prog, scale(math), prog:scale(math)]\n\n\n\n\n\n\n\n\n\n\n\nSampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 15 seconds.\n\n\nThis model utilizes a log link function and a negative binomial distribution for the likelihood. Also note that this model also contains an interaction prog:sale(math).\n\nmodel_interaction\n\n Formula: daysabs ~ 0 + prog + scale(math) + prog:scale(math)\n Family: negativebinomial\n Link: mu = log\n Observations: 314\n Priors: \n target = mu\n Common-level effects\n prog ~ Normal(mu: [0. 0. 0.], sigma: [5.0102 7.4983 5.2746])\n scale(math) ~ Normal(mu: 0.0, sigma: 2.5)\n prog:scale(math) ~ Normal(mu: [0. 0.], sigma: [6.1735 4.847 ])\n \n Auxiliary parameters\n alpha ~ HalfCauchy(beta: 1.0)\n------\n* To see a plot of the priors call the .plot_priors() method.\n* To see a summary or plot of the posterior pass the object returned by .fit() to az.summary() or az.plot_trace()\n\n\n\nfig, ax = plt.subplots(figsize=(7, 3), dpi=120)\nbmb.interpret.plot_predictions(\n model_interaction, \n idata_interaction, \n \"math\", \n ax=ax, \n pps=False\n);\n\n\n\n\nThe plot above shows that as math increases, the mean daysabs decreases. However, as the model contains an interaction term, the effect of math on daysabs depends on the value of prog. Therefore, we will use plot_predictions to plot the conditional adjusted predictions for each level of prog.\n\nfig, ax = plt.subplots(figsize=(7, 3), dpi=120)\nbmb.interpret.plot_predictions(\n model_interaction, \n idata_interaction, \n [\"math\", \"prog\"], \n ax=ax, \n pps=False\n);\n\n\n\n\nPassing specific subplot_kwargs can allow for a more interpretable plot. Especially when the posterior predictive distribution plot results in overlapping credible intervals.\n\nbmb.interpret.plot_predictions(\n model_interaction, \n idata_interaction, \n conditional=[\"math\", \"prog\"],\n pps=True,\n subplot_kwargs={\"main\": \"math\", \"group\": \"prog\", \"panel\": \"prog\"},\n legend=False,\n fig_kwargs={\"figsize\": (16, 5), \"sharey\": True}\n);\n\n\n\n\n\n\n\nTo further demonstrate the plot_predictions function, we will implement a logistic regression model. This example is taken from the marginaleffects plot_predictions documentation. The internet movie database, http://imdb.com/, is a website devoted to collecting movie data supplied by studios and fans. It claims to be the biggest movie database on the web and is run by Amazon. The movies in this dataset were selected for inclusion if they had a known length and had been rated by at least one imdb user. The dataset below contains 28,819 rows and 24 columns. The variables of interest in the dataset are the following: - title. Title of the movie. - year. Year of release. - budget. Total budget (if known) in US dollars - length. Length in minutes. - rating. Average IMDB user rating. - votes. Number of IMDB users who rated this movie. - r1-10. Multiplying by ten gives percentile (to nearest 10%) of users who rated this movie a 1. - mpaa. MPAA rating. - action, animation, comedy, drama, documentary, romance, short. Binary variables represent- ing if movie was classified as belonging to that genre.\n\ndata = pd.read_csv(\"https://vincentarelbundock.github.io/Rdatasets/csv/ggplot2movies/movies.csv\")\n\ndata[\"style\"] = \"Other\"\ndata.loc[data[\"Action\"] == 1, \"style\"] = \"Action\"\ndata.loc[data[\"Comedy\"] == 1, \"style\"] = \"Comedy\"\ndata.loc[data[\"Drama\"] == 1, \"style\"] = \"Drama\"\ndata[\"certified_fresh\"] = (data[\"rating\"] >= 8) * 1\ndata = data[data[\"length\"] < 240]\n\npriors = {\"style\": bmb.Prior(\"Normal\", mu=0, sigma=2)}\nmodel = bmb.Model(\"certified_fresh ~ 0 + length * style\", data=data, priors=priors, family=\"bernoulli\")\nidata = model.fit(random_seed=1234, target_accept=0.9, init=\"adapt_diag\")\n\nModeling the probability that certified_fresh==1\nAuto-assigning NUTS sampler...\nInitializing NUTS using adapt_diag...\nMultiprocess sampling (4 chains in 4 jobs)\nNUTS: [length, style, length:style]\n\n\n\n\n\n\n\n\n\n\n\nSampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 1686 seconds.\n\n\nThe logistic regression model uses a logit link function and a Bernoulli likelihood. Therefore, the link scale is the log-odds of a successful response and the response scale is the probability of a successful response.\n\nmodel\n\n Formula: certified_fresh ~ 0 + length * style\n Family: bernoulli\n Link: p = logit\n Observations: 58662\n Priors: \n target = p\n Common-level effects\n length ~ Normal(mu: 0.0, sigma: 0.0283)\n style ~ Normal(mu: 0.0, sigma: 2.0)\n length:style ~ Normal(mu: [0. 0. 0.], sigma: [0.0263 0.0263 0.0263])\n------\n* To see a plot of the priors call the .plot_priors() method.\n* To see a summary or plot of the posterior pass the object returned by .fit() to az.summary() or az.plot_trace()\n\n\nAgain, by default, the plot_predictions function plots the mean outcome on the response scale. Therefore, the plot below shows the probability of a successful response certified_fresh as a function of length.\n\nfig, ax = plt.subplots(figsize=(7, 3), dpi=120)\nbmb.interpret.plot_predictions(model, idata, \"length\", ax=ax);\n\n\n\n\nAdditionally, we can see how the probability of certified_fresh varies as a function of categorical covariates.\n\nfig, ax = plt.subplots(figsize=(7, 3), dpi=120)\nbmb.interpret.plot_predictions(model, idata, \"style\", ax=ax);\n\n\n\n\n\n\n\nplot_predictions also has the argument target where target determines what parameter of the response distribution is plotted as a function of the explanatory variables. This argument is useful in distributional models, i.e., when the response distribution contains a parameter for location, scale and or shape. The default of this argument is mean and passing a parameter into target only works when the argument pps=False because when pps=True the posterior predictive distribution is plotted and thus, can only refer to the outcome variable (instead of any of the parameters of the response distribution). For this example, we will simulate our own dataset.\n\nrng = np.random.default_rng(121195)\nN = 200\na, b = 0.5, 1.1\nx = rng.uniform(-1.5, 1.5, N)\nshape = np.exp(0.3 + x * 0.5 + rng.normal(scale=0.1, size=N))\ny = rng.gamma(shape, np.exp(a + b * x) / shape, N)\ndata_gamma = pd.DataFrame({\"x\": x, \"y\": y})\n\nformula = bmb.Formula(\"y ~ x\", \"alpha ~ x\")\nmodel = bmb.Model(formula, data_gamma, family=\"gamma\")\nidata = model.fit(random_seed=1234)\n\nAuto-assigning NUTS sampler...\nInitializing NUTS using jitter+adapt_diag...\nMultiprocess sampling (4 chains in 4 jobs)\nNUTS: [Intercept, x, alpha_Intercept, alpha_x]\n\n\n\n\n\n\n\n\n\n\n\nSampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 18 seconds.\nThere were 8 divergences after tuning. Increase `target_accept` or reparameterize.\n\n\n\nmodel\n\n Formula: y ~ x\n alpha ~ x\n Family: gamma\n Link: mu = inverse\n alpha = log\n Observations: 200\n Priors: \n target = mu\n Common-level effects\n Intercept ~ Normal(mu: 0.0, sigma: 2.5037)\n x ~ Normal(mu: 0.0, sigma: 2.8025)\n target = alpha\n Common-level effects\n alpha_Intercept ~ Normal(mu: 0.0, sigma: 1.0)\n alpha_x ~ Normal(mu: 0.0, sigma: 1.0)\n------\n* To see a plot of the priors call the .plot_priors() method.\n* To see a summary or plot of the posterior pass the object returned by .fit() to az.summary() or az.plot_trace()\n\n\nThe model we defined uses a gamma distribution parameterized by alpha and mu where alpha utilizes a log link and mu goes through an inverse link. Therefore, we can plot either: (1) the mu of the response distribution (which is the default), or (2) alpha of the response distribution as a function of the explanatory variable \\(x\\).\n\n# First, the mean of the response (default)\nfig, ax = plt.subplots(figsize=(7, 3), dpi=120)\nbmb.interpret.plot_predictions(model, idata, \"x\", ax=ax);\n\n\n\n\nBelow, instead of plotting the default target, target=mean, we set target=alpha to visualize how the model parameter alpha varies as a function of the x predictor.\n\n# Second, another param. of the distribution: alpha\nfig, ax = plt.subplots(figsize=(7, 3), dpi=120)\nbmb.interpret.plot_predictions(model, idata, \"x\", target=\"alpha\", ax=ax);\n\n\n\n\n\n%load_ext watermark\n%watermark -n -u -v -iv -w\n\nLast updated: Fri Aug 16 2024\n\nPython implementation: CPython\nPython version : 3.11.9\nIPython version : 8.24.0\n\nmatplotlib: 3.8.4\nbambi : 0.14.1.dev12+g64e57423.d20240730\nnumpy : 1.26.4\npandas : 2.2.2\n\nWatermark: 2.4.3" }, { "objectID": "notebooks/plot_slopes.html",