-
Notifications
You must be signed in to change notification settings - Fork 0
/
lesson2-tasks.Rmd
70 lines (44 loc) · 3.82 KB
/
lesson2-tasks.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
---
title: "Lesson 2 - Tasks"
output: html_document
---
## Task 1: Prior predictive check
- Let us take a vague $N(0, 10)$ prior on the logarithm of the mean of a Poisson distribution. Run a prior predictive check, is there anything suspicious?
- Find a prior on the mean of Poisson distribution (or its logarithm - you pick) that matches this rough prior
knowledge:
- Zeroes are quite unlikely
- Most values should be between 2 and 15
- Values above 18 are quite unlikely
## Task 2: Fitting a Poisson
- Build a model that takes an array of integers
- Accept arguments for prior for the logarithm of the mean as data
- Simulate 20 values from a Poisson distribution with known mean. Do you recover the values?
## Task 3: Detecting overdispersion with a posterior predictive check
- Simulate 30 values with neg. binomial with $\phi = \frac{1}{2}$ (`size` in R's `rnbinom`), fit with the model from Task 2. Do you recover the simulated mean of the neg. binomial?
- Try to detect the model-data mismatch with a posterior predictive check.
R:
- Use `fit$draws(format = "draws_matrix")` to get a matrix of draws, then use `rpois`` to generate the predictions as the Stan model assumes
- use either `bayesplot::ppc_stat` with a stat measuring the dispersion (e.g. variance, sd, ...) or `bayesplot::ppc_dens_overlay`
Python: use either `arviz.plot_bpv()` with `kind="t_stat"` and a stat measuring the dispersion (e.g. variance, sd, ...) or `arviz.plot_ppc`
- What is the smallest number of observations we need to reliably detect the problem?
- Implement a negative binomial model and see how the check behaves now.
- Use [`neg_binomial_2_lpmf`](https://mc-stan.org/docs/functions-reference/unbounded_discrete_distributions.html#nbalt))
- Put an `exponential(1)` prior on the overdispersion parameter.
## Task 4: A bit more open-ended exploration
Given the following data:
```
y : 2, 0, 11, 25, 9, 4, 17, 11, 8, 4, 5, 2, 6, 4, 8, 24, 0, 3, 6, 4, 12, 9, 5, 6, 2, 10, 4, 15, 0, 1, 87, 19, 2, 1, 38, 16, 5, 7, 18, 11, 1, 0, 7, 15, 5, 0, 6, 1, 0, 6, 34, 29, 7, 11, 5, 10, 8, 3, 12, 18, 7, 1, 18, 18, 6, 3, 37, 5, 1, 0, 22, 13, 1, 0, 26, 19, 6, 7, 21, 45
group : "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B", "A", "B"
type : "X", "X", "Y", "Y", "X", "X", "Y", "Y", "X", "X", "Y", "Y", "X", "X", "Y", "Y", "X", "X", "Y", "Y", "X", "X", "Y", "Y", "X", "X", "Y", "Y", "X", "X", "Y", "Y", "X", "X", "Y", "Y", "X", "X", "Y", "Y", "X", "X", "Y", "Y", "X", "X", "Y", "Y", "X", "X", "Y", "Y", "X", "X", "Y", "Y", "X", "X", "Y", "Y", "X", "X", "Y", "Y", "X", "X", "Y", "Y", "X", "X", "Y", "Y", "X", "X", "Y", "Y", "X", "X", "Y", "Y"
```
Try to fit the neg. binomial model from task 3 with a shared mean and overdispersion parameter to the `y` column of the data.
The model does not represent the data well. Try to figure out what it is and fix it!
Hint: many PPCs can be performed per group, in R this is the `ppc_xxx_grouped` functions.
## Task 5: Open-ended exploration 2
Given the following data:
```
y : 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1
group : "B", "C", "D", "B", "C", "B", "D", "C", "A", "D", "C", "C", "C", "D", "C", "C", "B", "B", "D", "C", "D", "C", "C", "A", "B", "D", "C", "D", "D", "D", "C", "D", "A", "A", "C", "D", "D", "C", "C", "B", "B", "C", "D", "D", "A", "D", "A", "B", "C", "A"
```
Once again use the model from Task 3. Use posterior predictive check to determine what is wrong.
Can you fix it?