Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roadmap for Hierarchical Models #21

Open
1 of 9 tasks
storopoli opened this issue Jan 2, 2022 · 5 comments
Open
1 of 9 tasks

Roadmap for Hierarchical Models #21

storopoli opened this issue Jan 2, 2022 · 5 comments
Labels
enhancement New feature or request

Comments

@storopoli
Copy link
Member

storopoli commented Jan 2, 2022

  • Group-level effects (Hierarchical Models):
    • Varying-Slope: (x_1 | group)
    • Varying-Intercept-Slope: (1 + x_1 | group)
    • Correlated Group-level effects:
  • Case Studies showcasing:
    • random effects: random-slope, random-intercept-slope
@storopoli storopoli added the enhancement New feature or request label Jan 2, 2022
@storopoli storopoli added this to the 0.2.0 milestone Jan 2, 2022
@storopoli storopoli self-assigned this Jan 2, 2022
@storopoli storopoli changed the title Roadmap 0.2.0 Roadmap 2.0.0 Feb 11, 2022
@storopoli storopoli removed their assignment Feb 11, 2022
@storopoli storopoli changed the title Roadmap 2.0.0 Roadmap Mar 19, 2022
@storopoli storopoli removed this from the 0.2.0 milestone Mar 19, 2022
@storopoli storopoli changed the title Roadmap Roadmap for Hierarchical Models Sep 5, 2022
@jfhawkin
Copy link

jfhawkin commented Apr 5, 2023

When trying to improve a Turing hierarchical intercept logistic model by reviewing turing_model.jl, I noticed that function _model(μ_X, σ_X, prior, intercept_ranef, idx, ::Type{Bernoulli})
includes a normalization on the dependent variable, which here is 0/1. It gives me an error because mad(y) in my case is 0, which messes with the hyperparameter $\tau$ for SD. I thought I'd bring it to the developers' attention.

mad_y=mad(y; normalize=true) (ln 266)

@storopoli
Copy link
Member Author

Thanks for pointing this out!

For brms, the default prior for group-level parameters standard deviation is:

"restricted to be non-negative and, by default, have a half student-t prior with 3 degrees of freedom and a scale parameter that depends on the standard deviation of the response after applying the link function. Minimally, the scale parameter is 2.5"

where as for rstanarm it is much more complicated with LKJs and product of simplexes.

I don't think we have a problem with how the prior currently is defined (based on MAD). This is the formula for MAD:

$$\operatorname{MAD} = \operatorname{median} (|X_{i}-{\tilde {X}}|)$$

Now if your $y$ has MAD 0, then it has no variability.

@jfhawkin
Copy link

jfhawkin commented Apr 5, 2023

Thanks! I must be having a different issue then. I'm porting an rstanarm example for multi-level regression with poststratification into Turing (posted to Discourse here.

I run

fm = @formula(abortion ~ (1 | state) + male)
model = turing_model(fm, cces_df; model=Bernoulli);
chn = sample(model, NUTS(), 2_000);

and get the error DomainError with 0.0: AffineDistribution: the condition !(iszero(σ)) is not satisfied.

After getting this error, I had looked at the TuringGLM code and saw the mad(y) call. I calculated it for my dataset using mad from StatsBase. It gives me 0 to 5 decimal places. I also calculated it manually to confirm and get the same result. The data has variation because about 42% of observations are 1. Given MAD is the median of the absolute differences from the median, this would be the case for many datasets with 0/1 binary data. I ran a very simple experiment. If your data are [0,0,0,1] the median is 0 and the median variation is 0. If your data are [0,1,1,1] the same thing is true because the variations are 1-1=0. If you have [0,0,1,1], then you get a MAD = 0.5 (i.e, non-zero). I think this is true unless you have a 50/50 split (did a quick test in Excel repeatedly calculating it on vectors of 0/1 and never got a non-zero value). The mean varies but the median difference is a function of the fact that differences will always be 0 or 1 and you are taking the middle value.

@storopoli
Copy link
Member Author

Oh I see, it might be more interesting to use the standard deviation std instead of MAD?

@jfhawkin
Copy link

jfhawkin commented Apr 6, 2023

Makes sense to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants