-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rescaling causes huge differences in fixed effect coefficients #638
Comments
Adding @yuexiao-marketingattribution to this thread |
Thanks for including the example and the data. We'll take a look. |
I wrote up some comments on the different fits as a Quarto document. It can be transformed into a Jupyter notebook ( |
Hi, Dr. Bates,
Thanks a lot for looking into this. Am I right to say that the wacky fixed
coefficient seems to be related to the initial parameters? And scaling
could make it very small in some cases?
Would we be able to get around this in the model estimation algorithm?
Thanks,
Yue
…On Fri, Sep 23, 2022 at 10:41 AM Douglas Bates ***@***.***> wrote:
I wrote up some comments on the different fits as a Quarto
<https://quarto.org/> document. It can be transformed into a Jupyter
notebook (.ipynb) or as a PDF document or a HTML document. Which format
would you prefer, @liamlundy <https://github.com/liamlundy> and
@yuexiao-marketingattribution
<https://github.com/yuexiao-marketingattribution>? I enclose the PDF but
that doesn't work particularly well with some of the tables.
issue638.pdf
<https://github.com/JuliaStats/MixedModels.jl/files/9635099/issue638.pdf>
—
Reply to this email directly, view it on GitHub
<#638 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHERARHLWCGQRHIY4ERTHIDV7XFRJANCNFSM6AAAAAAQQHF2GQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
The peculiar fixed-effects coefficients, and the estimates of the random effects standard deviations and the extreme negative value of the objective, are all related to the fact that the response is mean centered and that the weights are over such a wide range (6 or 7 orders of magnitude). It may be possible to use the estimates from the model without random effects for the intercept to create starting estimates for the model you wish to fit. But that still doesn't get around the fact that mean-centering the response leaves you chasing rounding error in the random effects for the intercept. Is there a reason not to just stay with the model that has random effects for the covariates only? |
I tried manually setting the initial values for the parameters in the model you are trying to fit and still got the message about rescaling and convergence to nonsensical estimates. julia> m01a = let
f = @formula(volume ~ 1 + v2 + v1 + zerocorr(1+v2|pl5) + zerocorr(1+v1|pl3))
m = LinearMixedModel(f, dat; contrasts, wts=dat.wts)
copyto!(m.optsum.initial, [0.0001, 0.65, 0.0001, 0.59])
fit!(m; thin=1)
end
[ Info: Initial objective evaluation failed, rescaling initial guess and trying again.
┌ Warning: Failure of the initial evaluation is often indicative of a model specification
│ that is not well supported by the data and/or a poorly scaled model.
└ @ MixedModels ~/.julia/packages/MixedModels/1FpDN/src/linearmixedmodel.jl:472
Minimizing 114 Time: 0:00:00 ( 0.95 ms/it)
objective: -347053.39586158074
Linear mixed model fit by maximum likelihood
volume ~ 1 + v2 + v1 + MixedModels.ZeroCorr((1 + v2 | pl5)) + MixedModels.ZeroCorr((1 + v1 | pl3))
logLik -2 logLik AIC AICc BIC
214818.8698 -429637.7397 -429621.7397 -429621.7386 -429543.3044
Variance components:
Column Variance Std.Dev. Corr.
pl5 (Intercept) 0.00000000 0.00000001
v2 0.00174499 0.04177307 .
pl3 (Intercept) 0.00000000 0.00001041
v1 0.00071134 0.02667089 .
Residual 0.01985613 0.14091177
Number of obs: 133841; levels of grouping factors: 467, 79
Fixed-effects parameters:
──────────────────────────────────────────────────────────
Coef. Std. Error z Pr(>|z|)
──────────────────────────────────────────────────────────
(Intercept) 0.0139727 5.75968e-5 242.59 <1e-99
v2 1142.51 0.191435 5968.11 <1e-99
v1 -1900.27 0.317759 -5980.22 <1e-99
──────────────────────────────────────────────────────────
julia> m01a.optsum.fitlog
113-element Vector{Tuple{Vector{Float64}, Float64}}:
([5.755952196370287e-9, 3.741368927640687e-5, 5.755952196370287e-9, 3.396011795858469e-5], 326704.2440061639)
([1.0072916343648002e-8, 3.741368927640687e-5, 5.755952196370287e-9, 3.396011795858469e-5], 326704.24400616286)
([5.755952196370287e-9, 6.547395623371202e-5, 5.755952196370287e-9, 3.396011795858469e-5], 326704.23154658405)
([5.755952196370287e-9, 3.741368927640687e-5, 1.0072916343648002e-8, 3.396011795858469e-5], 326704.24400616414)
([5.755952196370287e-9, 3.741368927640687e-5, 5.755952196370287e-9, 5.9430206427523206e-5], 326704.2346277882)
([1.4389880490925721e-9, 3.741368927640687e-5, 5.755952196370287e-9, 3.396011795858469e-5], 326704.2440061644)
([5.755952196370287e-9, 9.35342231910172e-6, 5.755952196370287e-9, 3.396011795858469e-5], 326704.2496696343)
([5.755952196370287e-9, 3.741368927640687e-5, 1.4389880490925721e-9, 3.396011795858469e-5], 326704.2440061638)
([5.755952196370287e-9, 3.741368927640687e-5, 5.755952196370287e-9, 8.490029489646172e-6], 326704.248269125)
⋮
([8.259573820488997e-8, 0.29644805996849344, 7.387684360910232e-5, 0.18927318856618783], -136810.10648119918)
([8.248653458552367e-8, 0.29644688624629945, 7.387678952481503e-5, 0.18927412081947173], -271214.24989013287)
([8.248525785966926e-8, 0.29644836729405705, 7.387687418856134e-5, 0.1892736573373612], -392612.4826040373)
([8.240938253121144e-8, 0.2964487496311698, 7.387684624338898e-5, 0.18927366584797933], 326704.2440061639)
([8.243358028139956e-8, 0.2964482077859099, 7.387689565407254e-5, 0.18927319028752818], -149085.5704245197)
([8.252860617632192e-8, 0.2964484292586508, 7.387696379877048e-5, 0.18927365594744391], -429637.73967741965)
([8.258728390571775e-8, 0.2964487421416846, 7.387696715184453e-5, 0.18927403961173647], 326704.2440061639)
([8.248869647385056e-8, 0.2964483572018691, 7.387703325836202e-5, 0.18927330891463165], -192835.30979473024) @palday may have some ideas of how to handle this. |
Yes we are planning on removing the random effects for the intercepts. But
we have to have the weights in the model as they do matter. The reason we
are including the intercepts in this test is that we used to have that in
the model spec, using the older version of the package. And we want to get
a sense of how big the difference would be to move to the new version.
This example we gave you has the minimal variables that would show the
problem. Liam will strip the random intercepts and see if we can come up
with a simple case that gives us similar errors.
A potentially related info. The scaling part that seems to be causing the
problem was recently added by @palday <https://github.com/palday> to help
us pass some models that otherwise would fail.
Thanks,
Yue
…On Fri, Sep 23, 2022 at 12:56 PM Douglas Bates ***@***.***> wrote:
The peculiar fixed-effects coefficients, and the estimates of the random
effects standard deviations and the extreme negative value of the
objective, are all related to the fact that the response is mean centered
and that the weights are over such a wide range (6 or 7 orders of
magnitude). It may be possible to use the estimates from the model without
random effects for the intercept to create starting estimates for the model
you wish to fit. But that still doesn't get around the fact that
mean-centering the response leaves you chasing rounding error in the random
effects for the intercept.
Is there a reason not to just stay with the model that has random effects
for the covariates only?
—
Reply to this email directly, view it on GitHub
<#638 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHERARBBMRLTVU5XZI3JCJ3V7XVMLANCNFSM6AAAAAAQQHF2GQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I have a few ideas about how to nudge the optimizer here, but probably won't have the time to test them / write them up for a few days. As @dmbates mentioned, the huge span of the weights is going to create problems no matter what, but we might able to provide better tools for users to diagnose such problems. |
Thanks for looking into this. We understand the data is not ideal. But
this is the nature of some of our data unfortunately. We appreciate your
attention to this matter.
…On Sun, Sep 25, 2022 at 12:49 PM Phillip Alday ***@***.***> wrote:
I have a few ideas about how to nudge the optimizer here, but probably
won't have the time to test them / write them up for a few days. As
@dmbates <https://github.com/dmbates> mentioned, the huge span of the
weights is going to create problems no matter what, but we might able to
provide better tools for users to diagnose such problems.
—
Reply to this email directly, view it on GitHub
<#638 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHERARB2VOG6R7HC5UY664DWACGA3ANCNFSM6AAAAAAQQHF2GQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
DescriptionThank you both for looking into this and for the helpful suggestions regarding best code practice in MixedModels. I reworked the test script a little bit to hopefully better encapsulate our issues and provide some context for what we are hoping to accomplish. As Yue mentioned, we are aware that our models do not always have the best data support, but still need to fit them as best we can while bearing in mind the limitations of the data. Right now, we are working on upgrading our very old versions of julia and MixedModels and noticed that we are getting different model results in some cases. We needed to either get the results to match or at least understand / explain the differences. This was the initial impetus to test some of these cases. We are primarily testing julia (v1.6.0) / MixedModels (v4.7.1) against julia (v1.1.0) / MixedModels (v1.1.6). We are were previously using intercepts per Script / Datausing MixedModels, DataFrames, CSV, CategoricalArrays
model_form = @formula(
y ~
v1 +
v2 +
v3 +
v4 +
v5 +
(1 | pl3) +
((0 + v1) | pl3) +
(1 | pl5) +
((0 + v2) | pl5) +
((0 + v3) | pl5) +
((0 + v4) | pl5) +
((0 + v5) | pl5)
)
data = CSV.read("data.csv", DataFrame)
weight_vector = convert(Vector{Float64}, data[:, :w]);
contrasts = Dict{Symbol,Any}(:pl3 => Grouping(), :pl5 => Grouping());
m03 = let
fit(MixedModel, model_form, data; contrasts, wts=weight_vector, thin=1)
end
print(m03) Comparison with old julia / MMI ran the same data / formula against MM 1.1 / julia 1.1. I wanted to compare this model against
Because of that I tried to compare the new and old versions with intercepts just to have a consistent
New julia / MMWhen I ran against the new versions, I hit the rescaling both times and ended up
I'm not sure if this i helpful or not but I wanted to at least give a better picture of what we are trying to accomplish and why we are confused about the coefficients that we are getting with the latest models. As always, we really appreciate your help. Please let me or Yue know if there is anything we can do to help debug etc. |
@liamlundy The method error you ran into is the result of a bugfix in Julia's LinearAlgebra standard library that led to a return type changing in Julia 1.6 compared to older Julia versions (cf #431). Once 1.6 became LTS, we dropped the duplicated methods necessary to support pre 1.6 behavior. @dmbates I first thought that some of the numerical issues might be arising from the Xy storage we introduced in 4.0, but I can't fit these models on Julia 1.5 and MixedModels 2.4. The 2.x series is about my limit for time travel, partly because it going back to 1.0 would require me to deal with Julia from the 0.6-1.0 transition period, which just a little bit too painful. I was able to fit this model in lme4: # Linear mixed model fit by maximum likelihood ['lmerMod']
# Formula: y ~ v1 + v2 + v3 + v4 + v5 + (1 | pl3) + ((0 + v1) | pl3) + (1 |
# pl5) + ((0 + v2) | pl5) + ((0 + v3) | pl5) + ((0 + v4) |
# pl5) + ((0 + v5) | pl5)
# Data: data
# Weights: data$w
# AIC BIC logLik deviance df.resid
# 221640.9 221778.1 -110806.4 221612.9 133827
# Scaled residuals:
# Min 1Q Median 3Q Max
# -13.8622 -0.4886 -0.1377 0.1888 27.0129
# Random effects:
# Groups Name Variance Std.Dev.
# pl5 v5 1.615e-01 0.401829
# pl5.1 v4 2.353e-01 0.485084
# pl5.2 v3 4.693e-02 0.216635
# pl5.3 v2 2.331e+00 1.526889
# pl5.4 (Intercept) 1.651e-05 0.004064
# pl3 v1 2.206e+00 1.485228
# pl3.1 (Intercept) 0.000e+00 0.000000
# Residual 2.545e+00 1.595419
# Number of obs: 133841, groups: pl5, 467; pl3, 79
# Fixed effects:
# Estimate Std. Error t value
# (Intercept) -0.0007564 0.0008610 -0.878
# v1 -1.5349996 0.1815460 -8.455
# v2 -1.2912605 0.0923754 -13.978
# v3 0.2111613 0.0161330 13.089
# v4 0.9269805 0.0664061 13.959
# v5 0.4399864 0.0391905 11.227 This gives me similar results to non-mixed/OLS regression: # y ~ 1 + v1 + v2 + v3 + v4 + v5
# Coefficients:
# ─────────────────────────────────────────────────────────────────────────────────────
# Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
# ─────────────────────────────────────────────────────────────────────────────────────
# (Intercept) -0.000575762 0.000112393 -5.12 <1e-06 -0.000796048 -0.000355476
# v1 -0.934877 0.00206077 -453.65 <1e-99 -0.938916 -0.930838
# v2 -1.81368 0.00188045 -964.49 <1e-99 -1.81736 -1.80999
# v3 0.160488 0.000510854 314.16 <1e-99 0.159487 0.16149
# v4 1.5533 0.00112932 1375.43 <1e-99 1.55108 1.55551
# v5 1.16306 0.000691772 1681.28 <1e-99 1.16171 1.16442
# ───────────────────────────────────────────────────────────────────────────────────── Notably, both the MM 1.x fits provided by the OP and the lme4 fits shrink the random intercept for Extracting the parameter vector from lme4 and using it in MixedModels requires permuting a few things (and I'm not sure JellyMe4 would handle this correctly because of the zeroed correlations): lme4:
Julia:
Taking this permutation into account, I tried inserting the parameter vector from lme4 into a MixedModels fit (directly updating a model object, not using JellyMe4) and I also got a positive definite error. There's apparently something different in the way the numerical error accumulates there. I'm vaguely curious about the BLAS differences leading to this, but that's a much deeper rabbit hole than I have time for. It might also be related to the sorting of the random effects and how error accumulates and propagates. I also tried out a few different optimizers to see if we exploring the parameter space in a different way would solve the problem, but it didn't seem to help. |
Any updates on this? My understanding is that lme4 gives us the same results as the older version of MixedModels and we should align the two. |
I want to reiterate and expand upon a few points made by @dmbates.
Let's take a look at how these things culminate in (5). First some setup: using CSV
using DataFrames
using Downloads
using MixedModels
const CSV_URL = "https://github.com/JuliaStats/MixedModels.jl/files/9649005/data.csv"
data = CSV.read(Downloads.download(CSV_URL), DataFrame)
wts = data[!, :w]
contrasts = Dict(:pl3 => Grouping(), :pl5 => Grouping()); We can force correlations to zero in Julia with julia> m_zc_pl3 = let f = @formula(y ~ v1 + v2 + v3 + v4 + v5 +
zerocorr(1 + v1 | pl3) +
(1 + v2 + v3 + v4 + v5 | pl5))
fit(MixedModel, f, data; wts, contrasts)
end
[ Info: Initial objective evaluation failed, rescaling initial guess and trying again.
┌ Warning: Failure of the initial evaluation is often indicative of a model specification
│ that is not well supported by the data and/or a poorly scaled model.
└ @ MixedModels ~/.julia/packages/MixedModels/jPOIo/src/linearmixedmodel.jl:481
Minimizing 685 Time: 0:00:01 ( 1.58 ms/it)
objective: 225921.55359220033
Linear mixed model fit by maximum likelihood
y ~ 1 + v1 + v2 + v3 + v4 + v5 + MixedModels.ZeroCorr((1 + v1 | pl3)) + (1 + v2 + v3 + v4 + v5 | pl5)
logLik -2 logLik AIC AICc BIC
-112960.7761 225921.5522 225969.5522 225969.5612 226204.8580
Variance components:
Column Variance Std.Dev. Corr.
pl5 (Intercept) 0.00000001 0.00008912
v2 2.25735611 1.50245003 -1.00
v3 0.04660433 0.21588037 +0.17 -0.17
v4 0.24307148 0.49302280 +0.32 -0.32 +0.50
v5 0.17010131 0.41243341 +0.33 -0.33 +0.23 +0.84
pl3 (Intercept) 0.00000001 0.00012036
v1 0.00000000 0.00001888 .
Residual 2.63645886 1.62371760
Number of obs: 133841; levels of grouping factors: 467, 79
Fixed-effects parameters:
────────────────────────────────────────────────────────
Coef. Std. Error z Pr(>|z|)
────────────────────────────────────────────────────────
(Intercept) -0.000738517 0.000663529 -1.11 0.2657
v1 -0.849216 0.0124344 -68.30 <1e-99
v2 -1.20574 0.0910417 -13.24 <1e-39
v3 0.208185 0.0160558 12.97 <1e-37
v4 0.615806 0.0524811 11.73 <1e-31
v5 0.397616 0.038537 10.32 <1e-24
──────────────────────────────────────────────────────── We can also consider a model without the random intercept under julia> m_no_int_pl3 = let f = @formula(y ~ v1 + v2 + v3 + v4 + v5 +
(0 + v1 | pl3) +
(1 + v2 + v3 + v4 + v5 | pl5))
fit(MixedModel, f, data; wts, contrasts)
end
[ Info: Initial objective evaluation failed, rescaling initial guess and trying again.
┌ Warning: Failure of the initial evaluation is often indicative of a model specification
│ that is not well supported by the data and/or a poorly scaled model.
└ @ MixedModels ~/.julia/packages/MixedModels/jPOIo/src/linearmixedmodel.jl:481
Minimizing 1047 Time: 0 Time: 0:00:01 ( 1.39 ms/it)
objective: 225921.55369228724
Linear mixed model fit by maximum likelihood
y ~ 1 + v1 + v2 + v3 + v4 + v5 + (0 + v1 | pl3) + (1 + v2 + v3 + v4 + v5 | pl5)
logLik -2 logLik AIC AICc BIC
-112960.7658 225921.5316 225967.5316 225967.5398 226193.0329
Variance components:
Column Variance Std.Dev. Corr.
pl5 (Intercept) 0.00000001 0.00009248
v2 2.25715366 1.50238266 -1.00
v3 0.04662557 0.21592954 +0.18 -0.17
v4 0.24327977 0.49323399 +0.32 -0.32 +0.50
v5 0.17011576 0.41245092 +0.33 -0.33 +0.23 +0.84
pl3 v1 0.00000000 0.00003390
Residual 2.63645836 1.62371745
Number of obs: 133841; levels of grouping factors: 467, 79
Fixed-effects parameters:
───────────────────────────────────────────────────────
Coef. Std. Error z Pr(>|z|)
───────────────────────────────────────────────────────
(Intercept) -0.00073551 0.000662218 -1.11 0.2667
v1 -0.849216 0.0124344 -68.30 <1e-99
v2 -1.20574 0.0910375 -13.24 <1e-39
v3 0.20818 0.0160583 12.96 <1e-37
v4 0.615521 0.0524845 11.73 <1e-31
v5 0.39759 0.0385371 10.32 <1e-24
─────────────────────────────────────────────────────── Technically, these are nested models, so we can compute a likelihood ratio test, but the rounding error and boundary effects also show up here: julia> MixedModels.likelihoodratiotest(m_zc_pl3, m_no_int_pl3)
Model Formulae
1: y ~ 1 + v1 + v2 + v3 + v4 + v5 + (0 + v1 | pl3) + (1 + v2 + v3 + v4 + v5 | pl5)
2: y ~ 1 + v1 + v2 + v3 + v4 + v5 + MixedModels.ZeroCorr((1 + v1 | pl3)) + (1 + v2 + v3 + v4 + v5 | pl5)
────────────────────────────────────────────────────
model-dof -2 logLik χ² χ²-dof P(>χ²)
────────────────────────────────────────────────────
[1] 23 225921.5316
[2] 24 225921.5522 -0.0206 1 NaN
──────────────────────────────────────────────────── The more complex model is supposedly a worse fit, so we can a Interestingly, I was unable to fit these models with All of this is variations on a common theme: a lot of things that are well defined in theory are not well defined in numerical practice and part of data analysis is learning the limits of computation. Now, you may be asking why lme4 and earlier version of MixedModels did not run into these problems. I have a few speculations:
As I was writing this up, (3) occurred to me and I decided to check out what happens if we force MixedModels.jl to do things like lme4 does. This isn't exactly trivial because MixedModels.jl really wants to combine random effects with the same grouping variable into a single term with an appropriately zeroed correlation matrix. But we can do this by creating new grouping variables that are actually just copies of the original: select!(data, :,
:pl3 => :pl3a,
:pl3 => :pl3b,
:pl5 => :pl5a,
:pl5 => :pl5b,
:pl5 => :pl5c,
:pl5 => :pl5d,
:pl5 => :pl5e) We also need to add in the relevant contrasts: contrasts = merge(contrasts, Dict(:pl3a => Grouping(),
:pl3b => Grouping(),
:pl5a => Grouping(),
:pl5b => Grouping(),
:pl5c => Grouping(),
:pl5d => Grouping(),
:pl5e => Grouping())) Finally, we need to define a few extra matrix multiplication methods. It turns out that we've forced the internal representation to be a little weird and so we don't have some of the relevant specializations: using LinearAlgebra
MixedModels.rmulΛ!(A::Diagonal{T}, B::ReMat{T,1}) where {T} = rmul!(A, only(B.λ))
function MixedModels.rankUpdate!(C::Hermitian{T, Diagonal{T, Vector{T}}}, A::Diagonal{T, Vector{T}}, α, β) where {T}
size(C) == size(A) || throw(DimensionMismatch("Diagonal matrices unequal size"))
C.data.diag .*= β
C.data.diag .+= α .* abs2.(A.diag)
return C
end Now we fit the model: julia> m_form_split = let f = @formula(y ~ v1 + v2 + v3 + v4 + v5 +
(1 | pl3a) + ((0 + v1) | pl3b) +
(1 | pl5a) + ((0 + v2) | pl5b) +
((0 + v3) | pl5c) + ((0 + v4) | pl5d) +
((0 + v5) | pl5e))
fit(MixedModel, f, data; wts, contrasts)
end
Minimizing 1402 Time: 0 Time: 0:00:37 (26.52 ms/it)
Linear mixed model fit by maximum likelihood
y ~ 1 + v1 + v2 + v3 + v4 + v5 + (1 | pl3a) + (0 + v1 | pl3b) + (1 | pl5a) + (0 + v2 | pl5b) + (0 + v3 | pl5c) + (0 + v4 | pl5d) + (0 + v5 | pl5e)
logLik -2 logLik AIC AICc BIC
-110806.4290 221612.8581 221640.8581 221640.8612 221778.1198
Variance components:
Column Variance Std.Dev.
pl5a (Intercept) 0.00001651 0.00406315
pl5b v2 2.33149835 1.52692447
pl5c v3 0.04692059 0.21661162
pl5d v4 0.23529473 0.48507189
pl5e v5 0.16156129 0.40194688
pl3a (Intercept) 0.00000000 0.00000000
pl3b v1 2.20171276 1.48381696
Residual 2.54536460 1.59541988
Number of obs: 133841; levels of grouping factors: 467, 467, 467, 467, 467, 79, 79
Fixed-effects parameters:
────────────────────────────────────────────────────────
Coef. Std. Error z Pr(>|z|)
────────────────────────────────────────────────────────
(Intercept) -0.000756406 0.000861013 -0.88 0.3797
v1 -1.5349 0.181384 -8.46 <1e-16
v2 -1.29126 0.0923772 -13.98 <1e-43
v3 0.211162 0.0161316 13.09 <1e-38
v4 0.926982 0.0664047 13.96 <1e-43
v5 0.439968 0.0392001 11.22 <1e-28
──────────────────────────────────────────────────────── which gives an identical fit to lme4. Note that this took several hundred more iterations to fit than the more complex model (with a correlation structure for pl5) above and took nearly 20x as long per iteration. The more compact representation for suppressed correlations was introduced early in the 2.x release series (September 2019), so this also explains why things worked under 1.x but not under any recent release. @dmbates this is perhaps interesting both because I found a model where we hit our fallback |
This is fantastic. |
This starts to make a lot of sense. Our experience testing the code suggests (1) resort to rescaling happens a lot more frequent when we have more than one interaction level; (2) Keeping the random effect of intercept contributes to the issues; (3) Newer computer chips also contribute to coefficient discrepancies occasionally. I understand the performance implications of a fix to this issue and wonder (1) would it be possible to detect the extreme likelihood values and only invoke the fix in those cases? (2) Would it be helpful to avoid the problem by specifying the model similar to your test before we pass the data to fit the model? Or it is something that can be embedded in the package. Thanks a lot for continued effort on the issue. It is greatly appreciated. |
So we did a bunch of tests using the same data. Here is a summary of everything we observed -
It looks like for our purposes, MixedModels 2.x and later are not stable enough. We will hold off the transition until this can be resolved. Thanks a lot for your help so far. |
@yuexiao-marketingattribution I'm a little bit surprised by the performance you've found -- 11-30 minutes is an order of magnitude slower than what I found on any my personal devices. Can you provide the output of |
Below is the specs of my machine. @liamlundy 's machine uses newer chips and tends to use less time than mine. Julia Version 1.6.5 |
Just curious: Why are you not using the current Julia Version 1.8.2 (and all the updates coming with it)? |
Not sure how much difference this will make but we could try. We started this effort to upgrade julia when 1.6.5 was the latest version. The general issue with MixedModels, however, seems start when MixedModes went from 1.1.7 to 2.0.0. |
Also my interpretation of Phil's reconstruction of the lme4 solution by putting MixedModels.jl in a straight jacket was that the lme4 solution is for various reasons, say, spurious and should not be used as a platform for your work. I am happy to stand corrected if I misunderstood. |
@kliegl I don't know if MixedModels 1.x will run on Julia 1.6+. There was a change in the LinearAlgebra stdlib in Julia 1.6 that changed the type of a particular matrix, so we had to change the methods we used. We never backported the changed methods to older releases. |
@palday (Why) would it make sense in general to go back to MixedModels.jl 1.x? |
A few notes from my end:
|
This is pretty much correct. Our stance is that we're hitting the boundaries of the software because the modelling problem here is numerically ill-conditioned, which may reflect an underlying mismatch between the desired inference and the data. We want to support as many models as possible and provide better hints about when and how things may fail, but there is a fundamental limit to automated checks. There is also a very real tradeoff for my limited development time -- this is a hobby project for me. Even if implementing a fix for a particular situation is relatively straightforward, it increases the maintenance burden both by introducing more code and increasing the number of corner cases we have to check. I have actually mocked up what it would it would take to force separate terms to remain separate. It turns out that this violates assumptions elsewhere in the code and requires changing a surprising number of lines: kwarg for optional amalgmate. Because those lines also involve output behavior, it also means that there would be a lot of tests to write. I suspect there are better ways to solve this by changing more fundamental aspects of the algorithm when weights are present, but those also require time to research and think about. One alternative would be using higher precision numbers (so something besides double precision/
I only became heavily involved with MixedModels development shortly after the 2.0 release, so I am not really familiar with the internals from back then. However, I did fit a number of models using the 1.x releases for my own research. To the best of my knowledge, the linear mixed model code there is correct, although generally much more feature limited than later releases. (There are also compatibility issues with other packages in the Julia ecosystem.) So if you are able to fit the linear mixed model you need with 1.x, go for it. However, I would not fully trust generalized linear mixed models from the 1.x series -- finding a few issues in that code is actually how I became involved with MixedModels.jl. |
We have been using the latest version of MixedModels and have some models that seem to have hit the rescaling code added in #615 before they can be fit. The addition of the rescaling algorithm is great because it allows those models that were previously hitting the PosDef error during the fitting process to pass.
We do have some concerns about the results that the rescaled models are giving us however. We recently noticed that the coefficients are drastically different in models that are rescaled compared to very similar models that did not get the same treatment. I ran a few variations of a model where the rescaling algoirthm only kicked in some specific variation. The fixed effects remained fairly stable when the model is not rescaled, but change drastically when rescaling occurs. Changing ramdom coefficients makes sense but it seems unlikely that the fixed coefficients would be this different with only small modifications to the model form.
Script
This is the base script. The supporting data and OS info is attached at the bottom of the ticket. I attached the results from this as well as some similar models below with descriptions of exactly what was run.
MixedModels v4.7 - Two Random Effects (With Rescaling)
This is the script run as is. As you can see, the rescaling algorithm kicks in here, producing some unexpected coefficients.
MixedModels v4.7 - One Random Effect (No Rescaling)
This is the script run with
geo_level_3_product_level_3
terms commented out.MixedModels v4.7 - Two Random Effects - No Intercepts
This is the script run with the intercepts-per-interaction removed. The data is already mean-centered for each geogrpahy x product so the intercepts shouldn't be techincally neccessary.
MixedModels v4.7 - Two Random Effects - No Weights
This is the script run with weights removed.
MixedModels v1.1 - Two Random Effects
This is the original script run using julia v1.1 and MixedModels v1.1. Here the script
passes without any rescaling and comes back with coefficients in a similar range to the models run without rescaling.
Supporting Data / Info
Julia version info:
View Project.toml
View Manifest.toml
data.csv
The text was updated successfully, but these errors were encountered: