Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

attempt rescaling of poorly scaled problems #615

Merged
merged 16 commits into from
May 20, 2022
Merged

attempt rescaling of poorly scaled problems #615

merged 16 commits into from
May 20, 2022

Conversation

palday
Copy link
Member

@palday palday commented May 18, 2022

Closes #611

This should help address some problematic models. The basic idea is to rescale the initial identity scaling by dividing by a large number, where "large" is the product of the largest weight by the largest response value. In other words, for poorly constrained problems, we change the initial guess to already have a good amount of shrinkage and see if that helps.

I tried more naive ways of rescaling the weights or even standardizing all variables in the model, but that didn't help. @dmbates if you're happy with this solution, I can add tests and updatee NEWs, etc., but I have rather mixed feelings about it myself.

  • add tests
  • add entry in NEWS.md
  • after opening this PR, add a reference and run docs/NEWS-update.jl to update the cross-references.
  • I've bumped the version appropriately

@codecov
Copy link

codecov bot commented May 18, 2022

Codecov Report

Merging #615 (042a621) into main (621f88b) will decrease coverage by 0.24%.
The diff coverage is 100.00%.

❗ Current head 042a621 differs from pull request most recent head 70d2537. Consider uploading reports for the commit 70d2537 to get more accurate results

@@            Coverage Diff             @@
##             main     #615      +/-   ##
==========================================
- Coverage   94.19%   93.94%   -0.25%     
==========================================
  Files          29       29              
  Lines        2757     2594     -163     
==========================================
- Hits         2597     2437     -160     
+ Misses        160      157       -3     
Impacted Files Coverage Δ
src/linearmixedmodel.jl 98.14% <100.00%> (-0.11%) ⬇️
src/linalg.jl 98.03% <0.00%> (-1.97%) ⬇️
src/serialization.jl 93.33% <0.00%> (-0.42%) ⬇️
src/arraytypes.jl 94.33% <0.00%> (-0.40%) ⬇️
src/generalizedlinearmixedmodel.jl 90.15% <0.00%> (-0.37%) ⬇️
src/linalg/rankUpdate.jl 97.18% <0.00%> (-0.32%) ⬇️
src/remat.jl 96.70% <0.00%> (-0.28%) ⬇️
src/Xymat.jl 90.00% <0.00%> (-0.25%) ⬇️
src/predict.jl 98.11% <0.00%> (-0.20%) ⬇️
... and 13 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 621f88b...70d2537. Read the comment docs.

@palday palday requested a review from dmbates May 18, 2022 01:15
@dmbates
Copy link
Collaborator

dmbates commented May 18, 2022

Is it enough to change the initial field or does the initial_step field of the OptSummary object also need to be adjusted?

@dmbates
Copy link
Collaborator

dmbates commented May 18, 2022

@palday Have you used the symbolic debugger on the example in #611 to see exactly where the PosDefException is occurring? I was thinking of doing that today, if you hadn't already embarked on it. I'm interested in exactly which part of the Cholesky factorization is failing.

@palday
Copy link
Member Author

palday commented May 18, 2022

@dmbates The block it fails at is:

4×4 Matrix{Float64}:
     -2.09724e7  34383.4    29916.1    -78929.7
 -34369.8        -1334.4   -10224.6        -1.22374e5
 -29893.4        10223.1    -3655.35  -163610.0
  78803.2        -5724.27   -3884.24        2.10496e6

But I hadn't pursued it beyond that.

@palday
Copy link
Member Author

palday commented May 18, 2022

Is it enough to change the initial field or does the initial_step field of the OptSummary object also need to be adjusted?

The initial step is just an empty vector -- we let NLopt do the hard work figuring that out. I have given some thought to adjusting the initial step for GLMM when we're initializing from a prior LMM/GLM but I hadn't really had a chance to experiment with what would be useful.

Copy link
Collaborator

@dmbates dmbates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks for doing this.

@dmbates
Copy link
Collaborator

dmbates commented May 19, 2022

Is it possible to construct a test that will reliably trigger this condition?

@palday
Copy link
Member Author

palday commented May 19, 2022

@dmbates I'm thinking about taking an example from #611 and simulating something based on that, although ideally I would want to see this behavior on a completely different model.

src/linearmixedmodel.jl Outdated Show resolved Hide resolved
src/linearmixedmodel.jl Outdated Show resolved Hide resolved
palday and others added 3 commits May 19, 2022 18:12
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@palday
Copy link
Member Author

palday commented May 19, 2022

@dmbates I came up with a test based on an error we occasionally get in shrinkage plots -- there's a limit to how much you can re-inflate data that requires some amount of shrinkage to fit. 😄

I also added in a secondary check that returns finitial when the optimizer drifts into an area where there isn't enough shrinkage. Why finitial? Generally, it will be the (near) worst case scenario value, so the optimizer won't view it as an optimum and using Inf messes up the quadratic approximation in BOBYQA. I couldn't add a reliable test in for this, because it's particularly sensitive to the exact parameter values the optimizer examines in a given step, but it did actually recover from an error I got when doing refit!(simulate!(MersenneTwister(42), model)) for the model in #611. Are you still okay with all this recovery code?

@palday
Copy link
Member Author

palday commented May 20, 2022

@dmbates well having tested this on OpenBLAS+MKL on x86-64 and OpenBLAS on Apple silicon, I've found one example that works for now (using fulldummy so that shrinkage is absolutely required to fit). We may have to ditch this test in the future.

@dmbates
Copy link
Collaborator

dmbates commented May 20, 2022

@palday This looks ready to go from my point of view.

@dmbates
Copy link
Collaborator

dmbates commented May 20, 2022

@palday We probably should document in the code the reason for setting the value to m.optsum.finitial when evaluation of the objective throws an error.

@dmbates
Copy link
Collaborator

dmbates commented May 20, 2022

While we are looking at this section of code, do we still need what is now line 472

    push!(fitlog, (copy(optsum.initial), optsum.finitial))

This is the line that results in the initial parameter value and objective being recorded twice when thin=1. I am unclear about why this value should be recorded unconditionally. For the default value of thin the final parameter vector and objective are not recorded explicitly in fitlog - they just exist in the optsum component as explicit fields. Is there a reason to record the initial parameter vector and objective value in both the explicit fields of optsum and in the fitlog?

Could this line just be removed or is it there to avoid fitlog being an empty vector?

@palday
Copy link
Member Author

palday commented May 20, 2022

@dmbates At first I thought that the value shouldn't be recorded twice because the line before that is empty!(fitlog). But now I wonder if the doubled recording is the result of NLopt evaluating the first step after we've already evaluated it explicitly. I'll drop the line.

The longterm idea was to unconditionally store the first and last steps in fitlog and have the final and initial fields in optsum point to the first and last entries of the fit log.

@palday
Copy link
Member Author

palday commented May 20, 2022

@dmbates just double checked and the first entry isn't repeated for me with the current code.

@dmbates
Copy link
Collaborator

dmbates commented May 20, 2022

@palday I think Nlopt has to evaluate the objective at the initial parameter values because we only pass it the parameters, we don't pass it the objective value.

If we want to have fitlog contain the initial and final parameter and objective values and have the .initial, .final fields just point to those values we should expand the schema for fitlog to include the evaluation number (and increment it at the beginning of the obj function, not in the middle as is done now). Either that or abandon the idea of using thin and just incorporate a Boolean logevaluations or some name like that. I doubt very much that many users will choose values of thin other than the default, typemax(Int) or 1. It is a symptom of my tendency to over-complicate matters that we use thin instead of a Boolean.

@palday palday merged commit 1b3f701 into main May 20, 2022
@palday palday deleted the pa/rescale branch May 20, 2022 16:05
@dmbates
Copy link
Collaborator

dmbates commented May 20, 2022

@dmbates just double checked and the first entry isn't repeated for me with the current code.

if you set thin = 1 it is

julia> ds = MixedModels.dataset(:dyestuff)
Arrow.Table with 30 rows, 2 columns, and schema:
 :batch  String
 :yield  Int16

julia> contrasts = Dict{Symbol, AbstractContrasts}(:batch => Grouping())
Dict{Symbol, AbstractContrasts} with 1 entry:
  :batch => Grouping()

julia> thin = 1
1

julia> dsm01 = let
           form = @formula yield ~ 1 + (1|batch)
           fit(MixedModel, form, ds; contrasts, thin)
       end
Linear mixed model fit by maximum likelihood
 yield ~ 1 + (1 | batch)
   logLik   -2 logLik     AIC       AICc        BIC    
  -163.6635   327.3271   333.3271   334.2501   337.5307

Variance components:
            Column    Variance Std.Dev.
batch    (Intercept)  1388.3332 37.2603
Residual              2451.2501 49.5101
 Number of obs: 30; levels of grouping factors: 6

  Fixed-effects parameters:
────────────────────────────────────────────────
              Coef.  Std. Error      z  Pr(>|z|)
────────────────────────────────────────────────
(Intercept)  1527.5     17.6946  86.33    <1e-99
────────────────────────────────────────────────

julia> first(dsm01.optsum.fitlog, 3)
3-element Vector{Tuple{Vector{Float64}, Float64}}:
 ([1.0], 327.76702162461663)
 ([1.0], 327.76702162461663)
 ([1.75], 331.03619322245146)

@palday palday mentioned this pull request May 20, 2022
@palday
Copy link
Member Author

palday commented May 20, 2022

@dmbates Huh, that's weird. I think our tests don't catch that because they're doing refit! instead of fit, but ideally the results should be the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LoadError: PosDefException: matrix is not positive definite; Cholesky factorization failed
2 participants