Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[G]VIF #548

Merged
merged 4 commits into from
Sep 14, 2023
Merged

[G]VIF #548

merged 4 commits into from
Sep 14, 2023

Conversation

palday
Copy link
Member

@palday palday commented Sep 13, 2023

closes #428

@palday palday requested a review from bkamins September 13, 2023 09:28
@codecov
Copy link

codecov bot commented Sep 13, 2023

Codecov Report

Patch coverage is 100.00% of modified lines.

Files Changed Coverage
src/GLM.jl ø
src/linpred.jl 100.00%

📢 Thoughts on this report? Let us know!.

@@ -362,7 +362,7 @@ fitted(m::LinPredModel) = m.rr.mu
predict(mm::LinPredModel) = fitted(mm)
residuals(obj::LinPredModel) = residuals(obj.rr)

function formula(obj::LinPredModel)
function StatsModels.formula(obj::LinPredModel)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we are at it. When is it called. When I do:

julia> formula(lm(x, y))
ERROR: type LinearModel has no field fr

julia> formula(glm(x, y, Normal()))
ERROR: type GeneralizedLinearModel has no field fr

other methods are called.

Do we have tests for different cases when formula is not present?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, will investigate. I thought we caught this when Milan removed TableRegressionModel.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On current master:

julia> formula(lm(ones(10, 1),  randn(10)))
ERROR: ArgumentError: model was fitted without a formula
Stacktrace:
 [1] formula(obj::LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}})
   @ GLM ~/Code/GLM.jl/src/linpred.jl:366
 [2] top-level scope
   @ REPL[13]:1

julia> formula(glm(ones(10, 1),  randn(10), Normal()))
ERROR: ArgumentError: model was fitted without a formula
Stacktrace:
 [1] formula(obj::GeneralizedLinearModel{GLM.GlmResp{Vector{Float64}, Normal{Float64}, IdentityLink}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}})
   @ GLM ~/Code/GLM.jl/src/linpred.jl:366
 [2] top-level scope
   @ REPL[14]:1

(will have to keep this in mind for the backport to 1.x where we still have TableRegressionModel)

@testset "[G]VIF" begin
duncan = RDatasets.dataset("car", "Duncan")
lm1 = lm(@formula(Prestige ~ 1 + Income + Education), duncan)
@test termnames(lm1)[2] == coefnames(lm1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. do we have tests when coefnames and termnames differ?
  2. do we have a decision what should be done in the case of lm(X, y) (i.e. model fitted without formula, it still prints variable names as x1 etc.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. This falls back to StatsModels -- the test there is just making sure we've successfully imported and exported the symbol.
  2. on master,termnames will error based on there being no formula (formula will return nothing).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think termnames should not be defined if there is no formula -- there are only Terms when there is a formula.

Copy link
Contributor

@bkamins bkamins Sep 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what should a user do to perform VIF analysis for the model = lm(X, y) case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vif works, but not gvif. So I think they can still do vif. If they're able to construct a model matrix directly for something with non trivial contrast coding, then they could probably also do adapt the gvif source to extract the correct columns.

@test termnames(lm1)[2] == coefnames(lm1)
@test vif(lm1) ≈ gvif(lm1)
lm2 = lm(@formula(Prestige ~ 1 + Income + Education + Type), duncan)
@test gvif(lm2; scale=true) ≈ [1.486330, 2.301648, 1.502666] atol=1e-4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Can you please add a comment on where these values are taken from?
  2. Also do we have tests for vif/gvif for glm?
  3. Do we have tests for vif/gvif for models without formula?
  4. Do we have tests for vif/gvif for models that have complex formulas, something like e.g @formula(y~(1+a*(b+log(c)))&(1+d))? (of course this is artificial, but I hope it is clear what I mean

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are just the StatsModels tests carried forward to models actually fitted here. 😄 But I can add a cross reference.

@palday palday requested a review from bkamins September 14, 2023 07:54
@@ -21,7 +22,7 @@ module GLM
export coef, coeftable, confint, deviance, nulldeviance, dof, dof_residual,
loglikelihood, nullloglikelihood, nobs, stderror, vcov, residuals, predict,
fitted, fit, fit!, model_response, response, modelmatrix, r2, r², adjr2, adjr²,
cooksdistance, hasintercept, dispersion
cooksdistance, hasintercept, dispersion, vif, gvif, termnames
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should just reexport StatsModels? That sounds natural.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only "problem" is that breaking changes in StatsModels necessarily become breaking changes in GLM.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah but there shouldn't be breaking changes in StatsModels minor releases, and anyway users who need these functions will do using StatsModels.

@palday palday merged commit b1ba4c5 into master Sep 14, 2023
12 checks passed
@palday palday deleted the pa/vif branch September 14, 2023 09:46
palday added a commit that referenced this pull request Sep 14, 2023
* [G]VIF

* add reference value source

* more tests

* glm tests

(cherry picked from commit b1ba4c5)
palday added a commit that referenced this pull request Sep 14, 2023
* [G]VIF (#548)

* [G]VIF

* add reference value source

* more tests

* glm tests

(cherry picked from commit b1ba4c5)

* fix formula implementation

* version bump
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add VIF?
3 participants