-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Generalised Linear Models to FSharpStats #334
Conversation
Hey @bvenn, would you mind checking this out? Please squish all the commits if possible to avoid confusion. If there is anything I should add or change please let me know |
} | ||
) | ||
|
||
module QR = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the correct location for a function like this? Are these functions so specialized that nobody ever needs to use them in another context? if not, i'd suggest putting them in the managed LinearAlgebra module, otherwise you can also set the whole QR module here to internal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While some of the functions in QR are highly specialised, some, qrAlternative and solveLinearQR, aren't. The module is needed for future development when I will add a solver based on singular value decomposition, but at the moment it is redundant.
I will move the the specified functions to LinearAlgebra but keep the rest there if that´s ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allrighty. You can then mark the whole specialized module as internal to avoid leaking any of that stuff in the public API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing that would also remove it from the testing I on the stepwise function. You can have a look and tell me if it is ok now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can expose internals to other projects in the repo by adding this attribute in the project file:
<ItemGroup>
<InternalsVisibleTo Include="TestProject1" />
</ItemGroup>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kMutagene hey, I changed the code accordingly. Could you have another look if its better now?
for i=0 to testingArray.Length-1 do | ||
let expected = linkExpected.[i] | ||
let actual = linkFActual.[i] | ||
if isInf actual then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand this test setup. I think you should base the first conditional on the expected value and not on the actual value, e.g. "If the expected value is infinity then actual should be infinity", same for the tests below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from a maintenance standpoint - i cannot assess the implementation itself or the Linear Algebra though, so as long as your tests pass i think this is fine, test setup is a little weird but maybe i just do not understand the specifics
Hey @bvenn, anything you would change before the merge? |
Sorry, I was busy during the last weeks. I'll check out your PR this week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GLM + documentation looks good, but I have a general thought: Why is the GLM module assigned to the Fitting-namespace rather than Testing? I know that GLMs are directly linked to regression analysis, but I personally would search at Testing when looking for GLMs. But maybe I'm wrong here, lets discuss.
let qN = setqOfA i | ||
let qLength = getVectorLength qN | ||
let rValue = sqrt(qLength) | ||
r[i,i] <- rValue | ||
qLengths[i] <- qLength |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I get whats happening here:
- qLengths is a zero-vector that gets filled iteratively for each
i
in line 262 - in line 256 setqOfA is called with the current i. However in l 247 qLength is called at position
i
which always should be 0, because it was initialized with 0. Just after the determination of qLength the value at this specific index is replaced in line 262.
Obviously I miss a step, but is there a possibility to create the qLength by Vector.init (...) rather than keeping it mutable and accessing/mutating it at multiple positions? If not, I'm happy to merge it, but I got confused during my attempt to understand whats happening here..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you additionally set into -brackets a reference to this implementation and a short description what the benefit, drawback is when using this implementation over others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not have a direct reference since I had a look at the mathematical explanation for QR via Gram Schmidt and tried implementing it myself. The only real upside using this method instead of the in FSharp.Stats established one is the difference in the output Dimensions in R. An actual reference would be the numpy implantation of the reduced qr https://numpy.org/doc/stable/reference/generated/numpy.linalg.qr.html .
I also think that this implemtation needs to be optimised in the future to perfect the GLM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I get whats happening here:
- qLengths is a zero-vector that gets filled iteratively for each
i
in line 262- in line 256 setqOfA is called with the current i. However in l 247 qLength is called at position
i
which always should be 0, because it was initialized with 0. Just after the determination of qLength the value at this specific index is replaced in line 262.Obviously I miss a step, but is there a possibility to create the qLength by Vector.init (...) rather than keeping it mutable and accessing/mutating it at multiple positions? If not, I'm happy to merge it, but I got confused during my attempt to understand whats happening here..
The mutation was done to simplify the function to match the mathematical explanation better. This is one of the things that could be updated when I get to it.
// Computes the link function value for a given parameter. | ||
getLink = fun b -> Math.Pow(b,-2.)//1.0 / b | ||
// Computes the inverse link function value for a given parameter. | ||
getInvLink = fun a -> Math.Pow(a,(1./ -2.))//1.0 / a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the 1./2.
and later -1.-1.
are left on purpose to make the implementation easier to understand 👍
let internal signFunction x = | ||
if x > 0. then 1. | ||
elif x = 0. then 0. | ||
else -1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use the build in sign
function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping
let b = System.Math.Log(a) | ||
let c = endV - muV | ||
let d = c / muV | ||
let e = -b + d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e
isn't used, is it correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping
When watching the docs |
The issue seems to be older than this PR or it is due to my setup. Could you check the docs in your machine? |
Try using the build target ( |
I still get the following error during "System.Xml.XmlException: The ',' character, hexadecimal value 0x2C, cannot be included in a name. Line 15278, position 99" @LibraChris your branch is 42 commits behind, if you could rebase, the issue may be gone |
Implement iterative reweighted least squares (IRLS) solver for generalised linear models
add qr based GLM and Statistics for it
- Update glm QR Solver by substituting a Cholesky decomposition with another QR decomposition. - Update InverseLinkFunction to be identical to python/r linker function
- Rename Poisson -> LogLinkFunction - Rename Gamme -> InverseLinkFunction
<Compile Include="GeneralisedLinearModels.fs" /> | ||
<Compile Include="Quantile.fs" /> | ||
<Compile Include="Rank.fs" /> | ||
<Compile Include="Main.fs" /> | ||
</ItemGroup> | ||
<ItemGroup> | ||
<EmbeddedResource Include="data/glm_test_lungcap.csv" CopyToOutputDirectory="PreserveNewest" /> | ||
<EmbeddedResource Include="data/glm_test_energy.csv" CopyToOutputDirectory="PreserveNewest" /> | ||
<EmbeddedResource Include="data/glm_test_cheese.csv" CopyToOutputDirectory="PreserveNewest" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please fix these indentations
/// Solves a linear system of equations using QR decomposition. | ||
/// | ||
/// Parameters: | ||
/// - A: The coefficient matrix of the linear system. | ||
/// - t: The target vector of the linear system. | ||
/// | ||
/// Returns: | ||
/// - mX: The solution vector of the linear system. | ||
/// - r: The upper triangular matrix obtained from QR decomposition. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates!
I'm going to merge this PR if you just could add proper XML docs like seen in:
FSharp.Stats/src/FSharp.Stats/Fitting/LinearRegression.fs
Lines 128 to 145 in c5444cc
/// <summary> | |
/// Calculates the slope of a regression line through the origin | |
/// </summary> | |
/// <param name="xData">vector of x values</param> | |
/// <param name="yData">vector of y values</param> | |
/// <returns>Slope of the regression line through the origin</returns> | |
/// <example> | |
/// <code> | |
/// // e.g. days since a certain event | |
/// let xData = vector [|0.;1.;2.;3.;4.;5.;|] | |
/// // some measured feature, that theoretically is zero at day 0 | |
/// let yData = vector [|1.;5.;9.;13.;17.;18.;|] | |
/// | |
/// // Estimate the slope of the regression line. | |
/// let coefficients = | |
/// LinearRegression.OLS.Linear.RTO.fit xData yData | |
/// </code> | |
/// </example> |
The same for line 222 f.
Updated XML documentation for qrAlternative and solveLinearQR
Changes introduced in this PR
Description
This PR introduces the GLM (Generalized Linear Models) module in FSharp.Stats. The key features include the implementation of an Iteratively Reweighted Least Squares (IRLS) solver on a QR-based solver for GLMs, enhancing the accuracy and performance of statistical modeling. Extensive tests have been added and updated to ensure robustness, covering various link functions and variance calculations. Documentation has been thoroughly updated to assist users in leveraging these new features effectively.
[Required] please make sure you checked that
[Optional]