Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defining Group size programmatically #289

Closed
DarioSarra opened this issue Jun 14, 2024 · 4 comments
Closed

Defining Group size programmatically #289

DarioSarra opened this issue Jun 14, 2024 · 4 comments

Comments

@DarioSarra
Copy link

I am using the Group option to calculate the mean for each column of a Matrix. However, the number of columns in said Matrix is variable. I can't find a way to prepare the Group so that the Mean () is correctly calculated separately unless I can hardcode the column size. In the example below, the hardcoded Group g1 works correctly, while the Group g2, built from a collection, clumps all the data together.

using OnlineStats, LinearAlgebra
mat = rand(100,4)
g1 = 4Mean()
fit!(g1, LinearAlgebra.eachrow(mat))

Group
├─ Mean: n=100 | value=0.47693
├─ Mean: n=100 | value=0.517347
├─ Mean: n=100 | value=0.445965
└─ Mean: n=100 | value=0.515843
g2 = Group(fill(Mean(), 4))
fit!(g2, LinearAlgebra.eachrow(mat))

Group
├─ Mean: n=400 | value=0.489021
├─ Mean: n=400 | value=0.489021
├─ Mean: n=400 | value=0.489021
└─ Mean: n=400 | value=0.489021

Subquestion: Eventually, I would like to be able to calculate the means over a 3d tensor over the 3rd dimension. I'd be grateful If someone can help with that, too

@DarioSarra
Copy link
Author

DarioSarra commented Jun 14, 2024

After some investigating, probably caused by my ignorance of Julia's syntax, I understood that the form 4Mean() is a shorthand for the multiplication symbol as in 4 * Mean(). While using the Group constructor outside the call was the reason the data were passed to all stats multiple times. So, the solution is:

mat = rand(100,4);
g1 = 4Mean();
g2 = Group(fill(Mean(), 4));
n = 4;
g3 = n * Mean();

fit!(g1, LinearAlgebra.eachrow(mat));
Group
├─ Mean: n=100 | value=0.533386
├─ Mean: n=100 | value=0.503356
├─ Mean: n=100 | value=0.469497
└─ Mean: n=100 | value=0.435189

fit!(g2, LinearAlgebra.eachrow(mat));
Group
├─ Mean: n=400 | value=0.485357
├─ Mean: n=400 | value=0.485357
├─ Mean: n=400 | value=0.485357
└─ Mean: n=400 | value=0.485357

fit!(g3, LinearAlgebra.eachrow(mat))
Group
├─ Mean: n=100 | value=0.533386
├─ Mean: n=100 | value=0.503356
├─ Mean: n=100 | value=0.469497
└─ Mean: n=100 | value=0.435189

g1 == g3 true

It might be worth considering to add an example of this construction method in the docs of Group

@joshday
Copy link
Owner

joshday commented Jun 14, 2024

I actually thought I had removed the n * stat syntax to create a Group. I know that I meant to, which is why it isn't in the docstring.

I will, however, add an example of passing a collection to Group.

@joshday joshday closed this as completed Jun 14, 2024
@DarioSarra
Copy link
Author

If you are planning to remove this method. The example that led me to this use was in the docs Details of Updating (fit!).

This might not be the right place to ask a question but I think it's related. My final goal was to compute N separate Means after vectorizing a circularbuffer Matrix, and update the means after a certain number of steps of the Circular buffer. Hower this doesn't seem to work either:

using OnlineStats, LinearAlgebra, DataStructures

mat = rand(10,5)
resh = reshape(mat, (size(mat,1) * size(mat,2), 1))
g = length(resh) * Mean()
fit!(g, LinearAlgebra.eachrow(resh))

In this example the Means() end up taking 50 inputs each instead of 1

@joshday
Copy link
Owner

joshday commented Jun 14, 2024

Ah, thanks for the pointer to that example.


I would expect the fit! in your example to be an error. I'm not entirely sure what you're trying to do there, but if there's a bug, let's put it in a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants