Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GForce optimize weighted.mean #3977

Closed
renkun-ken opened this issue Oct 18, 2019 · 4 comments · Fixed by #5246
Closed

GForce optimize weighted.mean #3977

renkun-ken opened this issue Oct 18, 2019 · 4 comments · Fixed by #5246
Assignees
Labels
GForce issues relating to optimized grouping calculations (GForce)
Milestone

Comments

@renkun-ken
Copy link
Member

I'm not sure if it is possible to GForce optimize weighted.mean? It's quite different from existing GForce functions because it takes two variables.

@MichaelChirico
Copy link
Member

I think it should be doable. Several other g functions are taking several arguments (though none taking several vector arguments...)

@renkun-ken
Copy link
Member Author

Yes, that's exactly what I was trying to say. It'd be great if it can be supported since I use a lot, one weighted.mean in the call makes everything broken, sometimes I have to single out this and merge later.

@jangorecki jangorecki added the GForce issues relating to optimized grouping calculations (GForce) label Oct 18, 2019
@myoung3
Copy link
Contributor

myoung3 commented Jan 18, 2020

+1 for this. The current workaround is to first take a product and assign it a new column, then doing the sum in a subsequent by-group call.

@ben-schwen
Copy link
Member

ben-schwen commented Oct 29, 2021

I took a look into this and we have basically two options.

  1. Write everything as a new gforce function
  2. Implement it like this gweighted.mean = function(x, w, ..., na.rm=FALSE) { if (is.null(w)) gmean(x, na.rm) else gsum(x*w, na.rm) / gsum(w, na.rm)}

Advantages / Disadvantages

Option 2 is easy maintenance since it dispatches everything onto gsum/gmean but is less verbose and might return confusing error messages/warnings.

Benchmarking both, it turns out that option 2 is also faster than option 1 (at least with a naive implementation similar to gvar/gsd) since gsum is heavily optimized. However, option 1 is numerically more stable.

@mattdowle mattdowle added this to the 1.14.3 milestone Dec 3, 2021
@jangorecki jangorecki modified the milestones: 1.14.9, 1.15.0 Oct 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GForce issues relating to optimized grouping calculations (GForce)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants