Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.) Add sum and reciprocalsum to merging tdigest. (Needs tests) #568

Closed
wants to merge 2 commits into from

Conversation

clin-stripe
Copy link
Contributor

@clin-stripe clin-stripe commented Oct 22, 2018

Summary

We'll need these summaries to report global histograms since these are the result of merging local digests together. Previously, we tracked these summaries manually.

Note that you cannot derive the harmonic mean from arithmetic means alone (the algebra requires geometric means as well). This is why we track reciprocal sums separately.

Motivation

Test plan

Rollout/monitoring/revert plan

r? @sjung-stripe
cc @stripe/observability

@stripe-ci
Copy link

Gerald Rule: Copy Observability on Veneur, Unilog, Falconer pull requests

cc @stripe/observability
cc @stripe/observability-stripe

@sjung-stripe
Copy link
Contributor

sjung-stripe commented Oct 22, 2018

hmm, wonder what's going on with travis tip - looks unrelated to your thing though so i'd just ignore it for now. implementation looks good to me. adding tests should be pretty straightforward (similar to https://github.com/stripe/veneur/blob/master/tdigest/histo_test.go#L22) - basically build up a histogram of random samples, ask the histogram for its sum, compare that to the real sum, and make sure they're within a reasonable error bound. remember to also test that the values are correct after merging two tdigests together

}
func (td *MergingDigest) Sum() float64 {
var s float64
for _, cent := range td.Centroids() {
Copy link
Contributor

@sjung-stripe sjung-stripe Oct 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, you want to iterate over td.mainCentroids and td.tempCentroids here. the function is only included for debugging/analysis purposes

(you'll need to iterate over both of the arrays, because they don't get combined together until you call td.mergeAllTemps())


td.reciprocalSum = oldReciprocalSum + other.reciprocalSum
td.min = math.Min(other.min, td.min)
td.max = math.Max(other.max, td.max)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these lines are unnecessary, they'll get updated by td.Add as the other histogram gets merged in

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair, but isn't this slightly more accurate, since we only use the Mean of the smallest centroid? In practice, what kind of error does the algorithm produce?

Copy link
Contributor

@sjung-stripe sjung-stripe Oct 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tdigest approximation is designed to preserve accuracy at the extreme quantiles (ie the smallest and largest samples will almost never be combined into a larger bucket), so the error between explicitly updating the min/max and recomputing it by adding centroids "should" be small. i can show you how to run the analysis, but i'm too lazy to do it myself so you can treat this comment as nonblocking

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah. I'll drop these lines for succinctness.

One case worth considering is returning a "0.001" value instead of 0. It can be potentially confusing when a value that should have a 0 min has something close to it. But most people will probably dismiss it as floating point error.

@@ -408,6 +430,9 @@ func (td *MergingDigest) GobDecode(b []byte) error {
if err := dec.Decode(&td.max); err != nil {
return err
}
if err := dec.Decode(&td.reciprocalSum); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might need to special case for io.EOF here, so that clients who aren't sending reciprocalSum can fail gracefully 😬 (in my defense, gob seemed fine back then... i have regrets now though)

@sjung-stripe
Copy link
Contributor

i assume this is redundant now that we have #569 ?

@clin-stripe
Copy link
Contributor Author

clin-stripe commented Oct 23, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants