-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1.) Add sum and reciprocalsum to merging tdigest. (Needs tests) #568
Conversation
Gerald Rule: Copy Observability on Veneur, Unilog, Falconer pull requestscc @stripe/observability |
hmm, wonder what's going on with travis tip - looks unrelated to your thing though so i'd just ignore it for now. implementation looks good to me. adding tests should be pretty straightforward (similar to https://github.com/stripe/veneur/blob/master/tdigest/histo_test.go#L22) - basically build up a histogram of random samples, ask the histogram for its sum, compare that to the real sum, and make sure they're within a reasonable error bound. remember to also test that the values are correct after merging two tdigests together |
tdigest/merging_digest.go
Outdated
} | ||
func (td *MergingDigest) Sum() float64 { | ||
var s float64 | ||
for _, cent := range td.Centroids() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, you want to iterate over td.mainCentroids and td.tempCentroids here. the function is only included for debugging/analysis purposes
(you'll need to iterate over both of the arrays, because they don't get combined together until you call td.mergeAllTemps()
)
|
||
td.reciprocalSum = oldReciprocalSum + other.reciprocalSum | ||
td.min = math.Min(other.min, td.min) | ||
td.max = math.Max(other.max, td.max) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these lines are unnecessary, they'll get updated by td.Add
as the other histogram gets merged in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair, but isn't this slightly more accurate, since we only use the Mean of the smallest centroid? In practice, what kind of error does the algorithm produce?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tdigest approximation is designed to preserve accuracy at the extreme quantiles (ie the smallest and largest samples will almost never be combined into a larger bucket), so the error between explicitly updating the min/max and recomputing it by adding centroids "should" be small. i can show you how to run the analysis, but i'm too lazy to do it myself so you can treat this comment as nonblocking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah. I'll drop these lines for succinctness.
One case worth considering is returning a "0.001" value instead of 0. It can be potentially confusing when a value that should have a 0 min has something close to it. But most people will probably dismiss it as floating point error.
@@ -408,6 +430,9 @@ func (td *MergingDigest) GobDecode(b []byte) error { | |||
if err := dec.Decode(&td.max); err != nil { | |||
return err | |||
} | |||
if err := dec.Decode(&td.reciprocalSum); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might need to special case for io.EOF
here, so that clients who aren't sending reciprocalSum
can fail gracefully 😬 (in my defense, gob seemed fine back then... i have regrets now though)
i assume this is redundant now that we have #569 ? |
Correct.
…On Tue, Oct 23, 2018 at 12:18 PM Stephen Jung ***@***.***> wrote:
i assume this is redundant now that we have #569
<#569> ?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#568 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/Ap1ZuNdRwBta6cLX8czbmf-X9eLY-vpAks5un2uQgaJpZM4X0XFU>
.
|
Summary
We'll need these summaries to report global histograms since these are the result of merging local digests together. Previously, we tracked these summaries manually.
Note that you cannot derive the harmonic mean from arithmetic means alone (the algebra requires geometric means as well). This is why we track reciprocal sums separately.
Motivation
Test plan
Rollout/monitoring/revert plan
r? @sjung-stripe
cc @stripe/observability