[sampling] sampling on counts to be disabled when aggregation is enabled. #127

truthbk · 2020-11-16T13:00:44Z

For data consistency, should aggregation be enabled we will disable sampling for counts. This is due to the fact we cannot really guarantee a constant sample rate for any given count, and there keeping track of samples could introduce statistical inconsistencies.

The main goal of sampling was to reduce traffic on the wire anyhow, and we already achieve this by enabling aggregation so this should come at no ill effects to the user.

…led.

olivielpeau

This makes sense 👍

A couple of questions (that are not blockers for this PR):

any reason to filter specifically on the COUNT metric type in this logic? (COUNT is definitely the metric type that's most affected, but I feel this logic could apply to all metric types regardless, unless I'm missing something such as some performance considerations)
have you considered the approach taken in the .NET client as well? (Handle sample rate in CountAggregator. dogstatsd-csharp-client#143) cc @ogaca-dd for visibility

truthbk · 2020-11-16T15:14:51Z

About (1), I thought about that. I ended up doing counts only because it's got the most impact statistically. Sampling on gauges is straight-up dangerous, no? How do you upscale a value with such potential variance? But indeed, users could still be using it, so maybe I should just do this for all types. Sets would definitely be affected too so let me address that.

Implementing the approach in (2) would require a relatively bigger change, because the sampling is taken care of in .NET in the aggregator, whereas in Java it happens up-front. We could change that, certainly, but it would be a bigger change that I don't think is necessarily worth it.

olivielpeau · 2020-11-16T17:29:50Z

Sampling on gauges is straight-up dangerous, no? How do you upscale a value with such potential variance?

Yeah I can't really think of use-cases to use sample rates on gauges. FWIW on the dogstatsd server (agent) side, the sample rate is used only for count, histogram and distribution metrics (for histograms, to upscale the .count).
So for metric types such as gauge, the client-side performance is all that matters, and it's probably less costly to take into account the sample rate (i.e. generate random number and drop depending on value) than sending the sample for aggregation. So it would make more sense to keep this PR focused on specific metric types.

Implementing the approach in (2) would require a relatively bigger change, because the sampling is taken care of in .NET in the aggregator, whereas in Java it happens up-front. We could change that, certainly, but it would be a bigger change that I don't think is necessarily worth it.

ok, thanks for the explanation, makes sense!

truthbk · 2020-11-17T13:25:24Z

OK, so sets do not currently admit sampling on the client so I will scope this solely to counts for now. When aggregation of complex types is introduced we will do the same for histograms and distributions.

I will thus merge this as-is.

[sampling] sampling on counts to be disabled when aggregation is enab…

1923cc1

…led.

olivielpeau approved these changes Nov 16, 2020

View reviewed changes

[sampling] just a little more readable code

2af8225

truthbk merged commit 73818a8 into master Nov 17, 2020

truthbk deleted the jaime/sampling branch November 17, 2020 13:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[sampling] sampling on counts to be disabled when aggregation is enabled. #127

[sampling] sampling on counts to be disabled when aggregation is enabled. #127

truthbk commented Nov 16, 2020

olivielpeau left a comment

truthbk commented Nov 16, 2020

olivielpeau commented Nov 16, 2020

truthbk commented Nov 17, 2020

[sampling] sampling on counts to be disabled when aggregation is enabled. #127

[sampling] sampling on counts to be disabled when aggregation is enabled. #127

Conversation

truthbk commented Nov 16, 2020

olivielpeau left a comment

Choose a reason for hiding this comment

truthbk commented Nov 16, 2020

olivielpeau commented Nov 16, 2020

truthbk commented Nov 17, 2020