Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support global scoped histograms that only report aggregate values at global tier. #569

Merged
merged 12 commits into from
Oct 25, 2018

Conversation

clin-stripe
Copy link
Contributor

@clin-stripe clin-stripe commented Oct 23, 2018

Currently, histograms only support local and mixedscope. Local scope sends all histogram values from local, forwarding veneurs.

Mixed scopes are more complicated: aggregate values like min, max, count, etc are sent by local veneurs but percentiles are aggregated globally.

To reduce metric cardinality and volume, we want to support globally scoped histograms for which all values are aggregated and reported by the global veneur.

This diff accomplishes that by adding a concept of global histograms inside veneur and the metric protobuf so the scope can be threaded to the global veneur.

Of note, it is necessary to add a hacky flag to the Histogram's Flush method indicating whether we want to flush the locally accumulated aggregate values or those from the globally aggregated histogram. We also had to add support to the tdigest data structure to support global merging.

Motivation

Test plan

  1. Additional test coverage for the changes in forwarding and histogram flushing behavior.
  2. Set up a mock global veneur tier configuration using the two additional yaml files being checked in.
  3. QA staging environment.
  4. Canarying in prod.

Copy link
Contributor

@sjung-stripe sjung-stripe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implementation looks great to me, nice work. failing test:

05:31:10 --- FAIL: TestLocalServerMixedMetrics (0.01s)
05:31:10     <autogenerated>:1: 
                          
	Error Trace:	server_test.go:406
05:31:10         
	Error:      	Not equal: 
05:31:10         
	            	expected: &tdigest.MergingDigest{compression:100, mainCentroids:[]tdigest.Centroid{tdigest.Centroid{Mean:1, Weight:1, Samples:[]float64(nil)}, tdigest.Centroid{Mean:2, Weight:1, Samples:[]float64(nil)}, tdigest.Centroid{Mean:7, Weight:1, Samples:[]float64(nil)}, tdigest.Centroid{Mean:8, Weight:1, Samples:[]float64(nil)}, tdigest.Centroid{Mean:100, Weight:1, Samples:[]float64(nil)}}, mainWeight:5, tempCentroids:[]tdigest.Centroid{}, tempWeight:0, min:1, max:100, reciprocalSum:0, debug:false}
05:31:10         
	            	actual  : &tdigest.MergingDigest{compression:100, mainCentroids:[]tdigest.Centroid{tdigest.Centroid{Mean:1, Weight:1, Samples:[]float64(nil)}, tdigest.Centroid{Mean:2, Weight:1, Samples:[]float64(nil)}, tdigest.Centroid{Mean:7, Weight:1, Samples:[]float64(nil)}, tdigest.Centroid{Mean:8, Weight:1, Samples:[]float64(nil)}, tdigest.Centroid{Mean:100, Weight:1, Samples:[]float64(nil)}}, mainWeight:5, tempCentroids:[]tdigest.Centroid{}, tempWeight:0, min:1, max:100, reciprocalSum:1.7778571428571428, debug:false}

this test contains a hardcoded gob, so it'll have to be adjusted. as for other tests:

  • e2e tests to make sure forwarding works with the new behavior (i can show you what these look like if you need a hand)
  • test to make sure old gobs can still be decoded without error

you'll also need to rebase because this repo is configured for squashes only

@clin-stripe clin-stripe changed the title WIP Support global scoped histograms that only report aggregate values at global tier. Oct 25, 2018
@stripe-ci
Copy link

Gerald Rule: Copy Observability on Veneur, Unilog, Falconer pull requests

cc @stripe/observability
cc @stripe/observability-stripe


// ensure histogram with no local samples flush aggregates for global flushes
// but no aggregates for non-global (mixed scope) flushes.
func TestGlobalHistoFlushBehavior(t *testing.T) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of writing new E2E tests, I wrote these unit tests to validate this behavior change. The other behavior changes (forwarding, mainly) were covered by adding new assertions.

Copy link
Contributor

@sjung-stripe sjung-stripe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GOOD SHIT

@sjung-stripe
Copy link
Contributor

ah, looks like it needs gofmt

}

// ScopeFromPB creates an internal MetricScope type from the protobuf Scope type.
func ScopeFromPB(scope metricpb.Scope) MetricScope {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ two spaces after MetricScope here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants