Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: pprof grouping for samples with span_id #3450

Merged
merged 4 commits into from
Jul 31, 2024

Conversation

korniltsev
Copy link
Collaborator

@korniltsev korniltsev commented Jul 30, 2024

The following code:

pprof.Do(context.Background(), pprof.Labels("function", "slow", "qwe", "asd", "asdasd", "zxczxc"), func(c context.Context) {
   work(40000)
   pprof.Do(c, pprof.Labels("span_id", "239"), func(c context.Context) {
     work(40000)
   })
})

may produce sampels with the folowing labels

(1,2) (3,4) (5,6)
(1,2) (3,4) (5,6) (7, 8)

After applying LabelsWithout they will look like:

(5,6) (3,4) (1,2) 
(1,2) (5,6) (3,4) | (7, 8)  // | denotes slice length and to the right is the filtered label

Next CompareSampleLabels will fail to recognize that these two label sets are equals
because first one is in reverse sorted order and the other is not.

In this PR we change FilterLabelsInPlace function to maintain sorted order of non filtered labels,
maintain reverse sorted order of filtered labels and put the filtered labels on the right side in place
so that slices.Reverse is not needed.

@korniltsev korniltsev requested a review from a team as a code owner July 30, 2024 15:42
@korniltsev
Copy link
Collaborator Author

Hmm, something is flaky in our integration test

 helper.go:302: 
        	Error Trace:	/home/korniltsev/p/pyroscope/pkg/test/integration/helper.go:302
        	            				/home/korniltsev/p/pyroscope/pkg/test/integration/ingest_pprof_test.go:264
        	Error:      	"0" is not greater than "1"
        	Test:       	TestIngest/../../../pkg/og/convert/pprof/testdata/req_2.pprof
        	Messages:   	{"code":"unknown","message":"canceled: stream error: stream ID 7; CANCEL"}%!(EXTRA *http.Response=&{500 Internal Server Error 500 HTTP/1.1 1 1 map[Content-Length:[74] Content-Type:[application/json] Date:[Tue, 30 Jul 2024 16:13:35 GMT] Vary:[Accept-Encoding]] 0xc0125a8ec0 74 [] false false map[] 0xc00301d680 <nil>})
    helper.go:303: 
        	Error Trace:	/home/korniltsev/p/pyroscope/pkg/test/integration/helper.go:303
        	            				/home/korniltsev/p/pyroscope/pkg/test/integration/ingest_pprof_test.go:264
        	Error:      	"0" is not greater than "1"
        	Test:       	TestIngest/../../../pkg/og/convert/pprof/testdata/req_2.pprof
        	Messages:   	{"code":"unknown","message":"canceled: stream error: stream ID 7; CANCEL"}

Not sure if It was like this before this PR.

Copy link
Collaborator

@kolesnikovae kolesnikovae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks for fixing this!

Yeah, the flaky integration test is worth investigating. Did you test the fix with "real" data? Our Go simple example should produce the data we need


Update:

I'm wondering if you discovered this issue in the wild – just trying to understand if this affected anyone.

IIRC, the logic here was that a situation when we have series that would match after the removal of span_id should not be possible:

(1,2) (3,4) (5,6)
(1,2) (3,4) (5,6) (7, 8)

When we set span_id, we also set span_name, which inevitably results in a unique combination of labels, but never exists alone.

Thus, we can have:

[ label set ] span_name span_id

Or:

[ label set ]

But never:

[ label set ] span_name

So, when we then remove span_id, no collisions are expected. For grouping to work, the order of the labels should be deterministic, but not necessarily defined as descending or ascending.

Now I understand that it may break in the case when a trace has been sampled – I'm not super sure how that's handled in all the SDKs.

The described logic is tailored to our use case. I really like the way it is handled in the PR – we should definitely make use of it

@korniltsev
Copy link
Collaborator Author

I have not seen this in the wild, just found it while reading the code.
I am not aware of any use cases other than the one you shared (sampled spans)
I guess "advanced" / API users can specify whatever they want and we better be handling it correctly.

@korniltsev
Copy link
Collaborator Author

I observed test flakiness in main branch #3454
I assume it was flaky before #3450 so merging.

@korniltsev korniltsev merged commit 96cad4a into main Jul 31, 2024
18 checks passed
@korniltsev korniltsev deleted the korniltsev/fix_pprof_split branch July 31, 2024 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants