Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression and memory leak in SampleBuilder@v4 #2778

Closed
at-wat opened this issue May 30, 2024 · 11 comments · Fixed by pion/interceptor#264
Closed

Performance regression and memory leak in SampleBuilder@v4 #2778

at-wat opened this issue May 30, 2024 · 11 comments · Fixed by pion/interceptor#264
Milestone

Comments

@at-wat
Copy link
Member

at-wat commented May 30, 2024

Your environment.

  • Version: v4.0.0-beta.14 and later

What did you do?

Updated pion/webrtc from v4.0.0-beta.13 to later version.

What did you expect?

SampleBuilder CPU/memory usage doesn't significantly increased.

What happened?

CPU usage is doubled in the existing benchmark, and input RTP packets aren't unreferenced after the corresponding samples are popped.

CPU

Bench name v4.0.0-beta.13 CPU v4.0.0-beta.14 CPU CPU usage increase
BenchmarkSampleBuilderSequential-32 311.4 ns/op 609.6 ns/op x1.96
BenchmarkSampleBuilderLoss-32 288.7 ns/op 720.5 ns/op x2.50
BenchmarkSampleBuilderReordered-32 320.9 ns/op 621.9 ns/op x1.93
BenchmarkSampleBuilderFragmented-32 272.8 ns/op 540.5 ns/op x1.98
BenchmarkSampleBuilderFragmentedLoss-32 246.8 ns/op 668.2 ns/op x2.71
Full benchmark outputs

v4.0.0-beta.13

$ go test -bench . -benchmem
goos: linux
goarch: amd64
pkg: github.com/pion/webrtc/v4/pkg/media/samplebuilder
cpu: AMD Ryzen 9 3950X 16-Core Processor            
BenchmarkSampleBuilderSequential-32        	 3831741	       311.4 ns/op	     388 B/op	       5 allocs/op
BenchmarkSampleBuilderLoss-32              	 3969078	       288.7 ns/op	     344 B/op	       4 allocs/op
BenchmarkSampleBuilderReordered-32         	 3697982	       320.9 ns/op	     388 B/op	       4 allocs/op
BenchmarkSampleBuilderFragmented-32        	 4363172	       272.8 ns/op	     362 B/op	       4 allocs/op
BenchmarkSampleBuilderFragmentedLoss-32    	 4807190	       246.8 ns/op	     317 B/op	       3 allocs/op
PASS
ok  	github.com/pion/webrtc/v4/pkg/media/samplebuilder	7.556s

v4.0.0-beta.14

$ go test -bench . -benchmem
goos: linux
goarch: amd64
pkg: github.com/pion/webrtc/v4/pkg/media/samplebuilder
cpu: AMD Ryzen 9 3950X 16-Core Processor            
BenchmarkSampleBuilderSequential-32        	 1979661	       609.6 ns/op	     420 B/op	       6 allocs/op
BenchmarkSampleBuilderLoss-32              	 1647154	       720.5 ns/op	     373 B/op	       5 allocs/op
BenchmarkSampleBuilderReordered-32         	 1881794	       621.9 ns/op	     420 B/op	       6 allocs/op
BenchmarkSampleBuilderFragmented-32        	 2221912	       540.5 ns/op	     394 B/op	       5 allocs/op
BenchmarkSampleBuilderFragmentedLoss-32    	 1816394	       668.2 ns/op	     346 B/op	       4 allocs/op
PASS
ok  	github.com/pion/webrtc/v4/pkg/media/samplebuilder	9.402s

v4.0.0-beta.19

$ go test -bench . -benchmem
goos: linux
goarch: amd64
pkg: github.com/pion/webrtc/v4/pkg/media/samplebuilder
cpu: AMD Ryzen 9 3950X 16-Core Processor            
BenchmarkSampleBuilderSequential-32        	 1978306	       602.4 ns/op	     420 B/op	       6 allocs/op
BenchmarkSampleBuilderLoss-32              	 1609996	       740.5 ns/op	     373 B/op	       5 allocs/op
BenchmarkSampleBuilderReordered-32         	 1897911	       628.1 ns/op	     420 B/op	       6 allocs/op
BenchmarkSampleBuilderFragmented-32        	 2179366	       538.9 ns/op	     394 B/op	       5 allocs/op
BenchmarkSampleBuilderFragmentedLoss-32    	 1744249	       684.3 ns/op	     346 B/op	       4 allocs/op
PASS
ok  	github.com/pion/webrtc/v4/pkg/media/samplebuilder	9.426s

Memory

I created a test to check that the input RTP packets are unreferenced after all samples are popped.

On v4.0.0-beta.13,

=== RUN   TestSampleBuilderPacketUnreference
--- PASS: TestSampleBuilderPacketUnreference (0.05s)

all popped packets are unreferenced.

On v4.0.0-beta.14,

=== RUN   TestSampleBuilderPacketUnreference
    samplebuilder_test.go:570: 
        	Error Trace:	/home/at-wat/go/src/github.com/pion/webrtc/pkg/media/samplebuilder/samplebuilder_test.go:570
        	Error:      	Not equal: 
        	            	expected: 1
        	            	actual  : 65536
        	Test:       	TestSampleBuilderPacketUnreference
--- FAIL: TestSampleBuilderPacketUnreference (0.08s)

65536 packets aren't unreferenced.

Note

CPU and memory usage of my product software using SampleBuilder significantly increased after updating v4.0.0-beta.13 to v4.0.0-beta.19.
I'll dig into the memory usage increase which doesn't appeared in the benchmark. Found the problem on input data unreference and updated above sections.

Refs

@at-wat at-wat added this to the v4 milestone May 30, 2024
@at-wat
Copy link
Member Author

at-wat commented May 30, 2024

@thatsnotright @Sean-Der FYI

@at-wat
Copy link
Member Author

at-wat commented May 31, 2024

Also, the new SampleBuilder doesn't unreference the input RTP packets and causes kind of memory leak.
Created a PR to add a test to check that the input RTP packets are unreferenced after all samples are popped.
#2781

@at-wat at-wat changed the title Performance regression in SampleBuilder@v4 Performance regression and memory leak in SampleBuilder@v4 May 31, 2024
@at-wat
Copy link
Member Author

at-wat commented May 31, 2024

@thatsnotright could you take a deeper look?

@at-wat
Copy link
Member Author

at-wat commented Jul 1, 2024

kindly ping @thatsnotright @Sean-Der

@edaniels
Copy link
Member

@at-wat gave a shot at investigating this one and I think I found at least one issue! See pion/interceptor#264

@at-wat
Copy link
Member Author

at-wat commented Aug 7, 2024

Memory unreference problem seems be fixed, but CPU usage is still twice expensive then v4.0.0-beta.13

goos: linux
goarch: amd64
pkg: github.com/pion/webrtc/v4/pkg/media/samplebuilder
cpu: AMD Ryzen 9 3950X 16-Core Processor            
BenchmarkSampleBuilderSequential-32        	 2007697	       599.3 ns/op	     420 B/op	       6 allocs/op
BenchmarkSampleBuilderLoss-32              	 1627015	       730.4 ns/op	     373 B/op	       5 allocs/op
BenchmarkSampleBuilderReordered-32         	 1930000	       626.0 ns/op	     420 B/op	       6 allocs/op
BenchmarkSampleBuilderFragmented-32        	 2244157	       527.6 ns/op	     394 B/op	       5 allocs/op
BenchmarkSampleBuilderFragmentedLoss-32    	 1774668	       680.3 ns/op	     346 B/op	       4 allocs/op
PASS
ok  	github.com/pion/webrtc/v4/pkg/media/samplebuilder	9.425s

@at-wat at-wat reopened this Aug 7, 2024
@edaniels
Copy link
Member

edaniels commented Aug 7, 2024

Memory unreference problem seems be fixed, but CPU usage is still twice expensive then v4.0.0-beta.13

goos: linux
goarch: amd64
pkg: github.com/pion/webrtc/v4/pkg/media/samplebuilder
cpu: AMD Ryzen 9 3950X 16-Core Processor            
BenchmarkSampleBuilderSequential-32        	 2007697	       599.3 ns/op	     420 B/op	       6 allocs/op
BenchmarkSampleBuilderLoss-32              	 1627015	       730.4 ns/op	     373 B/op	       5 allocs/op
BenchmarkSampleBuilderReordered-32         	 1930000	       626.0 ns/op	     420 B/op	       6 allocs/op
BenchmarkSampleBuilderFragmented-32        	 2244157	       527.6 ns/op	     394 B/op	       5 allocs/op
BenchmarkSampleBuilderFragmentedLoss-32    	 1774668	       680.3 ns/op	     346 B/op	       4 allocs/op
PASS
ok  	github.com/pion/webrtc/v4/pkg/media/samplebuilder	9.425s

Whoops sorry! Referencing the issue and merging auto closed this

@at-wat
Copy link
Member Author

at-wat commented Aug 7, 2024

I tested v4.0.0-beta.27 in my product app again.
Memory usage is fixed, but CPU usage is bumped from 10% to 100% (as same as I tried v4.0.0-beta.14), so it seems still unusable for our case.

@edaniels
Copy link
Member

edaniels commented Aug 7, 2024

Should we revert that code for now?

@Sean-Der
Copy link
Member

Sean-Der commented Aug 7, 2024

Can you give me a week I can look into perf regression.

I don’t think we will ever fix it otherwise. No one is actively maintaining/improving this part

@edaniels
Copy link
Member

edaniels commented Aug 7, 2024

Sure!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

3 participants