Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shared xxhash instance in the appender causes a data race and crashes the ingester #1384

Closed
bikashmishra100 opened this issue Apr 14, 2022 · 1 comment · Fixed by #1387
Closed
Milestone

Comments

@bikashmishra100
Copy link
Contributor

Describe the bug
When we run load tests at a smaller scale, we saw ingester crashes with the following stack traces in v1.3.2. Below example panic is from find call on completing blocks which also happens while calling find on head block.

We are proposing the below fix to make the hasher instance a local variable which fixed the panic. Please let us know if we can open a pull request with the below fix.
main...bikashmishra100:main

To Reproduce
Steps to reproduce the behavior:

  1. Start Tempo (v1.3.2)
  2. Perform Operations (Read/Write/Others) - call /api/traces/xxx

Expected behavior
Crashes are not expected

Environment:

  • Infrastructure: Kubernetes
  • Deployment tool: [e.g., helm, jsonnet]

Additional Context
Panic at find in complete blocks

panic: runtime error: slice bounds out of range [:40] with length 32
goroutine 40879742 [running]:
github.com/cespare/xxhash.(*xxh).Sum64(0xc05abcd9f0, 0x0)
	/go/src/vendor/github.com/cespare/xxhash/xxhash.go:129 +0x575
github.com/grafana/tempo/tempodb/encoding.(*appender).RecordsForID(0xc0635d8f00, 0xc0b7f1b900, 0x10, 0x10, 0x0, 0x0, 0x0)
	/go/src/tempodb/encoding/appender.go:92 +0xc7
github.com/grafana/tempo/tempodb/wal.(*AppendBlock).Find(0xc0e472e780, 0xc0b7f1b900, 0x10, 0x10, 0x34ced20, 0x45858c8, 0x0, 0x0, 0x0, 0x0, ...)
	/go/src/tempodb/wal/append_block.go:158 +0x99
github.com/grafana/tempo/modules/ingester.(*instance).FindTraceByID(0xc042c45560, 0x35080f8, 0xc091c70720, 0xc0b7f1b900, 0x10, 0x10, 0x0, 0x0, 0x0)
	/go/src/modules/ingester/instance.go:437 +0xbe7
github.com/grafana/tempo/modules/ingester.(*Ingester).FindTraceByID(0xc041522c00, 0x35080f8, 0xc091c70720, 0xc073c86c80, 0x0, 0x0, 0x0)
	/go/src/modules/ingester/ingester.go:214 +0x491
github.com/grafana/tempo/pkg/tempopb._Querier_FindTraceByID_Handler.func1(0x35080f8, 0xc091c706f0, 0x302f360, 0xc073c86c80, 0x0, 0x0, 0x0, 0x0)
	/go/src/pkg/tempopb/tempo.pb.go:1205 +0x105
github.com/grafana/tempo/cmd/tempo/app.glob..func2(0x35080f8, 0xc091c706f0, 0x302f360, 0xc073c86c80, 0xc0b7f252e0, 0xc0ba0f87b0, 0x0, 0x0, 0x0, 0x0)
	/go/src/cmd/tempo/app/fake_auth.go:22 +0xe2
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1(0x35080f8, 0xc091c70690, 0x302f360, 0xc073c86c80, 0x0, 0x0, 0x0, 0x0)
	/go/src/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25 +0xc4
github.com/weaveworks/common/middleware.UnaryServerInstrumentInterceptor.func1(0x35080f8, 0xc091c70690, 0x302f360, 0xc073c86c80, 0xc0b7f252e0, 0xc0b7f25300, 0x0, 0x0, 0x0, 0x0)
	/go/src/vendor/github.com/weaveworks/common/middleware/grpc_instrumentation.go:33 +0xcf
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1(0x35080f8, 0xc091c70690, 0x302f360, 0xc073c86c80, 0x0, 0x0, 0x0, 0x0)
	/go/src/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25 +0xc4
github.com/opentracing-contrib/go-grpc.OpenTracingServerInterceptor.func1(0x35080f8, 0xc091c70690, 0x302f360, 0xc073c86c80, 0xc0b7f252e0, 0xc0b7f25320, 0x0, 0x0, 0x0, 0x0)
	/go/src/vendor/github.com/opentracing-contrib/go-grpc/server.go:57 +0x6f4
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1(0x35080f8, 0xc091c705d0, 0x302f360, 0xc073c86c80, 0x0, 0x0, 0x0, 0x0)
	/go/src/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25 +0xc4
github.com/weaveworks/common/middleware.GRPCServerLog.UnaryServerInterceptor(0x3520b38, 0xc0004a3fe0, 0x7f862ba15500, 0x35080f8, 0xc091c705d0, 0x302f360, 0xc073c86c80, 0xc0b7f252e0, 0xc0b7f25340, 0x0, ...)
	/go/src/vendor/github.com/weaveworks/common/middleware/grpc_logging.go:29 +0xc9
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1(0x35080f8, 0xc091c705d0, 0x302f360, 0xc073c86c80, 0x0, 0x0, 0x0, 0x0)
	/go/src/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25 +0xc4
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1(0x35080f8, 0xc091c705d0, 0x302f360, 0xc073c86c80, 0xc0b7f252e0, 0xc0ba0f87b0, 0x0, 0x0, 0x0, 0x0)
	/go/src/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:34 +0x16a
github.com/grafana/tempo/pkg/tempopb._Querier_FindTraceByID_Handler(0x30b0520, 0xc041522c00, 0x35080f8, 0xc091c705d0, 0xc0e1a8bc20, 0xc00117c660, 0x0, 0x0, 0x0, 0x0)
	/go/src/pkg/tempopb/tempo.pb.go:1207 +0x24d
google.golang.org/grpc.(*Server).processUnaryRPC(0xc001102fc0, 0x351f678, 0xc078889e00, 0xc0af11d7a0, 0xc0416a4120, 0x4526c60, 0x0, 0x0, 0x0)
	/go/src/vendor/google.golang.org/grpc/server.go:1286 +0x105d
google.golang.org/grpc.(*Server).handleStream(0xc001102fc0, 0x351f678, 0xc078889e00, 0xc0af11d7a0, 0x0)
	/go/src/vendor/google.golang.org/grpc/server.go:1609 +0x9f9
google.golang.org/grpc.(*Server).serveStreams.func1.2(0xc05ae28620, 0xc001102fc0, 0x351f678, 0xc078889e00, 0xc0af11d7a0)
	/go/src/vendor/google.golang.org/grpc/server.go:934 +0xf4
created by google.golang.org/grpc.(*Server).serveStreams.func1
	/go/src/vendor/google.golang.org/grpc/server.go:932 +0x2c5

cc: @sagarwala

@joe-elliott
Copy link
Member

Thanks for filing this issue. Yes, we would definitely take the PR in your diff. I agree that we should not be using a shared object there for hashing.

The only thing that has me scratching my head is why we have not seen this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants