Ingester panics when no messages is processed in 1 minute #1125

marqc · 2018-10-17T09:03:06Z

Requirement - what kind of business use case are you trying to solve?

Forward traces from kafka storage to elasticsearch storage with jaeger-ingester deployed to kubernetes-cluster. There are periods when traced system is not used and no traces are being recorded.

Problem - what in Jaeger blocks you from solving the requirement?

When ingester does not process any message for Time.Minute process dies with:

{"level":"panic","ts":1539766162.6593273,"caller":"consumer/deadlock_detector.go:69","msg":"No messages processed in the last check interval"

That stops docker container. Kubernetes brings it back, but with exponential backoff. Each restart makes pauses longer.
If not using kubernetes or some systemd manager, that keeps starting ingester it would just die and stay that way.

Proposal - what do you suggest to solve the problem or improve the existing situation?

Trust kafka client (sarama) build in fail detection mechanism. Possibly in combination with exposing kafka consumer options (like read message timeouts) to be configurable.
Restart consumer within running process (build new consumer and bootstrap app without exiting).
Expose configuration for deadlock_detector tick duration - does not solve a problem, but helps managing impact on business.

Any open questions to address

Which of proposed solutions is best?

The text was updated successfully, but these errors were encountered:

yurishkuro · 2018-10-17T15:33:12Z

This was done to address a number of issues with the sarama lib (#1052) until they are fixed upstream.

I think exposing the timeout setting via cli/config is a good idea, and setting it to 0 should turn off the self-termination behavior.

yurishkuro · 2018-10-17T15:34:55Z

@marqc are you interested in creating a pull request to fix that?

Also, consider adding yourself to #207 .

marqc · 2018-10-19T14:29:52Z

@yurishkuro Yes, I will take a look in next week.

This was referenced Oct 22, 2018

Configurable deadlock detector interval for ingester. #1133

Closed

Configurable deadlock detector interval for ingester. (resubmit) #1134

Merged

pavolloffay closed this as completed in #1134 Nov 12, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ingester panics when no messages is processed in 1 minute #1125

Ingester panics when no messages is processed in 1 minute #1125

marqc commented Oct 17, 2018

yurishkuro commented Oct 17, 2018

yurishkuro commented Oct 17, 2018

marqc commented Oct 19, 2018

Ingester panics when no messages is processed in 1 minute #1125

Ingester panics when no messages is processed in 1 minute #1125

Comments

marqc commented Oct 17, 2018

Requirement - what kind of business use case are you trying to solve?

Problem - what in Jaeger blocks you from solving the requirement?

Proposal - what do you suggest to solve the problem or improve the existing situation?

Any open questions to address

yurishkuro commented Oct 17, 2018

yurishkuro commented Oct 17, 2018

marqc commented Oct 19, 2018