Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix panic when max_span_count is reached, add counter metric #104

Merged
merged 5 commits into from
Jan 25, 2022
Merged

Fix panic when max_span_count is reached, add counter metric #104

merged 5 commits into from
Jan 25, 2022

Commits on Jan 25, 2022

  1. Fix panic when max_span_count is reached, add counter metric

    Panic seen in `ghcr.io/jaegertracing/jaeger-clickhouse:0.8.0` with `log-level=debug`:
    ```
    panic: undefined type *clickhousespanstore.WriteWorker return from workerHeap
    
    goroutine 20 [running]:
    github.com/jaegertracing/jaeger-clickhouse/storage/clickhousespanstore.(*WriteWorkerPool).CleanWorkers(0xc00020c300, 0xc00008eefc)
    	github.com/jaegertracing/jaeger-clickhouse/storage/clickhousespanstore/pool.go:95 +0x199
    github.com/jaegertracing/jaeger-clickhouse/storage/clickhousespanstore.(*WriteWorkerPool).Work(0xc00020c300)
    	github.com/jaegertracing/jaeger-clickhouse/storage/clickhousespanstore/pool.go:50 +0x15e
    created by github.com/jaegertracing/jaeger-clickhouse/storage/clickhousespanstore.(*SpanWriter).backgroundWriter
    	github.com/jaegertracing/jaeger-clickhouse/storage/clickhousespanstore/writer.go:89 +0x226
    ```
    
    Also adds metric counter and logging to surface when things are hitting backpressure.
    
    Signed-off-by: Nick Parker <[email protected]>
    Nick Parker committed Jan 25, 2022
    Configuration menu
    Copy the full SHA
    2d27c11 View commit details
    Browse the repository at this point in the history
  2. Potential fix for deadlock: Avoid holding mutex while waiting on close

    Signed-off-by: Nick Parker <[email protected]>
    Nick Parker committed Jan 25, 2022
    Configuration menu
    Copy the full SHA
    030ef89 View commit details
    Browse the repository at this point in the history
  3. Discard new batches instead of waiting for old batches to finish

    The current limit logic can result in a stall where `worker.CLose()` never returns due to errors being returned from ClickHouse.
    This switches to a simpler system of discarding new work when the limit is reached, ensuring that we don't get backed up indefinitely in the event of a long outage.
    
    Also moves the count of pending spans to the parent pool:
    - Avoids race conditions where new work can be started before it's added to the count
    - Mutexing around the count is no longer needed
    
    Signed-off-by: Nick Parker <[email protected]>
    Nick Parker committed Jan 25, 2022
    Configuration menu
    Copy the full SHA
    b7294e7 View commit details
    Browse the repository at this point in the history
  4. Include arbitrary worker_id in logs to differentiate between retry loops

    Signed-off-by: Nick Parker <[email protected]>
    Nick Parker committed Jan 25, 2022
    Configuration menu
    Copy the full SHA
    2ffa2cb View commit details
    Browse the repository at this point in the history
  5. Fix lint

    Signed-off-by: Nick Parker <[email protected]>
    Nick Parker committed Jan 25, 2022
    Configuration menu
    Copy the full SHA
    237e63c View commit details
    Browse the repository at this point in the history