Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Broker pods die if tracing configmap is malformed on start #1417

Closed
grantr opened this issue Jun 17, 2019 · 8 comments
Closed

[BUG] Broker pods die if tracing configmap is malformed on start #1417

grantr opened this issue Jun 17, 2019 · 8 comments
Labels
area/observability kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Milestone

Comments

@grantr
Copy link
Contributor

grantr commented Jun 17, 2019

Describe the bug
If the tracing configmap is updated to include the key enabled: "true" but no zipkin-endpoint key, the Broker pods log an error but keep operating. They will fail to start upon the next restart.

Expected behavior
The pod should use the default value, or disable tracing and continue. Tracing is an observability feature and its misconfiguration shouldn't cause the Broker to stop working. The fact that the bug is triggered by a restart means the failure will be difficult to associate with the root cause.

To Reproduce

  1. Create a config-tracing configmap in the Broker's namespace with the key enabled: "true".
  2. The Broker pods log an error but proceed.
  3. Restart one or both of the Broker pods.
  4. The Broker pods fail to start.

Knative release version
Eventing HEAD, Serving 0.6

Additional context

@grantr grantr added the kind/bug Categorizes issue or PR as related to a bug. label Jun 17, 2019
@grantr grantr changed the title [BUG] Broker pods die if tracing configmap is malformed [BUG] Broker pods die if tracing configmap is malformed on start Jun 17, 2019
@akashrv
Copy link
Contributor

akashrv commented Jun 20, 2019

/priority important-soon

@knative-prow-robot knative-prow-robot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jun 20, 2019
@akashrv
Copy link
Contributor

akashrv commented Jun 20, 2019

/milestone v0.8.0

@daisy-ycguo
Copy link
Member

in-memory-channel-dispatcher is using the same logic to manage tracing and it has the same problem. Without zipkin-endpoint configured, in-memory-channel-dispatcher pod will get error when starting up.

$ kubectl get pods -n knative-eventing
NAME                                            READY     STATUS    RESTARTS   AGE
eventing-controller-bd99fd4c6-h7hbd             1/1       Running   0          6h38m
eventing-webhook-589b7cbf6-m5f9q                1/1       Running   0          6h38m
in-memory-channel-controller-67fb687b6d-5q9b2   1/1       Running   0          30s
in-memory-channel-dispatcher-755fc5dd9c-h546d   0/1       Error     2          26s
sources-controller-ffc8988b-kb4wh               1/1       Running   0          6h38m

So the logic to manage tracing with Zipkin shall be reviewed and updated.

@daisy-ycguo
Copy link
Member

daisy-ycguo commented Jul 3, 2019

Several pods with dependency on Zipkin are affected by this issue, not only broker and in-memory-channel-dispatcher. They are:

  • broker filter
  • broker ingress
  • in-memory channel dispatcher
  • gcppubsub channel dispatcher
  • kafka channel dispatcher
  • natss channel dispatcher

The root cause is vendor/knative.dev/pkg/tracing/config/trace.go: NewTracingConfigFromMap returning nil when tracing configmap is malformed. The activator pod in serving is affected by this issue too.

So we'd better create an issue in serving and fix it there.

@akashrv
Copy link
Contributor

akashrv commented Aug 8, 2019

/milestone v0.9.0

@knative-prow-robot knative-prow-robot added this to the v0.9.0 milestone Aug 8, 2019
@Harwayne Harwayne mentioned this issue Aug 28, 2019
37 tasks
@Harwayne
Copy link
Contributor

/project Observability To do

@grantr
Copy link
Contributor Author

grantr commented Sep 17, 2019

/milestone v0.10.0

@knative-prow-robot knative-prow-robot modified the milestones: v0.9.0, v0.10.0 Sep 17, 2019
@paulrossman paulrossman modified the milestones: v0.10.0, Sprint 4 Sep 30, 2019
@akashrv akashrv modified the milestones: v0.10.0-M1, v0.10.0 Oct 2, 2019
@paulrossman paulrossman removed the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Oct 10, 2019
@paulrossman paulrossman modified the milestones: v0.10.0, v0.11.0-M1 Oct 30, 2019
@paulrossman paulrossman modified the milestones: v0.11.0-M1, v0.11.0-M2 Nov 21, 2019
@grantr grantr modified the milestones: v0.11.0-M2, v0.11.0 Nov 26, 2019
@akashrv akashrv modified the milestones: v0.11.0, v0.12.0 Dec 10, 2019
@akashrv akashrv modified the milestones: v0.12.0, v0.13.0 Feb 12, 2020
@paulrossman paulrossman removed this from the v0.13.0 milestone Mar 3, 2020
@grantr grantr added this to the Backlog milestone Aug 24, 2020
@github-actions
Copy link

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/observability kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

6 participants