-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
panic starting shared informer #15035
Comments
possible this is just a test flake (setup issue) but it should be initially triaged as something more significant, i think. |
@mfojtik this could be a memory race (use before initialization goroutine completes), not sure how else this can happen |
yeah, seems like the pdbLister is i'm off for vacation tomorrow, so probably won't have traffic to investigate closer. |
A quick scan of the controller code shows the controller accepts a shared informer (a PodInformer) and adds an event handler which relies on state that isn't initialized until after the event handler is added. Because the informer lifecycle isn't managed by the controller, it seems the handler could be invoked before the handler's dependent state is initialized. I wouldn't be surprised if the race exists for other handlers attached to PodInformer or PodDisruptionBudgetInformer. Will try to get a simple reproducer going today before I'm out on PTO tomorrow. |
Looks like most/all of the controllers are started and run in goroutines prior to shared informer startup (see the disruption controller specifically). Since NewDisruptionController is executed in a goroutine, it seems at least possible that |
I think there's a subtle but critical flaw in my assessment: the async executions are like Pretty sure that invalidates my theory. |
The next line of inquiry: is one of the controllers themselves starting a SharedInformer? |
The error seems to imply that a SharedInformer for Pod was started before the call to NewDisruptionController, but so far I can't reason through how that could be possible in either the upstream or origin controller init code. Shifting focus to reproducing. |
It just dawned on me that the original issue was reported in the context of an integration test, which can do all sorts of potentially hazardous things with informers during custom controller bootstrapping logic. The original linked job is long gone, but it seems entirely possible the issue is some custom code in a test's setup as @bparees originally suggested. |
One more note for anybody else picking this up later: based on the reported stack trace, the controller init occurred via start_master.go, which in the context of an integration test could be invoked via StartConfiguredMasterWithOptions, which doesn't provide the caller with any apparent means to influence/inject the Informers created internally within |
@ironcladlou thanks, I can pick this up and review the integration bootstrapping, it seems like we are just lucky that the tests are running... Also I would not consider this a blocker if this is just an integration test bootstrapping issue. @smarterclayton agree? |
Agree
…On Wed, Jul 12, 2017 at 3:52 AM, Michal Fojtik ***@***.***> wrote:
@ironcladlou <https://github.com/ironcladlou> thanks, I can pick this up
and review the integration bootstrapping, it seems like we are just lucky
that the tests are running... Also I would not consider this a blocker if
this is just an integration test bootstrapping issue. @smarterclayton
<https://github.com/smarterclayton> agree?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#15035 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABG_p3yscpKnxaizGFyo1BJeUCzYDnphks5sNHsrgaJpZM4ONoa6>
.
|
Thanks. Wish I had more log context for this one: if anybody notices it again before we solve, please try and get the full logs here. |
The fix for kubernetes/kubernetes#51013 will probably solve this. |
Pretty sure this is resolved by: kubernetes/kubernetes@e63fcf7 Please re-open if it happens again. |
https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin_integration/4149/
The text was updated successfully, but these errors were encountered: