ansible event catcher - mark event_monitor_runnning when there are no events at startup #13903

durandom · 2017-02-14T09:20:46Z

The reason the EventCatcher would not start was that it did not send event_monitor_running when there are no events coming in.

So I moved that up, before the loop.

@miq-bot add_labels providers/ansible_tower, bug
@miq-bot assign @jrafanie

@jrafanie please review and merge - because you are the worker tamer 🦁

chessbyte · 2017-02-14T17:12:34Z

app/models/manageiq/providers/ansible_tower/automation_manager/event_catcher/stream.rb

@@ -22,6 +23,7 @@ def poll
      catch(:stop_polling) do
        begin
          loop do
+            @before_poll.call if @before_poll && @before_poll.respond_to?(:call)


nil.respond_to?(:call) is false, so code can be simplified to:

@before_poll.call if @before_poll.respond_to?(:call)

I usually don't prefer try but this time I actually like @before_poll.try(:call)

Note, @before_poll doesn't scream "proc/lambda" in my head. I wonder if it makes sense to make this interface more clear by calling it @before_poll_proc or @pre_poll_proc

durandom · 2017-02-14T17:59:52Z

@chessbyte @jrafanie thanks for the review. Added your suggestions.

Hopefully this kind of micro setup is not needed anymore once the event catcher is refactored

Fryguy · 2017-02-14T18:23:11Z

Why is there a need to have all of this proc stuff? Can you not just call heartbeat directly? All event catchers should have a heartbeat method implemented, and I can't see why it wouldn't be necessary for every provider.

durandom · 2017-02-14T18:41:56Z

Can you not just call heartbeat directly?

Usually EventCatcher::Runner does the heartbeat here - but thats only one thread, that does the event processing.

In case there is a steady stream of events, this would be called once in a while. But if there are no events coming in, then the Runner thread would not heartbeat.

why it wouldn't be necessary for every provider.

Looks like at least google, hawkular and vmware have the same problem :)

@agrare am I missing something here? What happens when the vmware event catcher has no events coming in? Will it still heartbeat?

agrare · 2017-02-14T18:50:59Z

@durandom I don't think we do anything special for heartbeat, its possible vmware is just verbose enough with events that it isn't an issue :)

We do have a timeout on WaitForUpdatesEx set so what I'd like to do is return an empty set of events when it times out so it will still heartbeat.

durandom · 2017-02-14T19:05:30Z

return an empty set of events when it times out so it will still heartbeat

That wont do I guess, because an empty set will not reach the heartbeat here:

https://github.com/ManageIQ/manageiq/blob/master/app/models/manageiq/providers/base_manager/event_catcher/runner.rb#L190

  def process_events(events)
    events.to_miq_a.each do |event|
      heartbeat
      process_event(event)
      Thread.pass
    end
  end

jrafanie · 2017-02-14T19:05:36Z

Can you not just call heartbeat directly?
Usually EventCatcher::Runner does the heartbeat here - but thats only one thread, that does the event processing.

I wonder if it makes sense making all workers have a dedicated thread just for heartbeating. We used to do this years ago before rails/activerecord was threadsafe. I think it would be something we could finally do.

Although, we've had issues with the main thread getting "stuck" on a really long work item, such as processing a huge report, and I don't know that heartbeating in a dedicated thread would help there since we'd be still reporting to be alive but not actually able to do new work or respond to requests. 😕

So, yeah, I'm torn. I don't know what a dedicated heartbeat thread buys us since we'd still have a way to get the status or interrupt the busy thread from the heartbeat thread. I don't know.

agrare · 2017-02-14T19:09:19Z

because an empty set will not reach the heartbeat here

We could always heartbeat before entering the loop if we go that direction (aka not a dedicated heartbeat thread)

durandom · 2017-02-14T19:17:47Z

re "dedicated heartbeat thread"

If the consuming thread (the one that fetches events from the provider) gets stale, because of network stuff - the worker should be restarted.

If the dispatching thread (the one that puts events on our miq queue) get stale, because of? crazyness - the worker should be restarted.

So, actually my solution here could lead to a false positive, where one thread still heartbeats and the other got stale. 😭

Threading is wicked.

I dont know if a dedicated thread makes sense at all, doesnt it defy the idea of heartbeating? Then you have a thread that just heartbeats and the other two are stale?! (just what @jrafanie said in the second paragraph)

I guess, what we have now is probably best. Restart the worker if no events are coming in...

...and in the future I'd like to re-visit the whole threading approach here, because I actually think it brings more complexity into this than it actually solves a problem here (green threads)

close this?

blomquisg · 2017-02-15T15:13:44Z

Maybe I'm missing something very fundamental here, but we heartbeat in the do_work_loop method in the base MiqWorker::Runner class.

If I'm reading this correctly, each time the do_work method is called, the worker will heartbeat. There's an additional safeguard to heartbeat every time events are processed in case there are TONS of events and it gets stuck in the process_events method for a while.

If the worker doesn't get any events for a long time, then based on what I'm reading, the heartbeat method will be called by the do_work_loop just before it calls do_work again.

durandom · 2017-02-15T16:32:14Z

@blomquisg you are absolutely right - thanks for joining in and clarifying.

This made me re-visit this whole situation.

The reason the EventCatcher would not start was that it did not send event_monitor_running when there are no events coming in.
So I moved that up, before the loop.

Nevertheless, this is an indicator for how fragile this whole setup is - not just looking for cheap excuses 😄

durandom · 2017-02-17T11:19:13Z

@jrafanie could you re-review in the light of @blomquisg and my latest comment?
This boils down to moving one line up now.

If you want I can also close this PR and open a new one with just my last comment as description

jrafanie · 2017-02-17T14:59:04Z

@durandom So... this statement in the description of the PR:

In case there are no new events from ansible, the worker would not heartbeat.

And this from @blomquisg seem to contradict each other:

If the worker doesn't get any events for a long time, then based on what I'm reading, the heartbeat method will be called by the do_work_loop just before it calls do_work again.

This latter statement is true. Each iteration of the do_worker_loop will heartbeat as long as we didn't override this method. So, what are we solving in this PR again? Can you reclarify the description? It appears we are going from heartbeating with each event to each call to monitor_events. What is this solving?

durandom · 2017-02-17T16:14:00Z

@jrafanie sure. Re-wrote the description and title. Sorry for the confusion, this was a long way coming. And thanks to @blomquisg for the correct comment

jrafanie · 2017-02-17T16:59:54Z

@jrafanie sure. Re-wrote the description and title. Sorry for the confusion, this was a long way coming. And thanks to @blomquisg for the correct comment

No worries, it's easy to have PR comments/descriptions go stale so I try to ask dumb questions to see if we can help future "us" avoid confusion.

miq-bot · 2017-02-27T16:07:00Z

This pull request is not mergeable. Please rebase and repush.

in case there are no events when the event catcher starts up it would not send event_monitor_running

miq-bot · 2017-02-27T16:17:31Z

Checked commit durandom@e7a344e with ruby 2.2.6, rubocop 0.47.1, and haml-lint 0.20.0
1 file checked, 0 offenses detected
Everything looks good. 👍

durandom · 2017-02-28T07:54:24Z

@jrafanie merge?

jrafanie · 2017-02-28T15:58:03Z

Thanks @durandom, I didn't know the conflict was resolved. 👍

jrafanie · 2017-02-28T15:58:48Z

@durandom can you mark euwe/darga? I believe this is euwe/no, darga/no...

durandom · 2017-03-01T09:27:15Z

@miq-bot add_label darga/no

miq-bot added bug providers/ansible_tower labels Feb 14, 2017

miq-bot assigned jrafanie Feb 14, 2017

chessbyte reviewed Feb 14, 2017

View reviewed changes

durandom force-pushed the heartbeat_ansible_worker branch 2 times, most recently from 4124a67 to 2946ab9 Compare February 14, 2017 17:56

durandom force-pushed the heartbeat_ansible_worker branch from 2946ab9 to fc7b613 Compare February 14, 2017 18:23

durandom force-pushed the heartbeat_ansible_worker branch from fc7b613 to 23da6c9 Compare February 15, 2017 16:28

durandom changed the title ~~heartbeat ansible event catcher~~ ansible event catcher - mark event_monitor_runnning when there are no events at startup Feb 17, 2017

miq-bot added the unmergeable label Feb 27, 2017

notify earlier that monitor thread is running

e7a344e

in case there are no events when the event catcher starts up it would not send event_monitor_running

durandom force-pushed the heartbeat_ansible_worker branch from 23da6c9 to e7a344e Compare February 27, 2017 16:15

miq-bot removed the unmergeable label Feb 27, 2017

jrafanie merged commit 93a8fd5 into ManageIQ:master Feb 28, 2017

jrafanie added this to the Sprint 56 Ending Mar 13, 2017 milestone Feb 28, 2017

jrafanie added the euwe/no label Feb 28, 2017

miq-bot added the darga/no label Mar 1, 2017

durandom deleted the heartbeat_ansible_worker branch March 17, 2017 18:14

Fryguy added the providers/automation label Jan 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ansible event catcher - mark event_monitor_runnning when there are no events at startup #13903

ansible event catcher - mark event_monitor_runnning when there are no events at startup #13903

durandom commented Feb 14, 2017 •

edited

Loading

chessbyte Feb 14, 2017

jrafanie Feb 14, 2017

durandom commented Feb 14, 2017

Fryguy commented Feb 14, 2017

durandom commented Feb 14, 2017

agrare commented Feb 14, 2017

durandom commented Feb 14, 2017

jrafanie commented Feb 14, 2017 •

edited

Loading

agrare commented Feb 14, 2017

durandom commented Feb 14, 2017

blomquisg commented Feb 15, 2017

durandom commented Feb 15, 2017

durandom commented Feb 17, 2017

jrafanie commented Feb 17, 2017

durandom commented Feb 17, 2017

jrafanie commented Feb 17, 2017

miq-bot commented Feb 27, 2017

miq-bot commented Feb 27, 2017

durandom commented Feb 28, 2017

jrafanie commented Feb 28, 2017

jrafanie commented Feb 28, 2017

durandom commented Mar 1, 2017

ansible event catcher - mark event_monitor_runnning when there are no events at startup #13903

ansible event catcher - mark event_monitor_runnning when there are no events at startup #13903

Conversation

durandom commented Feb 14, 2017 • edited Loading

chessbyte Feb 14, 2017

Choose a reason for hiding this comment

jrafanie Feb 14, 2017

Choose a reason for hiding this comment

durandom commented Feb 14, 2017

Fryguy commented Feb 14, 2017

durandom commented Feb 14, 2017

agrare commented Feb 14, 2017

durandom commented Feb 14, 2017

jrafanie commented Feb 14, 2017 • edited Loading

agrare commented Feb 14, 2017

durandom commented Feb 14, 2017

blomquisg commented Feb 15, 2017

durandom commented Feb 15, 2017

durandom commented Feb 17, 2017

jrafanie commented Feb 17, 2017

durandom commented Feb 17, 2017

jrafanie commented Feb 17, 2017

miq-bot commented Feb 27, 2017

miq-bot commented Feb 27, 2017

durandom commented Feb 28, 2017

jrafanie commented Feb 28, 2017

jrafanie commented Feb 28, 2017

durandom commented Mar 1, 2017

durandom commented Feb 14, 2017 •

edited

Loading

jrafanie commented Feb 14, 2017 •

edited

Loading