Add event driven workflows runner #102

agrare · 2024-07-01T18:00:35Z

Add a Workflows::Runner class which keeps a persistent thread waiting for events from Floe. Once an event is raised we try to link the event up to a workflow and we queue a step action if we find one.

If a workflow isn't found for example if it is already completed and removed from the list then we simply continue without queueing anything.

It is common to get multiple events for a single container run so this isn't a perfect "queue step when ready" and can be more efficient but currently this is erring on the side of caution (check step_ready on every event).

Also if the worker goes down and is restarted for any reason, the typical timer-based step is still on the queue and the running workflow will be added to the runner on the next timed check.

Depends on:

Fixes #109

agrare · 2024-08-15T16:19:11Z

lib/manageiq/providers/workflows/runner.rb

+          def runner
+            @runner ||= new.tap(&:start)
+          end


NOTE this ensures that this will work from the AutomationWorker as well as in development using simulate_queue_worker.

kbrock · 2024-08-15T21:53:51Z

lib/manageiq/providers/workflows/runner.rb

+          return if workflows.key?(workflow.id)
+
+          workflows[workflow.id] = [workflow, queue_args]


Does workflows[] ||= do the same thing with concurrent hashes?

Yeah ||= works I can switch to that

app/models/manageiq/providers/workflows/automation_manager/workflow_instance.rb

miq-bot · 2024-08-16T14:40:45Z

Checked commit agrare@7a0e8d8 with ruby 3.1.5, rubocop 1.56.3, haml-lint 0.51.0, and yamllint
4 files checked, 0 offenses detected
Everything looks fine. 🏆

Fryguy · 2024-08-16T14:56:30Z

lib/manageiq/providers/workflows/runner.rb

I was expecting this file to live on the floe side (or at least everything expect the runner method itself - that part I expected to be passed in).

That is, I expected some sort of generic queueing / pulling from a queue / checking strategy in floe, and then the specifics about how to queue something would come from manageiq-providers-workflows via callbacks or lambdas or something.

So aside from the boilerplate thread management, almost all of what is here is dealing with ConfigurationScript records and MiqQueue. We could move more into Floe but we'd have to have callbacks for almost everything in the docker_wait method (currently it is basically just docker_runner.wait, find the ConfigurationScript record, miq_queue.put).

app/models/manageiq/providers/workflows/automation_manager/workflow_instance.rb

kbrock · 2024-08-19T17:52:44Z

app/models/manageiq/providers/workflows/automation_manager/workflow_instance.rb

+    if wf.end?
+      ManageIQ::Providers::Workflows::Runner.runner.delete_workflow(self)
+    else
+      deliver_on = wf.wait_until || 1.minute.from_now.utc


lib/manageiq/providers/workflows/runner.rb

kbrock · 2024-08-19T21:32:00Z

The whole multiple events and keying off of workflow has me a little.

I feel the purpose of this is to react to an event from Docker
So every time you throw in a task, you should be waiting for the event to end and you will trigger your own event accordingly. It may be "put this on the Queue" or "put this on the "MiqQueue". And at that time, you delete the entry in the local reaction workflows cache, since we already reacted and have no need to react further.

If you run 3 tasks for a workflow, then you get 3 "I am done" events out and kick the same workflow accordingly.

I feel the entry should go into this cache when we kick off a task. Maybe this is where my idea of keying off of a container_ref breaks down. The container ref and the fact that we're calling a Task will be very well known to the caller. Feels like we're loosing some encapsulation.

lib/manageiq/providers/workflows/runner.rb

kbrock · 2024-08-20T14:27:13Z

offline: agrare doesn't want to store container_ref in the cache, it is too volatile and would prefer to store a workflow.id or something that is more stable

Fryguy · 2024-08-22T13:44:27Z

Backported to radjabov in commit ad9bc0b.

commit ad9bc0b2a3b19bb6c8ad9d5352efc3fe772a5235
Author: Keenan Brock <[email protected]>
Date:   Tue Aug 20 10:27:18 2024 -0400

    Merge pull request #102 from agrare/add_event_driven_workflows_runner
    
    Add event driven workflows runner
    
    (cherry picked from commit 572047c0e74ce12269dffbac0520134804c8f1c4)

Add event driven workflows runner (cherry picked from commit 572047c)

miq-bot added the wip label Jul 1, 2024

agrare force-pushed the add_event_driven_workflows_runner branch 12 times, most recently from 904527f to 36f4e3d Compare August 15, 2024 16:01

agrare mentioned this pull request Aug 15, 2024

Start all automation runners in AutomationWorker ManageIQ/manageiq#23151

Merged

agrare changed the title ~~[WIP] Add event driven workflows runner~~ Add event driven workflows runner Aug 15, 2024

miq-bot removed the wip label Aug 15, 2024

agrare commented Aug 15, 2024

View reviewed changes

agrare force-pushed the add_event_driven_workflows_runner branch 3 times, most recently from d02ea7d to 720e212 Compare August 15, 2024 18:05

kbrock reviewed Aug 15, 2024

View reviewed changes

app/models/manageiq/providers/workflows/automation_manager/workflow_instance.rb Show resolved Hide resolved

Add Workflows::Runner which uses docker events

7a0e8d8

agrare force-pushed the add_event_driven_workflows_runner branch from 720e212 to 7a0e8d8 Compare August 16, 2024 14:38

Fryguy reviewed Aug 16, 2024

View reviewed changes