Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow users to add workflow states after a built-in executing action #3014

Closed
reubenmiller opened this issue Jul 24, 2024 · 6 comments
Closed
Assignees
Milestone

Comments

@reubenmiller
Copy link
Contributor

Is your feature improvement request related to a problem? Please describe.

When overriding the config_update workflow which re-uses the existing thin-edge.io config_update functionality and only adds a pre and post state. The postprocess state is not currently not possible to add as the built-in executing state transitions automatically to successful ignoring whatever on_success is defined.

An example workflow is show below, where preprocess and postprocess have been added to perform some custom actions before letting the tedge-agent execute the built-in functionality.

#:schema https://gist.githubusercontent.com/reubenmiller/4e28e8403fe0c54b7461ac7d1d6838c2/raw/4ad7e3bc3ce2e3a7542a5b4873a0196cbe91f35d/tedge.workflow.json

operation = "config_update"
on_error = "failed"

[init]
  action = "proceed"
  on_success = "preprocess"

[preprocess]
  script = "/usr/bin/config_update.sh preprocess ${.payload.type}"
  on_success = "scheduled"
  # Proceed on any unexpected errors to ensure the operations status changed are handled
  on_error = "scheduled"

[scheduled]
  action = "builtin"
  on_success = "executing"

[executing]
  action = "builtin"
  on_success = "postprocess"

[postprocess]
  script = "/usr/bin/config_update.sh postprocess ${.payload.type}"
  on_success = "successful"

[successful]
 action = "cleanup"

[failed]
  action = "cleanup"

Below shows the workflow log after an execution, notice how the postprocess state is not present in the log (but the preprocess state is).

==================================================================
Triggered config_update workflow
==================================================================

topic:     te/device/main///cmd/config_update/c8y-mapper-38494494
operation: config_update
cmd_id:    c8y-mapper-38494494
time:      2024-07-24T13:43:33.22652579Z

==================================================================

----------------------[ config_update @ init | time=2024-07-24T13:43:33.228707959Z ]----------------------

State:    {"logPath":"/var/log/tedge/agent/workflow-config_update-c8y-mapper-38494494.log","remoteUrl":"http://127.0.0.1:8001/c8y/inventory/binaries/38462323","status":"init","type":"custom1"}

Action:   move to preprocess state

=> moving to config_update @ preprocess

----------------------[ config_update @ preprocess | time=2024-07-24T13:43:33.243623851Z ]----------------------

State:    {"logPath":"/var/log/tedge/agent/workflow-config_update-c8y-mapper-38494494.log","remoteUrl":"http://127.0.0.1:8001/c8y/inventory/binaries/38462323","status":"preprocess","type":"custom1"}

Action:   /usr/bin/config_update.sh preprocess custom1

Exit status: 0 (OK)

stderr <<EOF
Backing up file: 
EOF

stdout (EMPTY)
=> moving to config_update @ scheduled

----------------------[ config_update @ scheduled | time=2024-07-24T13:43:33.255045364Z ]----------------------

State:    {"logPath":"/var/log/tedge/agent/workflow-config_update-c8y-mapper-38494494.log","remoteUrl":"http://127.0.0.1:8001/c8y/inventory/binaries/38462323","status":"scheduled","type":"custom1"}

Action:   builtin


----------------------[ config_update @ executing | time=2024-07-24T13:43:33.265693542Z ]----------------------

State:    {"logPath":"/var/log/tedge/agent/workflow-config_update-c8y-mapper-38494494.log","remoteUrl":"http://127.0.0.1:8001/c8y/inventory/binaries/38462323","status":"executing","tedgeUrl":"http://127.0.0.1:8001/c8y/inventory/binaries/38462323","type":"custom1"}

Action:   builtin


----------------------[ config_update @ successful | time=2024-07-24T13:43:34.163506312Z ]----------------------

State:    {"logPath":"/var/log/tedge/agent/workflow-config_update-c8y-mapper-38494494.log","path":"/etc/custom1.toml","remoteUrl":"http://127.0.0.1:8001/c8y/inventory/binaries/38462323","status":"successful","tedgeUrl":"http://127.0.0.1:8001/c8y/inventory/binaries/38462323","type":"custom1"}

Action:   wait for the requester to finalize the command

Describe the solution you'd like

Built-in actions should honour the on_success fields defined in a given workflow.

Describe alternatives you've considered

Additional context

@didier-wenzek
Copy link
Contributor

The original proposal is to have the built-in actions honoring the on_success and on_error fields:

[scheduled]
action = "builtin"
on_success = "executing"

[executing]
action = "builtin"
on_success = "postprocess"
on_error = "rollback"

However, to be more consistent with the other actions - notably background scripts,
it would be better to distinguish the two scheduled and executing cases as for sub-operations actions.
When scheduled, the operation is only triggered and the final outcome is awaited in the executing state.
Hence, I propose that the built-in actions honor not only the on_success and on_error fields, but also the on_exec field.

[scheduled]
  action = "builtin"
  on_exec = "executing"

[executing]
  action = "builtin"
  on_success = "postprocess"
  on_error = "rollback"

I also propose to deprecate the confusing builtin keyword.
And, instead of implicitly triggering the action named after the operation,
let the user be explicit about the action:

[scheduled]
  action = "software_update"
  on_exec = "executing"

[executing]
  action = "await_software_update"
  on_success = "postprocess"
  on_error = "rollback"

Then, as a syntactic sugar, it will be convenient to squash the trigger and the await actions. As in:

[scheduled]
  action = "software_update"
  on_exec = "executing"
  on_success = "postprocess"
  on_error = "rollback"

@didier-wenzek
Copy link
Contributor

Design

Three points have to be addressed:

  • Determining the builtin command and state
    • Currently, using action = builtin, these are implied.
      The command is given by the operation of the workflow
      and the status is the current status of the command.
    • With action = <operation-name> the operation is explicit
      and the status is derived from the workflow definition.
      • action = <operation-name> -> scheduled
      • action = <await-operation-name> -> executing
  • Forwarding the command to the appropriate actor.
    • This is done by the CommandDispatcher::send() method which uses the command name as the dispatch key.
    • With the current setting (i.e. action = builtin), the status can be sent unchanged;
      but with explicit operation-name (as in action = software_update or action = await_software_update)
      the status as to be set appropriately (either to scheduled or executing).
      Indeed, the user might use such an action in a custom state.
  • Determining the next state using the status returned by the actor to proceed with the appropriate exit handler
    • status == executing -> on_exec handler
    • status == successful -> on_success handler
    • status == failed -> on_error handler

@didier-wenzek
Copy link
Contributor

didier-wenzek commented Aug 29, 2024

After discussion with @reubenmiller , we agreed on another more consistent approach to deprecate the confusing builtin action.

The key idea is to see the issue as a namespace issue:
we would like to distinguish user-defined workflows from built-in ones,
and, doing so, to allow users to invoke builtin workflows from their own workflow.

So the proposal is to distinguish builtin:operation-xyz> from operation-xyz
the later being a customized version of the former
(or being the same if no customized version has been provided by the user).

Triggering the builtin behavior will then be done using the operation keyword as for triggering a sub operation.

[scheduled]
  operation = "builtin:software_update"
  on_exec = "executing"

[executing]
  action = "await-operation-completion"
  on_success = "postprocess"
  on_error = "rollback"

Triggering the builtin version of an operation is done following the same rules as for sub operations:

  • In the workflow definition, the user can:
    • set the input of the builtin operation, possibly using an input_script,
      the default being to pass the current state unchanged to the builtin workflow.
    • inject the output of builtin operation into the current command state,
      the default being to merge the builtin outcome to the current command state.
    • specify the next steps on_exec, on_success and on_error.
  • During execution:
    • All the steps of the builtin workflow are logged in the main operation log file

For backward compatibility, any action = "builtin" definition
will be translated into the equivalent operation = "builtin:<operation-name>",
using the same current implicit resolution rules and raising an error if they don't apply.

@albinsuresh
Copy link
Contributor

The last proposal definitely covers the stated requirement of adding post-processing states after a built-in operation execution. But we should also consider a future enhancement where users would want to override the builtin:software_update or builtin:config_update operations into more granular actions. For example, in the case of a config_update operation, the execution involves the following steps:

  1. Send the executing status update
  2. Download the config file
  3. Apply the updated config file

In most cases, the users would want to reuse all those steps but sometimes add an additional state in between step 2 and step 3 to do some validation on the downloaded artefact before it is applied. In such cases, instead of a all-in-one builtin:config_update action, they'd want to break the workflow in granular states with granular actions like builtin:download, builtin:file-copy etc so that they can add additional logic between these actions.

The current proposal doesn't prevent the possibility to add more actions like that in the future, which is why I like this over the previous one tying actions to their corresponding states. builtin:config_update is just a named action that does everything for now, but we are free to add builtin:download and builtin:file-copy in future.

My only concern with this proposal is the decision to overload the same operation keyword for the same, as that might be a bit confusing to the user to differentiate between operation = "config_update" vs operation = "builtin:config_update" when they define their own composite workflows. Since the externally visible behaviour is also different for both (e.g: visibility over MQTT) that just adds to the confusion. So, I'd be in favour of a new keyword with input parameters similar to operation.

@didier-wenzek
Copy link
Contributor

My only concern with this proposal is the decision to overload the same operation keyword for the same, as that might be a bit confusing to the user to differentiate between operation = "config_update" vs operation = "builtin:config_update" when they define their own composite workflows. Since the externally visible behaviour is also different for both (e.g: visibility over MQTT) that just adds to the confusion. So, I'd be in favour of a new keyword with input parameters similar to operation.

I see a difference between steps like download and file-copy compare to builtin:config_update.

  • It makes sense to trigger a download from any workflow.
  • Technically, a builtin:config_update could be triggered from any workflow, say device profile. But allowing this would be error prone: any customization for configuration updates will be ignored and this is not what the user might expect.

@gligorisaev
Copy link
Contributor

Review of the Test Suite and Documentation done, tested also for flakiness, working as expected.

@reubenmiller reubenmiller added this to the 1.3.0 milestone Sep 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants