Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: improve workflow builtin actions #3105

Conversation

didier-wenzek
Copy link
Contributor

@didier-wenzek didier-wenzek commented Sep 4, 2024

This is an alternative PR to improve the way builtin operations are customized.

Proposed changes

Improve the definition of operation workflows, so one can trigger a builtin operation the same way as for a sub operation. The aim is to make things easier to understand with less concepts, more consistent (while a single way to trigger an operation) and more flexible (ability to define specific state on success and on error).

[scheduled]
  operation = "builtin:software_update"
  on_exec = "executing"

[executing]
  action = "await_operation_completion"
  on_success = "postprocess"
  on_error = "rollback"
  • Introduce OperationAction::BuiltinOperation
  • Equip the BuiltIn action with on_success, on_error and on_exec handlers.
  • Let the user provides its own steps for the executing, successful and failed state.
  • Update the documentation - introducing builtin:<operation>
  • Make sure the deprecated builtin keyword can still be used for backward compatibility
  • Ensure a builtin:operation can only be invoked from the same operation
  • Let the user execute any builtin operation from any workflow
  • Ability to build the builtin operation input from the calling operation as for sub operations
  • Ability to inject excerpts of the builtin operation output into the calling operation as for sub operations

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Improvement (general improvements like code refactoring that doesn't explicitly fix a bug or add any new functionality)
  • Documentation Update (if none of the other choices apply)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Paste Link to the issue

#3014

Checklist

  • I have read the CONTRIBUTING doc
  • I have signed the CLA (in all commits with git commit -s)
  • I ran cargo fmt as mentioned in CODING_GUIDELINES
  • I used cargo clippy as mentioned in CODING_GUIDELINES
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

Further comments

Copy link

codecov bot commented Sep 4, 2024

Comment on lines -352 to -353
} else {
log_file.log_info("=> sub-operation not yet launched").await;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log entry has been removed, because confusing when the sub-operation is actually a builtin operation.

Comment on lines +412 to +415
// As not finalized, the builtin state is sent back
// to the builtin operation actor for further processing.
let builtin_state = new_state.clone();
Ok(self.builtin_command_dispatcher.send(builtin_state).await?)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be remove in the future. Indeed, all the builtin-operation actors are currently dealing with 2 steps while things are actually done in a single step (the schedule state for restart and software update; the executing state for log_upload, config_snapshot and config_update).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config actor still does it in two steps, where the tedgeUrl generation happens in the scheduled state and the rest in executing, although I can't remember why we chose to do it that way.

Comment on lines -207 to +213
.with_status(CommandStatus::Executing);
.with_status(CommandStatus::Successful);
software_box.send(software_list_response.into()).await?;

mqtt_box
.assert_received([MqttMessage::new(
&Topic::new_unchecked("te/device/main///cmd/software_list/1234"),
r#"{"status":"executing"}"#,
r#"{"status":"successful"}"#,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 tests has been updated, watching for a successful status and no more an executing status.

Indeed, publishing the executing state is done by the workflow engine according to the workflow definition and no more by the internal actor.

Comment on lines +114 to +120
/// Trigger a built-in operation
///
/// ```toml
/// operation = "<builtin:operation-name>"
/// on_exec = "<state>"
/// ```
BuiltInOperation(OperationName, ExecHandlers),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a first step, I introduced a specific BuiltInOperation case along the existing Operation case.

I will consider to the merge the two.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decided to go against this idea: a builtin operation can only be invoked from the same operation.

Copy link
Contributor

github-actions bot commented Sep 4, 2024

Robot Results

✅ Passed ❌ Failed ⏭️ Skipped Total Pass % ⏱️ Duration
505 0 2 505 100 1h33m19.753409999s

@didier-wenzek didier-wenzek marked this pull request as ready for review September 4, 2024 15:46
Copy link
Contributor

@reubenmiller reubenmiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@didier-wenzek I've played around with the proposal and it works well. I've been able to create a custom log_upload handler with a conditional pre-processing step and everything worked as expected.

So I'm happy with the direction of the PR.

i.e. there is no `te/+/+/+/+/cmd/builtin:<operation-name>/+` command topic.
- The builtin versions are only internal triggered by the agent from a customized workflow.
- On a fresh install of %%te%%, the `<operation-name>` is simply a copy of `builtin:<operation-name>`.
- This copy is not materialized on the file system, but created in memory by the agent when no customized version is found.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Materializing them as done for devie_profile would be a good step for users to discover and extend those built-in operation workflows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I will push a follow up PR for that.

Comment on lines +606 to +607
- By convention, this file is named after the operation name, as in `/etc/tedge/operations/software_update.toml`.
However, this is not mandatory: the operation is determined by the `operation` property, e.g. `operation = "software_update"`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this PR, but I've always wondered why we didn't tie it to the workflow name, as that would have easily prevented conflicting workflow definitions as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not. In practice, this what we do. I will push a follow up PR to discuss this.

Comment on lines 612 to 614
- A user-provided workflow can invoke any builtin operations as well as the custom versions.
- `operation = "<operation-name>"` triggers the custom-version of the operation workflow.
- `operation = "builtin:<operation-name>"` triggers the operation ignoring any customization.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A user might wonder when/why they should use one over the other. So, we might wanna highlight that only the former can be used from workflows of other operations and the latter is limited to the workflow of the same type only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now clearer with 6b37ae3

Copy link
Contributor

@albinsuresh albinsuresh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The feature is working as expected. I just have one remark regarding the decision to deprecate the old builtin action syntax, which might still be simpler for the end-user.

crates/core/tedge_api/src/workflow/mod.rs Outdated Show resolved Hide resolved
pub fn adapt_builtin_request(&self, command_state: GenericCommandState) -> GenericCommandState {
match self {
OperationAction::BuiltInOperation(_, _) => {
command_state.update(GenericStateUpdate::scheduled())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One might wonder why not GenericStateUpdate::init_payload() here, although at the impl level it makes no difference (since the builtin actors react to both init and scheduled messages the same way).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The status of the builtin request could be set to init, yes.

However, splitting hairs:

  • GenericStateUpdate::init_payload() cannot be used here as this is an empty payload that still needs to be populated with the actual user request.
  • All the builtin commands do no nothing on the init state, simply proceeding to executing.

To simplify this, we can refactor all the operation actors to have a single do-it action.

Comment on lines +412 to +415
// As not finalized, the builtin state is sent back
// to the builtin operation actor for further processing.
let builtin_state = new_state.clone();
Ok(self.builtin_command_dispatcher.send(builtin_state).await?)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config actor still does it in two steps, where the tedgeUrl generation happens in the scheduled state and the rest in executing, although I can't remember why we chose to do it that way.

Comment on lines +23 to +26
on_success = "postprocess" # on_success & on_error can be customized for builtin actions

[postprocess]
script = "/usr/bin/log_upload.sh ${.payload.status}"
Copy link
Contributor

@albinsuresh albinsuresh Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I understand that we want to deprecate action = "builtin" in favour of the newly proposed operation = "builtin:<operation_name>" style, seeing the simplicity of this syntax:

[executing]
  action = "builtin"
  on_success = "postprocess"      # on_success & on_error can be customized for builtin actions

[postprocess]
  script = "/usr/bin/log_upload.sh ${.payload.status}"
  on_success = "successful"

...makes me wonder if this is still a better choice, during this API exploration phase, as the new version contains the additional boilerplate (the await_operation state), as follows:

[executing]
operation = "builtin:software_update"    # trigger the built-in behavior for software update
on_exec = "await_operation"

[await_operation]
action = "await-operation-completion"    # awaiting the builtin operation to complete
on_success = "postprocess"

[postprocess]
  script = "/usr/bin/log_upload.sh ${.payload.status}"
  on_success = "successful"

From the user's perspective, the await_operation state addition doesn't add much value. And since we have plans to get rid of the await states in the future anyway, I'm wondering if it's better not to deprecate the old syntax to keep things simpler for now.

Since even the decision to overload the operation keyword for builtin operations might also get revisited while we introduce more granular builtin actions like download, file-copy etc, I'm wondering if we can keep the simpler action = "builtin" syntax itself for another release, and introduce the operation = "builtin:<operation_name>" style only once we're fully confident that it is the way to go?

For this PR, nothing changes except hiding the newly added operation syntax from the toml_config level and keeping the newly added await handlers for the old builtin action. With that, the user still gets the new capability to add post-processing steps to a builtin action with minimal config changes as well. WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully follow you.

  • action = "builtin" is not deprecated yet: this is still working, but discouraged in the documentation.
  • operation = "builtin:<operation_name> is really nicer from a user perspective. This is a personal point of view, but with some evidence: the documentation has been far simpler to write and there is less traps in using it.
  • We are now in a position to make implicit the action = "await-operation-completion", removing some burden to the user.
  • I see smaller-grain actions like download, file-copy ... as a nice idea we really have to work on. But, I don't see any those as full-fledged operations (with an MQTT topic, capabilities, retain messages, states ...). Hence, no risk of confusion.

Copy link
Contributor

@albinsuresh albinsuresh Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant was, since action = "builtin" is still supported (although to-be deprecated), there are currently two ways for the user to attach post-processing logic to builtin operation workflows:

  1. The recommended approach with operation = "builtin:<operation> as follows:
[executing]
operation = "builtin:log_upload"
on_exec = "await_operation"

[await_operation]
action = "await-operation-completion"
on_success = "postprocess"

[postprocess]
script = "/usr/bin/log_upload.sh ${.payload.status}"
on_success = "successful"
  1. The discouraged/legacy approach of action = "builtin", with the newly supported await handlers:
[executing]
action = "builtin"
on_success = "postprocess"

[postprocess]
script = "/usr/bin/log_upload.sh ${.payload.status}"
on_success = "successful"

From the external user's perspective, the second one with just 2 states looks much simpler, as the additional await_operation state in the first approach is just boilerplate which adds no value to the user (unless I'm missing something here). The only additional thing that they get by using operation is the input and input_script support, which is also less useful here IMO, as there is little scope for input manipulation when the main operation and the builtin operation are of the same type (since the same input can be passed as-is).

Since both syntaxes are providing the same functionality and the latter being minimal, I was just wondering why even expose the new operation = "builtin:<operation> API, when the existing action = "builtin" API (with the newly added support for await actions) does the same job with less boilerplate. Or is there some additional benefit with the operation approach that I'm missing?

PS: Talking purely about the external facing TOML level API/syntax here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the external user's perspective, the second one with just 2 states looks much simpler, as the additional await_operation state in the first approach is just boilerplate which adds no value to the user (unless I'm missing something here).

Yes, you are missing something. 3 steps are already required in the second case. Indeed, the scheduled step must be invoked as well as the executing step.

The only additional thing that they get by using operation is the input and input_script support, which is also less useful here IMO, as there is little scope for input manipulation when the main operation and the builtin operation are of the same type (since the same input can be passed as-is).

This was the plan when starting this PR, but has been abandoned meantime. As a builtin:<operation> can only apply to the current operation there is no point to tune the input.

Or is there some additional benefit with the operation approach that I'm missing?

Using action = "builtin" is really error prone unless you stick to a strict template.

@didier-wenzek
Copy link
Contributor Author

Oops, I have to rename this commit f27bf17

Copy link
Contributor

@Bravo555 Bravo555 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question about the tests and small nits, overall LGTM.


```toml
operation = "software_update" # an operation for which tedge-agent provides an implementation
["init"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unnecessary quotes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

TomlOperationAction::Operation(operation) => match operation.strip_prefix("builtin:") {
None => {
let handlers = TryInto::<ExecHandlers>::try_into((input.handlers, defaults))?;
let input_script = input.input_script;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: reads shorter + jump to definition in IDE actually jumps to our try_from definition, instead of a stub try_into impl in stdlib

Suggested change
let input_script = input.input_script;
let handlers = ExecHandlers::try_from((input.handlers, defaults))?;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed clearer!

Fixed by 7ad7db4

}
Some(builtin_operation_name) => {
let handlers = TryInto::<ExecHandlers>::try_into((input.handlers, defaults))?;
Ok(OperationAction::BuiltInOperation(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Ok(OperationAction::BuiltInOperation(
let handlers = ExecHandlers::try_from((input.handlers, defaults))?;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by 7ad7db4

@@ -173,6 +181,17 @@ impl TryFrom<(TomlOperationState, DefaultHandlers)> for OperationAction {
handlers, cmd_output,
))
}
"builtin" => {
let exec_handlers = TryInto::<ExecHandlers>::try_into((
input.handlers.clone(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
input.handlers.clone(),
let exec_handlers = ExecHandlers::try_from((

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by 7ad7db4

ExecHandlers::builtin_default(),
))?;
let await_handlers = TryInto::<AwaitHandlers>::try_into((
input.handlers,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
input.handlers,
let await_handlers = AwaitHandlers::try_from((

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by 7ad7db4

Should Contain ${software_list[0]} "tedge"
Should Contain ${software_list[0]} "postprocess"
Should Contain ${software_list[0]} "done"
Execute Command tedge mqtt pub --retain te/device/main///cmd/software_list/robot-456 ''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: The agent is supposed to clean-up the command, shouldn't we check that it does instead of manually cleaning in the test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No the agent do nothing once the command is successful or failed. This is the requester who is supposed to clean-up the command, once processed the final outcome.

This commit introduces no changes to the user.
This is only a preparation step to replace builtin actions,
which usage was very restricted and specific,
with builtin operations that can be used as any other operations.

Signed-off-by: Didier Wenzek <[email protected]>
The new name is more appropriate, now that these handlers are used not
only for background scripts but also to trigger a sub operation or a
builtin action.

Signed-off-by: Didier Wenzek <[email protected]>
Signed-off-by: Didier Wenzek <[email protected]>
@didier-wenzek didier-wenzek force-pushed the feat/improve-workflow-builtin-actions-v2 branch from 79cf19b to 7ad7db4 Compare September 11, 2024 07:52
@didier-wenzek didier-wenzek added this pull request to the merge queue Sep 11, 2024
Merged via the queue into thin-edge:main with commit a7e4d9f Sep 11, 2024
33 checks passed
@didier-wenzek didier-wenzek deleted the feat/improve-workflow-builtin-actions-v2 branch September 11, 2024 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants