Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds automatic population of unset retrypolicy field when starting Workflows #654

Merged
merged 14 commits into from
Aug 4, 2020

Conversation

mastermanu
Copy link
Member

@mastermanu mastermanu commented Aug 4, 2020

Change summary:

  • Adds a separate dynamic configuration to dictate "default" retry policy field values for Workflows vs. Activities
  • Adds logic to auto-populate unset retry policy fields for starting a workflow, signaling-to-starting a workflow, and starting a child workflow. If the specified retry policy is nil, no auto-population happens. If it is non-nil, unset fields are assigned defaults
  • Fixes bug where WorkflowExpirationTime was always set to 0 for Child Workflows. This meant that Child Workflows would assume that max attempts = 0 implies no retry policy is set.
  • Addresses flakiness in matching test (TestMatcherSuite/TestMustOfferRemoteMatch is flaky #243)

Validation:

  • Added unit tests to ensure RetryPolicy is properly set for calls via FrontEnd
  • Updated unit tests to validate ChildWorkflow is started with the correct WorkflowExpirationTime
  • Validated using local application against one box environment

Backwards compatibility:

  • For cases where the Retry Policy is set, but maximum attempts or other values were set to zero, we would not retry before. We will start retrying now. This is an edge case and broken behavior, so there is minimal risk here.

Future work:

  • Remove setting of default Backoff Coefficient from GO SDK as the server now takes care of this completely.

// if a policy is explicitly set on a workflow
type DefaultRetrySettings struct {
InitialIntervalInSeconds int32
MaximumIntervalCoefficient float64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it called MaximumIntervalCoefficient? I think we should just call it MaximumInterval and use Duration as type.

Copy link
Member Author

@mastermanu mastermanu Aug 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We needed to initialize maximum interval in the case maximum interval was not set by the user, but a default maximum value might be less than the initial interval, which would then fail validation.

Btw, this change is already part of code-complete (this is just moving code around) and Maxim and I walked through this specific choice to use Coefficient vs. Interval before we landed this (it was initially an Interval).

Happy to discuss offline if we want to make further changes here though.

common/util.go Outdated Show resolved Hide resolved
common/util.go Outdated Show resolved Hide resolved
common/util.go Outdated Show resolved Hide resolved
Copy link
Contributor

@samarabbas samarabbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. You can land after addressing few minor comments.
It would be great if you can also add an integration test which covers default retry policies for activity and child workflows.

@mastermanu mastermanu merged commit 0ae59f6 into temporalio:master Aug 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants