Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: find correct retry node when using templateRef. Fixes: #12633 #12683

Merged
merged 2 commits into from
Feb 22, 2024

Conversation

shuangkun
Copy link
Member

@shuangkun shuangkun commented Feb 20, 2024

The root cause is pod have retryStrategy, the boundaryNode didn't have template (use templateRef), so when find retryNode, probabilistically found others and cause "failed to get a template".

Some info help review:
Failed workflow(find the wrong retry node):
image
Succeed worklfow:
image

Fixes: #12633

Motivation

Modifications

Verification

@shuangkun shuangkun marked this pull request as draft February 20, 2024 08:58
@shuangkun shuangkun marked this pull request as ready for review February 20, 2024 09:50
@Joibel Joibel self-assigned this Feb 20, 2024
@agilgur5 agilgur5 added the area/retry-manual Manual workflow "Retry" Action (API/CLI/UI). See retryStrategy for template-level retries label Feb 20, 2024
Copy link
Member

@Joibel Joibel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for digging down to this.

@shuangkun
Copy link
Member Author

Thanks!

@agilgur5 agilgur5 changed the title fix: find correct retry node. Fixes: #12633 fix: find correct retry node when using templateRef. Fixes: #12633 Feb 21, 2024
BoundaryID: "A1",
Children: []string{},
TemplateRef: &wfv1.TemplateRef{
Name: "tmpl1",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this have a template field as well? the test is passing with only name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok,I have added it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the test is passing with only name?

Oh I guess the test passed bc nil == nil. The code checks whether the templateRef struct isn't nil, but not the individual fields.
They should never be nil in a real use-case, so I suppose that is a test edge-case

@agilgur5 agilgur5 self-assigned this Feb 21, 2024
Signed-off-by: shuangkun <[email protected]>
Copy link
Member

@agilgur5 agilgur5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for root causing and fixing this!

@agilgur5 agilgur5 merged commit a927379 into argoproj:main Feb 22, 2024
27 checks passed
isubasinghe pushed a commit to isubasinghe/argo-workflows that referenced this pull request Feb 27, 2024
isubasinghe pushed a commit to isubasinghe/argo-workflows that referenced this pull request Feb 28, 2024
isubasinghe pushed a commit to isubasinghe/argo-workflows that referenced this pull request Mar 12, 2024
@agilgur5 agilgur5 added area/retryStrategy Template-level retryStrategy and removed area/retry-manual Manual workflow "Retry" Action (API/CLI/UI). See retryStrategy for template-level retries labels Apr 25, 2024
isubasinghe pushed a commit to isubasinghe/argo-workflows that referenced this pull request May 6, 2024
isubasinghe pushed a commit to isubasinghe/argo-workflows that referenced this pull request May 7, 2024
@agilgur5 agilgur5 added this to the v3.4.x patches milestone May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

inline task in DAG makes workflow fail: failed to get a template
3 participants