Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle nil HistoryURI during visibility archival #4880

Merged
merged 1 commit into from
Oct 5, 2023

Conversation

ytaben
Copy link
Contributor

@ytaben ytaben commented Sep 18, 2023

Handle cases of nil HistoryArchivalURI when performing visibility archival

See #4879 for details

This PR is a suggested fix for #4879
We are not fully familiar with the codebase so it's ok to close and make a better fix (or point us in the right direction; we are happy to make the patch)

Added a test that sets the URI to nil during visibility archival. It panics before the change and passes after

The risk is lower than before this fix. This is not, however, a comprehensive check on when similar panics can occur

This error has resulted in a P0 incident for us but it is also somewhat unlikely given it has not happened in such a long time. This may bring down cloud clusters as well in theory.

historyURI, err := carchiver.NewURI("test:///history/archival")
var historyURI carchiver.URI
var err error
if !c.NilHistoryUri {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit crude because if we just did c.HistoryURI - we wouldn't know whether to use default or if it's unset

@@ -260,7 +265,7 @@ func (a *archiver) archiveVisibility(ctx context.Context, request *Request, logg
HistoryLength: request.HistoryLength,
Memo: request.Memo,
SearchAttributes: searchAttributes,
HistoryArchivalUri: request.HistoryURI.String(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This value is not initialized here when history archival is disabled

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the visibility archiver need to have the value for the historyuri?

An alternative fix would be to always fill out both URIs in the request generation method so that it never reaches here nil. Then it could presumably have the correct value instead of an empty string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tristan-instacart

fill out both URIs

Fill them out how?
The branching for filling this out is currently set on history archival being enabled.
URI values also cannot be empty (parsing will fail).
How can we fill out the value if it were to not be available?

My understanding of this record is to record the current values for the fields and the error is that this crashes when history URI is nil (but should be empty)

@MichaelSnowden
Copy link
Contributor

Thanks for the fix! I'm sorry that this caused an incident for y'all, but we appreciate you fixing it to prevent it from happening to others. One thing I'd like to point out:

This may bring down cloud clusters as well in theory.

This won't happen because we don't use archival in Cloud.

@ytaben
Copy link
Contributor Author

ytaben commented Oct 3, 2023

Thank you for reviewing!

I realized after opening the PR the feature isn't offered in cloud but hopefully this still helps the community (or Temporal should this feature be used in cloud in the future)

@ytaben
Copy link
Contributor Author

ytaben commented Oct 5, 2023

@MichaelSnowden Just to make sure, do we need to do anything with this PR for merging or do we just leave it in your hands?

@yiminc yiminc merged commit c3820bb into temporalio:main Oct 5, 2023
10 checks passed
@ytaben ytaben deleted the ytaben/fix-nil-uri branch October 6, 2023 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants