Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Transport Stopped Exception #48930

Merged

Conversation

henningandersen
Copy link
Contributor

@henningandersen henningandersen commented Nov 11, 2019

When a node shuts down, TransportService moves to stopped state and
then closes connections. If a request is done in between, an exception
was thrown that was not retried in replication actions. Now throw a
wrapped NodeClosedException exception instead, which is correctly
handled in replication action. Fixed other usages too.

Relates #42612

When a node shutdowns, `TransportService` moves to stopped state and
then closes connections. If a request is done in between, an exception
was thrown that was not retried in replication actions. Now throw a
wrapped `NodeClosedException` exception instead, which is correctly
handled in replication action. Fixed other usages too.

Relates elastic#42612
@henningandersen henningandersen added >bug :Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. v8.0.0 v7.6.0 labels Nov 11, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/CRUD)

Turns out new exception was unnecessary.

The `Transport is stopped` messages were hardcoded in a few more places,
fixed those too.
Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few random comments :)

final boolean nodeIsClosing =
cause instanceof NodeClosedException
|| ExceptionsHelper.isTransportStoppedForAction(cause, "internal:cluster/shard/failure");
final boolean nodeIsClosing = cause instanceof NodeClosedException;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use the ExceptionsHelper.unwrap(e, NodeClosedException.class) != null pattern here too to be a little more resilient to future changes (and present unexpected paths that get us here ...?)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this too, but ended up keeping this as is, since the unwrap's are different in that unwrapCause only unwraps ElasticsearchWrapperException, whereas unwrap unwraps all causes. It could be important in case we end up unwrapping a deep exception from another host? Do you think we should try to follow your suggestion anyway?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah let's not do that. I def. doesn't look impossible for that exception from another node making it here (since it's not trivial to tell it the suggestion is pointless in the first place :)).

Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :)

final boolean nodeIsClosing =
cause instanceof NodeClosedException
|| ExceptionsHelper.isTransportStoppedForAction(cause, "internal:cluster/shard/failure");
final boolean nodeIsClosing = cause instanceof NodeClosedException;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah let's not do that. I def. doesn't look impossible for that exception from another node making it here (since it's not trivial to tell it the suggestion is pointless in the first place :)).

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the simplification of exception handling in this PR a lot. I've left one comment, looking good o.w.
AFAICS, this exception is only bubbled up locally on a node, so we don't need to care about BWC (i.e. be able to detect the old TransportException bubbling up from other nodes).

Copy link
Contributor

@Tim-Brooks Tim-Brooks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@henningandersen
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/bwc
@elasticmachine run elasticsearch-ci/default-distro

@henningandersen
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/packaging-sample-matrix

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@henningandersen henningandersen merged commit fbaf8c4 into elastic:master Nov 13, 2019
henningandersen added a commit to henningandersen/elasticsearch that referenced this pull request Nov 13, 2019
When a node shuts down, `TransportService` moves to stopped state and
then closes connections. If a request is done in between, an exception
was thrown that was not retried in replication actions. Now throw a
wrapped `NodeClosedException` exception instead, which is correctly
handled in replication action. Fixed other usages too.

Relates elastic#42612
henningandersen added a commit that referenced this pull request Nov 13, 2019
When a node shuts down, `TransportService` moves to stopped state and
then closes connections. If a request is done in between, an exception
was thrown that was not retried in replication actions. Now throw a
wrapped `NodeClosedException` exception instead, which is correctly
handled in replication action. Fixed other usages too.

Relates #42612
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. v7.6.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants