-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: auto retry once for connection closed #3426
Conversation
a864c30
to
ae83f28
Compare
a6498aa
to
e8741c7
Compare
e8741c7
to
3cce538
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than that, looks good to me
crates/router/src/services/api.rs
Outdated
Err(error) => { | ||
if error.current_context() == &errors::ApiClientError::ConnectionClosed { | ||
metrics::AUTO_RETRY_CONNECTION_CLOSED.add(&metrics::CONTEXT, 1, &[]); | ||
match cloned_send_request { | ||
Some(cloned_request) => { | ||
metrics_request::record_operation_time( | ||
cloned_request, | ||
&metrics::EXTERNAL_REQUEST_TIME, | ||
&[metrics_tag], | ||
) | ||
.await | ||
} | ||
None => Err(error), | ||
} | ||
} else { | ||
Err(error) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Err(error) => { | |
if error.current_context() == &errors::ApiClientError::ConnectionClosed { | |
metrics::AUTO_RETRY_CONNECTION_CLOSED.add(&metrics::CONTEXT, 1, &[]); | |
match cloned_send_request { | |
Some(cloned_request) => { | |
metrics_request::record_operation_time( | |
cloned_request, | |
&metrics::EXTERNAL_REQUEST_TIME, | |
&[metrics_tag], | |
) | |
.await | |
} | |
None => Err(error), | |
} | |
} else { | |
Err(error) | |
} | |
} | |
Err(error) if error.current_context() == &errors::ApiClientError::ConnectionClosed => { | |
metrics::AUTO_RETRY_CONNECTION_CLOSED.add(&metrics::CONTEXT, 1, &[]); | |
match cloned_send_request { | |
Some(cloned_request) => { | |
metrics_request::record_operation_time( | |
cloned_request, | |
&metrics::EXTERNAL_REQUEST_TIME, | |
&[metrics_tag], | |
) | |
.await | |
} | |
None => Err(error), | |
} | |
}, | |
err @ Err(_) => err |
@jarnura , this |
crates/router/src/services/api.rs
Outdated
metrics::AUTO_RETRY_CONNECTION_CLOSED.add(&metrics::CONTEXT, 1, &[]); | ||
match cloned_send_request { | ||
Some(cloned_request) => { | ||
metrics_request::record_operation_time( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we are retrying for all connection closed cases, but for non-idempotent request this will cause business related errors and audit for this is complex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
324bb11
@inventvenkat instead of doing this, what if we do not have any |
pool_idle_timeout disabling timeout didn't help this case. |
Type of Change
Description
Hyper crate has a bug which allows few calls to connect with a dead connections and fails to send request to the server. This bug of hyper can be tracked with below link.
Error: IncompleteMessage: connection closed before message completed
- LinkTemporary fix: Retry connection with the server by sending the request again when call fails with this issue.
Permanent fix would be, whenever reqwest crate release a stable version with hyper's 1.0 version
Another Link
This is just due to the racy nature of networking. hyper has a connection pool of idle connections, and it selected one to send your request. Most of the time, hyper will receive the server’s FIN and drop the dead connection from its pool. But occasionally, a connection will be selected from the pool and written to at the same time the server is deciding to close the connection. Since hyper already wrote some of the request, it can’t really retry it automatically on a new connection, since the server may have acted already
Additional Changes
Motivation and Context
Few external api calls fails due to this connection close
How did you test it?
Running the test case collections
Checklist
cargo +nightly fmt --all
cargo clippy