Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid unconditional retries in replicator's http client #1177

Merged
merged 1 commit into from
Feb 22, 2018

Conversation

nickva
Copy link
Contributor

@nickva nickva commented Feb 21, 2018

In some cases the higher level code from couch_replicator_api_wrap needs to
handle retries explicitly and cannot cope with retries happening in the lower
level http client. In such cases it sets retries = 0.

For example:

https://github.com/apache/couchdb/blob/master/src/couch_replicator/src/couch_replicator_api_wrap.erl#L271-L275

The http client then should avoid unconditional retries and instead consult
retries value. If retries = 0, it shouldn't retry and instead bubble the
exception up to the caller.

This bug was discovered when attachments were replicated to a target cluster
and the target cluster's resources were constrainted. Since attachment PUT
requests were made from the context of an open_revs GET request, PUT
request timed out, and they would retry. However, because the retry didn't
bubble up to the open_revs code, the second PUT request would die with a
noproc error, since the old parser had exited by then. See issue #745 for
more.

Testing recommendations

See issue #745 comments on how to set up testing. The code was tested locally with a Vagrant VM running Debian 8, Erlang 17.5. Hardware resources were 1 CPU, throttled to about 30%, disk throughput also throttled to about 10Mb/s. stress running in the background as stress --timeout 900m --cpu 1 --io 4.

Copy link
Member

@davisp davisp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Am not 100% concerned about removing the clause vs not. Whatever you think is best there.

% ibrowse worker terminated because remote peer closed the socket
% -> not an error
throw({retry, HttpDb, Params});
maybe_retry({error, req_timedout}, Worker, HttpDb, Params);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we just remove this clause entirely and let it fall through to the catch-all at the bottom?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh totally, let's remove it. Good call.

In some cases the higher level code from `couch_replicator_api_wrap` needs to
handle retries explicitly and cannot cope with retries happening in the lower
level http client. In such cases it sets `retries = 0`.

For example:

https://github.com/apache/couchdb/blob/master/src/couch_replicator/src/couch_replicator_api_wrap.erl#L271-L275

The http client then should avoid unconditional retries and instead consult
`retries` value. If `retries = 0`, it shouldn't retry and instead bubble the
exception up to the caller.

This bug was discovered when attachments were replicated to a target cluster
and the target cluster's resources were constrainted. Since attachment `PUT`
requests were made from the context of an open_revs `GET` request, `PUT`
request timed out, and they would retry. However, because the retry didn't
bubble up to the `open_revs` code, the second `PUT` request would die with a
`noproc` error, since the old parser had exited by then. See issue #745 for
more.
@nickva nickva force-pushed the fix-noproc-in-replicator-put branch from 399c07b to 16699e5 Compare February 22, 2018 00:57
@nickva nickva merged commit 6d959a7 into master Feb 22, 2018
@nickva nickva deleted the fix-noproc-in-replicator-put branch February 22, 2018 02:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants