Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(config_etcd): skip resync_delay while etcd watch timeout #6259

Merged

Conversation

nic-6443
Copy link
Member

@nic-6443 nic-6443 commented Feb 8, 2022

What this PR does / why we need it:

Discussion in mailing list: https://lists.apache.org/thread/pfkf88h7v515t29xh6csxhhfbhcbt77j

I have a problem with APISIX and I hope I can discuss it with you.
APISIX has a configuration item: etcd.resync_delay, the effect is to
pause for a while before launching the next watch request when the method
call of watch etcd returns an error.
I understand that this logic is to protect the etcd server from being
overloaded by uninterrupted retries by the client after an unintended
exception.
I think this protection mechanism is reasonable, but one of the cases of
error is timeout error, which means that no event is generated for the
specified key within the time period of this watch (default 30s timeout),
this kind of error is expected, because usually the configuration of the
gateway does not change frequently, and at this time we do not have special
handling for timeout error, so it will also cause the next watch call to be
launched with a wait of etcd.resync_delay seconds. This is very
dangerous.
For example: in the default configuration, when the user's upstream
configuration does not change within 30s, apisix will suspend the
synchronization configuration for about 6-7 seconds (5s+jitter), and apisix
will not be able to respond to all changes to the upstream during this
period.
So I think we should let the timeout error go and not take the resync delay
logic. This is in line with the millisecond configuration synchronization
requirements claimed in the apisix documentation.
The impact of doing so: removing the resync delay after timeout error will
cause apisix to have more concurrent etcd connections over time, for
example, in the default configuration (etcd.timeout=30, etcd.resync_delay=5), the delay resync after timeout processing can reduce
the number of concurrent connections by ~ 1/6(6/(6+30)). I think this
impact is negligible compared to the configuration not taking effect in
time.

Pre-submission checklist:

  • Did you explain what problem does this PR solve? Or what new features have been added?
  • Have you added corresponding test cases?
  • Have you modified the corresponding document?
  • Is this PR backward compatible? If it is not backward compatible, please discuss on the mailing list first

@spacewander spacewander merged commit 03324ee into apache:master Feb 9, 2022
@nic-6443 nic-6443 deleted the skip-resync-delay-while-timeout branch February 10, 2022 01:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants