Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross-datacenter DNS service lookups. #675

Closed
wants to merge 1 commit into from

Conversation

depeele
Copy link

@depeele depeele commented Feb 6, 2015

When performing a DNS lookup for consul services ( [%tag%.]*%service%.service.[%datacenter%.]consul or _%service%._tcp.service.[%datacenter%.]consul), if an explicit datacenter is not specified, perform "optional" lookups against all datacenters starting with the one from which the request originated. By "optional" we mean, if the lookup fails, the failure will be ignored.

This could be the base of a solution to #208 and generally allows service-based lookups to locate all service providers across an entire cluster. One extension might be the addition of timeouts to any optional lookups to reduce the time required to satisfy the query.

To allow for services that may be local-only services for the containing datacenter, we're using a special tag ('availability.local').

Update dispatch():

  • consolidate the initial call to serviceLookup() on the requested (implicit or explicit) datacenter passing false for 'isOptional';
  • if the datacenter was implicitly selected, call serviceLookup() for for each datacenter in the cluster passing true for 'isOptional'.

Update serviceLookup():

  • to take a final 'isOptional' parameter to indicate whether the lookup is optional;
  • if 'isOptional' is true and the lookup fails, do not set a DNS error code (ignore the failure);
  • update the call to filterServiceNodes() to pass along the current datacenter;

Update filterServiceNodes():

  • to take a final 'datacenter' parameter to indicate the datacenter that contains the incoming nodes;
  • if hasLocalTag() called with the tags of a candidate node service returns true (i.e. there is a tag of 'availability.local') and the node is not part of the local datacenter, drop it;

Add a hasLocalTag() helper to check an array of tag strings for one that matches 'availability.local' indicating that the service should be restricted to the containing datacenter.

For consul services, if an explicit datacenter is not specified, perform
"optional" lookups against all datacenters starting with the one from which the
request originated.  By "optional" we mean, if the lookup fails, the failure
will be ignored.

Update dispatch():
    - consolidate the initial call to serviceLookup() on the requested
      (implicit or explicit) datacenter passing false for 'isOptional';
    - if the datacenter was implicitly selected, call serviceLookup() for
      for each datacenter in the cluster passing true for 'isOptional'.

Update serviceLookup():
    - to take a final 'isOptional' parameter to indicate whether the lookup is
      optional;
    - if 'isOptional' is true and the lookup fails, do *not* set a DNS error
      code (ignore the failure);
    - update the call to filterServiceNodes() to pass along the current
      datacenter;

Update filterServiceNodes():
    - to take a final 'datacenter' paraemter to indicate the datacenter that
      contains the incoming nodes;
    - if hasLocalTag() called with the tags of a candidate node service returns
      true (i.e. there is a tag of 'availability.local') and the node is *not*
      part of the local datacenter, drop it;

Add a hasLocalTag() helper to check an array of tag strings for one that
matches 'availability.local' indicating that the service should be restricted
to the containing datacenter.
@armon
Copy link
Member

armon commented Feb 6, 2015

I think that this form of global lookup should only be enabled explicitly. For #208, I still think we need some type of selector like "_global" to trigger this behavior. Otherwise it's very difficult to reason about performance.

@depeele
Copy link
Author

depeele commented Feb 6, 2015

To me it seems that a service lookup within a cluster should generally return all matching services within the cluster--barring timeouts and any local-only "flag".

Adding timeouts to any optional lookups could address the difficulty reasoning about performance.

Optionally, you could maintain a queue of pending lookups and once the primary lookup completes, wait for a short bit and cancel any that haven't completed.

@armon
Copy link
Member

armon commented Feb 6, 2015

Both of those will be necessary yes. But the default behavior will always be a local-DC only query. It makes it much simpler to reason about as a user. It is also our legacy behavior and any change would break backwards compatibility.

@depeele
Copy link
Author

depeele commented Feb 6, 2015

Ahh backwards computability. That makes sense.

I guess if you're setting up a multi-datacenter cluster and want to ensure service lookups include all datacenters, the query could easily include a special '_global' datacenter.

@depeele depeele closed this Feb 11, 2015
duckhan pushed a commit to duckhan/consul that referenced this pull request Oct 24, 2021
When a user sets connectInject.envoyExtraArgs value, they can send
arguments to the injected envoy sidecar binary. For example, in a
development environment, we could consider enabling debug logs in all
sidecars.

Usage:
```
connectInject:
  enabled: true
  envoyExtraArgs: "--log-level debug --disable-hot-restart"
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants