Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow getting all catalog services for list of DataCenters #526

Closed
hopperd opened this issue Jan 28, 2016 · 11 comments
Closed

Allow getting all catalog services for list of DataCenters #526

hopperd opened this issue Jan 28, 2016 · 11 comments

Comments

@hopperd
Copy link

hopperd commented Jan 28, 2016

The use case here is we want to be able to query all services in the consul registry meaning all services in all data centers. Allowing us to configure our load balancers accordingly with all possible service routes no matter what data center they are in.

Currently was thinking we could accomplish this by adding a function that takes an array of data centers or defaults to all of them to retrieve the catalog service for each data center. This would be done by first basically querying the data centers and then iterating over that to get all the catalog services for each datacenter.

This is mainly to get around the fact that the consul API doesn't support this kind _all_ searching currently. I am happy to take the time to contribute this if it is considered valuable.

@hopperd
Copy link
Author

hopperd commented Feb 1, 2016

@sethvargo Any thoughts on the value of this, I'm happy to write it up if it would be valuable.

@sethvargo
Copy link
Contributor

Hi @Split3

We have not had a lot of requests for this functionality, so I do not think it's something we plan to add to Consul Template ourselves. If this is something that the community seems valuable, we would likely look to a community contribution for this.

That being said, it does sound somewhat anti-Consul to ignore the per-datacenter constraints for a higher-level load balancer. You are also potentially introducing a very large number of watches (DC's x Catalog Service x Health Service) is at least an n^2 operation, if not n^3 for large numbers of data centers. I worry about the load on the Consul cluster and the performance implications of something like this.

@hopperd
Copy link
Author

hopperd commented Feb 1, 2016

Our use case is as follows:

We have multiple physical data centers for our services. What we want to do is configure the load balancers in each datacenter with all appropriate services in all datacenters; where the near datacenter (the same one the service exists in) is the primary backend for a load balanced end point, while the far datacenter are the backups.

This would be the reason we would basically need to know about ALL services available to consul regardless of the datacenter it exists in. We use tags to limit the services that used by the load balancers accordingly but if we had a service in DC1 only we would want DC2 Load Balancers to also support routing traffic to those nodes accordingly.

Hopefully that makes some sense, if not I can try and explain better, and if there is a better approach to handling this particular situation I'm all ears.

@slackpad
Copy link
Contributor

slackpad commented Feb 1, 2016

Hi @Split3 this is a pretty interesting use case. If you weren't using load balancers, you could configure prepared queries to do this for you and it could figure out where the backup DC should be based on network round-trip time (or you could configure it). There's also an open issue to allow for a "parent" DC to be specified - hashicorp/consul#1159. Would either of these be an option for your infrastructure?

@hopperd
Copy link
Author

hopperd commented Feb 1, 2016

I looked at leveraging prepared queries but they require you to specify the DC that is searched which wouldn't work in this particular use case. The reason behind this is to ensure the high level of availability even on a physical data center being unavailable.

We also want this to be zero touch from a configuration stand point so the only way that would be somewhat possible would be to get all the available services in desired data centers, filter them by the associated tags we use. Then add the backends to the load balancer with the current data center being the primary (if they exist) and the other data centers being the backups (if they exist).

Our load balancers for each DC are fronted by a GTM that balances between those physical end points based on speed, latency, and performance. This is why we want all services configured in both locations to simplify the overall routing from the GTM to each data center.

@slackpad
Copy link
Contributor

slackpad commented Feb 1, 2016

I see. There's definitely concern about the large number of watches this will do, but how about something like this:

{{ range $dc := datacenters }}
   {{ range $svc := services (printf "@%s" $dc)}}
      {{ range $inst := service (printf "%s@%s" $svc.Name $dc) "any" }}
         {{ $dc }} {{ $svc.Name }} {{ $inst.Address }}:{{ $inst.Port }}
      {{ end }}
   {{ end }}
{{ end }}

This'll loop through all services in all datacenters.

@hopperd
Copy link
Author

hopperd commented Feb 1, 2016

Yea the problem isn't looping through, the problem is more having the data grouped together properly. For instance we need to produce something like

In DC1

service foo {
   backend foo-dc1-instance;
   backend foo-dc2-instance backup;
}

In DC2

service foo {
   backend foo-dc2-instance;
   backend foo-dc1-instance backup;
}

So at a high level I need to know all available services and then when looping through the Health Services for the associated Data Centers I can associate the backup instance appropriately based on which data center I'm querying them from.

What raises questions for me now is the load that @sethvargo mentioned. Are there any published numbers at what the loads can get to and what is tolerated by Consul?

@sethvargo
Copy link
Contributor

@Split3 it depends on the size of your cluster and the type of servers you're running honestly. You're essentially creating a watch on every non-key-value item in Consul, which means you have significant churn and data flow. It is very likely that you will constantly be restarting your load balancers because any change in any of the data in Consul will result in the template changing.

As far as actual load, it depends how many instances of this template you're running.

@hopperd
Copy link
Author

hopperd commented Feb 2, 2016

In terms of the template load that could be mitigated by deduplicate would it not. And we would only have health watchers on services that had associated tags to be used by the load balancer. But to determine that we'd still need to know each service from each datacenter and check for the associated tags to see if we should add that service to the load balancer.

Would a better approach be to use the KV store to link to the associated services to query that way the watchers are only on the services that explicitly are going to be used? The down side of this is that it won't be zero touch (the key would have to be added) instead of leveraging tags to accomplish this.

@slackpad
Copy link
Contributor

slackpad commented Feb 2, 2016

@Split3 you are correct that -deduplicate would cut the load if you are running consul-template with the same configuration on a large number of nodes. If you use that the load will probably reasonable, you'd just have to benchmark it. This would require one template per datacenter, but I think would do what you need:

{{ $this_dc := "sfo1" }}
{{ range $svc := services (printf "@%s" $this_dc) }}
   service {{$svc.Name}} {
   {{ range $inst := service (printf "%s@%s" $svc.Name $this_dc) "any" }}
      backend {{$inst.Node}}
   {{ end }}
   {{ range $dc := datacenters }}
      {{ if ne $dc $this_dc }}
         {{ range $inst := service (printf "%s@%s" $svc.Name $dc) "any" }}
            backend {{$inst.Node}} backup
         {{ end }}
      {{ end }}
   {{ end }}
{{ end }}
}

@hopperd
Copy link
Author

hopperd commented Feb 8, 2016

@slackpad Thanks for the attempt there but due to the way the template must be generated and the grouping needed this still wouldn't work as we need to create service blocks for all services in all datacenters. I've instead decided to go with a different approach and using the KV store to declaratively define the services that should be consumed from the registry. Though it now requires an initial "setup" step this will reduce the amount of load on the consul servers as we won't have watched configured on all the pieces mentioned by @sethvargo. Thanks for the help!

@hopperd hopperd closed this as completed Feb 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants