Skip to content
This repository has been archived by the owner on Apr 4, 2018. It is now read-only.

Commit

Permalink
aws/gce: Standardise LB health check configs
Browse files Browse the repository at this point in the history
For all load balancer health checks on AWS and GCE. Using variables so that
they are always the same. Most of these values have been reduced because it
was taking a very long time for new instances to come into service.

Change the following:

- interval to 5s which is the minimum supported by AWS. This has reduced AWS
  from 30s and increased GCE from 1s.
- timeout to 2s which is the minimum supported by AWS. This has reduced AWS
  from 5s and increased GCE from 1s.
- healthy threshold to 2 requests. This has not changed AWS or GCE.
- unhealthy threshold to 2 requests. This has changed AWS from 10 and not
  changed GCE.

I'm not 100% confident about the values. They weren't thoroughly tested when
we first introduced them for GCE and I suspect we might want to experiment
with them in the future, but this is a good start.

I've changed the target for API on AWS from the default of `TCP:8080` to
`HTTP:8080/info` in order to match GCE and give a more accurate check. Other
targets remain as their defaults but we have to pass them because they're
mandatory. They don't match GCE because GCE can't do TCP health checks.

This will *not* apply cleanly to existing GCE environments due to a bug in
Terraform. This should be fixed in the future by hashicorp/terraform#1894.
But for the timebeing I think it's important enough that we should delete
existing forwarding rules, target pools, and health checks, then let
Terraform recreate them with the correct config.
  • Loading branch information
dcarley committed May 12, 2015
1 parent 91fbf4e commit dcb7de6
Show file tree
Hide file tree
Showing 6 changed files with 63 additions and 4 deletions.
14 changes: 14 additions & 0 deletions aws/api-servers.tf
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,13 @@ resource "aws_elb" "api-ext" {
security_groups = ["${aws_security_group.default.id}", "${aws_security_group.web.id}"]
instances = ["${aws_instance.api.*.id}"]

health_check {
target = "HTTP:8000/info"
interval = "${var.health_check_interval}"
timeout = "${var.health_check_timeout}"
healthy_threshold = "${var.health_check_healthy}"
unhealthy_threshold = "${var.health_check_unhealthy}"
}
listener {
instance_port = 8080
instance_protocol = "http"
Expand All @@ -42,6 +49,13 @@ resource "aws_elb" "api-int" {
security_groups = ["${aws_security_group.default.id}", "${aws_security_group.web.id}"]
instances = ["${aws_instance.api.*.id}"]

health_check {
target = "HTTP:8000/info"
interval = "${var.health_check_interval}"
timeout = "${var.health_check_timeout}"
healthy_threshold = "${var.health_check_healthy}"
unhealthy_threshold = "${var.health_check_unhealthy}"
}
listener {
instance_port = 8080
instance_protocol = "http"
Expand Down
14 changes: 14 additions & 0 deletions aws/routers.tf
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,13 @@ resource "aws_elb" "router" {
security_groups = ["${aws_security_group.default.id}", "${aws_security_group.web.id}"]
instances = ["${aws_instance.router.*.id}"]

health_check {
target = "TCP:80"
interval = "${var.health_check_interval}"
timeout = "${var.health_check_timeout}"
healthy_threshold = "${var.health_check_healthy}"
unhealthy_threshold = "${var.health_check_unhealthy}"
}
listener {
instance_port = 80
instance_protocol = "http"
Expand All @@ -35,6 +42,13 @@ resource "aws_elb" "router-int" {
security_groups = ["${aws_security_group.default.id}", "${aws_security_group.web-int.id}"]
instances = ["${aws_instance.router.*.id}"]

health_check {
target = "TCP:80"
interval = "${var.health_check_interval}"
timeout = "${var.health_check_timeout}"
healthy_threshold = "${var.health_check_healthy}"
unhealthy_threshold = "${var.health_check_unhealthy}"
}
listener {
instance_port = 80
instance_protocol = "http"
Expand Down
7 changes: 7 additions & 0 deletions aws/ssl-proxies.tf
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,13 @@ resource "aws_elb" "tsuru-sslproxy-elb" {
security_groups = ["${aws_security_group.default.id}", "${aws_security_group.sslproxy.id}"]
instances = ["${aws_instance.tsuru-sslproxy.*.id}"]

health_check {
target = "TCP:443"
interval = "${var.health_check_interval}"
timeout = "${var.health_check_timeout}"
healthy_threshold = "${var.health_check_healthy}"
unhealthy_threshold = "${var.health_check_unhealthy}"
}
listener {
instance_port = 443
instance_protocol = "tcp"
Expand Down
6 changes: 4 additions & 2 deletions gce/api-servers.tf
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,10 @@ resource "google_compute_http_health_check" "api" {
name = "${var.env}-tsuru-api"
port = 8080
request_path = "/info"
check_interval_sec = 1
timeout_sec = 1
check_interval_sec = "${var.health_check_interval}"
timeout_sec = "${var.health_check_timeout}"
healthy_threshold = "${var.health_check_healthy}"
unhealthy_threshold = "${var.health_check_unhealthy}"
}
resource "google_compute_target_pool" "api" {
name = "${var.env}-tsuru-api-lb"
Expand Down
6 changes: 4 additions & 2 deletions gce/http-health-check.tf
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
resource "google_compute_http_health_check" "http-check" {
name = "${var.env}-http-check"
request_path = "/"
check_interval_sec = 1
timeout_sec = 1
check_interval_sec = "${var.health_check_interval}"
timeout_sec = "${var.health_check_timeout}"
healthy_threshold = "${var.health_check_healthy}"
unhealthy_threshold = "${var.health_check_unhealthy}"
}
20 changes: 20 additions & 0 deletions globals.tf
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,23 @@ variable "office_cidrs" {
description = "CSV of CIDR addresses for our office which will be trusted"
default = "80.194.77.90/32,80.194.77.100/32"
}

variable "health_check_interval" {
description = "Interval between requests for load balancer health checks"
default = 5
}

variable "health_check_timeout" {
description = "Timeout of requests for load balancer health checks"
default = 2
}

variable "health_check_healthy" {
description = "Threshold to consider load balancer healthy"
default = 2
}

variable "health_check_unhealthy" {
description = "Threshold to consider load balancer unhealthy"
default = 2
}

0 comments on commit dcb7de6

Please sign in to comment.