[WIP] Ha control plane #596

colhom · 2016-08-02T01:17:28Z

Controller(s) are in a crosszone autoscaling group and behind an ELB. ControllerIP parameter is gone, as we're going to rely on DNS for all of this.

Depends on #544

colhom · 2016-08-02T01:29:51Z

I'm struggling with what to do about createRecordSet option for controlling whether a Route53 DNS record is created for the API server endpoint.

The nodes now rely on externalDNSName to talk to the API server ELB, so tldr; the nodes can't join the cluster until externalDNSName is CNAME'd to point at the apiserver ELB.

In the case createRecordSet=true, this all happens automagically and there's nothing to notice.

If createRecordSet=false, this could be kind of weird from the operators perspective, in that the nodes will appear as ready "some amount of time" after the DNS entry is manually created.

I'm debating between:

deprecate the createRecordSet option and make the Route53 integration mandatory.
make the Route53 integration default to yes and print out a warning if it's turned off (like traction control).

\cc @mumoshu @pieterlange @whereisaaron

mumoshu · 2016-08-02T02:08:38Z

@colhom Personally, I'm happy with deprecating createRecordSet.

However, I guess there's a fairly common situation that the hosted zone is managed in an AWS account other than the one kube-aws is launching a cfn stack in.

I believe CloudFormation doesn't allow creating record sets under a hosted zone managed in another account (See e.g. https://forums.aws.amazon.com/thread.jspa?messageID=537944)

So, IMHO, your latter option(just defaulting the Route53 integration to true and warn when false) would be better as it provides both

a nice default behavior to most of users and
a work-around(manually creating a record set under the hosted zone in another AWS account) to users who encountered the limitation

Btw, I'm very excited to see this PR! I'll definitely going to test this out.

pieterlange · 2016-08-02T14:29:43Z

Awesome work @colhom! Haven't got time to review it right now, but will try to test this asap.

Agreed with @mumoshu's evaluation of your internal debate (don't debate, let users figure out things for themselves but provide sane defaults)

colhom · 2016-08-02T20:31:48Z

Alrighty, as of bce3c11 I have successfully created a very large HA cluster in us-east-1{a,c,d} with controllers,workers and etcd instances distributed across the three zones.

Kube-aws can now officially max out the aws regional SLA!

If you want to see which AZs your account can deploy to in a given region (sometimes it's less than 3 👎 ):

aws --region=$REGION ec2 describe-availability-zones

and use any where State=Available. AZ availability in a region is different from account to account, yay resource abstraction.

As of now, controllers, workers and etcd instances are all scheduled as close to evenly as possible across the AZs. Naturally an even multiple of your AZ count makes sense for workers and controllers.

There's really very little reason to double or triple up etcd instances per AZ, so I'd recommend etcd count === AZ count.

colhom · 2016-10-28T21:08:06Z

Hello Kubernetes Community,

Future work on kube-aws will be moved to a new dedicated repository. @mumoshu will be running point on maintaining that repository- please move all issues and PRs over there as soon as you can. We will be halting active development on the AWS portion of this repository in the near future. We will continue to maintain the vagrant single and multi-node distributions in this repository, along with our hyperkube container image image.

A community announcement to end users will be made once the transition is complete. We at CoreOS ask that those reading this message avoid publicizing/blogging about the transition until the official annoucement has been made to the community in the next week.

The new dedicated kube-aws repository already has the following features merged in:

Discrete etcd cluster
HA control plane
Cluster upgrades
Node draining/cordoning

If anyone in the Kubernetes community would like to be involved with maintaining this new repository, find @chom and/or @mumoshu on the Kubernetes slack in the #sig-aws channel or via direct message.

~CoreOS Infra Team

This was referenced Aug 2, 2016

Production Quality Deployment #340

Closed

Discrete etcd cluster #544

Closed

colhom force-pushed the ha-control-plane branch from 67412f9 to f552bc5 Compare August 2, 2016 01:34

colhom force-pushed the ha-control-plane branch 2 times, most recently from c052b43 to bce3c11 Compare August 2, 2016 20:25

colhom force-pushed the ha-control-plane branch from d412d61 to 65bd4fe Compare August 8, 2016 23:39

colhom mentioned this pull request Aug 9, 2016

Cluster upgrades #608

Closed

colhom force-pushed the ha-control-plane branch from 65bd4fe to 4d526ca Compare August 30, 2016 20:19

colhom added 2 commits September 8, 2016 12:53

WIP discrete etcd cluster

bbe383e

(WIP) HA control plane.

f6619bf

colhom force-pushed the ha-control-plane branch from 4d526ca to f6619bf Compare September 8, 2016 20:04

pieterlange mentioned this pull request Oct 11, 2016

Support multi availability zone deployments on AWS #100

Closed

colhom closed this Oct 28, 2016

pieterlange mentioned this pull request Nov 4, 2016

move kube-aws development to dedicated repository #751

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Ha control plane #596

[WIP] Ha control plane #596

colhom commented Aug 2, 2016

colhom commented Aug 2, 2016

mumoshu commented Aug 2, 2016

pieterlange commented Aug 2, 2016

colhom commented Aug 2, 2016

colhom commented Oct 28, 2016

[WIP] Ha control plane #596

[WIP] Ha control plane #596

Conversation

colhom commented Aug 2, 2016

colhom commented Aug 2, 2016

mumoshu commented Aug 2, 2016

pieterlange commented Aug 2, 2016

colhom commented Aug 2, 2016

colhom commented Oct 28, 2016