Feature: Spot Fleet support for worker nodes #112

mumoshu · 2016-11-30T08:31:12Z

Quite self explanatory but I'd like to add this to kube-aws.

Upstream issue: kubernetes/kubernetes#24472

Initial Implementation in this project: #113
Documentation: https://github.com/coreos/kube-aws/blob/master/Documentation/kubernetes-on-aws-node-pool.md#deploying-a-node-pool-powered-by-spot-fleet

Spot fleet backed worker nodes are supported since v0.9.2-rc.3:

# Launch a main cluster
kube-aws init ...
kube-aws render
kube-aws up ...

# Launch a node pool powered by Spot Fleet
kube-aws node-pools init --node-pool-name mypoolname ...
echo -e "worker:\n  spotFleet:\n    targetCapacity: 3\n" >> node-pools/mypoolname/cluster.yaml
kube-aws node-pools render --node-pool-name mypoolname
kube-aws node-pools up --node-pool-name mypoolname --s3-uri ...

An experimental feature to automatically taint nodes with user-provided taints is supported since v0.9.2-rc.4(not yet released) so we can ensure only pods tolerant to frequent node terminations are scheduled to spot instances/spot-fleet-powered nodes:

Utilizing Spot Fleet gives us chances to dramatically reduce cost being spent on EC2 instances powering Kubernetes worker nodes
AWS says cost reduction is up to 90%. I can confirm that in my daily used region ap-northeast-1 it is up to 89% right now, with slightly varying cost for each instance type.

I believe that on top of the recent work on Node Pools #46, it is easier than ever to implement a POC of the Spot Fleet support.
I'll send a pull request to show it shortly.
I'd appreciate your feedback(s)!

Several concerns I've came up with until now:

cluster-autoscaler doesn't support Spot Fleets
- If you want to make nodes in a spot fleet auto-scaled, you probably need to tinker resulting CloudFormation templates to include appropriate configuration. See https://aws.amazon.com/jp/blogs/aws/new-auto-scaling-for-ec2-spot-fleets/ for the official announcement of autoscaling for fleets.
- Upstream issue: Cluster-autoscaler: AWS EC2 Spot Fleets support contrib#2066
  - We need to teach cluster-autoscaler how it selects which node pool to expand
    - It shouldn't select an ASG which is suspended and a spot fleet which all/part of groups are beyond the bid price
    - If a pending node can be scheduled in an ASG or a spot fleet, it should select the one according to user preference
It seems there's no way to use cfn-signal like we did for standard, asg-based worker nodes to hold CloudFormation's creation/update completion until e.g. kubelet's become ready
I'm not yet sure how we can rolling-update nodes in a spot fleet like we did for standard asg-based worker nodes.
I'm assuming users already have the aws-ec2-spot-fleet-role IAM role created in their AWS accounts automatically by accessing Spot Fleets in AWS Console at least once
- But if an user had not, kube-aws nodepool up will fail while copy-pasting an error message from CloudFormation like IAM role aws-ec2-spot-fleet-role doesn't exist, which may be useless to the user as it doesn't provide any information to notify the user needed to arrive Spot Fleet in AWS console at least once
- We could create such IAM role like described in https://cloudonaut.io/3-simple-ways-of-saving-up-to-90-of-ec2-costs/, instead of assuming/referencing the possibly existing IAM role

TODOs:

Add more tests
Make rootVolumeSize and rootVolumeIOPS for each launchConfiguration defaults to worker.spotFleet.unitRootVolumeSize * weightedCapacity, worker.spotFleet.unitRootVolumeIOPS * weightedCapacity respectively
Node labels Experimental feature: User-specified node labels for worker nodes #149
Taints Experimental feature to taint worker nodes with user-provided key=value:effect #132
- Relevant k8s taints and toleration experience from community: https://medium.com/@alejandro.ramirez.ch/reserving-a-kubernetes-node-for-specific-nodes-e75dc8297076#.lr7x178mo
- The kubelet options like --register-node=true --register-schedurable=false followed by kubectl taint and kubectl uncordon would avoid any race-condition which can result in undesired pods getting scheduled to undesired nodes while adding a new node/kubelet to a cluster.
  - Relevant upstream issue: kubelet with "--register-node=false" still tries to find that node kubernetes/kubernetes#15108
- Relevant upstream issue: [Request] Labeling spot instances / preemptible instances as such. kubernetes/kubernetes#35166
  - It is recommended to not taint nodes by default. However, I'd definitely like to do that so our users in their early stages are saved from confusions like "I use Spot Fleet to power my worker nodes and why my pod(s) frequently goes down?"
Experimental.LoadBalancer.Names is not taken into account Conform worker nodes powered by spot fleets to the ones powered by autoscaling groups #167
- Is it safe to add spot instances launched by a spot fleet automatically via aws-cli?
  - How to clean up after the spot instance terminates?
Create Name tags on a spot instance with the same value as workers in a main cluster Conform worker nodes powered by spot fleets to the ones powered by autoscaling groups #167
Integration tests with CloudFormation (Requires an AWS account and runs stack validations in many combinations of cluster configurations in cluster.yamls)

The text was updated successfully, but these errors were encountered:

mumoshu · 2016-12-01T04:31:32Z

Edited several times to cover more TODOs and concerns.

mumoshu · 2016-12-05T09:05:15Z

Addressed the integration tests in f46c711#diff-17c3b4ff0a8d67faed426a76a03f8430R1

mumoshu · 2016-12-05T09:33:58Z

I'm going to add support for node labels and taints in another pull request(s).

The context is that If we don't need to mix up various types of nodes but just need to use Spot Fleets, labels and taints are not required. So I believe I can cut the #113 now and deliver it so that we can start supporting some use-cases.

For an another use-case like "I want to mix up various types of nodes for blah-blah-blah`, I can author an another pull request addressing labels and taints.

See updated nodepool/config/templates/cluster.yaml for the detailed guide of configuration. This is the initial implementation for kubernetes-retired#112 Beware that this feature may change in backward-incompatible ways

mumoshu · 2016-12-06T03:02:59Z

The initial implementation for this is now merged into master.
I'm going to work on experimental support for adding custom and probably automatic node labels and taints to node pools next.

mumoshu · 2016-12-06T04:38:34Z

Btw: I don't mean to show off, but my personal project https://github.com/mumoshu/kube-spot-termination-notice-handler would be useful for anyone wants to gracefully stop pods running on spot instances when your spot fleet lost to bids.

Deploying it to your spot instances allows you to automatically run kubectl drain on the node 2 min before its termination thus more time to gracefully reschedule pods.

…lue:effect` ref kubernetes-retired#112

mumoshu · 2016-12-07T09:41:34Z

Though I thought it would be nice to add initially, do we really need the feature to add user-provided labels to worker nodes?
I'm not aware of exact use-cases for user-provided node labels as we already have taints to implement dedicated nodes.

pieterlange · 2016-12-07T11:56:22Z

I think this is useful; operators might want to restrict pod scheduling to certain node pools because of node capabilities or security domains.

Example usecase: i have the majority of nodes in private subnets but i start a few in public subnets because they need to directly expose some service to the internet on. With node labels i restrict those pods to the public nodes.

Ref http://kubernetes.io/docs/user-guide/node-selection/

cknowles · 2016-12-07T15:46:08Z

Do we know the difference between a node selector on a pod and a taint toleration on a pod? It seems like they could achieve similar things as far as I can see in the taint design docs.

mumoshu · 2016-12-08T03:24:34Z

I'm now taking a look into an issue that worker nodes brought up from spot-fleet-enabled node pool often fail to register themselves.

More specifically, if you've created 2 or more nodes backed by a spot fleet, only one of them are registered. Making TargetCapacity larger hence adding nodes seems to consistently result in spot instances successfully get launched but their corresponding nodes unregistered.

kubelet does report that it successfully registered node. However immediately after that kubelet starts complaining the node it just registered can not be found.
Running systemctl restart kubelet.service on a problematic node doesn't fix the issue.
Also, it doesn't happen in only one of spot instances in a spot fleet.
It doesn't happen in autoscaling-group-based node pool either.
Nonsense!

I suspect that missing KubernetesCluster tags on spot instances results in such a behavior, for now. That tag is a prerequisite for Kubernetes to work according to the upstream doc.

Edit: Bingo! Putting a tag named KubernetesCluster on a problematic spot instance and systemctl restart kubelet.service worked. The node status became READY after approximately 20 seconds after kubelet had restarted.

I'll shortly submit a pull request to address this.

Maybe I'll utilize the quay.io/coreos/awscli docker image to run a command almost the same as what @innovia described in his comment to the upstream issue.

aws ec2 create-tags --region {{.Region}} --resources $INSTANCE_ID \
  --tags "Key=KubernetesCluster,Value={{.ClusterName}}"

mumoshu · 2016-12-08T04:03:47Z

Btw, just noticed that Experimental.LoadBalancer.Names is (not taken into account for|completely ignored in) spot-fleet-backed node pools. Added to TODOs.

…leet couldn't be registered thus unable to run any pods ref kubernetes-retired#112 (comment)

mumoshu · 2016-12-09T16:50:17Z

@pieterlange In that case, wouldn't you like to use taints rather than labels? IMHO taints are more failproof than labels.

If you've used taints to implement dedicated nodes, pods missing tolerations won't be scheduled to anywhere thus you can ensure that only the desired pods are scheduled to desired nodes. On the other hand, if you've used labels, pods missing node selectors will end up with completely useless deployment - pods get distributed over both private and public nodes.

mumoshu · 2016-12-09T17:00:31Z

@c-knowles Both taints and labels could be used to select subset of nodes to schedule pods.
But labels seem to be a bit less failproof than taints when used to implement dedicated nodes because they don't reject pods missing node selectors,

mumoshu · 2016-12-10T01:55:52Z

IMHO it is perfectly fine to use node labels for purposes other than dedicated nodes(=reserved for specific pods).
Now, I'm not opposed to add support for user-provided node labels to kube-aws.

An example use-case for node labels could be that running administrative tasks on subsets of nodes.
In such case, tolerations could be used in combination with node labels so that users could optionally include dedicated nodes into the subset.

This complements Node Pools(kubernetes-retired#46) and Spot Fleet support(kubernetes-retired#112) The former `experimental.nodeLabel` configuration key is renamed to `experimental.awsNodeLabels` to avoid collision with newly added `experimental.nodeLabels` and consistency with `experimental.awsEnvironment`.

mumoshu · 2016-12-15T03:28:48Z

All the remaining TODOs are going to be addressed in v0.9.3-rc.1

… spot fleet to conform these nodes to the ones powered by a autoscaling group ref kubernetes-retired#112 see also http://docs.aws.amazon.com/cli/latest/reference/ec2/create-tags.html and http://stackoverflow.com/a/1250279 for implementation details

… fleet ref kubernetes-retired#112

mumoshu · 2016-12-16T09:53:48Z

All the TODOs have been addressed.

mumoshu · 2016-12-16T09:54:29Z

Documentation can be seen at https://github.com/coreos/kube-aws/blob/master/Documentation/kubernetes-on-aws-node-pool.md#deploying-a-node-pool-powered-by-spot-fleet

mumoshu · 2016-12-19T08:21:55Z

Closing this issue as the initial iterations to bring the feature have finished.

…leet couldn't be registered thus unable to run any pods ref kubernetes-retired#112 (comment)

This complements Node Pools(kubernetes-retired#46) and Spot Fleet support(kubernetes-retired#112) The former `experimental.nodeLabel` configuration key is renamed to `experimental.awsNodeLabels` to avoid collision with newly added `experimental.nodeLabels` and consistency with `experimental.awsEnvironment`.

… spot fleet to conform these nodes to the ones powered by a autoscaling group ref kubernetes-retired#112 see also http://docs.aws.amazon.com/cli/latest/reference/ec2/create-tags.html and http://stackoverflow.com/a/1250279 for implementation details

… fleet ref kubernetes-retired#112

mumoshu mentioned this issue Nov 30, 2016

Experimental feature: Spot Fleet support for worker nodes #113

Merged

mumoshu mentioned this issue Dec 6, 2016

Feature: Node pools #46

Closed

8 tasks

mumoshu mentioned this issue Dec 6, 2016

Experimental feature to taint worker nodes with user-provided key=value:effect #132

Merged

4 tasks

mumoshu added a commit to mumoshu/kube-aws that referenced this issue Dec 7, 2016

Experimental feature to taint worker nodes with user-provided `key=va…

698a63d

…lue:effect` ref kubernetes-retired#112

mumoshu added the waiting for community feedback label Dec 7, 2016

mumoshu changed the title ~~Proposal: Spot Fleet support~~ Feature: Spot Fleet support for worker nodes Dec 8, 2016

mumoshu added a commit to mumoshu/kube-aws that referenced this issue Dec 8, 2016

Fix the issue that the second and the subsequent nodes in each spot f…

a4e5d25

…leet couldn't be registered thus unable to run any pods ref kubernetes-retired#112 (comment)

mumoshu mentioned this issue Dec 8, 2016

Fix node registration issue for spot-fleet-powered worker nodes #141

Merged

mumoshu mentioned this issue Dec 10, 2016

Experimental feature: User-specified node labels for worker nodes #149

Merged

mumoshu mentioned this issue Dec 15, 2016

Conform worker nodes powered by spot fleets to the ones powered by autoscaling groups #167

Merged

mumoshu added this to the v0.9.3-rc.1 milestone Dec 15, 2016

mumoshu added a commit to mumoshu/kube-aws that referenced this issue Dec 16, 2016

Add experimental load balancers support for workers powered by a spot…

31924ff

… fleet ref kubernetes-retired#112

mumoshu closed this as completed Dec 19, 2016

jpcope mentioned this issue May 11, 2017

Support spotFleet for instances groups kubernetes/kops#1784

Closed

kylehodgetts pushed a commit to HotelsDotCom/kube-aws that referenced this issue Mar 27, 2018

Fix the issue that the second and the subsequent nodes in each spot f…

33cfe61

…leet couldn't be registered thus unable to run any pods ref kubernetes-retired#112 (comment)

kylehodgetts pushed a commit to HotelsDotCom/kube-aws that referenced this issue Mar 27, 2018

Add experimental load balancers support for workers powered by a spot…

43680e0

… fleet ref kubernetes-retired#112

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Spot Fleet support for worker nodes #112

Feature: Spot Fleet support for worker nodes #112

mumoshu commented Nov 30, 2016 •

edited

Loading

mumoshu commented Dec 1, 2016

mumoshu commented Dec 5, 2016

mumoshu commented Dec 5, 2016 •

edited

Loading

mumoshu commented Dec 6, 2016

mumoshu commented Dec 6, 2016

mumoshu commented Dec 7, 2016

pieterlange commented Dec 7, 2016

cknowles commented Dec 7, 2016 •

edited

Loading

mumoshu commented Dec 8, 2016 •

edited

Loading

mumoshu commented Dec 8, 2016 •

edited

Loading

mumoshu commented Dec 9, 2016 •

edited

Loading

mumoshu commented Dec 9, 2016 •

edited

Loading

mumoshu commented Dec 10, 2016 •

edited

Loading

mumoshu commented Dec 15, 2016

mumoshu commented Dec 16, 2016

mumoshu commented Dec 16, 2016

mumoshu commented Dec 19, 2016

Feature: Spot Fleet support for worker nodes #112

Feature: Spot Fleet support for worker nodes #112

Comments

mumoshu commented Nov 30, 2016 • edited Loading

mumoshu commented Dec 1, 2016

mumoshu commented Dec 5, 2016

mumoshu commented Dec 5, 2016 • edited Loading

mumoshu commented Dec 6, 2016

mumoshu commented Dec 6, 2016

mumoshu commented Dec 7, 2016

pieterlange commented Dec 7, 2016

cknowles commented Dec 7, 2016 • edited Loading

mumoshu commented Dec 8, 2016 • edited Loading

mumoshu commented Dec 8, 2016 • edited Loading

mumoshu commented Dec 9, 2016 • edited Loading

mumoshu commented Dec 9, 2016 • edited Loading

mumoshu commented Dec 10, 2016 • edited Loading

mumoshu commented Dec 15, 2016

mumoshu commented Dec 16, 2016

mumoshu commented Dec 16, 2016

mumoshu commented Dec 19, 2016

mumoshu commented Nov 30, 2016 •

edited

Loading

mumoshu commented Dec 5, 2016 •

edited

Loading

cknowles commented Dec 7, 2016 •

edited

Loading

mumoshu commented Dec 8, 2016 •

edited

Loading

mumoshu commented Dec 8, 2016 •

edited

Loading

mumoshu commented Dec 9, 2016 •

edited

Loading

mumoshu commented Dec 9, 2016 •

edited

Loading

mumoshu commented Dec 10, 2016 •

edited

Loading