Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefix Delegation: Merge from prefix delegation preview branch to master #1516

Merged
merged 19 commits into from
Jun 24, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,4 @@ grpc-health-probe
cni-metrics-helper
coverage.txt
build/
vendor
vendor
10 changes: 5 additions & 5 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ endif
# LDFLAGS is the set of flags used when building golang executables.
LDFLAGS = -X main.version=$(VERSION) -X pkg/awsutils/awssession.version=$(VERSION)
# ALLPKGS is the set of packages provided in source.
ALLPKGS = $(shell go list ./... | grep -v cmd/packet-verifier)
ALLPKGS = $(shell go list $(VENDOR_OVERRIDE_FLAG) ./... | grep -v cmd/packet-verifier)
# BINS is the set of built command executables.
BINS = aws-k8s-agent aws-cni grpc-health-probe cni-metrics-helper
# Plugin binaries
Expand Down Expand Up @@ -144,7 +144,7 @@ docker-func-test: docker ## Run the built CNI container image to use in func
# Run unit tests
unit-test: export AWS_VPC_K8S_CNI_LOG_FILE=stdout
unit-test: ## Run unit tests
go test -v -coverprofile=coverage.txt -covermode=atomic $(ALLPKGS)
go test -v $(VENDOR_OVERRIDE_FLAG) -coverprofile=coverage.txt -covermode=atomic ./pkg/...

# Run unit tests with race detection (can only be run natively)
unit-test-race: export AWS_VPC_K8S_CNI_LOG_FILE=stdout
Expand Down Expand Up @@ -207,7 +207,7 @@ generate:
# Generate eni-max-pods.txt file for EKS AMI
generate-limits: GOOS=
generate-limits: ## Generate limit file go code
go run scripts/gen_vpc_ip_limits.go
go run $(VENDOR_OVERRIDE_FLAG) scripts/gen_vpc_ip_limits.go

# Fetch the CNI plugins
plugins: FETCH_VERSION=0.9.0
Expand Down Expand Up @@ -253,8 +253,8 @@ helm-lint:
@${MAKEFILE_PATH}test/helm/helm-lint.sh

# Run go vet on source code.
vet: ## Run go vet on source code.
go vet $(ALLPKGS)
vet: setup-ec2-sdk-override ## Run go vet on source code.
go vet $(VENDOR_OVERRIDE_FLAG) $(ALLPKGS)


docker-vet: build-docker-test ## Run go vet inside of a container.
Expand Down
38 changes: 34 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -236,10 +236,17 @@ Type: Integer

Default: None

Specifies the number of free IP addresses that the `ipamd` daemon should attempt to keep available for pod assignment on the node.
For example, if `WARM_IP_TARGET` is set to 5, then `ipamd` attempts to keep 5 free IP addresses available at all times. If the
Specifies the number of free IP addresses that the `ipamd` daemon should attempt to keep available for pod assignment on the node.
With `ENABLE_PREFIX_DELEGATION` set to `true` then `ipamd` daemon will check if the existing (/28) prefixes are enough to maintain the
`WARM_IP_TARGET` if it is not sufficent then more prefixes will be attached.

For example,

1. if `WARM_IP_TARGET` is set to 5, then `ipamd` attempts to keep 5 free IP addresses available at all times. If the
elastic network interfaces on the node are unable to provide these free addresses, `ipamd` attempts to allocate more interfaces
until `WARM_IP_TARGET` free IP addresses are available.
until `WARM_IP_TARGET` free IP addresses are available.
2. `ENABLE_PREFIX_DELEGATION` set to `true` and `WARM_IP_TARGET` is 16. Initially 1 (/28) prefix is sufficient but once a single pod is assigned IP then
remaining free IPs are 15 hence IPAMD will allocate 1 more prefix to achieve 16 `WARM_IP_TARGET`

**NOTE!** Avoid this setting for large clusters, or if the cluster has high pod churn. Setting it will cause additional calls to the
EC2 API and that might cause throttling of the requests. It is strongly suggested to set `MINIMUM_IP_TARGET` when using `WARM_IP_TARGET`.
Expand All @@ -248,7 +255,8 @@ If both `WARM_IP_TARGET` and `MINIMUM_IP_TARGET` are set, `ipamd` will attempt t
This environment variable overrides `WARM_ENI_TARGET` behavior. For a detailed explanation, see
[`WARM_ENI_TARGET`, `WARM_IP_TARGET` and `MINIMUM_IP_TARGET`](https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/eni-and-ip-target.md).


`ENABLE_PREFIX_DELEGATION` set to `true` and this environment variable overrides `WARM_PREFIX_TARGET` behavior. For a detailed explanation, see
[`WARM_PREFIX_TARGET`, `WARM_IP_TARGET` and `MINIMUM_IP_TARGET`](https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/prefix-and-ip-target.md).
---

`MINIMUM_IP_TARGET` (Since v1.6.0)
Expand Down Expand Up @@ -450,6 +458,28 @@ You can use the below command to enable `DISABLE_TCP_EARLY_DEMUX` to `true` -
```
kubectl patch daemonset aws-node -n kube-system -p '{"spec": {"template": {"spec": {"initContainers": [{"env":[{"name":"DISABLE_TCP_EARLY_DEMUX","value":"true"}],"name":"aws-vpc-cni-init"}]}}}}'
```
---

`ENABLE_PREFIX_DELEGATION` (Since v1.9)

Type: Boolean as a String

Default: `false`

To enable IPv4 prefix delegation on nitro instances. Setting `ENABLE_PREFIX_DELEGATION` to `true` will start allocating a /28 prefix
instead of a secondary IP in the ENIs subnet. The total number of prefixes and private IP addresses will be less than the
limit on private IPs allowed by your instance. Setting or resetting of `ENABLE_PREFIX_DELEGATION` while pods are running or if ENIs are attached is supported and the new pods allocated will get IPs based on the mode of IPAMD but the max pods of kubelet should be updated which would need either kubelet restart or node recycle.

---

`WARM_PREFIX_TARGET`

Type: Integer

Default: None

Specifies the number of free IPv4(/28) prefixes that the `ipamd` daemon should attempt to keep available for pod assignment on the node.
This environment variable works when `ENABLE_PREFIX_DELEGATION` is set to `true` and is overriden when `WARM_IP_TARGET` and `MINIMUM_IP_TARGET` are configured.

### ENI tags related to Allocation

Expand Down
2 changes: 2 additions & 0 deletions config/master/aws-k8s-cni.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,8 @@
"value": "false"
- "name": "ENABLE_POD_ENI"
"value": "false"
- "name": "ENABLE_PREFIX_DELEGATION"
"value": "false"
- "name": "MY_NODE_NAME"
"valueFrom":
"fieldRef":
Expand Down
1 change: 1 addition & 0 deletions config/master/manifests.jsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,7 @@ local awsnode = {
DISABLE_INTROSPECTION: "false",
DISABLE_METRICS: "false",
ENABLE_POD_ENI: "false",
ENABLE_PREFIX_DELEGATION: "false",
MY_NODE_NAME: {
valueFrom: {
fieldRef: {fieldPath: "spec.nodeName"},
Expand Down
34 changes: 34 additions & 0 deletions docs/prefix-and-ip-target.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
## `WARM_PREFIX_TARGET`, `WARM_IP_TARGET` and `MINIMUM_IP_TARGET`

IPAMD will start allocating (/28) prefixes to the ENIs with `ENABLE_PREFIX_DELEGATION` set to `true`. By default IPAMD will allocate 1 prefix for the allocated ENI but based on the need the number of prefixes to be held in warm pool can be controlled by setting `WARM_PREFIX_TARGET`, `WARM_IP_TARGET` and `MINIMUM_IP_TARGET` environment variables.

`WARM_IP_TARGET` and `MINIMUM_IP_TARGET` if set will override `WARM_PREFIX_TARGET`. `WARM_PREFIX_TARGET` will allocate one full (/28) prefix even if a single IP is consumed with the existing prefix. If the ENI has no space to allocate a prefix then a new ENI will be created. So make sure to use this on need basis i.e, if pod density is high since this will be carved out of the ENIs subnet. `WARM_IP_TARGET` and `MINIUM_IP_TARGET` give more fine grained control on the number of IPs but if existing prefixes are not sufficient to maintain the warm pool then IPAMD will allocate more prefixes to the existing ENI or create a new ENI if the existing ENIs are running out of prefixes.

When a new ENI is allocated, IPAMD will allocate either 1 prefix or number of prefixes needed to maintain the `WARM_PREFIX_TARGET`, `WARM_IP_TARGET` and `MINIMUM_IP_TARGET` setting. This is done to avoid extra EC2 calls to either allocate more prefixes or free extra prefixes on ENI bring up.


Some example cases:

| Instance type | `WARM_PREFIX_TARGET`| `WARM_IP_TARGET`| `MINIMUM_IP_TARGET` | Pods | ENIs | Pod per ENIs | Attached Prefixes | Unused Prefixes | Prefixes per ENI | Unused IPs|
|---------------|:-------------------:|:---------------:|:-------------------:|:----:|:----:|:------------:|:-----------------:|:---------------:|:----------------:|:---------:|
| t3.small | 1 | - | - | 0 | 1 | 0 | 1 | 1 | 1 | 16 |
| t3.small | 1 | - | - | 5 | 3 | 1,2,2 | 4 | 1 | 2,1,1 | 59 |
| t3.small | 1 | - | - | 17 | 1 | 17 | 3 | 1 | 3 | 31 |
| | | | | | | | | | | |
| t3.small | - | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 16 |
| t3.small | - | 1 | 1 | 5 | 3 | 1,2,2 | 3 | 0 | 1,1,1 | 43 |
| t3.small | - | 1 | 1 | 17 | 1 | 17 | 2 | 0 | 2 | 15 |
| | | | | | | | | | | |
| t3.small | - | 2 | 10 | 0 | 1 | 0 | 1 | 1 | 1 | 16 |
| t3.small | - | 2 | 10 | 5 | 3 | 1,2,2 | 3 | 0 | 1,1,1 | 43 |
| t3.small | - | 2 | 10 | 17 | 1 | 17 | 2 | 0 | 2 | 15 |
| | | | | | | | | | | |
| p3dn.24xlarge | 1 | - | - | 0 | 1 | 0 | 1 | 1 | 1 | 16 |
| p3dn.24xlarge | 1 | - | - | 3 | 2 | 3,0 | 2 | 1 | 2,0 | 29 |
| p3dn.24xlarge | 1 | - | - | 95 | 3 | 95,0,0 | 7 | 1 | 7,0,0 | 17 |
| | | | | | | | | | | |
| p3dn.24xlarge | - | 5 | 10 | 0 | 1 | 0 | 1 | 1 | 1 | 16 |
| p3dn.24xlarge | - | 5 | 10 | 7 | 1 | 7 | 1 | 0 | 1 | 9 |
| p3dn.24xlarge | - | 5 | 10 | 15 | 1 | 15 | 2 | 1 | 2 | 17 |
| p3dn.24xlarge | - | 5 | 10 | 45 | 2 | 45,0 | 4 | 1 | 4,0 | 19 |
| | | | | | | | | | | |
11 changes: 4 additions & 7 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,6 @@ require (
github.com/golang/mock v1.4.1
github.com/golang/protobuf v1.4.2
github.com/google/go-jsonnet v0.16.0
github.com/google/gopacket v1.1.18
github.com/gregjones/httpcache v0.0.0-20190212212710-3befbb6ad0cc // indirect
github.com/pkg/errors v0.9.1
github.com/prometheus/client_golang v1.0.0
github.com/prometheus/client_model v0.2.0
Expand All @@ -20,11 +18,10 @@ require (
github.com/stretchr/testify v1.5.1
github.com/vishvananda/netlink v1.1.1-0.20201029203352-d40f9887b852
go.uber.org/zap v1.15.0
golang.org/x/lint v0.0.0-20201208152925-83fdc39ff7b5 // indirect
golang.org/x/mod v0.4.0 // indirect
golang.org/x/net v0.0.0-20201110031124-69a78807bb2b
golang.org/x/sys v0.0.0-20201117170446-d9b008d0a637
golang.org/x/tools v0.0.0-20210113180300-f96436850f18 // indirect
golang.org/x/lint v0.0.0-20210508222113-6edffad5e616 // indirect
golang.org/x/net v0.0.0-20210405180319-a5a99cb37ef4
golang.org/x/sys v0.0.0-20210616094352-59db8d763f22
golang.org/x/tools v0.1.3 // indirect
google.golang.org/grpc v1.29.0
gopkg.in/natefinch/lumberjack.v2 v2.0.0
k8s.io/api v0.18.6
Expand Down
Loading