Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restrict traffic to/from EKS Cluster #52

Draft
wants to merge 29 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
a2e2914
new: start of network policy configuration
nickumia-reisys Oct 14, 2021
3aa8e16
fix: update variable names to be consistent; assign appropriate defau…
nickumia-reisys Oct 14, 2021
b230625
new: add the inputs to provision as parameters for the manifest; need…
nickumia-reisys Oct 14, 2021
fff97ff
fix: tcp is case-sensitive
nickumia-reisys Oct 14, 2021
f8b1601
new: create outline for calico helm release; update default-deny and …
nickumia-reisys Oct 14, 2021
22cbc1b
new: add missing variable back in; set appropriate default value for …
nickumia-reisys Oct 14, 2021
ab10c6d
new: local test of applying network policies with calico
nickumia-reisys Oct 14, 2021
213494f
new: switch gears from network policy to vpc security groups
nickumia-reisys Oct 14, 2021
9a3f9bf
update: parameters to the terraform from brokerpak
nickumia-reisys Oct 14, 2021
7dc7b71
fix: optional parameters need to be set to null to allow terraform to…
nickumia-reisys Oct 18, 2021
f208249
new: successfully denied all egress traffic via network_acl_rule
nickumia-reisys Oct 18, 2021
ff16b5e
new/format: add options for user to allow specific egress and ingress…
nickumia-reisys Oct 18, 2021
5472d6d
docs: add example of egress/ingress specifications
nickumia-reisys Oct 19, 2021
8f54262
fix: update manifest.yml so that brokerpak doesn't fail; update netwo…
nickumia-reisys Oct 19, 2021
a196655
dynamically get aws prefix list
nickumia-reisys Oct 20, 2021
7e9eb74
update: change brokerpak arguments for ingress/egress from environmen…
nickumia-reisys Oct 20, 2021
16ac8bf
new: dynamically get subdomain IP from dns lookup
nickumia-reisys Oct 20, 2021
b024a6e
update: update kubernetes exec plugins based on new docs (https://reg…
nickumia-reisys Oct 23, 2021
e59f957
new: add more resources to support private/public subnets vpc for eks…
nickumia-reisys Oct 23, 2021
b6548c1
Docs: networking development
nickumia-reisys Oct 23, 2021
9769129
fix: move networking docs to docs/
nickumia-reisys Oct 23, 2021
84900d6
fast forward to main branch
nickumia-reisys Nov 2, 2021
87950cd
cleanup: remove restrictions on vpc, so that EKS can start normally
nickumia-reisys Nov 4, 2021
856312b
new: two different methods for trying to enable cni; neither work rig…
nickumia-reisys Nov 8, 2021
73af958
new: add node groups as part of the EKS module itself; pass off to Br…
nickumia-reisys Nov 9, 2021
cb6545d
update: aws observability syntax
nickumia-reisys Nov 16, 2021
d22be0a
lint: terraform fmt
nickumia-reisys Nov 16, 2021
3a785f0
new: finally got managed nodes working with eks alongside fargate
nickumia-reisys Nov 16, 2021
61c6878
Merge branch 'main' into restrict-eks-traffic
FuhuXia Nov 17, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 81 additions & 0 deletions docs/eks-networking.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Understanding VPC configuration for EKS in conjunction with Fargate (WIP*)

*\*All of this is subject to change while this message is here.*

Through work on [a security compliance issue](https://github.com/GSA/datagov-deploy/issues/3355), a thorough inspection of the networking design of this repo was completed. By default, EKS clusters are fully publicly available. The desire was to allow tighter configruation to prevent hacks/data leaks. A combination of public+private networking is considered the [best practice for common Kubernetes workloads on AWS](https://aws.amazon.com/blogs/containers/de-mystifying-cluster-networking-for-amazon-eks-worker-nodes/) as it provides the flexibility of public availability alongside the security of private resources.

Note: This repo utilizes Terraform to configure multiple intertwining parts from the AWS world to the Kubernetes world and wraps it up nicely with a Brokerpak bow. Most of the concepts and commands are discussed in terms of Terraform, but there are AWS/K8S cli equivalents.

Here's a non-detailed exhaustive list of modules/resources used:
- Module [vpc](https://registry.terraform.io/modules/terraform-aws-modules/eks/aws/14.0.0)
- Module [eks](https://registry.terraform.io/modules/terraform-aws-modules/vpc/aws/3.7.0)
- Module [aws_load_balancer_controller](https://github.com/GSA/terraform-kubernetes-aws-load-balancer-controller)
- Resource [aws_vpc_endpoint](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/vpc_endpoint)
- Resource [aws_route53_resolver_endpoint](https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/route53_resolver_endpoint)

## Desired Configuration/Design

The user would like to provision a functional K8S cluster with the ability to host publicly-available application deployments. The cluster will live in AWS Fargate to reduce the compliance burden of managing the security of node machines.

### Deployment Stack:
- Computing Levels of Abstraction:
- Fargate Nodes > EKS > Application
- Networking Levels of Abstraction (the order is still being learned):
- Internal CIDRs (Private + Public) > Network ACLs > Security Groups > Ingress Controller > NAT Gateway > Application Load Balancer > Elastic IP (EIP) > Domain
- VPC > NAT Gateway > Load Balancer > Elastic IP (EIP)

When a user accesses an application through the [domain](https://github.com/GSA/eks-brokerpak/blob/restrict-eks-traffic/terraform/provision/ingress.tf#L248-L254), it gets resolved to an EIP that gets routed to the [application load balancer](https://github.com/GSA/eks-brokerpak/blob/restrict-eks-traffic/terraform/provision/ingress.tf#L19-L31). This then passes to the internal [ingress controller](https://github.com/GSA/eks-brokerpak/blob/restrict-eks-traffic/terraform/provision/ingress.tf#L36-L95) to the cluster nodes based on the [vpc configuration](https://github.com/GSA/eks-brokerpak/blob/restrict-eks-traffic/terraform/provision/vpc.tf#L30-L186). That last step is intricate because the ingress controller lives within the VPC, so it only works if the VPC configuration permits.

### Networking Design

- The entire EKS cluster lives within the VPC (10.20.0.0/16).
- There is a public subnet (10.20.101.0/24).
- There is a private subnet (10.20.1.0/24).
- The EKS control plane has a public endpoint (x.x.x.x/x).
- The EKS control plane has a private endpoint (10.20.x.x/x).
- Worker nodes, by default, communicate entirely on the private subnet.
- The ingress controllers connect external traffic to worker nodes through the public subnet.
- Security Groups and Network ACLs are used to control traffic.

### Setting up Clusters in a Private Subnet

In order for EKS to isolate workers in a private subnet, the following [VPC considerations](https://docs.aws.amazon.com/eks/latest/userguide/private-clusters.html) are necessary,

- The VPC needs to configure private subnets,
- Define `private_subnets` cidrs.
- Set `enable_nat_gateway` (True) to allow ingresses to connect public and private subnets.
- Set `map_public_ip_on_launch` (False) to disable public ips being set on private subnets.
- Set `enable_dns_hostnames` and `enable_dns_support` to support DNS hostnames in the VPC (necessary for the [API Server](https://docs.aws.amazon.com/eks/latest/userguide/cluster-endpoint.html)).

- The EKS needs to know about the VPC config,
- [Example](https://github.com/GSA/eks-brokerpak/blob/restrict-eks-traffic/terraform/provision/eks.tf#L15-L28)

- VPC Endpoints are necessary for private cluster nodes to talk to other AWS services,
- [These are the ones](https://github.com/GSA/eks-brokerpak/blob/restrict-eks-traffic/terraform/provision/vpc.tf#L196-L265) I've identified as necessary for us,
- com.amazonaws.<region>.ec2
- com.amazonaws.<region>.ecr.api
- com.amazonaws.<region>.ecr.dkr
- com.amazonaws.<region>.s3 _– For pulling container images_
- com.amazonaws.<region>.logs _– For CloudWatch Logs_
- com.amazonaws.<region>.sts _– If using Cluster Autoscaler or IAM roles for service accounts_
- com.amazonaws.<region>.elasticloadbalancing _– If using Application Load Balancers_
- These are additional ones that may be necessary in the future,
- com.amazonaws.<region>.autoscaling _– If using Cluster Autoscaler_
- com.amazonaws.<region>.appmesh-envoy-management _– If using App Mesh_
- com.amazonaws.<region>.xray _– If using AWS X-Ray_

- [Security Group (SG) Considerations](https://docs.aws.amazon.com/eks/latest/userguide/sec-group-reqs.html#cluster-sg)
- Control Plane
- Minimum Inbound traffic - 443/TCP from all node SGs
- Minimum Outbound traffic - 10250/TCP to all node SGs
- Nodes
- Minimum Inbound traffic - 10250/TCP from control plane SGs
- Minimum Outbound traffic - 443/TCP to control plane SGs

- The [IAM Role](https://github.com/aws/amazon-vpc-cni-k8s/issues/30) needs to allow Nodes to pull images.
- Docs: https://docs.aws.amazon.com/AmazonECR/latest/userguide/repository-policy-examples.html
- [Terraform implementation](https://github.com/GSA/eks-brokerpak/blob/restrict-eks-traffic/terraform/provision/eks.tf#L150-L166) of creating the role policy

- A [aws_route53_resolver_endpoint](https://github.com/GSA/eks-brokerpak/blob/restrict-eks-traffic/terraform/provision/vpc.tf#L14-L28) needs to be made available to the private subnet.

- Careful consideration needs to be put towards the [user/roles](https://stackoverflow.com/questions/66996306/aws-eks-fargate-coredns-imagepullbackoff) for Fargate cluster creations.
13 changes: 13 additions & 0 deletions eks-service-definition.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,18 @@ provision:
pattern: ^[a-z0-9][a-z0-9-]*[a-z0-9]$
default: ${str.truncate(64, "${request.instance_id}")}
details: A subdomain to use for the cluster instance. Default is the instance ID.
- field_name: egress_allowed
required: false
type: array
details: "A list of IP ranges to allow egress traffic to (ex. [\"x.x.x.x/x\", ...])"
overwrite: true
default: null
- field_name: ingress_allowed
required: false
type: array
details: "A list of IP ranges to allow ingress traffic from (ex. [\"x.x.x.x/x\", ...])"
overwrite: true
default: null
computed_inputs:
- name: instance_name
required: true
Expand All @@ -51,6 +63,7 @@ provision:
default: ${config("aws.default_region")}
- name: write_kubeconfig
type: boolean
required: false
overwrite: true
default: false
outputs:
Expand Down
53 changes: 53 additions & 0 deletions network_policy/2048_fixture.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: deployment-2048
spec:
selector:
matchLabels:
app.kubernetes.io/name: app-2048
replicas: 2
template:
metadata:
labels:
app.kubernetes.io/name: app-2048
spec:
containers:
- image: alexwhen/docker-2048
imagePullPolicy: Always
name: app-2048
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: service-2048
spec:
ports:
- port: 80
targetPort: 80
protocol: TCP
type: ClusterIP
selector:
app.kubernetes.io/name: app-2048
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ingress-2048
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
rules:
- http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: service-2048
port:
number: 80
39 changes: 39 additions & 0 deletions network_policy/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
Context:

This sub-directory provides a test case for creating and applying network
policies. It..
- Sets up a basic kind cluster (specifying the pod-network-cidr)
- Installs an nginx ingress to make services available outside of the cluster
- Installs calico as the network plugin to manage network policies

The test case uses the 2048 service as a baseline of seeing network
restrictions. Different network policies are then applied to allow/restrict
network traffic.

Instuctions:

To setup environment,

`./startup.sh`

To tear down envrionment,

`./shutdown.sh`

To create 2048 game,

`kubectl apply -f 2048_fixture.yml`

To apply network policy,

`kubectl apply -f test_deny.yml`

To test egress traffic,

`kubectl exec -it pod/<2048-pod> -- sh -c "ping -c 4 8.8.8.8"`

To test ingress traffic,

Visit the 2048 game, default url is http://default-http-backend/
Note: Make sure to add the host to ip translation in /etc/hosts or similar
`127.0.0.1 default-http-backend`
29 changes: 29 additions & 0 deletions network_policy/kind-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# four node (three workers) cluster config
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
networking:
podSubnet: "10.0.0.0/8"
nodes:
- role: control-plane
image: kindest/node:v1.19.11@sha256:07db187ae84b4b7de440a73886f008cf903fcf5764ba8106a9fd5243d6f32729
# Mapping an ingress controller to host ports
# See docs at https://kind.sigs.k8s.io/docs/user/ingress/#create-cluster
kubeadmConfigPatches:
- |
kind: InitConfiguration
nodeRegistration:
kubeletExtraArgs:
node-labels: "ingress-ready=true"
extraPortMappings:
- containerPort: 80
hostPort: 80
protocol: TCP
- containerPort: 443
hostPort: 443
protocol: TCP
- role: worker
image: kindest/node:v1.19.11@sha256:07db187ae84b4b7de440a73886f008cf903fcf5764ba8106a9fd5243d6f32729
- role: worker
image: kindest/node:v1.19.11@sha256:07db187ae84b4b7de440a73886f008cf903fcf5764ba8106a9fd5243d6f32729
- role: worker
image: kindest/node:v1.19.11@sha256:07db187ae84b4b7de440a73886f008cf903fcf5764ba8106a9fd5243d6f32729
1 change: 1 addition & 0 deletions network_policy/shutdown.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
kind delete cluster --name datagov-broker-test
12 changes: 12 additions & 0 deletions network_policy/startup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Creating a temporary Kubernetes cluster to test against with KinD
kind create cluster --config kind-config.yaml --name datagov-broker-test

# Install a KinD-flavored ingress controller (to make the Solr instances visible to the host).
# See (https://kind.sigs.k8s.io/docs/user/ingress/#ingress-nginx for details.
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.0.1/deploy/static/provider/kind/deploy.yaml
kubectl wait --namespace ingress-nginx \
--for=condition=ready pod \
--selector=app.kubernetes.io/component=controller \
--timeout=270s

kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
9 changes: 9 additions & 0 deletions network_policy/test_deny.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: ingress-default-deny
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
2 changes: 1 addition & 1 deletion terraform/provision/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,6 @@ FROM alpine/k8s:1.20.7
COPY --from=terraform /bin/terraform /bin/terraform

RUN apk update
RUN apk add --update git
RUN apk add --update git bind-tools

ENTRYPOINT ["/bin/sh"]
10 changes: 5 additions & 5 deletions terraform/provision/crds.tf
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@
# solr-operator do it so that it will register and unregister its CRDs as part
# of the helm install process.
resource "helm_release" "zookeeper-operator" {
name = "zookeeper"
chart = "zookeeper-operator"
repository = "https://charts.pravega.io/"
version = "0.2.12"
namespace = "kube-system"
name = "zookeeper"
chart = "zookeeper-operator"
repository = "https://charts.pravega.io/"
version = "0.2.12"
namespace = "kube-system"
set {
# See https://github.com/pravega/zookeeper-operator/issues/324#issuecomment-829267141
name = "hooks.delete"
Expand Down
Loading