Skip to content

Commit

Permalink
KEP-753: add PRR answers for beta
Browse files Browse the repository at this point in the history
Signed-off-by: Matthias Bertschy <[email protected]>
  • Loading branch information
matthyx committed Oct 5, 2023
1 parent 8669330 commit 7fe8891
Show file tree
Hide file tree
Showing 2 changed files with 164 additions and 29 deletions.
2 changes: 2 additions & 0 deletions keps/prod-readiness/sig-node/753.yaml
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
kep-number: 753
alpha:
approver: "@wojtek-t"
beta:
approver: "@wojtek-t"
191 changes: 162 additions & 29 deletions keps/sig-node/753-sidecar-containers/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,7 @@ tags, and then generate with `hack/update-toc.sh`.
- [Graduation Criteria](#graduation-criteria)
- [Alpha](#alpha)
- [Beta](#beta)
- [GA](#ga)
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
- [Upgrade strategy](#upgrade-strategy)
- [Downgrade strategy](#downgrade-strategy)
Expand Down Expand Up @@ -1102,7 +1103,7 @@ For Beta and GA, add links to added tests together with links to k8s-triage for
https://storage.googleapis.com/k8s-triage/index.html
-->

- <test>: <link to test coverage> FIXME to fill
No integration tests are planned. We'll cover this with e2e_node tests.

##### e2e tests

Expand All @@ -1116,7 +1117,8 @@ https://storage.googleapis.com/k8s-triage/index.html
We expect no non-infra related flakes in the last month as a GA graduation criteria.
-->

- <test>: <link to test coverage> FIXME to fill
- Test failures: https://storage.googleapis.com/k8s-triage/index.html?pr=1&test=SidecarContainers
- All related tests can be filtered with the SidecarContainers

##### Ready state of a sidecar container is properly used to create/delete endpoints

Expand All @@ -1128,16 +1130,20 @@ The sidecar container can expose ports and can be used to handle external traffi

##### Pod lifecycle scenarios without sidecar containers

TBD: describe test cases that can be affected by introducing sidecar containers.
- Init containers should start after the previous init container has completed
- Regular containers should start in parallel after all init containers have completed
- Restart behavior of the init containers

##### Pod lifecycle scenarios with sidecar containers

TBD: describe test cases to test with sidecar containers.
- Init containers should start after the previous restartable init container has started
- Regular containers should start in parallel after all regular init containers have completed and all restartable init containers have started
- Restartable init containers should restart always

##### Kubelet restart test cases

TBD: describe test cases on how Pod with sidecar containers behaves when kubelet
was restarted
- It should restart the containers in the right order after the node reboot
- It should not restart any completed init containers after the kubelet restart

##### API server is down: failure to update containers status during initialization

Expand Down Expand Up @@ -1254,13 +1260,14 @@ in back-to-back releases.
#### Beta

- Implement proper termination ordering.
- Provide defaults for `restartPolicy` field on init containers, `nil` is not
an ideal default long term.
- Allow to apply security policies on all containers in `initContainers`
collection. Example may be disabling `kubectl exec` on containers in
`initContainers` collection.
- Allow sidecar containers to restart during the shutdown of the Pod.

#### GA

- Enable `livenessProbe` during the shutdown of the Pod for sidecar containers that restart.
- Allow to apply security policies on all containers in `initContainers`
collection. Example may be disabling `kubectl exec` on containers in
`initContainers` collection.

### Upgrade / Downgrade Strategy

Expand Down Expand Up @@ -1298,13 +1305,14 @@ Pods that has already been created will stay being scheduled after the downgrade
not be rejected by control
plane nor by kubelet. Both will treat the sidecar container as an Init container.
This may render the Pod unusable as it will stuck in initialization forever -
sidecar container are never exiting. We will document this behavior for Alpha release.
Promoting feature to Beta we will revisit the situation. If we will see this as
a major issue, we will need to wait for 3 releases so kubelet will have the logic
sidecar container are never exiting.
This behavior has been documented for Alpha release, but we don't see it as a
major issue requiring to wait for 3 releases so kubelet will have the logic
to reject such Pods when the feature gate is disabled to keep Downgrade safe.

**Note**, For the control plane and kubelet we will implement logic to reject Pods
with sidecar containers when feature gate got turned off.
**Note**, We have implemented logic for the kubelet to reject Pods
with sidecar containers when feature gate is turned off. For the control plane
we simply ignore the new field to maintain Pod scheduling.
See [Upgrade/downgrade testing](#upgradedowngrade-testing) section.

Workloads will have to be deleted and recreated with the old way of handling
Expand Down Expand Up @@ -1472,13 +1480,45 @@ rollout. Similarly, consider large clusters and how enablement/disablement
will rollout across nodes.
-->

Rollout could fail for multiple reasons:

- webhooks that are not recompiled with the new field will strip it out
- bug in the resource calculation or CPU reservation logic could render the Pod unschedulable
- bug in the kubelet affecting the pod lifecycle could cause the Pod to be stuck in initialization

However, we have tried to maintain a high coverage of unit tests to ensure we catch these.

Rollback can fail if a Pod with sidecars is scheduled on a node where the feature
is disabled.
In that case the Pod will be rejected by kubelet and will be stuck in Pending state.
Therefore, we advise to first disable the feature gate on the control plane and then
proceed with the nodes.

Running workloads are not impacted.

Pods with sidecars might take a long time to exit and exceed the TGPS, a new
event should be added in beta to help administrators diagnose this issue.
Rather than rolling back the feature, they should work on the graceful termination
of their main containers to ensure sidecars have enough time to be notified
and exit on their own.

###### What specific metrics should inform a rollback?

<!--
What signals should users be paying attention to when the feature is young
that might indicate a serious problem?
-->

- [X] Metrics
- Metric name: kubelet_started_containers_errors_total
- Type: Counter
- Labels:code, container_type (should be `init_container`)
- Components exposing the metric: `kubelet-metrics`
- Symptoms: high number of errors indicates that the kubelet is unable to start the sidecar containers
- [X] Events
- Event name: TBD
- Symptoms: high number of events indicates that the TGPS has been exceeded and sidecars have been terminated not gracefully

###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?

<!--
Expand All @@ -1487,12 +1527,41 @@ Longer term, we may want to require automated upgrade/rollback tests, but we
are missing a bunch of machinery and tooling and can't do that now.
-->

Upgrade->downgrade->upgrade testing was done manually using the following steps:

Kubelet specific:
1. Deploy k8s 1.29-alpha
2. Enable the `SidecarContainers` feature gate on the control plane and kubelet
3. Deploy a Pod with sidecar containers using a Deployment
4. Disable the `SidecarContainers` feature gate on the kubelet (requires a restart)
5. Drain the node
6. Pod is rejected by kubelet
7. Enable the `SidecarContainers` feature gate on the kubelet (requires a restart)
8. Pod is scheduled and works as expected

Control plane specific:
1. Deploy k8s 1.29-alpha
2. Enable the `SidecarContainers` feature gate on the control plane and kubelet
3. Deploy a Pod with sidecar containers using a Deployment
4. Disable the `SidecarContainers` feature gate on the control plane
5. Delete the Pod
6. Pod is created without the new field - init containers are not recognized as sidecars and block the Pod in initialization
7. Modify the Deployment by moving the sidecar containers to the regular containers section
8. Pod is scheduled and works (without the sidecar support)
9. Enable the `SidecarContainers` feature gate on the control plane
10. Delete the Pod
11. Pod is scheduled and works (without the sidecar support)
12. Modify the Deployment by moving the sidecar containers to the init containers section
13. Pod is scheduled and works (with the sidecar support)

###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?

<!--
Even if applying deprecation policies, they may still surprise some users.
-->

No.

### Monitoring Requirements

<!--
Expand All @@ -1510,6 +1579,8 @@ checking if there are objects with field X set) may be a last resort. Avoid
logs or events for this purpose.
-->

By checking if `.spec.initContainers[i].restartPolicy` is set to `OnFailure` or `Always`.

###### How can someone using this feature know that it is working for their instance?

<!--
Expand All @@ -1521,13 +1592,17 @@ and operation of this feature.
Recall that end users cannot usually observe component logs or access metrics.
-->

- [ ] Events
- Event Reason:
- [ ] API .status
- Condition name:
- Other field:
- [ ] Other (treat as last resort)
- Details:
End users can check components that are using the new feature, such as Istio, if istio-proxy runs as a sidecar container:

```
$ kubectl get pod -o "custom-columns="\
"NAME:.metadata.name,"\
"INIT:.spec.initContainers[*].name,"\
"CONTAINERS:.spec.containers[*].name"
NAME INIT CONTAINERS
sleep-7656cf8794-8fhdk istio-init,istio-proxy sleep
```

###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?

Expand All @@ -1546,18 +1621,33 @@ These goals will help you determine what you need to measure (SLIs) in the next
question.
-->

- number of running containers should not change by more than 10% throughout the day,
as measured by the number of running containers at the beginning and end of the day
- error rate for containers of type init_container should be less than 1%,
as measured by the number of errors divided by the total number of init_container containers
- number of events indicating that TGPS has been exceeded should be less than 10 per day,
as measured by the number of events logged in the kubelet log
- 99% of jobs with sidecars should complete successfully,
as measured by the number of jobs that complete successfully divided by the total number of jobs with sidecars

###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?

<!--
Pick one more of these and delete the rest.
-->

- [ ] Metrics
- Metric name:
- [Optional] Aggregation method:
- Components exposing the metric:
- [ ] Other (treat as last resort)
- Details:
- [X] Metrics
- Metric name: kubelet_running_containers
- Type: Gauge
- Labels:container_state
- Components exposing the metric: `kubelet-metrics`
- Metric name: kubelet_started_containers_errors_total
- Type: Counter
- Labels:code, container_type (should be `init_container`)
- Components exposing the metric: `kubelet-metrics`
- [X] Events
- Event name: TBD
- should not appear, unless TGPS is exceeded and sidecars are terminated

###### Are there any missing metrics that would be useful to have to improve observability of this feature?

Expand All @@ -1566,6 +1656,8 @@ Describe the metrics themselves and the reasons why they weren't added (e.g., co
implementation difficulties, etc.).
-->

No.

### Dependencies

<!--
Expand All @@ -1589,6 +1681,8 @@ and creating new ones, as well as about cluster-level services (e.g. DNS):
- Impact of its degraded performance or high-error rates on the feature:
-->

No.

### Scalability

<!--
Expand Down Expand Up @@ -1616,6 +1710,8 @@ Focusing mostly on:
heartbeats, leader election, etc.)
-->

No.

###### Will enabling / using this feature result in introducing new API types?

<!--
Expand All @@ -1625,6 +1721,8 @@ Describe them, providing:
- Supported number of objects per namespace (for namespace-scoped objects)
-->

No.

###### Will enabling / using this feature result in any new calls to the cloud provider?

<!--
Expand All @@ -1633,6 +1731,8 @@ Describe them, providing:
- Estimated increase:
-->

No.

###### Will enabling / using this feature result in increasing size or count of the existing API objects?

<!--
Expand All @@ -1642,6 +1742,8 @@ Describe them, providing:
- Estimated amount of new objects: (e.g., new Object X for every existing Pod)
-->

No.

###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?

<!--
Expand All @@ -1653,6 +1755,10 @@ Think about adding additional work or introducing new steps in between
[existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos
-->

Graceful Pod termination might take longer with sidecars since their exit sequence starts after the
last main container has stopped.
The impact should be negligible because the TGPS is enforced in all cases.

###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?

<!--
Expand All @@ -1665,6 +1771,8 @@ This through this both in small and large cases, again with respect to the
[supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md
-->

No.

###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?

<!--
Expand All @@ -1677,6 +1785,10 @@ Are there any tests that were run/should be run to understand performance charac
and validate the declared limits?
-->

No, since the KEP only enable a new way to run containers as sidecars instead of regular containers.
Resource consumption can even be lower since various tricks using emptyDir volumes to perform synchronization
(as with istio-proxy) are no longer needed.

### Troubleshooting

<!--
Expand All @@ -1692,6 +1804,8 @@ details). For now, we leave it here.

###### How does this feature react if the API server and/or etcd is unavailable?

Nothing changes compared to the current kubelet behavior.

###### What are other known failure modes?

<!--
Expand All @@ -1707,8 +1821,27 @@ For each of them, fill in the following information by copying the below templat
- Testing: Are there any tests for failure mode? If not, describe why.
-->

- Main containers don't exit within TGPS, leading to sidecars being terminated
- Detection: high number of events indicating TGPS has been exceeded
- Mitigations: ensure timely termination of main containers
- Diagnostics: Events
- Testing: TBD
- Main container or sidecar use a preStop hook consuming TGPS, leading to remaining sidecars being terminated
- Detection: high number of events indicating TGPS has been exceeded
- Mitigations: ensure preStop hooks are not delaying termination
- Diagnostics: Events
- Testing: TBD
- Sidecar container uses a preStop hook that make the container exit during Pod shutdown, sidecar is restarted, leading
to a CrashLoopBackOff
- Detection: sidecar in CrashLoopBackOff during termination
- Mitigations: ensure preStop hooks are not making the container to exit, document best practices
- Diagnostics: Events
- Testing: TBD

###### What steps should be taken if SLOs are not being met to determine the problem?

None.

## Implementation History

<!--
Expand Down

0 comments on commit 7fe8891

Please sign in to comment.