Skip to content

Commit

Permalink
Merge branch 'release-1.8' into patch-51
Browse files Browse the repository at this point in the history
  • Loading branch information
lcfang authored Sep 13, 2017
2 parents 4ebd0e3 + 6dcb9d4 commit 3dc7521
Show file tree
Hide file tree
Showing 12 changed files with 207 additions and 65 deletions.
2 changes: 1 addition & 1 deletion _data/tasks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,7 @@ toc:
- docs/tasks/administer-cluster/quota-pod-namespace.md
- docs/tasks/administer-cluster/quota-api-object.md
- docs/tasks/administer-cluster/opaque-integer-resource-node.md
- docs/tasks/administer-cluster/cpu-management-policies.md
- docs/tasks/administer-cluster/access-cluster-api.md
- docs/tasks/administer-cluster/access-cluster-services.md
- docs/tasks/administer-cluster/securing-a-cluster.md
Expand All @@ -140,7 +141,6 @@ toc:
- docs/tasks/administer-cluster/cpu-memory-limit.md
- docs/tasks/administer-cluster/out-of-resource.md
- docs/tasks/administer-cluster/reserve-compute-resources.md
- docs/tasks/administer-cluster/cpu-management-policies.md
- docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods.md
- docs/tasks/administer-cluster/declare-network-policy.md
- title: Install Network Policy Provider
Expand Down
25 changes: 24 additions & 1 deletion docs/concepts/architecture/nodes.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,27 @@ The node condition is represented as a JSON object. For example, the following r

If the Status of the Ready condition is "Unknown" or "False" for longer than the `pod-eviction-timeout`, an argument is passed to the [kube-controller-manager](/docs/admin/kube-controller-manager) and all of the Pods on the node are scheduled for deletion by the Node Controller. The default eviction timeout duration is **five minutes**. In some cases when the node is unreachable, the apiserver is unable to communicate with the kubelet on it. The decision to delete the pods cannot be communicated to the kubelet until it re-establishes communication with the apiserver. In the meantime, the pods which are scheduled for deletion may continue to run on the partitioned node.

In versions of Kubernetes prior to 1.5, the node controller would [force delete](/docs/concepts/workloads/pods/pod/#force-deletion-of-pods) these unreachable pods from the apiserver. However, in 1.5 and higher, the node controller does not force delete pods until it is confirmed that they have stopped running in the cluster. One can see these pods which may be running on an unreachable node as being in the "Terminating" or "Unknown" states. In cases where Kubernetes cannot deduce from the underlying infrastructure if a node has permanently left a cluster, the cluster administrator may need to delete the node object by hand. Deleting the node object from Kubernetes causes all the Pod objects running on it to be deleted from the apiserver, freeing up their names.
In versions of Kubernetes prior to 1.5, the node controller would [force delete](/docs/concepts/workloads/pods/pod/#force-deletion-of-pods)
these unreachable pods from the apiserver. However, in 1.5 and higher, the node controller does not force delete pods until it is
confirmed that they have stopped running in the cluster. One can see these pods which may be running on an unreachable node as being in
the "Terminating" or "Unknown" states. In cases where Kubernetes cannot deduce from the underlying infrastructure if a node has
permanently left a cluster, the cluster administrator may need to delete the node object by hand. Deleting the node object from
Kubernetes causes all the Pod objects running on it to be deleted from the apiserver, freeing up their names.

Version 1.8 introduces an alpha feature that automatically creates
[taints](/docs/concepts/configuration/taint-and-toleration) that represent conditions.
To enable this behavior, pass an additional feature gate flag `--feature-gates=...,TaintNodesByCondition=true`
to the API server, controller manager, and scheduler.
When `TaintNodesByCondition` is enabled, the scheduler ignores conditions when considering a Node; instead
it looks at the Node's taints and a Pod's tolerations.

Now users can choose between the old scheduling model and a new, more flexible scheduling model.
A Pod that does not have any tolerations gets scheduled according to the old model. But a Pod that
tolerates the taints of a particular Node can be scheduled on that Node.

Note that because of small delay, usually less than one second, between time when condition is observed and a taint
is created, it's possible that enabling this feature will slightly increase number of Pods that are successfully
scheduled but rejected by the kubelet.

### Capacity

Expand Down Expand Up @@ -174,6 +194,9 @@ NodeController is responsible for adding taints corresponding to node problems l
node unreachable or not ready. See [this documentation](/docs/concepts/configuration/taint-and-toleration)
for details about `NoExecute` taints and the alpha feature.

Starting in version 1.8, the node controller can be made responsible for creating taints that represent
Node conditions. This is an alpha feature of version 1.8.

### Self-Registration of Nodes

When the kubelet flag `--register-node` is true (the default), the kubelet will attempt to
Expand Down
116 changes: 97 additions & 19 deletions docs/concepts/configuration/manage-compute-resources-container.md
Original file line number Diff line number Diff line change
Expand Up @@ -305,6 +305,8 @@ where `OOM` stands for Out Of Memory.

## Opaque integer resources (Alpha feature)

{% include feature-state-deprecated.md %}

Kubernetes version 1.5 introduces Opaque integer resources. Opaque
integer resources allow cluster operators to advertise new node-level
resources that would be otherwise unknown to the system.
Expand All @@ -313,9 +315,12 @@ Users can consume these resources in Pod specs just like CPU and memory.
The scheduler takes care of the resource accounting so that no more than the
available amount is simultaneously allocated to Pods.

**Note:** Opaque integer resources are Alpha in Kubernetes version 1.5.
Only resource accounting is implemented; node-level isolation is still
under active development.
**Note:** Opaque Integer Resources will be removed in version 1.9.
[Extended Resources](#extended-resources) are a replacement for Opaque Integer
Resources. Users can use any domain name prefix outside of the `kubernetes.io/`
domain instead of the previous `pod.alpha.kubernetes.io/opaque-int-resource-`
prefix.
{: .note}

Opaque integer resources are resources that begin with the prefix
`pod.alpha.kubernetes.io/opaque-int-resource-`. The API server
Expand All @@ -339,22 +344,9 @@ first pod that requests the resource to be scheduled on that node.

**Example:**

Here is an HTTP request that advertises five "foo" resources on node `k8s-node-1` whose master is `k8s-master`.

```http
PATCH /api/v1/nodes/k8s-node-1/status HTTP/1.1
Accept: application/json
Content-Type: application/json-patch+json
Host: k8s-master:8080
[
{
"op": "add",
"path": "/status/capacity/pod.alpha.kubernetes.io~1opaque-int-resource-foo",
"value": "5"
}
]
```
Here is an example showing how to use `curl` to form an HTTP request that
advertises five "foo" resources on node `k8s-node-1` whose master is
`k8s-master`.

```shell
curl --header "Content-Type: application/json-patch+json" \
Expand Down Expand Up @@ -395,6 +387,92 @@ spec:
pod.alpha.kubernetes.io/opaque-int-resource-foo: 1
```

## Extended Resources

Kubernetes version 1.8 introduces Extended Resources. Extended Resources are
fully-qualified resource names outside the `kubernetes.io` domain. Extended
Resources allow cluster operators to advertise new node-level resources that
would be otherwise unknown to the system. Extended Resource quantities must be
integers and cannot be overcommitted.

Users can consume Extended Resources in Pod specs just like CPU and memory.
The scheduler takes care of the resource accounting so that no more than the
available amount is simultaneously allocated to Pods.

The API server restricts quantities of Extended Resources to whole numbers.
Examples of _valid_ quantities are `3`, `3000m` and `3Ki`. Examples of
_invalid_ quantities are `0.5` and `1500m`.

**Note:** Extended Resources replace [Opaque Integer
Resources](#opaque-integer-resources-alpha-feature). Users can use any domain
name prefix outside of the `kubernetes.io/` domain instead of the previous
`pod.alpha.kubernetes.io/opaque-int-resource-` prefix.
{: .note}

There are two steps required to use Extended Resources. First, the
cluster operator must advertise a per-node Extended Resource on one or more
nodes. Second, users must request the Extended Resource in Pods.

To advertise a new Extended Resource, the cluster operator should
submit a `PATCH` HTTP request to the API server to specify the available
quantity in the `status.capacity` for a node in the cluster. After this
operation, the node's `status.capacity` will include a new resource. The
`status.allocatable` field is updated automatically with the new resource
asynchronously by the kubelet. Note that because the scheduler uses the
node `status.allocatable` value when evaluating Pod fitness, there may
be a short delay between patching the node capacity with a new resource and the
first pod that requests the resource to be scheduled on that node.

**Example:**

Here is an example showing how to use `curl` to form an HTTP request that
advertises five "example.com/foo" resources on node `k8s-node-1` whose master
is `k8s-master`.

```shell
curl --header "Content-Type: application/json-patch+json" \
--request PATCH \
--data '[{"op": "add", "path": "/status/capacity/example.com~1foo", "value": "5"}]' \
http://k8s-master:8080/api/v1/nodes/k8s-node-1/status
```

**Note**: In the preceding request, `~1` is the encoding for the character `/`
in the patch path. The operation path value in JSON-Patch is interpreted as a
JSON-Pointer. For more details, see
[IETF RFC 6901, section 3](https://tools.ietf.org/html/rfc6901#section-3).
{: .note}

To consume an Extended Resource in a Pod, include the resource name as a key
in the `spec.containers[].resources.requests` map.

**Note:** Extended resources cannot be overcommitted, so request and limit
must be equal if both are present in a container spec.
{: .note}

The Pod is scheduled only if all of the resource requests are
satisfied, including cpu, memory and any Extended Resources. The Pod will
remain in the `PENDING` state as long as the resource request cannot be met by
any node.

**Example:**

The Pod below requests 2 cpus and 1 "example.com/foo" (an extended resource.)

```yaml
apiVersion: v1
kind: Pod
metadata:
name: my-pod
spec:
containers:
- name: my-container
image: myimage
resources:
requests:
cpu: 2
example.com/foo: 1
```

## Planned Improvements

Kubernetes version 1.5 only allows resource quantities to be specified on a
Expand Down
19 changes: 15 additions & 4 deletions docs/concepts/configuration/taint-and-toleration.md
Original file line number Diff line number Diff line change
Expand Up @@ -188,7 +188,7 @@ running on the node as follows

The above behavior is a beta feature. In addition, Kubernetes 1.6 has alpha
support for representing node problems. In other words, the node controller
automatically taints a node when certain condition is true. The builtin taints
automatically taints a node when certain condition is true. The built-in taints
currently include:

* `node.alpha.kubernetes.io/notReady`: Node is not ready. This corresponds to
Expand Down Expand Up @@ -249,9 +249,20 @@ admission controller](https://git.k8s.io/kubernetes/plugin/pkg/admission/default

* `node.alpha.kubernetes.io/unreachable`
* `node.alpha.kubernetes.io/notReady`
* `node.kubernetes.io/memoryPressure`
* `node.kubernetes.io/diskPressure`
* `node.kubernetes.io/outOfDisk` (*only for critical pods*)

This ensures that DaemonSet pods are never evicted due to these problems,
which matches the behavior when this feature is disabled.

## Taint Nodes by Condition

Version 1.8 introduces an alpha feature that causes the node controller to create taints corresponding to
Node conditions. When this feature is enabled, the scheduler does not check conditions; instead the scheduler checks taints. This assures that conditions don't affect what's scheduled onto the Node. The user can choose to ignore some of the Node's problems (represented as conditions) by adding appropriate Pod tolerations.

To make sure that turning on this feature doesn't break DaemonSets, starting in version 1.8, the DaemonSet controller automatically adds the following `NoSchedule` tolerations to all daemons:

* `node.kubernetes.io/memory-pressure`
* `node.kubernetes.io/disk-pressure`
* `node.kubernetes.io/out-of-disk` (*only for critical pods*)

The above settings ensure backward compatibility, but we understand they may not fit all user's needs, which is why
cluster admin may choose to add arbitrary tolerations to DaemonSets.
17 changes: 10 additions & 7 deletions docs/concepts/workloads/controllers/daemonset.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,19 +103,22 @@ but they are created with `NoExecute` tolerations for the following taints with

- `node.alpha.kubernetes.io/notReady`
- `node.alpha.kubernetes.io/unreachable`
- `node.alpha.kubernetes.io/memoryPressure`
- `node.alpha.kubernetes.io/diskPressure`

When the support to critical pods is enabled and the pods in a DaemonSet are
labelled as critical, the Daemon pods are created with an additional
`NoExecute` toleration for the `node.alpha.kubernetes.io/outOfDisk` taint with
no `tolerationSeconds`.

This ensures that when the `TaintBasedEvictions` alpha feature is enabled,
they will not be evicted when there are node problems such as a network partition. (When the
`TaintBasedEvictions` feature is not enabled, they are also not evicted in these scenarios, but
due to hard-coded behavior of the NodeController rather than due to tolerations).

They also tolerate following `NoSchedule` taints:

- `node.kubernetes.io/memory-pressure`
- `node.kubernetes.io/disk-pressure`

When the support to critical pods is enabled and the pods in a DaemonSet are
labelled as critical, the Daemon pods are created with an additional
`NoSchedule` toleration for the `node.kubernetes.io/out-of-disk` taint.

Note that all above `NoSchedule` taints above are created only in version 1.8 or later if the alpha feature `TaintNodesByCondition` is enabled.

## Communicating with Daemon Pods

Expand Down
11 changes: 8 additions & 3 deletions docs/concepts/workloads/pods/init-containers.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,8 @@ scripts not present in an app image.

This feature has exited beta in 1.6. Init Containers can be specified in the PodSpec
alongside the app `containers` array. The beta annotation value will still be respected
and overrides the PodSpec field value.
and overrides the PodSpec field value, however, they are deprecated in 1.6 and 1.7.
In 1.8, the annotations are no longer supported and must be converted to the PodSpec field.

{% capture body %}
## Understanding Init Containers
Expand Down Expand Up @@ -123,7 +124,7 @@ spec:
command: ['sh', '-c', 'echo The app is running! && sleep 3600']
```
There is a new syntax in Kubernetes 1.6, although the old annotation syntax still works. We have moved the declaration of init containers to `spec`:
There is a new syntax in Kubernetes 1.6, although the old annotation syntax still works for 1.6 and 1.7. The new syntax must be used for 1.8 or greater. We have moved the declaration of init containers to `spec`:

```yaml
apiVersion: v1
Expand All @@ -146,7 +147,7 @@ spec:
command: ['sh', '-c', 'until nslookup mydb; do echo waiting for mydb; sleep 2; done;']
```

1.5 syntax still works on 1.6, but we recommend using 1.6 syntax. In Kubernetes 1.6, Init Containers were made a field in the API. The beta annotation is still respected but will be deprecated in future releases.
1.5 syntax still works on 1.6, but we recommend using 1.6 syntax. In Kubernetes 1.6, Init Containers were made a field in the API. The beta annotation is still respected in 1.6 and 1.7, but is not supported in 1.8 or greater.

Yaml file below outlines the `mydb` and `myservice` services:

Expand Down Expand Up @@ -311,6 +312,10 @@ into alpha and beta annotations so that Kubelets version 1.3.0 or greater can ex
Init Containers, and so that a version 1.6 apiserver can safely be rolled back to version
1.5.x without losing Init Container functionality for existing created pods.

In Apiserver and Kubelet versions 1.8.0 or greater, support for the alpha and beta annotations
is removed, requiring a conversion from the deprecated annotations to the
`spec.initContainers` field.

{% endcapture %}


Expand Down
Loading

0 comments on commit 3dc7521

Please sign in to comment.