diff --git a/docs/usage/install/underlay/get-started-sriov-zh_CN.md b/docs/usage/install/underlay/get-started-sriov-zh_CN.md index 67a9190e7a..93410a0463 100644 --- a/docs/usage/install/underlay/get-started-sriov-zh_CN.md +++ b/docs/usage/install/underlay/get-started-sriov-zh_CN.md @@ -149,6 +149,18 @@ Spiderpool 可用作 underlay 网络场景下提供固定 IP 的一种解决方 } ``` + > sriov-network-config-daemon pod 负责在节点上配置 VF ,其会顺序在每个节点上完成该工作。在每个节点上配置 VF 时,sriov-network-config-daemon 会对节点上的所有 POD 进行驱逐,配置 VF ,并可能重启节点。当 sriov-network-config-daemon 驱逐某个 POD 失败时,会导致所有流程都停滞,从而导致 node 的 vf 数量一直为 0。 这种情况时,sriov-network-config-daemon POD 会看到如下类似日志: + > + > `error when evicting pods/calico-kube-controllers-865d498fd9-245c4 -n kube-system (will retry after 5s) ...` + > + > 该问题可参考 sriov-network-operator 社区的类似 [issue](https://github.com/k8snetworkplumbingwg/sriov-network-operator/issues/463) + > + > 此时,可排查指定 POD 为啥无法驱逐的原因,有如下可能: + > + > (1)该驱逐失败的 POD 可能配置了 PodDisruptionBudget,导致可用副本数不足。请调整 PodDisruptionBudget + > + > (2)集群中的可用节点不足,导致没有节点可以调度 + 4. 创建 SpiderIPPool 实例。 Pod 会从该子网中获取 IP,进行 Underlay 的网络通讯,所以该子网需要与接入的 Underlay 子网对应。 diff --git a/docs/usage/install/underlay/get-started-sriov.md b/docs/usage/install/underlay/get-started-sriov.md index 59aec7f947..5bc27655c8 100644 --- a/docs/usage/install/underlay/get-started-sriov.md +++ b/docs/usage/install/underlay/get-started-sriov.md @@ -142,6 +142,20 @@ SriovNetwork helps us install sriov-cni and sriov-device-plugin components, maki "memory": "16247944Ki", "pods": "110" } + ``` + + > The sriov-network-config-daemon pod is responsible for configuring VF on nodes, and it will sequentially complete the work on each node. When configuring VF on each node, the sriov network configuration daemon will evict all PODs on the node, configure VF, and possibly restart the node. When sriov network configuration daemon fails to evict a POD, it will cause all processes to stop, resulting in the vf number of nodes remaining at 0. In this case, the sriov network configuration daemon POD will see logs similar to the following: + > + > `error when evicting pods/calico-kube-controllers-865d498fd9-245c4 -n kube-system (will retry after 5s) ...` + > + > This issue can be referred to similar topics in the sriov-network-operator community [issue](https://github.com/k8snetworkplumbingwg/sriov-network-operator/issues/463) + > + > The reason why the designated POD cannot be expelled can be investigated, which may include the following: + > + > (1) The POD that failed the eviction may have been configured with a PodDisruptionBudget, resulting in a + > shortage of available replicas. Please adjust the PodDisruptionBudget + > + > (2) Insufficient available nodes in the cluster, resulting in no nodes available for scheduling ## Install Spiderpool