Skip to content

Commit

Permalink
Merge pull request #2424 from spidernet-io/pr/welan/sriovissue
Browse files Browse the repository at this point in the history
doc: sriov issue
  • Loading branch information
weizhoublue authored Oct 14, 2023
2 parents 39eb7b4 + 8acefba commit 92225e7
Show file tree
Hide file tree
Showing 2 changed files with 26 additions and 0 deletions.
12 changes: 12 additions & 0 deletions docs/usage/install/underlay/get-started-sriov-zh_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,18 @@ Spiderpool 可用作 underlay 网络场景下提供固定 IP 的一种解决方
}
```
> sriov-network-config-daemon pod 负责在节点上配置 VF ,其会顺序在每个节点上完成该工作。在每个节点上配置 VF 时,sriov-network-config-daemon 会对节点上的所有 POD 进行驱逐,配置 VF ,并可能重启节点。当 sriov-network-config-daemon 驱逐某个 POD 失败时,会导致所有流程都停滞,从而导致 node 的 vf 数量一直为 0。 这种情况时,sriov-network-config-daemon POD 会看到如下类似日志:
>
> `error when evicting pods/calico-kube-controllers-865d498fd9-245c4 -n kube-system (will retry after 5s) ...`
>
> 该问题可参考 sriov-network-operator 社区的类似 [issue](https://github.com/k8snetworkplumbingwg/sriov-network-operator/issues/463)
>
> 此时,可排查指定 POD 为啥无法驱逐的原因,有如下可能:
>
> (1)该驱逐失败的 POD 可能配置了 PodDisruptionBudget,导致可用副本数不足。请调整 PodDisruptionBudget
>
> (2)集群中的可用节点不足,导致没有节点可以调度
4. 创建 SpiderIPPool 实例。
Pod 会从该子网中获取 IP,进行 Underlay 的网络通讯,所以该子网需要与接入的 Underlay 子网对应。
Expand Down
14 changes: 14 additions & 0 deletions docs/usage/install/underlay/get-started-sriov.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,20 @@ SriovNetwork helps us install sriov-cni and sriov-device-plugin components, maki
"memory": "16247944Ki",
"pods": "110"
}
```
> The sriov-network-config-daemon pod is responsible for configuring VF on nodes, and it will sequentially complete the work on each node. When configuring VF on each node, the sriov network configuration daemon will evict all PODs on the node, configure VF, and possibly restart the node. When sriov network configuration daemon fails to evict a POD, it will cause all processes to stop, resulting in the vf number of nodes remaining at 0. In this case, the sriov network configuration daemon POD will see logs similar to the following:
>
> `error when evicting pods/calico-kube-controllers-865d498fd9-245c4 -n kube-system (will retry after 5s) ...`
>
> This issue can be referred to similar topics in the sriov-network-operator community [issue](https://github.com/k8snetworkplumbingwg/sriov-network-operator/issues/463)
>
> The reason why the designated POD cannot be expelled can be investigated, which may include the following:
>
> (1) The POD that failed the eviction may have been configured with a PodDisruptionBudget, resulting in a
> shortage of available replicas. Please adjust the PodDisruptionBudget
>
> (2) Insufficient available nodes in the cluster, resulting in no nodes available for scheduling
## Install Spiderpool
Expand Down

0 comments on commit 92225e7

Please sign in to comment.