Skip to content

Commit

Permalink
remove watermark of MetaX topo diagrams (#581)
Browse files Browse the repository at this point in the history
Signed-off-by: Bo Han <[email protected]>
  • Loading branch information
obnah authored Oct 30, 2024
1 parent 233184e commit 3245d26
Show file tree
Hide file tree
Showing 8 changed files with 6 additions and 6 deletions.
6 changes: 3 additions & 3 deletions docs/metax-support.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
When multiple GPUs are configured on a single server, the GPU cards are connected to the same PCIe Switch or MetaXLink depending on whether they are connected
, there is a near-far relationship. This forms a topology among all the cards on the server, as shown in the following figure:

![img](../imgs/metax_topo.jpg)
![img](../imgs/metax_topo.png)

A user job requests a certain number of metax-tech.com/gpu resources, Kubernetes schedule pods to the appropriate node. gpu-device further processes the logic of allocating the remaining resources on the resource node following criterias below:
1. MetaXLink takes precedence over PCIe Switch in two way:
Expand All @@ -15,11 +15,11 @@ Equipped with MetaXLink interconnected resources.

2. When using `node-scheduler-policy=spread` , Allocate Metax resources to be under the same Metaxlink or Paiswich as much as possible, as the following figure shows:

![img](../imgs/metax_spread.jpg)
![img](../imgs/metax_spread.png)

3. When using `node-scheduler-policy=binpack`, Assign GPU resources, so minimize the damage to MetaxXLink topology, as the following figure shows:

![img](../imgs/metax_binpack.jpg)
![img](../imgs/metax_binpack.png)

## Important Notes

Expand Down
6 changes: 3 additions & 3 deletions docs/metax-support_cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
在单台服务器上配置多张 GPU 时,GPU 卡间根据双方是否连接在相同的 PCIe Switch 或 MetaXLink
下,存在近远(带宽高低)关系。服务器上所有卡间据此形成一张拓扑,如下图所示。

![img](../imgs/metax_topo.jpg)
![img](../imgs/metax_topo.png)

用户作业请求一定数量的 metax-tech.com/gpu 资源,Kubernetes 选择剩余资源数量满足要求的
节点,并将 Pod 调度到相应节点。gpu‑device 进一步处理资源节点上剩余资源的分配逻辑,并按照以
Expand All @@ -17,11 +17,11 @@

2. 当任务使用 `node-scheduler-policy=spread` ,分配GPU资源尽可能位于相同 MetaXLink或PCIe Switch下,如下图所示:

![img](../imgs/metax_spread.jpg)
![img](../imgs/metax_spread.png)

3. 当使用 `node-scheduler-policy=binpack`,分配GPU资源后,剩余资源尽可能完整,如下图所示:

![img](../imgs/metax_binpack.jpg)
![img](../imgs/metax_binpack.png)

## 注意:

Expand Down
Binary file removed imgs/metax_binpack.jpg
Binary file not shown.
Binary file added imgs/metax_binpack.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed imgs/metax_spread.jpg
Binary file not shown.
Binary file added imgs/metax_spread.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed imgs/metax_topo.jpg
Binary file not shown.
Binary file added imgs/metax_topo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 3245d26

Please sign in to comment.