diff --git a/docs/en/faq.md b/docs/en/faq.md index 6ebc7d64106..e656ecd8dc4 100644 --- a/docs/en/faq.md +++ b/docs/en/faq.md @@ -75,7 +75,7 @@ We list some common troubles faced by many users and their corresponding solutio 3. Extend the warmup iterations: some models are sensitive to the learning rate at the start of the training. You can extend the warmup iterations, e.g., change the `warmup_iters` from 500 to 1000 or 2000. 4. Add gradient clipping: some models requires gradient clipping to stabilize the training process. The default of `grad_clip` is `None`, you can add gradient clippint to avoid gradients that are too large, i.e., set `optimizer_config=dict(_delete_=True, grad_clip=dict(max_norm=35, norm_type=2))` in your config file. If your config does not inherits from any basic config that contains `optimizer_config=dict(grad_clip=None)`, you can simply add `optimizer_config=dict(grad_clip=dict(max_norm=35, norm_type=2))`. -- ’GPU out of memory" +- "GPU out of memory" 1. There are some scenarios when there are large amount of ground truth boxes, which may cause OOM during target assignment. You can set `gpu_assign_thr=N` in the config of assigner thus the assigner will calculate box overlaps through CPU when there are more than N GT boxes. 2. Set `with_cp=True` in the backbone. This uses the sublinear strategy in PyTorch to reduce GPU memory cost in the backbone. 3. Try mixed precision training using following the examples in `config/fp16`. The `loss_scale` might need further tuning for different models. diff --git a/docs/zh_cn/faq.md b/docs/zh_cn/faq.md index 1e5bcd9ee67..3376ce74830 100644 --- a/docs/zh_cn/faq.md +++ b/docs/zh_cn/faq.md @@ -77,7 +77,7 @@ 3. 延长 warm up 的时间:一些模型在训练初始时对学习率很敏感,您可以把 `warmup_iters` 从 500 更改为 1000 或 2000。 4. 添加 gradient clipping: 一些模型需要梯度裁剪来稳定训练过程。 默认的 `grad_clip` 是 `None`, 你可以在 config 设置 `optimizer_config=dict(_delete_=True, grad_clip=dict(max_norm=35, norm_type=2))` 如果你的 config 没有继承任何包含 `optimizer_config=dict(grad_clip=None)`, 你可以直接设置`optimizer_config=dict(grad_clip=dict(max_norm=35, norm_type=2))`. -- ’GPU out of memory" +- "GPU out of memory" 1. 存在大量 ground truth boxes 或者大量 anchor 的场景,可能在 assigner 会 OOM。 您可以在 assigner 的配置中设置 `gpu_assign_thr=N`,这样当超过 N 个 GT boxes 时,assigner 会通过 CPU 计算 IOU。 2. 在 backbone 中设置 `with_cp=True`。 这使用 PyTorch 中的 `sublinear strategy` 来降低 backbone 占用的 GPU 显存。 3. 使用 `config/fp16` 中的示例尝试混合精度训练。`loss_scale` 可能需要针对不同模型进行调整。