diff --git a/README.md b/README.md index 0f882eed..47420909 100644 --- a/README.md +++ b/README.md @@ -23,22 +23,22 @@ Please note: `torchtitan` is a proof-of-concept for Large-scale LLM training usi Key features available:
1 - [FSDP2 (per param sharding)](docs/fsdp.md)
-2 - Tensor Parallel (FSDP + Tensor Parallel)
-3 - Selective layer and op activation checkpointing
-4 - Distributed checkpointing (asynch pending)
+2 - [Tensor Parallel](https://pytorch.org/docs/stable/distributed.tensor.parallel.html) (FSDP + Tensor Parallel)
+3 - Selective layer and operator activation checkpointing
+4 - Distributed checkpointing (async checkpointing)
5 - 3 datasets pre-configured (47K - 144M)
6 - GPU usage, MFU, tokens per second and other metrics all reported and displayed via TensorBoard.
7 - Fused RMSNorm (optional), learning rate scheduler, meta init, and more.
-8 - All options easily configured via toml files.
+8 - All options easily configured via [toml files](train_configs/).
9 - [Performance](docs/performance.md) verified on 64 A100 GPUs.
## Coming soon features: -1 - Asynch checkpointing
+1 - Async checkpointing
2 - FP8 support
3 - Context Parallel
4 - 3D (Pipeline Parallel)
-5 - Torch Compile support
+5 - `torch.compile` support
6 - Scalable data loading solution