Skip to content

Commit

Permalink
Add torchtune checkpoint link, modify product position statement loca…
Browse files Browse the repository at this point in the history
…tion (#241)

This PR:
1 - add's feature note and link to checkpoint doc on supporting
torchtitan weights being saved and loaded into torchtune for fine
tuning.
2 - moves the product position info from top of page to bottom.
  • Loading branch information
lessw2020 authored Apr 16, 2024
1 parent 04f5b82 commit 4f6ed9a
Showing 1 changed file with 5 additions and 4 deletions.
9 changes: 5 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,9 @@
* Modular components instead of monolithic codebase.
* Get started in minutes, not hours!

Please note: `torchtitan` is a proof-of-concept for Large-scale LLM training using native PyTorch. It is (and will continue to be) a repo to showcase PyTorch's latest distributed training features in a clean, minimal codebase. torchtitan is complementary to and not a replacement for any of the great large-scale LLM training codebases such as Megatron, Megablocks, LLM Foundry, Deepspeed, etc. Instead, we hope that the features showcased in torchtitan will be adopted by these codebases quickly. torchtitan is unlikely to ever grow a large community around it.


## Pre-Release Updates:
#### (4/16/2024): `torchtitan` is now public but in a pre-release state and under development. Currently we showcase pre-training Llama2 models (LLMs) of various sizes from scratch. `torchtitan` is tested and verified with the PyTorch nightly version `torch-2.4.0.dev20240412`.
#### (4/16/2024): `torchtitan` is now public but in a pre-release state and under development.
Currently we showcase pre-training Llama2 models (LLMs) of various sizes from scratch. `torchtitan` is tested and verified with the PyTorch nightly version `torch-2.4.0.dev20240412`. (We recommend latest PyTorch nightly).

Key features available:</br>
1 - [FSDP2 (per param sharding)](docs/fsdp.md) </br>
Expand All @@ -31,6 +29,7 @@ Key features available:</br>
7 - Fused RMSNorm (optional), learning rate scheduler, meta init, and more.</br>
8 - All options easily configured via [toml files](train_configs/).</br>
9 - [Performance](docs/performance.md) verified on 64 A100 GPUs.</br>
10 - [Save pre-trained torchtitan model weights](docs/checkpoint.md) and load directly into [`torchtune`](https://github.com/pytorch/torchtune) for fine tuning. </br>


## Coming soon features:
Expand Down Expand Up @@ -108,6 +107,8 @@ If your gpu count per node is not 8, adjust:

in the SBATCH command section.

## Product Position:
Please note: `torchtitan` is a proof-of-concept for Large-scale LLM training using native PyTorch. It is (and will continue to be) a repo to showcase PyTorch's latest distributed training features in a clean, minimal codebase. torchtitan is complementary to and not a replacement for any of the great large-scale LLM training codebases such as Megatron, Megablocks, LLM Foundry, Deepspeed, etc. Instead, we hope that the features showcased in torchtitan will be adopted by these codebases quickly. torchtitan is unlikely to ever grow a large community around it.

## License

Expand Down

0 comments on commit 4f6ed9a

Please sign in to comment.