v0.5.0
What's new
- Fixed conversion to HuggingFace model for DDP-trained models.
- Added support for remote source and destination for HuggingFace model conversion.
Added 🎉
- Added support for document masking via flash-attn during training with
--data.generate_doc_lengths
. - Added config options for
model.norm_after
,model.scale_emb_init
, andauxiliary_loss_multiplier
(used with zloss). - Added scripts for running experiments on qk_norm, norm reordering, and zloss.
- Added
model.rope_theta
configuration option. - Added
model.embedding_layer_norm
configuration option for adding a LN to the embeddings. - Added
model.emb_init_std
configuration option to override the standard deviation used to initialize the embeddings. - Added downstream eval task for requests dumped from oe-eval tasks
- Added
CosLinearEnvelope
scheduler, which is a pointwise product of a cosine schedule and a linear decay. - Added ability to save outputs of submodules for debugging purposes.
- Version dolma flan change in named_data_mix.py
Changed ⚠️
- Changed default distributed training strategy from single-GPU to FSDP
- Fixed behavior of
effective_memmap_dtype
to prevent unrecognized dtypes to be parsed asuint16
.
Fixed ✅
- Fixed restarting a training run in later epochs so that we no longer need to set the flag
--epoch=INT
. - Swapped in correct flan data mix.
- Fix bug where the attention norm, when applied before the attention block, was modifying the residual stream.
- Fixed
OLMo.from_checkpoint()
so that it correctly loadsolmo_core
andtorch_new
style checkpoints. - Fixed
preserve_rng_state
being incorrectly set to False when doing gradient checkpointing with dropout
Commits
cee1a5d Merge pull request #710 from allenai/version-dolma-flan-change
213a639 Merge pull request #711 from allenai/epwalsh/fix-unbound-qkv
4575d40 Fix Conversion Issues + add support for remote upload. (#694)
78d79a5 Merge pull request #709 from allenai/shanea/debugging-docs
9147889 Merge pull request #685 from allenai/ot-oe-eval-requests
6cdc4cc Merge pull request #698 from allenai/shanea/compare-model-state
e5217cf Merge pull request #705 from allenai/dave/checkpoint_style_naming
f4b386e Merge pull request #704 from allenai/shanea/fix-olmo-1.7-batch-size
1e71ce3 Merge pull request #547 from allenai/shanea/add-olmo-1.7-7b-to-readme
6c4d53f Merge pull request #702 from chrisc36/main
0bc7f6c Merge pull request #690 from allenai/shanea/trace-model-outputs-2
4332c32 Merge pull request #691 from allenai/dave/cosine_linear_envelope
6587ddb Merge pull request #674 from allenai/dave/flan_data_mix
7d63fe0 Merge pull request #671 from allenai/s3_unshard_to_hf
c322b9a Merge pull request #686 from allenai/fix-from-checkpoint
c482df7 Merge pull request #680 from allenai/shanea/fix-incorrect-attn-norm
3e30710 Merge pull request #629 from allenai/epwalsh/amberish
4e00460 Add support for document masking during training (#661)
b45002e make epoch logging less confusing
1b7d275 Fix restarts in later epochs (#670)
345edc6 Merge branch 'main' of https://github.com/allenai/LLM
66d2be7 Revert "Update Beaker image"
0757223 Merge pull request #649 from allenai/ModelLadder
90b3889 Merge pull request #660 from allenai/fix_convert_olmo_to_hf
dfb7212 Merge pull request #616 from allenai/chameleon
d627c94 Merge pull request #665 from allenai/ddp-ckpt-fix
ab63296 Improving memmap type parser (#663)
b55fb5f Merge pull request #662 from allenai/tiny-olmo-config-fix
56d1fe0 Merge pull request #657 from allenai/shanea/lumi-torch2.3-3
26c2d53 Merge pull request #648 from allenai/shanea/default-fsdp-strategy
65f1fff Merge pull request #656 from jeqcho/patch-1
20b82f8 Merge pull request #653 from allenai/shanea/olmo-v0.4.0