Skip to content

v0.5.0

Compare
Choose a tag to compare
@github-actions github-actions released this 27 Aug 02:00
· 170 commits to main since this release

What's new

  • Fixed conversion to HuggingFace model for DDP-trained models.
  • Added support for remote source and destination for HuggingFace model conversion.

Added 🎉

  • Added support for document masking via flash-attn during training with --data.generate_doc_lengths.
  • Added config options for model.norm_after, model.scale_emb_init, and auxiliary_loss_multiplier (used with zloss).
  • Added scripts for running experiments on qk_norm, norm reordering, and zloss.
  • Added model.rope_theta configuration option.
  • Added model.embedding_layer_norm configuration option for adding a LN to the embeddings.
  • Added model.emb_init_std configuration option to override the standard deviation used to initialize the embeddings.
  • Added downstream eval task for requests dumped from oe-eval tasks
  • Added CosLinearEnvelope scheduler, which is a pointwise product of a cosine schedule and a linear decay.
  • Added ability to save outputs of submodules for debugging purposes.
  • Version dolma flan change in named_data_mix.py

Changed ⚠️

  • Changed default distributed training strategy from single-GPU to FSDP
  • Fixed behavior of effective_memmap_dtype to prevent unrecognized dtypes to be parsed as uint16.

Fixed ✅

  • Fixed restarting a training run in later epochs so that we no longer need to set the flag --epoch=INT.
  • Swapped in correct flan data mix.
  • Fix bug where the attention norm, when applied before the attention block, was modifying the residual stream.
  • Fixed OLMo.from_checkpoint() so that it correctly loads olmo_core and torch_new style checkpoints.
  • Fixed preserve_rng_state being incorrectly set to False when doing gradient checkpointing with dropout

Commits

cee1a5d Merge pull request #710 from allenai/version-dolma-flan-change
213a639 Merge pull request #711 from allenai/epwalsh/fix-unbound-qkv
4575d40 Fix Conversion Issues + add support for remote upload. (#694)
78d79a5 Merge pull request #709 from allenai/shanea/debugging-docs
9147889 Merge pull request #685 from allenai/ot-oe-eval-requests
6cdc4cc Merge pull request #698 from allenai/shanea/compare-model-state
e5217cf Merge pull request #705 from allenai/dave/checkpoint_style_naming
f4b386e Merge pull request #704 from allenai/shanea/fix-olmo-1.7-batch-size
1e71ce3 Merge pull request #547 from allenai/shanea/add-olmo-1.7-7b-to-readme
6c4d53f Merge pull request #702 from chrisc36/main
0bc7f6c Merge pull request #690 from allenai/shanea/trace-model-outputs-2
4332c32 Merge pull request #691 from allenai/dave/cosine_linear_envelope
6587ddb Merge pull request #674 from allenai/dave/flan_data_mix
7d63fe0 Merge pull request #671 from allenai/s3_unshard_to_hf
c322b9a Merge pull request #686 from allenai/fix-from-checkpoint
c482df7 Merge pull request #680 from allenai/shanea/fix-incorrect-attn-norm
3e30710 Merge pull request #629 from allenai/epwalsh/amberish
4e00460 Add support for document masking during training (#661)
b45002e make epoch logging less confusing
1b7d275 Fix restarts in later epochs (#670)
345edc6 Merge branch 'main' of https://github.com/allenai/LLM
66d2be7 Revert "Update Beaker image"
0757223 Merge pull request #649 from allenai/ModelLadder
90b3889 Merge pull request #660 from allenai/fix_convert_olmo_to_hf
dfb7212 Merge pull request #616 from allenai/chameleon
d627c94 Merge pull request #665 from allenai/ddp-ckpt-fix
ab63296 Improving memmap type parser (#663)
b55fb5f Merge pull request #662 from allenai/tiny-olmo-config-fix
56d1fe0 Merge pull request #657 from allenai/shanea/lumi-torch2.3-3
26c2d53 Merge pull request #648 from allenai/shanea/default-fsdp-strategy
65f1fff Merge pull request #656 from jeqcho/patch-1
20b82f8 Merge pull request #653 from allenai/shanea/olmo-v0.4.0