v0.4.0
What's new
Added 🎉
- Added clipping fix to
Optimizer
class to make it work with FSDPno_shard
and DDP. - Added tests to compare grad norm differences between torch optimizer and clipping and OLMo optimizer and clipping on both CPU and GPU.
- Expose memmap dtype in data config
- Added support for DDP training.
- Added caching to disk of HF datasets used in downstream evals
- Added FLOPs logging
- Added configs for OLMo tiny set of models
- Added configuration field
optimizer.record_update_metrics
, which defaults toFalse
, but when set toTrue
will trigger AdamW to collect the step size norm and absolute max for each parameter. - Added configuration field
optimizer.selective_updates
, which defaults toFalse
, but when set toTrue
will tell the optimizer to skip updating the parameter and state when the corresponding gradient is 0. - Added configuration field
optimizer.record_update_metrics
, which defaults toFalse
, but when set to True will trigger AdamW to collect the step size norm and absolute max for each parameter. - Added
olmo_data
, a package holding data files like tokenizers. - Added ability to load tokenizers from
olmo_data
package data.
Changed ⚠️
- Added original legacy unsharding implementation back, as the default. The new
shared memory implementation can be used by passinguse_legacy_shared_mem_impl
tounshard.py
. - Refactor weight initialization. IMPORTANT: this does not maintain backwards-compatibility with older configs; the jobs will still run, but may produce different outputs.
- Changed the behavior of the Lion optimizer to only record the update cosine similarity when
optimizer.record_update_metrics
isTrue
in order to be consistent with the API. - Added HF datasets into
olmo_data
, and changed downstream eval to load from the package.
Fixed ✅
- Changed from
ignored_index
toignore_index
forcross_entropy_loss
whenflash-attn>=2.5.8
. - Make
hf_olmo
supportAutoModelForCasualLM
and similar HF methods again.
Commits
d423c11 Merge pull request #652 from allenai/shanea/update-to-torch2.3
b10ab4b Merge pull request #651 from allenai/shanea/lumi-torch2.3-2
a101b31 Merge pull request #646 from allenai/shanea/hf-datasets-from-package
429a752 Merge pull request #647 from allenai/shanea/fix-tokenizer-break
bc60b8a Add option to skip optim steps for 0 grad params (#636)
cbc7c25 Merge pull request #645 from allenai/shanea/tokenizer-package-data
1b2658b Add option to record step size metrics from AdamW (#605)
a3e2ea7 multiple epoch fix
a1f118a Merge pull request #628 from allenai/olmo-tiny
d7994c8 Fix Z-loss calculation (#634)
a5539f4 Merge pull request #631 from allenai/shanea/hf-olmo-auto-model
d72a262 Merge pull request #626 from allenai/shanea/inspect-train-data-improvements
2417b11 Make olmo-core checkpointer more robust on weka (#624)
ddc8847 Merge pull request #612 from allenai/ddp
41ed20a Merge pull request #623 from allenai/shanea/hf-save-to-disk-2
a33caa9 Merge pull request #604 from allenai/WandbDiff
e5d63a3 Merge pull request #619 from allenai/shanea/add-olmo-1.7-7b-checkpoints
e207df7 Officially add OLMo-core as a dependency (#615)
72159ae Merge pull request #614 from allenai/shanea/pass-include-instance-metadata
c2cedbc Merge pull request #607 from allenai/rewrite-init
578234d Merge pull request #611 from allenai/shanea/hf-get-tokenizer-from-config-2
de43ee8 Merge pull request #610 from allenai/shanea/hf-get-tokenizer-from-config
2639279 Merge pull request #594 from NeuralFabricAI/lx/expose-data-dtype
9e89408 Create sensible filenames
02a8a58 Merge pull request #603 from allenai/shanea/unshard-without-passing-type
ae84d47 Merge pull request #602 from allenai/no_shard_ddp_clip
40210bb Merge pull request #599 from allenai/train-olmo-large
55c1e2f Merge pull request #601 from allenai/no_shard_ddp_clip
5789cfe Merge pull request #593 from allenai/shanea/inspect-train-data-no-indices
eafd154 Merge pull request #579 from MLgdg/main
652c745 Merge pull request #590 from allenai/shanea/update-readme-to-olmo-1.7
8ec2809 Merge pull request #589 from allenai/shanea/update-main-readme-hf
6e714b8 Merge pull request #588 from allenai/shanea/hf-olmo-docs-auto-methods
65d5575 Merge pull request #587 from allenai/shanea/storage-cleaner-improvemnts
0bddfe0 Merge pull request #585 from allenai/shanea/add-hf-docs
e6430a0 Merge pull request #582 from allenai/shanea/hybrid-shard-as-no-shard
c29787a Merge pull request #569 from allenai/Muennighoff/fix-torchv
7a462c5 Merge pull request #580 from allenai/shanea/update-ignore-index-kwarg
4f917fb Merge pull request #575 from allenai/shanea/add-weka
5c721cc Fix GPU tests CI (#574)
467adcc Merge remote-tracking branch 'origin/train-olmo-large'
4b2d12e Merge pull request #565 from allenai/readme
ccc49fd Merge pull request #564 from allenai/shanea/add-new-hf-converter
b17abd0 Merge pull request #512 from liaoleo/main
295d309 Merge pull request #561 from allenai/shanea/delay-device-mesh-import
4e8746d Merge pull request #562 from allenai/shanea/re-add-easy-legacy-unshard-impl
f38de95 Merge pull request #558 from allenai/shanea/release-v0.3.0
829f1d6 Merge pull request #520 from allenai/add-ce-loss-metric