Version dolma flan change #710

IanMagnusson · 2024-08-20T19:28:51Z

I'm working on sorting out which model ladder runs are comparable to other model ladder runs. One crucial necessity for this is versioning our data mixes. We should lock in a name for the version of Dolma 1.7 that uses preprocessed/tulu_flan/v1-decontaminated-60M-shots_all-upweight_1-dialog_false-sep_rulebased instead of preprocessed/tulu_flan/v2-decontaminated-60M-shots_all-upweight_1-dialog_false-sep_newline/ (introduced to the model ladder in this PR). I think we should not call this dolma17 as we currently do unless there is a plan to update the HF hosted version of dolma 1.7 to also have this change. At least I'd like there to be two different named_data_mixes for dolma 1.7 with each of these flans so that tracking which what exact dataset a run uses can be done by just looking at the data mix name and not having to check out the code used to train a run just to check what overloaded version of a named mix it is.

The implementation here is a hot fix to differentiate different flans in dolma17 for model ladder. Later we can work on cleaning up the data mix definition system more thoroughly but right now we need to just make sure that new runs do not have a mislabeled data mix.

…nto version-dolma-flan-change

IanMagnusson added 2 commits August 20, 2024 11:55

version dolma flan change

d8b4f10

Merge branch 'main' into version-dolma-flan-change

6959be9

IanMagnusson requested a review from dirkgr August 20, 2024 19:28

IanMagnusson and others added 4 commits August 21, 2024 09:59

figuring out undocumented style stuff

60a1e4e

Merge branch 'main' into version-dolma-flan-change

b189ca6

changelog updated

bd29493

Merge branch 'version-dolma-flan-change' of github.com:allenai/OLMo i…

bd319be

…nto version-dolma-flan-change

IanMagnusson marked this pull request as ready for review August 22, 2024 20:33

soldni requested a review from epwalsh August 22, 2024 20:57

epwalsh approved these changes Aug 23, 2024

View reviewed changes

Merge branch 'main' into version-dolma-flan-change

9c82609

IanMagnusson merged commit cee1a5d into main Aug 26, 2024
11 of 12 checks passed

IanMagnusson deleted the version-dolma-flan-change branch August 26, 2024 18:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Version dolma flan change #710

Version dolma flan change #710

IanMagnusson commented Aug 20, 2024

Version dolma flan change #710

Version dolma flan change #710

Conversation

IanMagnusson commented Aug 20, 2024