-
Notifications
You must be signed in to change notification settings - Fork 473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes for 65B and 70B runs #414
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GQA looks good to me
olmo/config.py
Outdated
@@ -243,6 +242,14 @@ class ModelConfig(BaseConfig): | |||
The number of self-attention heads. | |||
""" | |||
|
|||
n_kv_heads: Optional[int] = None | |||
""" | |||
The number of heads to use for keys and values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The number of heads to use for keys and values. | |
The number of heads to use for keys and values. Defaults to `n_heads`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, I just can't click "commit" here for some reason.
olmo/config.py
Outdated
if hasattr(new_config, "optimizer"): | ||
new_config.optimizer = OptimizerConfig.update_legacy_settings(new_config.optimizer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am learning that update_leacy_settings
doesn't work anyways with settings you specify on the command line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's an alternative approach that doesn't involve implementing update_legacy_settings()
.
olmo/config.py
Outdated
@@ -309,8 +317,7 @@ class ModelConfig(BaseConfig): | |||
|
|||
multi_query_attention: bool = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this Optional[bool]
, defaulting to None
.
olmo/config.py
Outdated
if self.n_kv_heads is None: | ||
self.n_kv_heads = self.n_heads |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then here we could do this:
if self.multi_query_attention:
self.n_kv_heads = 1
elif self.n_kv_heads is None:
self.n_kv_heads = self.n_heads
olmo/config.py
Outdated
self.n_kv_heads = self.n_heads | ||
|
||
@classmethod | ||
def update_legacy_settings(cls, config: D) -> D: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then this won't be needed.
…o mitchish65-2-gqa
Updates to 70B config and checkpointing
GQA into Mitchich65
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is some leftover debug commenting? Other than that, looks good.
# Save metadata. | ||
self._save_metadata(checkpoint_dir, upload_to=upload_to) | ||
|
||
# Save config. | ||
self._save_config(checkpoint_dir, upload_to=upload_to) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment explaining why this is now last?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
olmo/model.py
Outdated
@@ -245,7 +260,7 @@ def __init__(self, config: ModelConfig, cache: BufferCache): | |||
self.config = config | |||
self.__cache = cache | |||
# Warm up cache. | |||
self.get_rotary_embedding(config.max_sequence_length, _non_meta_init_device(config)) | |||
# self.get_rotary_embedding(config.max_sequence_length, _non_meta_init_device(config)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need this anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted.
scripts/beaker/mitchish70.sh
Outdated
SEED=3423 | ||
INIT=fan_in | ||
RUN_NAME="fan-in-init-${SEED}" | ||
ARGS="--run_name=${RUN_NAME} --data.seed=6198 --seed=${SEED} --model.init_fn=${INIT} --model.init_std=0.006 --model.init_cutoff_factor=3 --device_train_microbatch_size=4 --model.flash_attention=true --fused_loss=true --evaluators=[] --stop_at=500 --wandb.group=mitchish70-ablate-init --save_interval_ephemeral=100" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this isn't the final config anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleaned up in 92d2a08.
- Update cluster - Up-sample wikipedia
No description provided.