Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: learning rate scheduler fix for bert squad example [DET-2897] #711

Merged
merged 1 commit into from
Jun 15, 2020
Merged

fix: learning rate scheduler fix for bert squad example [DET-2897] #711

merged 1 commit into from
Jun 15, 2020

Conversation

davetroiano
Copy link

@davetroiano davetroiano commented Jun 15, 2020

Description

Test Plan

Commentary (optional)

@cla-bot cla-bot bot added the cla-signed label Jun 15, 2020
max_seq_length: 384
doc_stride: 128
max_query_length: 64
n_best_size: 20
max_answer_length: 30
null_score_diff_threshold: 0.0
max_grad_norm: 1.0
num_training_steps: 15000 # This is the number of optimizer steps. Set it
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the main fix to achieve parity. if this value is too low, we end up with a LR of zero pretty quickly, and this was causing the learning to basically stop (validation f1 flatlined at 60-something).

in the huggingface repo, they set this similarly to # batches times * epochs (assuming gradient accumulation is 1) -- see here and here.

with this fix we get closer to the expected validation f1: 89 after 150 steps, compared to 88.52.

@davetroiano
Copy link
Author

output from a const run:

image

@davetroiano davetroiano merged commit 60ced66 into determined-ai:master Jun 15, 2020
tayritenour pushed a commit to tayritenour/determined that referenced this pull request Apr 25, 2023
If the Slrum hpcPartitionDetails contains accelerator type set it for the resource pool.
eecsliu pushed a commit to eecsliu/determined that referenced this pull request Jun 23, 2023
If the Slrum hpcPartitionDetails contains accelerator type set it for the resource pool.
stoksc pushed a commit that referenced this pull request Jun 26, 2023
If the Slrum hpcPartitionDetails contains accelerator type set it for the resource pool.
eecsliu pushed a commit that referenced this pull request Jun 28, 2023
If the Slrum hpcPartitionDetails contains accelerator type set it for the resource pool.
eecsliu pushed a commit that referenced this pull request Jun 28, 2023
If the Slrum hpcPartitionDetails contains accelerator type set it for the resource pool.
stoksc pushed a commit that referenced this pull request Jul 20, 2023
If the Slrum hpcPartitionDetails contains accelerator type set it for the resource pool.
eecsliu pushed a commit that referenced this pull request Jul 24, 2023
If the Slrum hpcPartitionDetails contains accelerator type set it for the resource pool.
stoksc pushed a commit that referenced this pull request Oct 17, 2023
If the Slrum hpcPartitionDetails contains accelerator type set it for the resource pool.
azhou-determined pushed a commit that referenced this pull request Dec 7, 2023
If the Slrum hpcPartitionDetails contains accelerator type set it for the resource pool.
wes-turner pushed a commit that referenced this pull request Feb 2, 2024
If the Slrum hpcPartitionDetails contains accelerator type set it for the resource pool.
@dannysauer dannysauer added this to the 0.12.10 milestone Feb 6, 2024
rb-determined-ai pushed a commit that referenced this pull request Feb 29, 2024
If the Slrum hpcPartitionDetails contains accelerator type set it for the resource pool.
amandavialva01 pushed a commit that referenced this pull request Mar 18, 2024
If the Slrum hpcPartitionDetails contains accelerator type set it for the resource pool.
eecsliu pushed a commit that referenced this pull request Apr 18, 2024
If the Slrum hpcPartitionDetails contains accelerator type set it for the resource pool.
eecsliu pushed a commit to determined-ai/determined-release-testing that referenced this pull request Apr 22, 2024
If the Slrum hpcPartitionDetails contains accelerator type set it for the resource pool.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants