Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to continue pretraining from the released checkpoint? #11

Open
saroyehun opened this issue Feb 16, 2021 · 3 comments
Open

How to continue pretraining from the released checkpoint? #11

saroyehun opened this issue Feb 16, 2021 · 3 comments

Comments

@saroyehun
Copy link

Hello,
Thank you for releasing the codes for pretraining MPNet!
I am trying to continue training of the language model task on a custom dataset from the released checkpoint using the --restore-file argument. However, I am not able to successfully load the checkpoint. It fails with the following error: MPNet/pretraining/fairseq/checkpoint_utils.py", line 307, in _upgrade_state _dict registry.set_defaults(state['args'], tasks.TASK_REGISTRY[state['args'].task]) KeyError: 'mixed_position_lm'

In case it helps, here is the details of the training command :

WARMUP_UPDATES=50000    # Warmup the learning rate over this many updates
PEAK_LR=0.0005          # Peak learning rate, adjust as needed
TOKENS_PER_SAMPLE=512   # Max sequence length
MAX_POSITIONS=512       # Num. positional embeddings (usually same as above)
MAX_SENTENCES=35        # Number of sequences per batch (batch size)
UPDATE_FREQ=16          # Increase the batch size 16x

DATA_DIR=data-bin

fairseq-train --fp16 $DATA_DIR \
  --task masked_permutation_lm --criterion masked_permutation_cross_entropy \
  --arch mpnet_base --sample-break-mode none --tokens-per-sample $TOKENS_PER_SAMPLE \
  --optimizer adam --adam-betas '(0.9,0.98)' --adam-eps 1e-6 --clip-norm 0.0 \
  --lr-scheduler polynomial_decay --lr $PEAK_LR --warmup-updates $WARMUP_UPDATES \ 
  --total-num-update $TOTAL_UPDATES   --dropout 0.1 --attention-dropout 0.1 --weight-decay 0.01 \
  --max-sentences $MAX_SENTENCES --update-freq $UPDATE_FREQ --skip-invalid-size-inputs-valid-test \
  --max-update $TOTAL_UPDATES --log-format simple --log-interval 1 --input-mode 'mpnet'\
  --restore-file mpnet.base/mpnet.pt --save-interval-updates 10 --ddp-backend no_c10d

I will appreciate insights on what to do to resolve this error. Thank you!

@GeorgeSanchezTR
Copy link

GeorgeSanchezTR commented Aug 30, 2023

@ast123 where did you get the mpnet.base/mpnet.pt
The link in the README.MD does not work
https://modelrelease.blob.core.windows.net/pre-training/MPNet/mpnet.base.tar.gz

image

@GeorgeSanchezTR
Copy link

GeorgeSanchezTR commented Sep 15, 2023

@ast123 I got hold of the checkpoint file. Give this a try. If you are still trying to pretrain.
The error is because the state saves the name of the registered task, which in this checkpoint is 'mixed_position_lm'
As a quick fix, I popped both the task and criterion from the state['args'] loaded by this checkpoint, before line 307
that throws the error, so that I can use a different task and criterion to pass through the arguments.

state['args'].__dict__.pop('task')
state['args'].__dict__.pop('criterion')

@gyhhe
Copy link

gyhhe commented Jun 8, 2024

@GeorgeSanchezTR,Did you use this checkpoint for pre-training, was it successful?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants