Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pretrain then finetune on multiple GPUs error #1613

Open
ZeguanXiao opened this issue Jul 23, 2024 · 0 comments
Open

Pretrain then finetune on multiple GPUs error #1613

ZeguanXiao opened this issue Jul 23, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@ZeguanXiao
Copy link

Bug description

When I finetune a pre-trained (using litgpt) tinyllama model with multiple GPUs, there is an error with weight mismatch. But when I finetune with only 1 GPU, it works.
A related issue is #1430

Traceback (most recent call last): File "/nfsshare/home/fuzhouquan/miniconda3/envs/litgpt/bin/litgpt", line 8, in <module> sys.exit(main()) File "/nfsshare/home/fuzhouquan/whith-box-defense/litgpt/litgpt/__main__.py", line 57, in main CLI(parser_data) File "/nfsshare/home/fuzhouquan/miniconda3/envs/litgpt/lib/python3.10/site-packages/jsonargparse/_cli.py", line 119, in CLI return _run_component(component, init.get(subcommand)) File "/nfsshare/home/fuzhouquan/miniconda3/envs/litgpt/lib/python3.10/site-packages/jsonargparse/_cli.py", line 204, in _run_component return component(**cfg) File "/nfsshare/home/fuzhouquan/whith-box-defense/litgpt/litgpt/finetune/full.py", line 105, in setup fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/nfsshare/home/fuzhouquan/miniconda3/envs/litgpt/lib/python3.10/site-packages/lightning/fabric/fabric.py", line 845, in launch return self._wrap_and_launch(function, self, *args, **kwargs) File "/nfsshare/home/fuzhouquan/miniconda3/envs/litgpt/lib/python3.10/site-packages/lightning/fabric/fabric.py", line 930, in _wrap_and_launch return launcher.launch(to_run, *args, **kwargs) File "/nfsshare/home/fuzhouquan/miniconda3/envs/litgpt/lib/python3.10/site-packages/lightning/fabric/strategies/launchers/subprocess_script.py", line 107, in launch return function(*args, **kwargs) File "/nfsshare/home/fuzhouquan/miniconda3/envs/litgpt/lib/python3.10/site-packages/lightning/fabric/fabric.py", line 936, in _wrap_with_setup return to_run(*args, **kwargs) File "/nfsshare/home/fuzhouquan/whith-box-defense/litgpt/litgpt/finetune/full.py", line 151, in main load_checkpoint(fabric, state["model"], checkpoint_path) File "/nfsshare/home/fuzhouquan/whith-box-defense/litgpt/litgpt/utils.py", line 347, in load_checkpoint fabric.load_raw(checkpoint_path, model, strict=strict) File "/nfsshare/home/fuzhouquan/miniconda3/envs/litgpt/lib/python3.10/site-packages/lightning/fabric/fabric.py", line 802, in load_raw self._strategy.load_checkpoint(path=path, state=obj, strict=strict) File "/nfsshare/home/fuzhouquan/miniconda3/envs/litgpt/lib/python3.10/site-packages/lightning/fabric/strategies/fsdp.py", line 525, in load_checkpoint _load_raw_module_state_from_path(path, module=state, world_size=self.world_size, strict=strict) File "/nfsshare/home/fuzhouquan/miniconda3/envs/litgpt/lib/python3.10/site-packages/lightning/fabric/strategies/fsdp.py", line 859, in _load_raw_module_state_from_path _load_raw_module_state(state_dict=_lazy_load(path), module=module, world_size=world_size, strict=strict) File "/nfsshare/home/fuzhouquan/miniconda3/envs/litgpt/lib/python3.10/site-packages/lightning/fabric/strategies/fsdp.py", line 870, in _load_raw_module_state module.load_state_dict(state_dict, strict=strict) File "/nfsshare/home/fuzhouquan/miniconda3/envs/litgpt/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2153, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for FullyShardedDataParallel: Missing key(s) in state_dict: "_fsdp_wrapped_module.lm_head.weight", "_fsdp_wrapped_module.transformer.wte.weight", "_fsdp_wrapped_module.transformer.h.0._fsdp_wrapped_module._checkpoint_wrapped_module.norm_1.weight", "_fsdp_wrapped_module.transformer.h.0._fsdp_wrapped_module._checkpoint_wrapped_module.attn.attn.weight", "_fsdp_wrapped_module.transformer.h.0._fsdp_wrapped_module._checkpoint_wrapped_module.attn.proj.weight", "_fsdp_wrapped_module.transformer.h.0._fsdp_wrapped_module._checkpoint_wrapped_module.norm_2.weight", "_fsdp_wrapped_module.transformer.h.0._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.fc_1.weight", "_fsdp_wrapped_module.transformer.h.0._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.fc_2.weight", "_fsdp_wrapped_module.transformer.h.0._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.proj.weight", "_fsdp_wrapped_module.transformer.h.1._fsdp_wrapped_module._checkpoint_wrapped_module.norm_1.weight", "_fsdp_wrapped_module.transformer.h.1._fsdp_wrapped_module._checkpoint_wrapped_module.attn.attn.weight", "_fsdp_wrapped_module.transformer.h.1._fsdp_wrapped_module._checkpoint_wrapped_module.attn.proj.weight", "_fsdp_wrapped_module.transformer.h.1._fsdp_wrapped_module._checkpoint_wrapped_module.norm_2.weight", "_fsdp_wrapped_module.transformer.h.1._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.fc_1.weight", "_fsdp_wrapped_module.transformer.h.1._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.fc_2.weight", "_fsdp_wrapped_module.transformer.h.1._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.proj.weight", "_fsdp_wrapped_module.transformer.h.2._fsdp_wrapped_module._checkpoint_wrapped_module.norm_1.weight", "_fsdp_wrapped_module.transformer.h.2._fsdp_wrapped_module._checkpoint_wrapped_module.attn.attn.weight", "_fsdp_wrapped_module.transformer.h.2._fsdp_wrapped_module._checkpoint_wrapped_module.attn.proj.weight", "_fsdp_wrapped_module.transformer.h.2._fsdp_wrapped_module._checkpoint_wrapped_module.norm_2.weight", "_fsdp_wrapped_module.transformer.h.2._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.fc_1.weight", "_fsdp_wrapped_module.transformer.h.2._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.fc_2.weight", "_fsdp_wrapped_module.transformer.h.2._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.proj.weight", "_fsdp_wrapped_module.transformer.h.3._fsdp_wrapped_module._checkpoint_wrapped_module.norm_1.weight", "_fsdp_wrapped_module.transformer.h.3._fsdp_wrapped_module._checkpoint_wrapped_module.attn.attn.weight", "_fsdp_wrapped_module.transformer.h.3._fsdp_wrapped_module._checkpoint_wrapped_module.attn.proj.weight", "_fsdp_wrapped_module.transformer.h.3._fsdp_wrapped_module._checkpoint_wrapped_module.norm_2.weight", "_fsdp_wrapped_module.transformer.h.3._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.fc_1.weight", "_fsdp_wrapped_module.transformer.h.3._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.fc_2.weight", "_fsdp_wrapped_module.transformer.h.3._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.proj.weight", "_fsdp_wrapped_module.transformer.h.4._fsdp_wrapped_module._checkpoint_wrapped_module.norm_1.weight", "_fsdp_wrapped_module.transformer.h.4._fsdp_wrapped_module._checkpoint_wrapped_module.attn.attn.weight", "_fsdp_wrapped_module.transformer.h.4._fsdp_wrapped_module._checkpoint_wrapped_module.attn.proj.weight", "_fsdp_wrapped_module.transformer.h.4._fsdp_wrapped_module._checkpoint_wrapped_module.norm_2.weight", "_fsdp_wrapped_module.transformer.h.4._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.fc_1.weight", "_fsdp_wrapped_module.transformer.h.4._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.fc_2.weight", "_fsdp_wrapped_module.transformer.h.4._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.proj.weight", "_fsdp_wrapped_module.transformer.h.5._fsdp_wrapped_module._checkpoint_wrapped_module.norm_1.weight", "_fsdp_wrapped_module.transformer.h.5._fsdp_wrapped_module._checkpoint_wrapped_module.attn.attn.weight", "_fsdp_wrapped_module.transformer.h.5._fsdp_wrapped_module._checkpoint_wrapped_module.attn.proj.weight", "_fsdp_wrapped_module.transformer.h.5._fsdp_wrapped_module._checkpoint_wrapped_module.norm_2.weight", "_fsdp_wrapped_module.transformer.h.5._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.fc_1.weight", "_fsdp_wrapped_module.transformer.h.5._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.fc_2.weight", "_fsdp_wrapped_module.transformer.h.5._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.proj.weight", "_fsdp_wrapped_module.transformer.h.6._fsdp_wrapped_module._checkpoint_wrapped_module.norm_1.weight", "_fsdp_wrapped_module.transformer.h.6._fsdp_wrapped_module._checkpoint_wrapped_module.attn.attn.weight", "_fsdp_wrapped_module.transformer.h.6._fsdp_wrapped_module._checkpoint_wrapped_module.attn.proj.weight", "_fsdp_wrapped_module.transformer.h.6._fsdp_wrapped_module._checkpoint_wrapped_module.norm_2.weight", "_fsdp_wrapped_module.transformer.h.6._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.fc_1.weight", "_fsdp_wrapped_module.transformer.h.6._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.fc_2.weight", "_fsdp_wrapped_module.transformer.h.6._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.proj.weight", "_fsdp_wrapped_module.transformer.h.7._fsdp_wrapped_module._checkpoint_wrapped_module.norm_1.weight", "_fsdp_wrapped_module.transformer.h.7._fsdp_wrapped_module._checkpoint_wrapped_module.attn.attn.weight", "_fsdp_wrapped_module.transformer.h.7._fsdp_wrapped_module._checkpoint_wrapped_module.attn.proj.weight", "_fsdp_wrapped_module.transformer.h.7._fsdp_wrapped_module._checkpoint_wrapped_module.norm_2.weight", "_fsdp_wrapped_module.transformer.h.7._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.fc_1.weight", "_fsdp_wrapped_module.transformer.h.7._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.fc_2.weight", "_fsdp_wrapped_module.transformer.h.7._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.proj.weight", "_fsdp_wrapped_module.transformer.h.8._fsdp_wrapped_module._checkpoint_wrapped_module.norm_1.weight", "_fsdp_wrapped_module.transformer.h.8._fsdp_wrapped_module._checkpoint_wrapped_module.attn.attn.weight", "_fsdp_wrapped_module.transformer.h.8._fsdp_wrapped_module._checkpoint_wrapped_module.attn.proj.weight", "_fsdp_wrapped_module.transformer.h.8._fsdp_wrapped_module._checkpoint_wrapped_module.norm_2.weight", "_fsdp_wrapped_module.transformer.h.8._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.fc_1.weight", "_fsdp_wrapped_module.transformer.h.8._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.fc_2.weight", "_fsdp_wrapped_module.transformer.h.8._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.proj.weight", "_fsdp_wrapped_module.transformer.h.9._fsdp_wrapped_module._checkpoint_wrapped_module.norm_1.weight", "_fsdp_wrapped_module.transformer.h.9._fsdp_wrapped_module._checkpoint_wrapped_module.attn.attn.weight", "_fsdp_wrapped_module.transformer.h.9._fsdp_wrapped_module._checkpoint_wrapped_module.attn.proj.weight", "_fsdp_wrapped_module.transformer.h.9._fsdp_wrapped_module._checkpoint_wrapped_module.norm_2.weight", "_fsdp_wrapped_module.transformer.h.9._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.fc_1.weight", "_fsdp_wrapped_module.transformer.h.9._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.fc_2.weight", "_fsdp_wrapped_module.transformer.h.9._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.proj.weight", "_fsdp_wrapped_module.transformer.h.10._fsdp_wrapped_module._checkpoint_wrapped_module.norm_1.weight", "_fsdp_wrapped_module.transformer.h.10._fsdp_wrapped_module._checkpoint_wrapped_module.attn.attn.weight", "_fsdp_wrapped_module.transformer.h.10._fsdp_wrapped_module._checkpoint_wrapped_module.attn.proj.weight", "_fsdp_wrapped_module.transformer.h.10._fsdp_wrapped_module._checkpoint_wrapped_module.norm_2.weight", "_fsdp_wrapped_module.transformer.h.10._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.fc_1.weight", "_fsdp_wrapped_module.transformer.h.10._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.fc_2.weight", "_fsdp_wrapped_module.transformer.h.10._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.proj.weight", "_fsdp_wrapped_module.transformer.h.11._fsdp_wrapped_module._checkpoint_wrapped_module.norm_1.weight", "_fsdp_wrapped_module.transformer.h.11._fsdp_wrapped_module._checkpoint_wrapped_module.attn.attn.weight", "_fsdp_wrapped_module.transformer.h.11._fsdp_wrapped_module._checkpoint_wrapped_module.attn.proj.weight", "_fsdp_wrapped_module.transformer.h.11._fsdp_wrapped_module._checkpoint_wrapped_module.norm_2.weight", "_fsdp_wrapped_module.transformer.h.11._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.fc_1.weight", "_fsdp_wrapped_module.transformer.h.11._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.fc_2.weight", "_fsdp_wrapped_module.transformer.h.11._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.proj.weight", "_fsdp_wrapped_module.transformer.h.12._fsdp_wrapped_module._checkpoint_wrapped_module.norm_1.weight", "_fsdp_wrapped_module.transformer.h.12._fsdp_wrapped_module._checkpoint_wrapped_module.attn.attn.weight", "_fsdp_wrapped_module.transformer.h.12._fsdp_wrapped_module._checkpoint_wrapped_module.attn.proj.weight", "_fsdp_wrapped_module.transformer.h.12._fsdp_wrapped_module._checkpoint_wrapped_module.norm_2.weight", "_fsdp_wrapped_module.transformer.h.12._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.fc_1.weight", "_fsdp_wrapped_module.transformer.h.12._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.fc_2.weight", "_fsdp_wrapped_module.transformer.h.12._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.proj.weight", "_fsdp_wrapped_module.transformer.h.13._fsdp_wrapped_module._checkpoint_wrapped_module.norm_1.weight", "_fsdp_wrapped_module.transformer.h.13._fsdp_wrapped_module._checkpoint_wrapped_module.attn.attn.weight", "_fsdp_wrapped_module.transformer.h.13._fsdp_wrapped_module._checkpoint_wrapped_module.attn.proj.weight", "_fsdp_wrapped_module.transformer.h.13._fsdp_wrapped_module._checkpoint_wrapped_module.norm_2.weight", "_fsdp_wrapped_module.transformer.h.13._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.fc_1.weight", "_fsdp_wrapped_module.transformer.h.13._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.fc_2.weight", "_fsdp_wrapped_module.transformer.h.13._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.proj.weight", "_fsdp_wrapped_module.transformer.h.14._fsdp_wrapped_module._checkpoint_wrapped_module.norm_1.weight", "_fsdp_wrapped_module.transformer.h.14._fsdp_wrapped_module._checkpoint_wrapped_module.attn.attn.weight", "_fsdp_wrapped_module.transformer.h.14._fsdp_wrapped_module._checkpoint_wrapped_module.attn.proj.weight", "_fsdp_wrapped_module.transformer.h.14._fsdp_wrapped_module._checkpoint_wrapped_module.norm_2.weight", "_fsdp_wrapped_module.transformer.h.14._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.fc_1.weight", "_fsdp_wrapped_module.transformer.h.14._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.fc_2.weight", "_fsdp_wrapped_module.transformer.h.14._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.proj.weight", "_fsdp_wrapped_module.transformer.h.15._fsdp_wrapped_module._checkpoint_wrapped_module.norm_1.weight", "_fsdp_wrapped_module.transformer.h.15._fsdp_wrapped_module._checkpoint_wrapped_module.attn.attn.weight", "_fsdp_wrapped_module.transformer.h.15._fsdp_wrapped_module._checkpoint_wrapped_module.attn.proj.weight", "_fsdp_wrapped_module.transformer.h.15._fsdp_wrapped_module._checkpoint_wrapped_module.norm_2.weight", "_fsdp_wrapped_module.transformer.h.15._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.fc_1.weight", "_fsdp_wrapped_module.transformer.h.15._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.fc_2.weight", "_fsdp_wrapped_module.transformer.h.15._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.proj.weight", "_fsdp_wrapped_module.transformer.h.16._fsdp_wrapped_module._checkpoint_wrapped_module.norm_1.weight", "_fsdp_wrapped_module.transformer.h.16._fsdp_wrapped_module._checkpoint_wrapped_module.attn.attn.weight", "_fsdp_wrapped_module.transformer.h.16._fsdp_wrapped_module._checkpoint_wrapped_module.attn.proj.weight", "_fsdp_wrapped_module.transformer.h.16._fsdp_wrapped_module._checkpoint_wrapped_module.norm_2.weight", "_fsdp_wrapped_module.transformer.h.16._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.fc_1.weight", "_fsdp_wrapped_module.transformer.h.16._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.fc_2.weight", "_fsdp_wrapped_module.transformer.h.16._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.proj.weight", "_fsdp_wrapped_module.transformer.h.17._fsdp_wrapped_module._checkpoint_wrapped_module.norm_1.weight", "_fsdp_wrapped_module.transformer.h.17._fsdp_wrapped_module._checkpoint_wrapped_module.attn.attn.weight", "_fsdp_wrapped_module.transformer.h.17._fsdp_wrapped_module._checkpoint_wrapped_module.attn.proj.weight", "_fsdp_wrapped_module.transformer.h.17._fsdp_wrapped_module._checkpoint_wrapped_module.norm_2.weight", "_fsdp_wrapped_module.transformer.h.17._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.fc_1.weight", "_fsdp_wrapped_module.transformer.h.17._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.fc_2.weight", "_fsdp_wrapped_module.transformer.h.17._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.proj.weight", "_fsdp_wrapped_module.transformer.h.18._fsdp_wrapped_module._checkpoint_wrapped_module.norm_1.weight", "_fsdp_wrapped_module.transformer.h.18._fsdp_wrapped_module._checkpoint_wrapped_module.attn.attn.weight", "_fsdp_wrapped_module.transformer.h.18._fsdp_wrapped_module._checkpoint_wrapped_module.attn.proj.weight", "_fsdp_wrapped_module.transformer.h.18._fsdp_wrapped_module._checkpoint_wrapped_module.norm_2.weight", "_fsdp_wrapped_module.transformer.h.18._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.fc_1.weight", "_fsdp_wrapped_module.transformer.h.18._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.fc_2.weight", "_fsdp_wrapped_module.transformer.h.18._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.proj.weight", "_fsdp_wrapped_module.transformer.h.19._fsdp_wrapped_module._checkpoint_wrapped_module.norm_1.weight", "_fsdp_wrapped_module.transformer.h.19._fsdp_wrapped_module._checkpoint_wrapped_module.attn.attn.weight", "_fsdp_wrapped_module.transformer.h.19._fsdp_wrapped_module._checkpoint_wrapped_module.attn.proj.weight", "_fsdp_wrapped_module.transformer.h.19._fsdp_wrapped_module._checkpoint_wrapped_module.norm_2.weight", "_fsdp_wrapped_module.transformer.h.19._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.fc_1.weight", "_fsdp_wrapped_module.transformer.h.19._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.fc_2.weight", "_fsdp_wrapped_module.transformer.h.19._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.proj.weight", "_fsdp_wrapped_module.transformer.h.20._fsdp_wrapped_module._checkpoint_wrapped_module.norm_1.weight", "_fsdp_wrapped_module.transformer.h.20._fsdp_wrapped_module._checkpoint_wrapped_module.attn.attn.weight", "_fsdp_wrapped_module.transformer.h.20._fsdp_wrapped_module._checkpoint_wrapped_module.attn.proj.weight", "_fsdp_wrapped_module.transformer.h.20._fsdp_wrapped_module._checkpoint_wrapped_module.norm_2.weight", "_fsdp_wrapped_module.transformer.h.20._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.fc_1.weight", "_fsdp_wrapped_module.transformer.h.20._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.fc_2.weight", "_fsdp_wrapped_module.transformer.h.20._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.proj.weight", "_fsdp_wrapped_module.transformer.h.21._fsdp_wrapped_module._checkpoint_wrapped_module.norm_1.weight", "_fsdp_wrapped_module.transformer.h.21._fsdp_wrapped_module._checkpoint_wrapped_module.attn.attn.weight", "_fsdp_wrapped_module.transformer.h.21._fsdp_wrapped_module._checkpoint_wrapped_module.attn.proj.weight", "_fsdp_wrapped_module.transformer.h.21._fsdp_wrapped_module._checkpoint_wrapped_module.norm_2.weight", "_fsdp_wrapped_module.transformer.h.21._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.fc_1.weight", "_fsdp_wrapped_module.transformer.h.21._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.fc_2.weight", "_fsdp_wrapped_module.transformer.h.21._fsdp_wrapped_module._checkpoint_wrapped_module.mlp.proj.weight", "_fsdp_wrapped_module.transformer.ln_f.weight". Unexpected key(s) in state_dict: "_fsdp_wrapped_module.model", "_fsdp_wrapped_module.optimizer", "_fsdp_wrapped_module.train_dataloader", "_fsdp_wrapped_module.iter_num", "_fsdp_wrapped_module.step_count".

What operating system are you using?

Linux

LitGPT Version

Version: 0.4.3.dev0
@ZeguanXiao ZeguanXiao added the bug Something isn't working label Jul 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant