You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have read the README and searched the existing issues.
System Info
Tried to KTO finetune with Gemma 2 Base models, it results in error for "srcIndex < srcSelectDimSize" assertion fail.
Other models work fine for finetuning.
llamafactory version is 0.9.1.dev0
Transformers version is 4.45.2
Please see:
../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [1044,0,0], thread: [63,0,0] Assertion srcIndex < srcSelectDimSizefailed. Traceback (most recent call last): File "/home/ubuntu/factory/LLaMA-Factory/venv/bin/llamafactory-cli", line 8, in <module> sys.exit(main()) File "/home/ubuntu/factory/LLaMA-Factory/src/llamafactory/cli.py", line 111, in main run_exp() File "/home/ubuntu/factory/LLaMA-Factory/src/llamafactory/train/tuner.py", line 58, in run_exp run_kto(model_args, data_args, training_args, finetuning_args, callbacks) File "/home/ubuntu/factory/LLaMA-Factory/src/llamafactory/train/kto/workflow.py", line 78, in run_kto train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint) File "/home/ubuntu/factory/LLaMA-Factory/venv/lib/python3.10/site-packages/transformers/trainer.py", line 2052, in train return inner_training_loop( File "/home/ubuntu/factory/LLaMA-Factory/venv/lib/python3.10/site-packages/transformers/trainer.py", line 2388, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/home/ubuntu/factory/LLaMA-Factory/venv/lib/python3.10/site-packages/transformers/trainer.py", line 3485, in training_step loss = self.compute_loss(model, inputs) File "/home/ubuntu/factory/LLaMA-Factory/venv/lib/python3.10/site-packages/trl/trainer/kto_trainer.py", line 1382, in compute_loss loss, metrics = self.get_batch_loss_metrics(model, inputs) File "/home/ubuntu/factory/LLaMA-Factory/src/llamafactory/train/kto/trainer.py", line 196, in get_batch_loss_metrics self.concatenated_forward(model, batch) File "/home/ubuntu/factory/LLaMA-Factory/src/llamafactory/train/kto/trainer.py", line 152, in concatenated_forward target_logps, target_logps_avg = self.forward(model, batch) File "/home/ubuntu/factory/LLaMA-Factory/src/llamafactory/train/kto/trainer.py", line 144, in forward logits = model(**model_inputs, return_dict=True, use_cache=False).logits.to(torch.float32) File "/home/ubuntu/factory/LLaMA-Factory/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/ubuntu/factory/LLaMA-Factory/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) File "/home/ubuntu/factory/LLaMA-Factory/venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 820, in forward return model_forward(*args, **kwargs) File "/home/ubuntu/factory/LLaMA-Factory/venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 808, in __call__ return convert_to_fp32(self.model_forward(*args, **kwargs)) File "/home/ubuntu/factory/LLaMA-Factory/venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast return func(*args, **kwargs) File "/home/ubuntu/factory/LLaMA-Factory/venv/lib/python3.10/site-packages/peft/peft_model.py", line 1577, in forward return self.base_model( File "/home/ubuntu/factory/LLaMA-Factory/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/ubuntu/factory/LLaMA-Factory/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) File "/home/ubuntu/factory/LLaMA-Factory/venv/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 188, in forward return self.model.forward(*args, **kwargs) File "/home/ubuntu/factory/LLaMA-Factory/venv/lib/python3.10/site-packages/transformers/models/gemma2/modeling_gemma2.py", line 1047, in forward outputs = self.model( File "/home/ubuntu/factory/LLaMA-Factory/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/ubuntu/factory/LLaMA-Factory/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) File "/home/ubuntu/factory/LLaMA-Factory/venv/lib/python3.10/site-packages/transformers/models/gemma2/modeling_gemma2.py", line 850, in forward cache_position = torch.arange( RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile withTORCH_USE_CUDA_DSA` to enable device-side assertions.
Reminder
System Info
Tried to KTO finetune with Gemma 2 Base models, it results in error for "srcIndex < srcSelectDimSize" assertion fail.
Other models work fine for finetuning.
llamafactory version is 0.9.1.dev0
Transformers version is 4.45.2
Please see:
../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [1044,0,0], thread: [63,0,0] Assertion
srcIndex < srcSelectDimSizefailed. Traceback (most recent call last): File "/home/ubuntu/factory/LLaMA-Factory/venv/bin/llamafactory-cli", line 8, in <module> sys.exit(main()) File "/home/ubuntu/factory/LLaMA-Factory/src/llamafactory/cli.py", line 111, in main run_exp() File "/home/ubuntu/factory/LLaMA-Factory/src/llamafactory/train/tuner.py", line 58, in run_exp run_kto(model_args, data_args, training_args, finetuning_args, callbacks) File "/home/ubuntu/factory/LLaMA-Factory/src/llamafactory/train/kto/workflow.py", line 78, in run_kto train_result = trainer.train(resume_from_checkpoint=training_args.resume_from_checkpoint) File "/home/ubuntu/factory/LLaMA-Factory/venv/lib/python3.10/site-packages/transformers/trainer.py", line 2052, in train return inner_training_loop( File "/home/ubuntu/factory/LLaMA-Factory/venv/lib/python3.10/site-packages/transformers/trainer.py", line 2388, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) File "/home/ubuntu/factory/LLaMA-Factory/venv/lib/python3.10/site-packages/transformers/trainer.py", line 3485, in training_step loss = self.compute_loss(model, inputs) File "/home/ubuntu/factory/LLaMA-Factory/venv/lib/python3.10/site-packages/trl/trainer/kto_trainer.py", line 1382, in compute_loss loss, metrics = self.get_batch_loss_metrics(model, inputs) File "/home/ubuntu/factory/LLaMA-Factory/src/llamafactory/train/kto/trainer.py", line 196, in get_batch_loss_metrics self.concatenated_forward(model, batch) File "/home/ubuntu/factory/LLaMA-Factory/src/llamafactory/train/kto/trainer.py", line 152, in concatenated_forward target_logps, target_logps_avg = self.forward(model, batch) File "/home/ubuntu/factory/LLaMA-Factory/src/llamafactory/train/kto/trainer.py", line 144, in forward logits = model(**model_inputs, return_dict=True, use_cache=False).logits.to(torch.float32) File "/home/ubuntu/factory/LLaMA-Factory/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/ubuntu/factory/LLaMA-Factory/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) File "/home/ubuntu/factory/LLaMA-Factory/venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 820, in forward return model_forward(*args, **kwargs) File "/home/ubuntu/factory/LLaMA-Factory/venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 808, in __call__ return convert_to_fp32(self.model_forward(*args, **kwargs)) File "/home/ubuntu/factory/LLaMA-Factory/venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast return func(*args, **kwargs) File "/home/ubuntu/factory/LLaMA-Factory/venv/lib/python3.10/site-packages/peft/peft_model.py", line 1577, in forward return self.base_model( File "/home/ubuntu/factory/LLaMA-Factory/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/ubuntu/factory/LLaMA-Factory/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) File "/home/ubuntu/factory/LLaMA-Factory/venv/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 188, in forward return self.model.forward(*args, **kwargs) File "/home/ubuntu/factory/LLaMA-Factory/venv/lib/python3.10/site-packages/transformers/models/gemma2/modeling_gemma2.py", line 1047, in forward outputs = self.model( File "/home/ubuntu/factory/LLaMA-Factory/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/ubuntu/factory/LLaMA-Factory/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl return forward_call(*args, **kwargs) File "/home/ubuntu/factory/LLaMA-Factory/venv/lib/python3.10/site-packages/transformers/models/gemma2/modeling_gemma2.py", line 850, in forward cache_position = torch.arange( RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with
TORCH_USE_CUDA_DSA` to enable device-side assertions.../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [1045,0,0], thread: [0,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [1045,0,0], thread: [1,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [1045,0,0], thread: [2,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [1045,0,0], thread: [3,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [1045,0,0], thread: [4,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [1045,0,0], thread: [5,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [1045,0,0], thread: [6,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [1045,0,0], thread: [7,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [1045,0,0], thread: [8,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [1045,0,0], thread: [9,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [1045,0,0], thread: [10,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [1045,0,0], thread: [11,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [1045,0,0], thread: [12,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [1045,0,0], thread: [13,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [1045,0,0], thread: [14,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [1045,0,0], thread: [15,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [1045,0,0], thread: [16,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [1045,0,0], thread: [17,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [1045,0,0], thread: [18,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [1045,0,0], thread: [19,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [1045,0,0], thread: [20,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [1045,0,0], thread: [21,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [1045,0,0], thread: [22,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [1045,0,0], thread: [23,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [1045,0,0], thread: [24,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [1045,0,0], thread: [25,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [1045,0,0], thread: [26,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [1045,0,0], thread: [27,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [1045,0,0], thread: [28,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [1045,0,0], thread: [29,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [1045,0,0], thread: [30,0,0] Assertion
srcIndex < srcSelectDimSize
failed.../aten/src/ATen/native/cuda/Indexing.cu:1308: indexSelectLargeIndex: block: [1045,0,0], thread: [31,0,0] Assertion
srcIndex < srcSelectDimSize
failed.0%| | 0/6033 [00:00<?, ?it/s]`
Reproduction
Run KTO finetuning task with gemma 2 basemodel.
Expected behavior
Training should work
Others
No response
The text was updated successfully, but these errors were encountered: