You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm attempting to evaluate an OpenLlama model on a test dataset. When I use single element inference, it's considerably slow, so I'm trying to utilize batching for efficiency. However, during batch inference, I'm encountering a CUDA error.
Error Message
../aten/src/ATen/native/cuda/Indexing.cu:1146: indexSelectLargeIndex: block: [277,0,0], thread: [125,0,0] Assertion 'srcIndex < srcSelectDimSize' failed.
../aten/src/ATen/native/cuda/Indexing.cu:1146: indexSelectLargeIndex: block: [277,0,0], thread: [126,0,0] Assertion 'srcIndex < srcSelectDimSize' failed.
../aten/src/ATen/native/cuda/Indexing.cu:1146: indexSelectLargeIndex: block: [277,0,0], thread: [127,0,0] Assertion 'srcIndex < srcSelectDimSize' failed.
...
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with 'TORCH_USE_CUDA_DSA' to enable device-side assertions.
I was because I have some out of index tokens, I have to remove them:
def truncate_batch(batch, max_length=None):
"""
To remove tokens before the padding '32000' which cause
Assertion `srcIndex < srcSelectDimSize` failed.
"""
lengths = batch['attention_mask'].sum(dim=1)
# If max_length is not provided, take the minimum of the lengths in the batch
if not max_length:
max_length = lengths.min().item()
# Slice the tensors
batch['input_ids'] = batch['input_ids'][:, :max_length]
batch['attention_mask'] = batch['attention_mask'][:, :max_length]
return batch
` ``
I'm attempting to evaluate an OpenLlama model on a test dataset. When I use single element inference, it's considerably slow, so I'm trying to utilize batching for efficiency. However, during batch inference, I'm encountering a CUDA error.
Error Message
Code for Batch Inference
Additional Information
It's original a "openlm-research/open_llama_7b_v2" but I finetune it using peft. So I load the model using :
Any assistance on this issue would be greatly appreciated. Thank you in advance!
The text was updated successfully, but these errors were encountered: