Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: stack expects each tensor to be equal size (when using lhotse shar data sets) #10382

Closed
riqiang-dp opened this issue Sep 6, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@riqiang-dp
Copy link

Describe the bug

Hi @pzelasko, I got the following error using lhotse shar dataset settings to train a hybrid CTC transducer model. What does this error suggest? What are the entries in this stack in this case? I'm not sure if this is a bug or something wrong with my dataset.

I'm guessing if it can be due to audio having two channels? Although I think it's also unlikely that I have that in my data.

Thank you!

  File "***lib/python3.11/site-packages/torch/utils/data/_utils/worker.py", line 308, in _wor
ker_loop                                                                                                                                                       
    data = fetcher.fetch(index)                                                                                                                                
           ^^^^^^^^^^^^^^^^^^^^                                                                                                                                
  File "/***lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 41, in fetch 
    data = next(self.dataset_iter)                                                                                                                             
           ^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                             
  File "***lib/python3.11/site-packages/lhotse/dataset/iterable_dataset.py", line 100, in __n
ext__                                                                                                                                                          
    return self.dataset[sampled]                                                                                                                               
           ~~~~~~~~~~~~^^^^^^^^^                                                                                                                               
  File "***lib/python3.11/site-packages/nemo/collections/asr/data/audio_to_text_lhotse.py", l
ine 52, in __getitem__                                                                                                                                         
    audio, audio_lens, cuts = self.load_audio(cuts)                                                                                                            
                              ^^^^^^^^^^^^^^^^^^^^^                                                                                                            
  File "***lib/python3.11/site-packages/lhotse/dataset/input_strategies.py", line 224, in __c
all__                                                                                                                                                          
    return collate_audio(                                                                                                                                      
           ^^^^^^^^^^^^^^                                                                                                                                      
  File "***lib/python3.11/site-packages/lhotse/dataset/collation.py", line 202, in collate_au
dio                                                                                                                                                            
    audios = torch.stack(audios)                                                                                                                               
             ^^^^^^^^^^^^^^^^^^^                                                                                                                               
RuntimeError: stack expects each tensor to be equal size, but got [203680] at entry 0 and [2, 203680] at entry 28 

Steps/Code to reproduce bug
As described above.

Expected behavior

Tensors should all just have one dimension?

Environment overview (please complete the following information)

N/A

Environment details

N/A

Additional context

N/A

@riqiang-dp riqiang-dp added the bug Something isn't working label Sep 6, 2024
@riqiang-dp
Copy link
Author

Fixed, found stereo audio marked as single channel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant