Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

save speech separation results to disk throw IndexError,size of diarization.labels() and shape of sources.data is not same #1735

Open
yinyao opened this issue Jul 1, 2024 · 5 comments

Comments

@yinyao
Copy link

yinyao commented Jul 1, 2024

Tested versions

3.3.0

System information

win10

Issue description

I am trying use below code to separate an audio, diarization labels is 3, but when s = 1, sources.data[:,s] throw IndexError: index 1 is out of bounds for axis 1 with size 1,how can i fix it ? i want to save separated audio to disk.

# instantiate the pipeline
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained(
  "pyannote/speech-separation-ami-1.0",
  use_auth_token="HUGGINGFACE_ACCESS_TOKEN_GOES_HERE")

# run the pipeline on an audio file
diarization, sources = pipeline("audio.wav")

# dump the diarization output to disk using RTTM format
with open("audio.rttm", "w") as rttm:
    diarization.write_rttm(rttm)

# dump sources to disk as SPEAKER_XX.wav files
import scipy.io.wavfile
for s, speaker in enumerate(diarization.labels()):
    scipy.io.wavfile.write(f'{speaker}.wav', 16000, sources.data[:,s])``

Minimal reproduction example (MRE)

https://github.com/yinyao/yinyao.github.io/blob/master/4.4-Chad-Zannah.wav

@yinyao yinyao changed the title speech-separation results,size of diarization.labels() and shape of sources.data is not same save speech separation results to disk throw IndexError,size of diarization.labels() and shape of sources.data is not same Jul 1, 2024
@yinyao
Copy link
Author

yinyao commented Jul 2, 2024

I upload the example,when i run the code,throw exception below:

File "D:\Software\Python3.10\lib\site-packages\pyannote\audio\pipelines\speech_separation.py", line 631, in apply
    if non_silent[0] > asr_collar_frames:
IndexError: index 0 is out of bounds for axis 0 with size 0

@hbredin
Copy link
Member

hbredin commented Jul 2, 2024

@joonaskalda, any chance you can have a look, now that @yinyao has shared the audio file above?

@eschmidbauer
Copy link

i am also running into this issue
using the following code:

for s, speaker in enumerate(diarization.labels()):
            audio = np.float32(sources.data[:, s] / np.max(np.abs(sources.data[:, s])))  # noqa

@RioLLee
Copy link

RioLLee commented Aug 9, 2024

Hello, I also encountered this problem, and I found that it may be caused by the following problem:
https://github.com/pyannote/pyannote-audio/blob/develop/pyannote/audio/pipelines/speech_separation.py#L591C9-L591C30

In the SpeechSeparation class, after obtaining the global clustering result, both the diarization result and the separation result will call self.reconstruct() to map the local speaker probability to the global speaker probability through the mapping stored in hard_clusters. After that, the diarization result also needs to call the self.to_diarization() function to select k high-probability results in each frame. The value of k comes from the count, while the separation result does not have this step. So if the number of people after clustering, that is, np.max(hard_clusters) + 1, is less than the maximum number in count, that is, np.max(count.data), the number of people in the diarization result will be greater than the number of people in the separation result. See the code
https://github.com/pyannote/pyannote-audio/blob/develop/pyannote/audio/pipelines/utils/diarization.py#L224C9-L224C50

@hbredin
Copy link
Member

hbredin commented Aug 20, 2024

Hey @joonaskalda, I believe @RioLLee correctly pin-pointed the reason why this happens.

Do you think you'll find time to have a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants