save speech separation results to disk throw IndexError，size of diarization.labels() and shape of sources.data is not same #1735

yinyao · 2024-07-01T15:48:14Z

Tested versions

3.3.0

System information

win10

Issue description

I am trying use below code to separate an audio, diarization labels is 3, but when s = 1, sources.data[:,s] throw IndexError: index 1 is out of bounds for axis 1 with size 1，how can i fix it ? i want to save separated audio to disk.

# instantiate the pipeline
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained(
  "pyannote/speech-separation-ami-1.0",
  use_auth_token="HUGGINGFACE_ACCESS_TOKEN_GOES_HERE")

# run the pipeline on an audio file
diarization, sources = pipeline("audio.wav")

# dump the diarization output to disk using RTTM format
with open("audio.rttm", "w") as rttm:
    diarization.write_rttm(rttm)

# dump sources to disk as SPEAKER_XX.wav files
import scipy.io.wavfile
for s, speaker in enumerate(diarization.labels()):
    scipy.io.wavfile.write(f'{speaker}.wav', 16000, sources.data[:,s])``

Minimal reproduction example (MRE)

https://github.com/yinyao/yinyao.github.io/blob/master/4.4-Chad-Zannah.wav

The text was updated successfully, but these errors were encountered:

yinyao · 2024-07-02T01:18:37Z

I upload the example，when i run the code，throw exception below:

File "D:\Software\Python3.10\lib\site-packages\pyannote\audio\pipelines\speech_separation.py", line 631, in apply
    if non_silent[0] > asr_collar_frames:
IndexError: index 0 is out of bounds for axis 0 with size 0

hbredin · 2024-07-02T14:08:47Z

@joonaskalda, any chance you can have a look, now that @yinyao has shared the audio file above?

eschmidbauer · 2024-07-17T18:39:13Z

i am also running into this issue
using the following code:

for s, speaker in enumerate(diarization.labels()):
            audio = np.float32(sources.data[:, s] / np.max(np.abs(sources.data[:, s])))  # noqa

RioLLee · 2024-08-09T07:56:38Z

Hello, I also encountered this problem, and I found that it may be caused by the following problem:
https://github.com/pyannote/pyannote-audio/blob/develop/pyannote/audio/pipelines/speech_separation.py#L591C9-L591C30

In the SpeechSeparation class, after obtaining the global clustering result, both the diarization result and the separation result will call self.reconstruct() to map the local speaker probability to the global speaker probability through the mapping stored in hard_clusters. After that, the diarization result also needs to call the self.to_diarization() function to select k high-probability results in each frame. The value of k comes from the count, while the separation result does not have this step. So if the number of people after clustering, that is, np.max(hard_clusters) + 1, is less than the maximum number in count, that is, np.max(count.data), the number of people in the diarization result will be greater than the number of people in the separation result. See the code
https://github.com/pyannote/pyannote-audio/blob/develop/pyannote/audio/pipelines/utils/diarization.py#L224C9-L224C50

hbredin · 2024-08-20T07:51:17Z

Hey @joonaskalda, I believe @RioLLee correctly pin-pointed the reason why this happens.

Do you think you'll find time to have a look?

yinyao changed the title ~~speech-separation results，size of diarization.labels() and shape of sources.data is not same~~ save speech separation results to disk throw IndexError，size of diarization.labels() and shape of sources.data is not same Jul 1, 2024

hbredin added the cannot_reproduce label Jul 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

save speech separation results to disk throw IndexError，size of diarization.labels() and shape of sources.data is not same #1735

save speech separation results to disk throw IndexError，size of diarization.labels() and shape of sources.data is not same #1735

yinyao commented Jul 1, 2024 •

edited

Loading

yinyao commented Jul 2, 2024

hbredin commented Jul 2, 2024

eschmidbauer commented Jul 17, 2024

RioLLee commented Aug 9, 2024 •

edited

Loading

hbredin commented Aug 20, 2024

save speech separation results to disk throw IndexError，size of diarization.labels() and shape of sources.data is not same #1735

save speech separation results to disk throw IndexError，size of diarization.labels() and shape of sources.data is not same #1735

Comments

yinyao commented Jul 1, 2024 • edited Loading

Tested versions

System information

Issue description

Minimal reproduction example (MRE)

yinyao commented Jul 2, 2024

hbredin commented Jul 2, 2024

eschmidbauer commented Jul 17, 2024

RioLLee commented Aug 9, 2024 • edited Loading

hbredin commented Aug 20, 2024

yinyao commented Jul 1, 2024 •

edited

Loading

RioLLee commented Aug 9, 2024 •

edited

Loading