What is the purpose of the Resegmentation and AdaptiveVoiceActivityDetection Pipeline? #1700

asusdisciple · 2024-04-30T14:24:49Z

Tested versions

Reproduced in 3.1.0

System information

Ubuntu 22.04, Lenovo P1 Gen 5 Workstation A4500

Issue description

I wanted to improve my segmentation with Pyannote, since most of my segments are very long when the same person is talking. Since min_duration_off is already set to 0.0, I looked through the code and found the classes Resegmentation and AdaptiveVoiceActivityDetection.

I thought by applying one of those methods I would be able to get shorter segments, however it seems the code is not working. Is this legacy code or should it work?

For AdaptiveVoiceActivityDetection I get the error:

  File "/home/.../PycharmProjects/..../venv/lib/python3.10/site-packages/pyannote/audio/pipelines/voice_activity_detection.py", line 313, in apply
    vad_pipeline = VoiceActivityDetection("vad").instantiate(
  File "/home/.../PycharmProjects/..../venv/lib/python3.10/site-packages/pyannote/audio/pipelines/voice_activity_detection.py", line 123, in __init__
    model = get_model(segmentation, use_auth_token=use_auth_token)
  File "/home/.../PycharmProjects/.../venv/lib/python3.10/site-packages/pyannote/audio/pipelines/utils/getter.py", line 89, in get_model
    model.eval()
AttributeError: 'NoneType' object has no attribute 'eval'

Could not download 'vad' model.

I initialize the model like this:

self.vad = AdaptiveVoiceActivityDetection(MODEL_PATH_SEG)
self.vad.instantiate({"num_epochs": 1, "batch_size": settings.BATCH_SIZE_SEG, "learning_rate": 0.1})
self.vad.to(self.device)
#call
va = self.vad(tensor_audio_mapping)

For me it seems to be like the instantiation is hardcoded (line 313) and the model key "vad" can not be found?

For Resegmentation I get the error, however I can not see what is wrong my way of instantion since it works, for example in case of SpeakerDiarization:

  File "/home/.../PycharmProjects/.../pyannote_service.py", line 125, in diarize
    reseg = self.resegmentation_model(file=tensor_audio_mapping, diarization=diarization)
  File "/home/.../PycharmProjects/.../venv/lib/python3.10/site-packages/pyannote/audio/core/pipeline.py", line 304, in __call__
    raise RuntimeError(
RuntimeError: A pipeline must be instantiated with `pipeline.instantiate(parameters)` before it can be applied.

I initialize the model like this:

 self.resegmentation_model = Resegmentation(segmentation=MODEL_PATH_SEG)
 self.resegmentation_model.instantiate(co["params"])
 self.resegmentation_model.to(self.device)
# diarization is the diarization object produced by a speaker_d pipeline from pyannote
reseg = self.resegmentation_model(file=tensor_audio_mapping, diarization=diarization)

where co refers to a yaml which looks like this:

version: 3.1.0
pipeline:
  name: pyannote.audio.pipelines.SpeakerDiarization
  params:
    clustering: AgglomerativeClustering
    embedding: models/pyannote/pyannote_embedding.bin
    embedding_batch_size: 32
    embedding_exclude_overlap: true
    segmentation: models/pyannote/pyannote_segmentation.bin
    segmentation_batch_size: 32

params:
    min_duration_off: 0.0

Would appreciate any insights I might have missed out on or just a short clarification if the code is not intended for usage.

Minimal reproduction example (MRE)

Can be found in my example above

The text was updated successfully, but these errors were encountered:

asusdisciple · 2024-05-02T14:59:02Z

Okay for the Resegmentation pipeline it seems to be that it does not work with pyannote/segmentation-3.0. But it does work with pyannote/segmentation, which unfortunately gives me a a few warnings:

Just wanted to let you know.

Best regards

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the purpose of the Resegmentation and AdaptiveVoiceActivityDetection Pipeline? #1700

What is the purpose of the Resegmentation and AdaptiveVoiceActivityDetection Pipeline? #1700

asusdisciple commented Apr 30, 2024

asusdisciple commented May 2, 2024

What is the purpose of the Resegmentation and AdaptiveVoiceActivityDetection Pipeline? #1700

What is the purpose of the Resegmentation and AdaptiveVoiceActivityDetection Pipeline? #1700

Comments

asusdisciple commented Apr 30, 2024

Tested versions

System information

Issue description

Minimal reproduction example (MRE)

asusdisciple commented May 2, 2024