Repetition in recordings #72

gudrob · 2024-05-10T11:08:08Z

So far everything has been working out of the box, so thank you for this great plugin!

Issue:
I'm having problems with repetition. Recognition is good, but the same sentence is repeated over and over.

What I have tried:
From what I can see in the whisper documentation, the entropy threashold should fix this.
But there seems to be no effect when I change the value.

entropy 2.8, default

entropy 5

entropy 0

If at all higher values make recognition less precise.

Is this related to the other problem regarding Voice Activation Detection?
I have tried changing the VAD threshold as well but that seems to be doing nothing.

I have also tried using a larger whisper model but that yields the same results, only slower.

gudrob · 2024-05-11T09:29:55Z

So I replaced the Capture Effect of the audio bus with a Record Effect. I used linear Interpolation to resample the data i got from GetRecording() from 48000 to 16000. This works with an astounding accuracy of ~95% ( I am not a native english speaker). No repetition, even recognizes names correctly.

While this approach works for me, i just couldnt get the sample capture implementation to work.

Ughuuu · 2024-05-13T11:27:39Z

Interesting, this sounds like it could be an issue with how I am doing the interpolation. This plugin currently uses libsamplerate for that, as seen here: https://github.com/V-Sekai/godot-whisper/blob/main/src/speech_to_text.cpp#L32

The resample function also exposes a InterpolatorType:

	enum InterpolatorType {
		SRC_SINC_BEST_QUALITY = 0,
		SRC_SINC_MEDIUM_QUALITY = 1,
		SRC_SINC_FASTEST = 2,
		SRC_ZERO_ORDER_HOLD = 3,
		SRC_LINEAR = 4,
	};

By default it's set to FASTEST

godot-whisper/bin/addons/godot_whisper/capture_stream_to_text.gd

Line 66 in c3682d7

var resampled = resample(_accumulated_frames, SpeechToText.SRC_SINC_FASTEST)

You could also give a try to set it to BEST_QUALITY see if there is a change. If not the solution/approach you did is pretty good as well, if you want you can make a new scene with it and add a PR for others to try.(if not I might if I get some time).

gudrob · 2024-05-14T09:56:25Z

@Ughuuu I have implemented this in C#, here https://github.com/gudatr/godot-ai-rpg/blob/main/scripts/SpeechRecognizer.cs but it greatly differs from the examples of the project. I tried writing the code in gdscript but I must admit that I am too inexperienced with it, especially if the implementation needs to be close to the samples, and currently have no motiviation to learn it, sorry.

Ughuuu · 2024-05-14T11:16:49Z

No worries, thanks for this, it's great! If anything it's a sample people can look at if they want to do sampling manually. I'm also busy but maybe in future I might take a stab at it.

gudrob changed the title ~~Repetition in recordings, e~~ Repetition in recordings May 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repetition in recordings #72

Repetition in recordings #72

gudrob commented May 10, 2024

gudrob commented May 11, 2024

Ughuuu commented May 13, 2024

gudrob commented May 14, 2024 •

edited

Loading

Ughuuu commented May 14, 2024

Repetition in recordings #72

Repetition in recordings #72

Comments

gudrob commented May 10, 2024

gudrob commented May 11, 2024

Ughuuu commented May 13, 2024

gudrob commented May 14, 2024 • edited Loading

Ughuuu commented May 14, 2024

gudrob commented May 14, 2024 •

edited

Loading