You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using silence detection to determine start and end blocks of transcribe through whisper, but I have an inherent issue where transcribed blocks around silence don't have the "correct" start time for the transcribed text.
i.e.
Say we have a block of 30s of audio...
00:05->00:06 Hi there
(silence) .. for a few seconds
00:06->00:11 Yeah I'm good thanks
^^^ The problem in this case is that the "Yeah" really starts at the 9 or 10s mark.
Is there some sort of setting in the transcription that causes this, or is this down to the model etc and just generally how it works?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I'm using silence detection to determine start and end blocks of transcribe through whisper, but I have an inherent issue where transcribed blocks around silence don't have the "correct" start time for the transcribed text.
i.e.
Say we have a block of 30s of audio...
00:05->00:06 Hi there
(silence) .. for a few seconds
00:06->00:11 Yeah I'm good thanks
^^^ The problem in this case is that the "Yeah" really starts at the 9 or 10s mark.
Is there some sort of setting in the transcription that causes this, or is this down to the model etc and just generally how it works?
Beta Was this translation helpful? Give feedback.
All reactions