Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to add noise or reverb into sample label to increase the recall ? #38

Open
Leeviber opened this issue Aug 7, 2023 · 4 comments

Comments

@Leeviber
Copy link

Leeviber commented Aug 7, 2023

Hi,
The performance of the model are really good when the voice is clean, however if the background is not clean with some noisy or room reverb, the recall rate is really low. is it possible to add some background noise or reverb into keyword audio sample to increase the detect rate under complex scene, Will it affect the recognition success rate of the model? Is such data enhancement done during training?

@TheSeriousProgrammer
Copy link
Contributor

We have added some augmentations during training, but reverb was not included.

During high noise situations, the hotword detector may face issues because it is trained to look for all vocal patterns and match them with the user's provided samples.

One possible solution is to treat the base model as a foundation model and fine-tune it on around 5-10 user-provided samples for a specific word. However, this may cripple the model's ability to identify new words out of the blue.

As suggested, you can consider adding some samples with noise and variations in accents for a word's pronunciation that you want to consider.

After this, you can increase the accuracy threshold.

@TheSeriousProgrammer
Copy link
Contributor

I am currently looking at a clip like architecture to better boost the perfomance of the system

@damian-666
Copy link

i use a cartoid directional moouse and get good performance in general with a sure sv 1000 ro the legendary 21m 58. These are heavy and will give you strain in 4 hours, but on a stand if you hare doing hands free work.. commanding your computer they are the best.. I have a small room fans running , etc. i use a preamp its about 200$ total and there might be cheaper karaoke mic but look for a dynamic , not condenser mic ( though some mght work ok) but make sure its verry directional in is pattern. even mic arrays on lap tops dont work well for this, a singers mic is the best IMO

@TheSeriousProgrammer
Copy link
Contributor

Like @damian-666 pointed out voice assistants employ directional mics to combat the same problem. The idea is that noise heard by the all mics would be uniform but the volume of voice heard by the mic won't be uniform, they employ some simple math to achieve noise reduction by a great level. This is difficult to do so with a single mic. It would help if you could share a video recording of the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants