Provide example of live recognition with microphone on various platforms #316

smbika007 · 2022-10-19T17:33:12Z

Hi.

I need to use pocketsphinx with a microphone only. I was able to do so in the 5prealpha version found on SourceForge with the pocketsphinx_continuous program. It has since been "retired" (Blade Runner style, it seems) and I have not found any real replacement yet in the code on Github (this code).

Does this pocketsphinx (5.0.0) support the use of a microphone?

Thanks,
Sean

dhdaines · 2022-10-19T17:38:21Z

No, PocketSphinx 5.0.0 command-line and C API does not support the use of a microphone. See the rationale here: https://cmusphinx.github.io/2022/08/pocketsphinx-continuous/

PocketSphinx Python API does support microphone input. See documentation here: https://pocketsphinx.readthedocs.io/en/latest/

smbika007 · 2022-10-19T18:10:48Z

David,
Thanks for the reply. I may have to stick with 5prealpha then. I appreciate the rationale page but I have to say, all I needed to do to make use of it the way I needed was to essentially copy the code out of pocketsphinx_continuous and graft it into my program. It worked as near to perfectly as one could expect and was the ideal choice for my company's application needs. I am not allowed to use python for this because our MO is to not use scripting languages in our active environment. They are generally slower and we need lightning fast turnaround. Our use case for it was strictly microphone access against a very small and specific grammar which limited ambiguity in a verbal commanding situation. I've found that pocketsphinx was not very good at general dictation even with a large vocabulary.

Ah, well.

Thanks,
Sean

dhdaines · 2022-10-19T18:18:24Z

Hi Sean,

Thanks for the detailed reply! The issue is mainly that I very much do not want PocketSphinx to be in the business of interfacing with the microphone, because this creates a lot of maintainability and portability issues. I'm actually a bit surprised that the pocketsphinx_continuous code worked so well for you :)

Because I think there are at least a few people in your specific situation, I will provide an example of using PortAudio streams to do live recognition. I'm not enthusiastic about the idea of actually adding PortAudio as a dependency, and I think its API is rather unpleasant, but it seems like the least-hassle solution to the removal of pocketsphinx_continuous.

And yes, PocketSphinx is not to be used for general dictation, it is about 30 years out of date on that front. In fact, I am not convinced it should be used for anything, but I felt it needed to be cleaned up and the build system fixed, so...

dhdaines · 2022-10-19T18:19:09Z

(link to PortAudio documentation: http://files.portaudio.com/docs/v19-doxydocs/tutorial_start.html)

Also I have reopened this issue and changed its name!

smbika007 · 2022-10-19T18:23:16Z

My thanks, again! I will consider PortAudio as a possible mitigation to this. FTR, though, I've found the pocketsphinx_continuous code worked exceedingly well on all of the Windows 10 platforms and on Ubuntu in a VM which used the Windows box's native audio features. Could be I just got lucky ;-)

dhdaines · 2022-10-19T18:24:45Z

Hmm! Perhaps I can just pull out the old audio code and put it in the example then... mainly the issue is not wanting it to be in the library itself.

dhdaines · 2022-10-19T18:28:23Z

For PortAudio, it's specifically the "Blocking I/O" calls that are needed, the callback-based API is totally unsuitable for doing ASR:

http://portaudio.com/docs/v19-doxydocs/blocking_read_write.html

smbika007 · 2022-10-19T18:59:13Z

Hmm! Perhaps I can just pull out the old audio code and put it in the example then... mainly the issue is not wanting it to be in the library itself.

Putting it in the examples is fine by me. The use cases for sphinx should include it for the purposes of verbal commanding which it seems to do quite well. The java versions of sphinx all have it and indeed my first experience with it in our domain was the Java version. It worked fine too but the reason we moved to the C version was because the grammar compiler they used was too strict and when I introduced a grammar that include a LOT of variants, the compiler choked. I switch to the simpler version which is a single perl script and that was all I needed to add anything I wanted in free style.

It can easily be caveated as legacy code which some oddballs like me found useful...LOL

Don't write sphinx off as outdated just yet. I've found that if it still works to ones satisfaction and can be maintained easily, it's still a useful member of society ;-)

dhdaines · 2022-10-19T19:06:07Z

Good to know! The grammar support could stand to be improved - there's a bit of a performance regression in 5.0.0 because some optimizations that were being done when compiling JSGF to FSG resulted in incorrect grammars. I just created an issue for this #317

And of course PocketSphinx is actually quite useful for alignment as well.

dhdaines · 2022-10-19T20:08:21Z

Working on this here: #319

The PortAudio example seems to work well though I haven't yet tried it on Windows - the CMake code to detect it almost certainly won't work there, I'll check that soon.

dhdaines · 2022-10-21T23:47:51Z

The Win32 example (https://github.com/cmusphinx/pocketsphinx/blob/live_examples/examples/live_win32.c) ought to work at least as well as the 5prealpha code, which is to say, maybe not all that well at all. The microphone on my Windows laptop seems very noisy, so the endpointer gives a lot of false positives for the first 30 seconds or so.

dhdaines closed this as completed Oct 19, 2022

dhdaines reopened this Oct 19, 2022

dhdaines changed the title ~~Is there microphone support for Pocketsphinx 5.0.0?~~ Provide example of live recognition with PortAudio streams Oct 19, 2022

dhdaines changed the title ~~Provide example of live recognition with PortAudio streams~~ Provide example of live recognition with microphone on various platforms Oct 19, 2022

dhdaines mentioned this issue Oct 19, 2022

CMake fail: Checking for module 'gobject-2.0' Can't find gobject-2.0.pc... #315

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide example of live recognition with microphone on various platforms #316

Provide example of live recognition with microphone on various platforms #316

smbika007 commented Oct 19, 2022

dhdaines commented Oct 19, 2022

smbika007 commented Oct 19, 2022

dhdaines commented Oct 19, 2022

dhdaines commented Oct 19, 2022

smbika007 commented Oct 19, 2022

dhdaines commented Oct 19, 2022

dhdaines commented Oct 19, 2022

smbika007 commented Oct 19, 2022 •

edited

Loading

dhdaines commented Oct 19, 2022

dhdaines commented Oct 19, 2022

dhdaines commented Oct 21, 2022

Provide example of live recognition with microphone on various platforms #316

Provide example of live recognition with microphone on various platforms #316

Comments

smbika007 commented Oct 19, 2022

dhdaines commented Oct 19, 2022

smbika007 commented Oct 19, 2022

dhdaines commented Oct 19, 2022

dhdaines commented Oct 19, 2022

smbika007 commented Oct 19, 2022

dhdaines commented Oct 19, 2022

dhdaines commented Oct 19, 2022

smbika007 commented Oct 19, 2022 • edited Loading

dhdaines commented Oct 19, 2022

dhdaines commented Oct 19, 2022

dhdaines commented Oct 21, 2022

smbika007 commented Oct 19, 2022 •

edited

Loading