Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide example of live recognition with microphone on various platforms #316

Open
smbika007 opened this issue Oct 19, 2022 · 11 comments
Open

Comments

@smbika007
Copy link

Hi.

I need to use pocketsphinx with a microphone only. I was able to do so in the 5prealpha version found on SourceForge with the pocketsphinx_continuous program. It has since been "retired" (Blade Runner style, it seems) and I have not found any real replacement yet in the code on Github (this code).

Does this pocketsphinx (5.0.0) support the use of a microphone?

Thanks,
Sean

@dhdaines
Copy link
Contributor

No, PocketSphinx 5.0.0 command-line and C API does not support the use of a microphone. See the rationale here: https://cmusphinx.github.io/2022/08/pocketsphinx-continuous/

PocketSphinx Python API does support microphone input. See documentation here: https://pocketsphinx.readthedocs.io/en/latest/

@smbika007
Copy link
Author

David,
Thanks for the reply. I may have to stick with 5prealpha then. I appreciate the rationale page but I have to say, all I needed to do to make use of it the way I needed was to essentially copy the code out of pocketsphinx_continuous and graft it into my program. It worked as near to perfectly as one could expect and was the ideal choice for my company's application needs. I am not allowed to use python for this because our MO is to not use scripting languages in our active environment. They are generally slower and we need lightning fast turnaround. Our use case for it was strictly microphone access against a very small and specific grammar which limited ambiguity in a verbal commanding situation. I've found that pocketsphinx was not very good at general dictation even with a large vocabulary.

Ah, well.

Thanks,
Sean

@dhdaines
Copy link
Contributor

Hi Sean,

Thanks for the detailed reply! The issue is mainly that I very much do not want PocketSphinx to be in the business of interfacing with the microphone, because this creates a lot of maintainability and portability issues. I'm actually a bit surprised that the pocketsphinx_continuous code worked so well for you :)

Because I think there are at least a few people in your specific situation, I will provide an example of using PortAudio streams to do live recognition. I'm not enthusiastic about the idea of actually adding PortAudio as a dependency, and I think its API is rather unpleasant, but it seems like the least-hassle solution to the removal of pocketsphinx_continuous.

And yes, PocketSphinx is not to be used for general dictation, it is about 30 years out of date on that front. In fact, I am not convinced it should be used for anything, but I felt it needed to be cleaned up and the build system fixed, so...

@dhdaines dhdaines reopened this Oct 19, 2022
@dhdaines
Copy link
Contributor

(link to PortAudio documentation: http://files.portaudio.com/docs/v19-doxydocs/tutorial_start.html)

Also I have reopened this issue and changed its name!

@dhdaines dhdaines changed the title Is there microphone support for Pocketsphinx 5.0.0? Provide example of live recognition with PortAudio streams Oct 19, 2022
@smbika007
Copy link
Author

My thanks, again! I will consider PortAudio as a possible mitigation to this. FTR, though, I've found the pocketsphinx_continuous code worked exceedingly well on all of the Windows 10 platforms and on Ubuntu in a VM which used the Windows box's native audio features. Could be I just got lucky ;-)

@dhdaines
Copy link
Contributor

Hmm! Perhaps I can just pull out the old audio code and put it in the example then... mainly the issue is not wanting it to be in the library itself.

@dhdaines
Copy link
Contributor

For PortAudio, it's specifically the "Blocking I/O" calls that are needed, the callback-based API is totally unsuitable for doing ASR:

http://portaudio.com/docs/v19-doxydocs/blocking_read_write.html

@dhdaines dhdaines changed the title Provide example of live recognition with PortAudio streams Provide example of live recognition with microphone on various platforms Oct 19, 2022
@smbika007
Copy link
Author

smbika007 commented Oct 19, 2022

Hmm! Perhaps I can just pull out the old audio code and put it in the example then... mainly the issue is not wanting it to be in the library itself.

Putting it in the examples is fine by me. The use cases for sphinx should include it for the purposes of verbal commanding which it seems to do quite well. The java versions of sphinx all have it and indeed my first experience with it in our domain was the Java version. It worked fine too but the reason we moved to the C version was because the grammar compiler they used was too strict and when I introduced a grammar that include a LOT of variants, the compiler choked. I switch to the simpler version which is a single perl script and that was all I needed to add anything I wanted in free style.

It can easily be caveated as legacy code which some oddballs like me found useful...LOL

Don't write sphinx off as outdated just yet. I've found that if it still works to ones satisfaction and can be maintained easily, it's still a useful member of society ;-)

@dhdaines
Copy link
Contributor

Good to know! The grammar support could stand to be improved - there's a bit of a performance regression in 5.0.0 because some optimizations that were being done when compiling JSGF to FSG resulted in incorrect grammars. I just created an issue for this #317

And of course PocketSphinx is actually quite useful for alignment as well.

@dhdaines
Copy link
Contributor

Working on this here: #319

The PortAudio example seems to work well though I haven't yet tried it on Windows - the CMake code to detect it almost certainly won't work there, I'll check that soon.

@dhdaines
Copy link
Contributor

The Win32 example (https://github.com/cmusphinx/pocketsphinx/blob/live_examples/examples/live_win32.c) ought to work at least as well as the 5prealpha code, which is to say, maybe not all that well at all. The microphone on my Windows laptop seems very noisy, so the endpointer gives a lot of false positives for the first 30 seconds or so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants