Replies: 8 comments 7 replies
-
Great question and one I like to talk about A LOT. Willow is the first open source project to deliver truly Alexa-level commercial grade wake because I have > 20 years in audio, speech, etc experience and very few people understand just how challenging it is to get truly commercial grade reliable wake while minimizing false wake. There are a lot of people out there who think you can just show up, repeat a word a few times, and have reliable wake. Doesn't happen. Not to mention the other issues with audio processing on device to get clean audio in far-field and acoustically challenging environments... The process they lay out is essentially the industry standard for wake engines. In fact, the entire stack we build on (from Espressif) including the built-in wake words has been tested and qualified by Amazon themselves to be an Alexa platform frontend. So when I say we're "Alexa grade" I mean it in a very real sense. As you can see from the docs the standard is daunting - 20k samples, > 500 speakers (including children), professional recording, audio engineering, significant testing and validation, etc. So yes, very expensive. We use provided wake words (Alexa, Hi ESP, etc) because they were trained with that process and are freely available with the underlying ESP-SR we leverage. We have plans to fund "Hi Willow" or something along those lines depending on input from the community. Additionally, Willow has received a bit of commercial interest as expected and that is our exact monetization strategy - custom manufactured hardware devices with custom wake and UI, etc. See here for an example. |
Beta Was this translation helpful? Give feedback.
-
Awesome project! Just what I've been looking for to get away from Amazon or Google for my voice commands for my home automation. May I place my vote in for a single word wake word? Just plain "Willow" would be fine. |
Beta Was this translation helpful? Give feedback.
-
First I just want to say a big thanks for developing this project!! 😄 Willow is super cool, and I'm excited to see where it goes! I had almost given up all hope on voice assistants but this has reinvigorated my interest! I love the name Willow – so much, in fact, that my dog shares the name! Coincidentally, one of my friends also has a dog named Willow. So using "Hi Willow" might lead to some really confused puppers around our homes. As for "Hi ESP," it's not really doing it for me, and I'd rather avoid using "Alexa" as well. (Also, could there be any potential legal issues with enabling "Alexa"?) As a huge Star Trek fan, using "computer" as a wake word would be an awesome throwback, and it does have those three syllables we're aiming for. But I get it; "computer" might be too common in everyday chatter for practical use. Here's to hoping LLMs bring us closer to context-aware voice assistants! A few more ideas:
|
Beta Was this translation helpful? Give feedback.
-
Thanks! We got pretty lucky with Willow. Naming projects and companies is one of my least favorite things and Willow came from a friend of mine who was excited about Whisper but kept calling it Willow for some reason ;). Hi ESP is actually a terrible wake word. It's very awkward, and we're learning again and again that some people can't quite nail it, largely because of having to clearly annunciate E-S-P. Right now our only other real alternative is "Alexa", which people also don't like for obvious reasons. Our likely approach will be to select a few finalists (with my filtering of wake words that are clearly impractical) and start Kickstarter or similar campaigns for each of them with pass-through pricing of the Espressif costs. Anything that achieves the target fundraising goal will be produced. For example, "computer" does seem to be a popular one. I'd never use it, but that's not a reason to create it (it's actually the only non-Amazon branded wake word supported with Alexa/Echo). Espressif currently provides Alexa with the esp-sr framework and I can't imagine there is any legal exposure for us leveraging it. Your other suggestions (Orion, Harmony, Nebula) are excellent and if we did a poll I suspect at least one of them would be a finalist. |
Beta Was this translation helpful? Give feedback.
-
Thanks Kristian. 😊 I agree dslugPX, one word and three syllables is definitely the way to go. 👍 |
Beta Was this translation helpful? Give feedback.
-
Three syllables minimum is a hard requirement. In terms of other suggestions, the large commercial voice assistants have carefully selected wake words for good reason. Things like:
We're here because we appreciate Willow and the fundamental merits of it vs Alexa, etc. We value flexibility, user choice, and user control. However, to reach our primary goal of being competitive with these commercial projects we should learn from some of the approaches they have taken to achieve the quality user experience (compared to existing open source voice interfaces) they provide. There are some more-or-less hard lessons learned and fundamental rules with voice interfaces and we'd be foolish to think these rules don't also apply to us. For things as fundamental as wake word we need to be careful not to nerf the user experience significantly because we didn't consider all of these fundamental issues for additional wake words. People have strong opinions on wake word and in an ideal world the scenario everyone seems to want of "let me use anything" is a recipe for disaster as the many failed projects attempting this route demonstrate. In short, it's unlikely everyone is going to be completely happy with one or more selected wake words. On that, I don't really know what to tell you - for something like wake word there are many good reasons why it "is what it is". My hope is that Willow is so overwhelming valuable and otherwise useful that having to prefix a wake word with "Hi/Hey" or use a name/word you generally don't love isn't a deal-breaker for use of Willow. The reality basically boils down to "custom wake words don't work, you can certainly attempt it with the other open source options but the performant, reliable, and accurate choices currently are Alexa/Google/etc or Willow". |
Beta Was this translation helpful? Give feedback.
-
I believe it's essential to find a wake word that resonates with users and avoids any potential awkwardness or annoyance, especially considering it will be used frequently every day. For example, "Okay/Hey Google" is considered cumbersome by numerous people. Using "Hi Willow" may also lead to frustration, especially for those who have personal connections to the name. Alexa, though an efficient wake word (one word, three syllables, and easy pronunciation), has likely caused inconvenience to individuals with that name in their day-to-day lives. I personally know someone named Siri who wasn't too pleased when Apple chose that name for their voice assistant. In 2021, Willow ranked as the 39th most popular name for girls, which indicates a considerable number of potential users sharing this name. I'd like to revisit "Computer" again. This article provides some interesting insights into the benefits and reasons for adopting it: https://www.salon.com/2017/11/26/dont-call-it-siri-why-the-wake-word-should-be-computer/ When LLMs become more efficient, we could potentially use context-aware reasoning to determine user intent and improve conversation state retention. This approach would likely require devices to continuously stream input audio to a server capable of handling the LLM. Local implementation will help address privacy concerns, in addition to the open-source nature of the project. Combining Whisper with an LLM might even eliminate the need for wake word training altogether, and present additional benefits. While this might be a project for the future, I believe it would be advantageous to have the continuous input audio stream capability already in the existing code to support such a feature. |
Beta Was this translation helpful? Give feedback.
-
i would love to learn the process of wake word generation for these boxes. at the moment the fact we are tied to hi esp and alexa with only paying Espressif to generate private wakewords is a show stopper for the esp32 box. i'd rather throw rassphy on a device instead, i can train my wakeword and point it at WIS. i understand its a complex process but we need the ability to train new wake words on our own without Espressif. edit, do we know the cost of wake word creation? |
Beta Was this translation helpful? Give feedback.
-
for the full service "we organise it all for you, including collecting hundreds of samples" option.. doesn't sound cheap.
https://docs.espressif.com/projects/esp-sr/en/latest/esp32s3/wake_word_engine/ESP_Wake_Words_Customization.html
Beta Was this translation helpful? Give feedback.
All reactions