Greetings + LLM Chatbot Configuration #252

paulgrove · 2023-09-02T15:09:58Z

paulgrove
Sep 2, 2023

Hello willow community, what a great project - I can't wait to get involved.

I've ordered a ESP32-S3-BOX-3 which should arrive in a few weeks (couldn't find ESP32-S3-BOX), super excited to get everything set up.

I'll run WAS on a linux server, and for now I'll run WIS on my windows desktop (with a 3090) under WSL2 + docker.

Really hoping I can learn about this project and contribute.

I've got WIS working and have tested the STT via the web interface and it works great and was super easy to set up!

However I am struggling to configure the generative chatbot aspect, with the default settings I get the following error:

 huggingface_hub.utils._errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-64f34e76-58c5633c3ad3103b44f352df)

 Repository Not Found for url: https://huggingface.co/TheBloke/vicuna-13b-v1.3-GPTQ/resolve/main/tokenizer_config.json

Which I can confirm the url returns 401.

I had a search on huggingface and I found a very similarly named model called TheBloke/vicuna-13b-v1.3.0-GPTQ if I switch to this one I get another error:

 [2023-09-02 15:04:50 +0000] [89] [INFO] CHATBOT: Using model TheBloke/vicuna-13b-v1.3.0-GPTQ and CUDA, attempting load (this takes a while)...
[2023-09-02 15:04:52 +0000] [89] [ERROR] Traceback (most recent call last):
   File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 677, in lifespan
     async with self.lifespan_context(app) as maybe_state:
   File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 566, in __aenter__
     await self._router.startup()
   File "/usr/local/lib/python3.8/dist-packages/starlette/routing.py", line 656, in startup
     handler()
   File "/app/main.py", line 1063, in startup_event
     load_models()
   File "/app/main.py", line 401, in load_models
     chatbot_model = AutoGPTQForCausalLM.from_quantized(chatbot_model_path,
   File "/usr/local/lib/python3.8/dist-packages/auto_gptq/modeling/auto.py", line 82, in from_quantized
     return quant_func(
   File "/usr/local/lib/python3.8/dist-packages/auto_gptq/modeling/_base.py", line 698, in from_quantized
     raise FileNotFoundError(f"Could not find model in {model_name_or_path}")
 FileNotFoundError: Could not find model in TheBloke/vicuna-13b-v1.3.0-GPTQ

I've also tried selecting a few other different models, but I've yet to find a model that will work.

Can you advise how I can find and configure a compatible model from huggingface?

Thanks a million,
Paul

nikito · 2023-09-08T12:04:03Z

nikito
Sep 8, 2023

I don't yet have a GPU that can run LLMs, but from what I see on this model in huggingface TheBloke changed how this works with AutoGPTQ where you don't pass in a model_basename any more: https://huggingface.co/TheBloke/vicuna-13b-v1.3.0-GPTQ/discussions/3

Maybe this is the problem?

0 replies

paulgrove · 2023-09-08T21:35:41Z

paulgrove
Sep 8, 2023
Author

Just got the ESP32-S3-BOX-3 I know it's not yet supported, but just to confirm the ESP32-S3-BOX firmware doesn't seem to work on it :)

1 reply

kristiankielhofner Sep 9, 2023
Maintainer

We should be getting our BOX-3s shortly and will support them when we do.

paulgrove · 2023-09-08T21:55:08Z

paulgrove
Sep 8, 2023
Author

If anyone want's to take a look, here is the serial output after flashing, errors in WILLOW/CONFIG and WILLOW/MAIN:

I (112) esp_image: segment 0: paddr=00030020 vaddr=3c150020 size=55c90h (351376) map
I (183) esp_image: segment 1: paddr=00085cb8 vaddr=3fc9f6b0 size=06d7ch ( 28028) load
I (189) esp_image: segment 2: paddr=0008ca3c vaddr=40378000 size=035dch ( 13788) load
I (193) esp_image: segment 3: paddr=00090020 vaddr=42000020 size=14c3f4h (1360884) map
I (441) esp_image: segment 4: paddr=001dc41c vaddr=4037b5dc size=140d0h ( 82128) load
I (468) boot: Loaded app from partition at offset 0x30000
I (489) boot: Set actual ota_seq=1 in otadata[0]
I (489) boot: Disabling RNG early entropy source...
I (500) opi psram: vendor id : 0x0d (AP)
I (500) opi psram: dev id    : 0x03 (generation 4)
I (500) opi psram: density   : 0x05 (128 Mbit)
I (503) opi psram: good-die  : 0x01 (Pass)
I (508) opi psram: Latency   : 0x01 (Fixed)
I (513) opi psram: VCC       : 0x00 (1.8V)
I (518) opi psram: SRF       : 0x01 (Fast Refresh)
I (523) opi psram: BurstType : 0x01 (Hybrid Wrap)
I (529) opi psram: BurstLen  : 0x01 (32 Byte)
I (534) opi psram: Readlatency  : 0x02 (10 cycles@Fixed)
I (540) opi psram: DriveStrength: 0x00 (1/1)
I (545) spiram: Found 128MBit SPI RAM device
I (550) spiram: SPI RAM mode: sram 80m
I (554) spiram: PSRAM initialized, cache is in normal (1-core) mode.
I (561) cpu_start: Pro cpu up.
I (565) cpu_start: Starting app cpu, entry point is 0x403796d8
I (0) cpu_start: App cpu up.
I (1135) spiram: SPI SRAM memory test OK
I (1135) spiram: Instructions copied and mapped to SPIRAM
I (1264) spiram: Read only data copied and mapped to SPIRAM
I (1305) cpu_start: Pro cpu start user code
I (1305) cpu_start: cpu freq: 240000000
I (1305) cpu_start: Application information:
I (1308) cpu_start: Project name:     willow
I (1313) cpu_start: App version:      0.1.0-rc.1-7-g5c5dad6
I (1319) cpu_start: Compile time:     Sep  8 2023 21:44:23
I (1325) cpu_start: ELF file SHA256:  e5d3eaffa2f25169...
I (1331) cpu_start: ESP-IDF:          v4.4.5-dirty
I (1337) cpu_start: Min chip rev:     v0.0
I (1342) cpu_start: Max chip rev:     v0.99
I (1347) cpu_start: Chip rev:         v0.2
I (1351) heap_init: Initializing. RAM available for dynamic allocation:
I (1359) heap_init: At 3FCACAB8 len 0003CC58 (243 KiB): D/IRAM
I (1365) heap_init: At 3FCE9710 len 00005724 (21 KiB): STACK/DIRAM
I (1372) heap_init: At 600FE000 len 00002000 (8 KiB): RTCRAM
I (1378) spiram: Adding pool of 14656K of external SPI memory to heap allocator
I (1387) spi_flash: detected chip: gd
I (1391) spi_flash: flash io: dio
I (1395) sleep: Configure to isolate all GPIO pins in sleep state
I (1402) sleep: Enable automatic switching of GPIO sleep configuration
I (1409) cpu_start: Starting scheduler on PRO CPU.
I (0) cpu_start: Starting scheduler on APP CPU.
I (1424) spiram: Reserving pool of 16K of internal memory for DMA/internal allocations
I (00:35:12.023) WILLOW/MAIN: Starting up! Please wait...
I (00:35:12.108) WILLOW/MAIN: SPIFFS mounted
I (00:35:12.108) WILLOW/CONFIG: opening /spiffs/user/config/willow.json
E (00:35:12.169) WILLOW/CONFIG: failed to open /spiffs/user/config/willow.json
E (00:35:14.554) WILLOW/MAIN: failed to open NVS namespace WIFI: ESP_OK

2 replies

nikito Sep 8, 2023

Those errors are usually what happens if you flash directly without using WAS, it's indicating that the wifi NVS isn't present. Did you follow the guide on heywillow.io?

paulgrove Sep 8, 2023
Author

Ah OK I have flashed using WAS, but got a blank screen, I thought I would try building and flashing the latest main branch in case there were any relevant fixes. If I am flashing directly what should I do?

paulgrove · 2023-09-08T22:44:44Z

paulgrove
Sep 8, 2023
Author

I reflashed with WAS and attached a serial monitor, here is the end of the log - can see errors relating to the i2c lcd panel:

I (00:01:15.220) WILLOW/WAS: WebSocket connected
I (00:01:15.926) WILLOW/HTTP: HTTP status='200' content_length='2367'
I (00:01:15.985) WILLOW/AUDIO: audio_hal_ctrl_codec: ESP_OK

----------------------------- ESP Audio Platform -----------------------------
|                                                                            |
|                       ESP_AUDIO-v1.7.2-20e6bd0-b92a149                     |
|                     Compile date: Nov 30 2022-07:50:12                     |
------------------------------------------------------------------------------
I (00:01:16.017) WILLOW/AUDIO: audio player initialized
E (00:01:16.021) I2S: register I2S object to platform failed
I (00:01:16.029) WILLOW/AUDIO: Using record buffer '12'
E (00:01:16.034) WILLOW/AUDIO: multinet not supported but enabled in config
MC Quantized wakenet9: wakeNet9_v1h24_hiesp_3_0.63_0.635, tigger:v3, mode:2, p:0, (Jul 11 2023 15:43:08)
I (00:01:16.213) WILLOW/AUDIO: app_main() - start_rec() finished
E (00:01:16.214) lcd_panel.io.i2c: panel_io_i2c_rx_buffer(128): i2c transaction failed
E (00:01:16.217) TT21100: esp_lcd_touch_tt21100_read_data(173): I2C read error!
E (00:01:16.225) TT21100: esp_lcd_touch_new_i2c_tt21100(103): TT21100 init failed
E (00:01:16.233) TT21100: Error (0xffffffff)! Touch controller TT21100 initialization failed!
E (00:01:16.243) WILLOW/LVGL: failed to initialize touch screen: ESP_FAIL
I (00:01:16.250) WILLOW/NETWORK: MAC address: 60:55:f9:f8:38:0c
I (00:01:16.256) WILLOW/MAIN: Startup complete! Version: 0.1.0-rc.1. Waiting for wake word.
I (00:01:26.266) WILLOW/TIMER: Wake LCD timeout, turning off LCD

I'll mention that the Wake word doesn't seem to be triggering, but with the other hardware issues, I wont worry about that for now :)

3 replies

kristiankielhofner Sep 9, 2023
Maintainer

This is helpful feedback and output for the BOX-3, thanks!

Not surprising the BOX image doesn't work out of the box, we'll need to add specific hardware support for it.

paulgrove Sep 9, 2023
Author

I was hoping to take a look, I have had some experience working with embedded controllers and have messed around with ESP32's a little on a hobby basis.

Though as a whole I'm not yet well versed in the esp-box ecosystem.

I was hoping that it might be as simple as configuring ESP-IDF for the correct board, but I've been going through the docs and examples and I'm wondering if you think we might have to update ESP-IDF?

kristiankielhofner Sep 9, 2023
Maintainer

It's complicated...

We use ESP-ADF for audio handling (of course). ESP-ADF is a little strange in that it's not implemented as a component to IDF, rather IDF is a component of ADF (sort of). Support for audio boards in ADF doesn't come from IDF BSPs but rather it's own "Audio HAL". Using BSPs with ADF would be nice because like most embedded platforms they are the standard for hardware support and if ADF used BSPs we could just pull hardware support directly from the esp-box github repo for the BOX-3. Instead, we have to either wait for BOX-3 support in ADF or add it ourselves (being inspired by the BOX-3 BSP in this case). Same for M5stack and a variety of other boards.

Anyway, ESP-ADF only very recently supports newer versions of IDF (5.x) with ADF 2.6. We have a branch that supports ADF 2.6 with IDF 4.4 but support for IDF 5.x is still a work in progress for ADF and Willow as a result.

In any case support for the BOX-3 will largely be implemented via the Audio HAL in ADF. Long term I have the very lofty goal of eliminating the use of ADF but that will be a long time coming as ADF has a significant amount of functionality we currently utilize.

kristiankielhofner · 2023-09-09T12:45:14Z

kristiankielhofner
Sep 9, 2023
Maintainer

Generally speaking with LLMs we'll be taking a different route shortly which will eventually include removing LLM support from WIS.

With WAS our intention is to insert WAS in the flow between Willow devices and WIS/HA/etc with WAS applications/integrations for all kinds of interesting pipelining of combined functionality:

Willow -> WAS -> WIS transcript -> LLM via OpenAI API/vllm/lmdeploy/TGI/etc with WAS -> potentially other things -> what we do today.

4 replies

paulgrove Sep 9, 2023
Author

That suits me well, I was mostly just trying out the WIS integrated LLM out of interest, with willow its the TTS STT pipeline that interested me the most and was hoping it integrate it with my own pipeline and experiments.

kristiankielhofner Sep 9, 2023
Maintainer

LLM support in WIS was implemented before we saw things like TGI, vLLM, lmdeploy, etc ready for prime-time. We also didn't have WAS until very recently (and it's still very, very early). The LLM space moves so quickly and these projects are so mature and advanced at this point it no longer makes sense for us to try to support LLMs directly with WIS.

I personally use them for my own LLM-related projects.

WIS will continue to support TTS and STT as it's the most performant and highly adapted for Willow and other latency sensitive use cases.

paulgrove Sep 9, 2023
Author

The STT performance is amazing - completely blown me out of the water! I saw mention of Bark and other TTS as potentials in the future which is pretty exciting also.

kristiankielhofner Sep 9, 2023
Maintainer

Thanks! Most of the Whisper performance comes thanks to ctranslate2 but we have a custom feature extractor, streaming from Willow, etc that results in very fast response times (the fastest available to my knowledge).

TTS is a little frustrating... Other than our current SpeechT5 we haven't found anything that comes close in terms of performance, memory usage, etc. It has a lot of limitations and it's not the highest quality voice but compared to waiting several seconds (at best) for TTS synthesis it's the best overall approach for the time being. I watch this space very closely and continue to evaluate new approaches as they are made available but no dice so far.

kristiankielhofner · 2023-09-09T18:17:30Z

kristiankielhofner
Sep 9, 2023
Maintainer

@paulgrove - If you want to test I have a branch with several updates and fixes including chatbot support:

git pull
git checkout update/23.09
./utils.sh build-docker
./utils.sh run

Oh one additional thing - you will probably want to adjust your custom_settings.py inspired by the in-tree settings.py for chatbot parameters.

2 replies

paulgrove Sep 9, 2023
Author

Amazing!

kristiankielhofner Sep 9, 2023
Maintainer

Glad it worked out for you - that was a monster commit and you're the first person to test it outside of my dev and test environments.

Now all we need to do is implement support for your BOX-3 and you'll be in business!

mikey60 · 2023-09-29T02:24:56Z

mikey60
Sep 29, 2023

@kristiankielhofner I finally decided to try this test branch. The chatbot is disabled even with 'support_chatbot: bool = True' in the custom_settings.py configuration. It says that Device 0 is pre-Volta. The normal "Hi ESP" wakeup and Home Assistant functions are working the same as the original WIS.
That section of the startup messages is as follows:

willow-inference-server-wis-1    | [2023-09-29 02:06:11 +0000] [90] [INFO] CUDA: Detected 1 device(s)
willow-inference-server-wis-1    | [2023-09-29 02:06:11 +0000] [90] [INFO] CUDA: Device 0 name: NVIDIA GeForce GTX 1070
willow-inference-server-wis-1    | [2023-09-29 02:06:11 +0000] [90] [INFO] CUDA: Device 0 capability: 61
willow-inference-server-wis-1    | [2023-09-29 02:06:11 +0000] [90] [INFO] CUDA: Device 0 total memory: 8589672448 bytes
willow-inference-server-wis-1    | [2023-09-29 02:06:11 +0000] [90] [INFO] CUDA: Device 0 free memory: 7582253056 bytes
willow-inference-server-wis-1    | [2023-09-29 02:06:11 +0000] [90] [WARNING] CUDA: Device 0 has low memory, disabling chunking support
willow-inference-server-wis-1    | [2023-09-29 02:06:11 +0000] [90] [WARNING] CUDA: Device 0 is pre-Volta, forcing int8
willow-inference-server-wis-1    | [2023-09-29 02:06:11 +0000] [90] [WARNING] CUDA: Device 0 is pre-Volta, disabling chatbot

6 replies

mikey60 Sep 30, 2023

@nikito,
Thank you. That is very helpful. I should have done more research. I thought the RTX3090 that @paulgrove used also did not support the Volta architecture. My bad. Maybe some day in the future I might tackle oobabooga. There is always the option to invest $700+ in a used RTX3090, but I am not to that point.

nikito Sep 30, 2023

I think 3090 is ampere, and the 4x series is ada Lovelace? In any event glad I could help, hope you are enjoying willow! 😊

paulgrove Dec 23, 2023
Author

Oh damn, I just got a 1070 to stick in my server so I can host this more permanently than running on my desktop. I am current in understanding that Willow inference will not be able to support this GPU?

paulgrove Dec 23, 2023
Author

Sorry to clarify. I mean, will a 1070 work on the voice inference? (I'm doing a different LLM chatbot solution - currently playing with functionary)

nikito Dec 23, 2023

Yes 1070 works fine for inference and tts 🙂

kristiankielhofner · 2023-09-30T20:33:35Z

kristiankielhofner
Sep 30, 2023
Maintainer

Nvidia has a handy reference to show compute capability for their GPUs. Compute capability is what the code actually uses to configure/enable/disable functionality as compute capability is what gets returned in software.

The CUDA Wikipedia page is also generally accurate and it includes the micro-architecture friendly names that we use in WIS logging messages to the user.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Greetings + LLM Chatbot Configuration #252

{{title}}

Replies: 8 comments 18 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Greetings + LLM Chatbot Configuration #252

paulgrove Sep 2, 2023

Replies: 8 comments · 18 replies

nikito Sep 8, 2023

paulgrove Sep 8, 2023 Author

kristiankielhofner Sep 9, 2023 Maintainer

paulgrove Sep 8, 2023 Author

nikito Sep 8, 2023

paulgrove Sep 8, 2023 Author

paulgrove Sep 8, 2023 Author

kristiankielhofner Sep 9, 2023 Maintainer

paulgrove Sep 9, 2023 Author

kristiankielhofner Sep 9, 2023 Maintainer

kristiankielhofner Sep 9, 2023 Maintainer

paulgrove Sep 9, 2023 Author

kristiankielhofner Sep 9, 2023 Maintainer

paulgrove Sep 9, 2023 Author

kristiankielhofner Sep 9, 2023 Maintainer

kristiankielhofner Sep 9, 2023 Maintainer

paulgrove Sep 9, 2023 Author

kristiankielhofner Sep 9, 2023 Maintainer

mikey60 Sep 29, 2023

mikey60 Sep 30, 2023

nikito Sep 30, 2023

paulgrove Dec 23, 2023 Author

paulgrove Dec 23, 2023 Author

nikito Dec 23, 2023

kristiankielhofner Sep 30, 2023 Maintainer

paulgrove
Sep 2, 2023

Replies: 8 comments 18 replies

nikito
Sep 8, 2023

paulgrove
Sep 8, 2023
Author

kristiankielhofner Sep 9, 2023
Maintainer

paulgrove
Sep 8, 2023
Author

paulgrove Sep 8, 2023
Author

paulgrove
Sep 8, 2023
Author

kristiankielhofner Sep 9, 2023
Maintainer

paulgrove Sep 9, 2023
Author

kristiankielhofner Sep 9, 2023
Maintainer

kristiankielhofner
Sep 9, 2023
Maintainer

paulgrove Sep 9, 2023
Author

kristiankielhofner Sep 9, 2023
Maintainer

paulgrove Sep 9, 2023
Author

kristiankielhofner Sep 9, 2023
Maintainer

kristiankielhofner
Sep 9, 2023
Maintainer

paulgrove Sep 9, 2023
Author

kristiankielhofner Sep 9, 2023
Maintainer

mikey60
Sep 29, 2023

paulgrove Dec 23, 2023
Author

paulgrove Dec 23, 2023
Author

kristiankielhofner
Sep 30, 2023
Maintainer