WOW Great extension! The best TTS extension out there! Here are some code fixes for auto play and installation! #3

RandomInternetPreson · 2023-11-19T21:41:54Z

Firstly, thank you for taking the time to do this!!! OMG it's fast, does perfect inflections, this is eleven labs quality on my local machine AMAZING!!!!!

Here is some information to make the extension work a bit better, I'm on a windows machine so my experience might be unique to that.

Auto-play keeps trying to play all audio clips in the history to fix this change this:

def history_modifier(history):
if len(history["internal"]) > 0:
history["visible"][-1] = [
history["visible"][-1][0],
history["visible"][-1][1].replace(
"controls autoplay>", "controls>")
]
return history

to this:

def history_modifier(history):
if len(history["internal"]) > 0:
history["visible"][-1] = [
history["visible"][-1][0],
history["visible"][-1][1].replace(
"controls autoplay style="height: 30px;">", "controls style="height: 30px;">")
]
return history

The initial loading of the extension was not successful, this is because the folder that is created in the oob extension directory has the horizontal dashes, users need to change the folder name from:

text-generation-webui-xtts

to:

text_generation_webui_xtts

Seriously amazing stuff, thank you again for integrating this into oobabooga. I will do a pr just to have a copy to mess around with, but I'll direct people to this repo.

allenhs · 2023-11-19T22:29:04Z

Thanks!

I had to make a chance get get it to work right in linux for me.

I changed:

"controls autoplay style="height: 30px;">", "controls style="height: 30px;">")

to:

'controls autoplay style="height: 30px;">', 'controls style="height: 30px;">'

I used chatgpt to help me make the fix. It works for me, but I don't know how correct this change is.

RandomInternetPreson · 2023-11-19T22:34:04Z

What that bit of code is doing is replacing the stings inside the log file and removing the "autoplay" tag.

your code has the embeddings for the source location of the .wav files slightly different than the og barkTTS code if you look at your format_html function

def format_html(audiofiles):
if params["combine"]:
autoplay = "autoplay" if params["autoplay"] else ""
combined = combine(audiofiles)
time_label = audiofiles[0].split("/")[-1].split("_")[0]
sf.write(f"{this_dir}/generated/{time_label}_combined.wav",
combined, 24000)
return f'<audio src="file/{this_dir}/generated/{time_label}_combined.wav" controls {autoplay} style="height: 30px;">'
else:
string = ""
for audiofile in audiofiles:
string += f''
return string

your see the string the code fix addresses:  controls style="height: 30px;">

so we are making sure we are changing this from

"controls autoplay style="height: 30px;">"

to

"controls style="height: 30px;">")

in the history of the conversation with the AI so it doesn't keep autoplaying.

RandomInternetPreson · 2023-11-19T22:43:36Z

I edited the .py file in my fork for you to reference if you need it:

https://github.com/RandomInternetPreson/text_generation_webui_xtt_Alts/blob/main/script.py

wow this works so incredibly well!

RandomInternetPreson · 2023-11-19T23:07:40Z

Sorry to keep peppering you here in this issue, but just wanted to let you know that I'd be okay if you wanted to reference my fork here: https://github.com/RandomInternetPreson/text_generation_webui_xtt_Alts
for folks installing the extension for windows.

erew123 · 2023-11-19T23:28:55Z

Ill close my other issue on here, but I can confirm that on a 100% fresh install of Text-Gen-WebUI on windows, I did the following:

Run a command prompt
cd text-generation-webui (wherever you have it stored on your disk)
cmd_windows.bat (cmd_windows.bat will activate your environment. Linux and Mac options are there too)
cd extensions
git clone https://github.com/kanttouchthis/text_generation_webui_xtts
cd text_generation_webui_xtt_Alts
pip install -r requirements.txt
pip install TTS --no-dependencies

cd back up to the text-generation-webui folder.
Run Start_windows.bat

Agree to the license and let it download the other files it needs.
(ensure its activated on the "session" tab and apply/restart)

With all that done, its running fine! :) No audio repeats etc.

One thing I do notice, it keeps the generated audio in \text-generation-webui\extensions\text_generation_webui_xtt_Alts\generated so that may need clean up from time to time.

Im sure the changes will get merged back into the original on here at some point!

Thanks for everyone's help and work on this!

erew123 · 2023-11-20T00:34:23Z

A quick note on speed vs quality etc as its not mentioned anywhere else. I notice the sample audio voice file used to generate audio, is about 7 seconds long, Mono (not stereo), PCM S16 LE with a Sample rate of 22050Hz and Bits per sample 16.

I'm guessing there are a few factors that may speed up processing.

Keeping it the lower quality like the original file.
Fewer seconds in length (I think somewhere it says you need 4 to 12 seconds as a sample)

I tried a very simple test using a 22050Hz sample voice and a 44100Hz sample voice (9 second mono sample).

22050Hz > Processing time: 59.185802936553955
44100Hz > Processing time: 125.19529104232788

This was generating the same amount of speech. Its not highly scientific, run over 1000's tests. But it would appear that if you want to use your favourite celebrity voice, get a high quality sample, make it mono, drop its bit rate to 22050Hz and keep it around the 4-9 second mark. (I suspect a shorter voice sample probably will be faster).

fbradcdsc · 2023-11-20T02:54:58Z

Followed the steps but it still gives me a

ERROR:Failed to load the extension "text_generation_webui_xtt_Alts".
Traceback (most recent call last):
File "C:\text-generation-webui\modules\extensions.py", line 36, in load_extensions
exec(f"import extensions.{name}.script")
File "", line 1, in
File "C:\text-generation-webui\extensions\text_generation_webui_xtt_Alts\script.py", line 1, in
from TTS.api import TTS
ModuleNotFoundError: No module named 'TTS'

When restarting the webui after activating it in the session tab

RandomInternetPreson · 2023-11-20T03:31:31Z

If you are using windows follow these instructions, I've made a video to go with them. These instructions will show you how to install TTS.

https://github.com/RandomInternetPreson/text_generation_webui_xtt_Alts/tree/main#installation-windows

kanttouchthis · 2023-11-20T03:38:28Z

Sorry to keep peppering you here in this issue, but just wanted to let you know that I'd be okay if you wanted to reference my fork here: https://github.com/RandomInternetPreson/text_generation_webui_xtt_Alts for folks installing the extension for windows.

Thanks for your help!

One thing I do notice, it keeps the generated audio in \text-generation-webui\extensions\text_generation_webui_xtt_Alts\generated so that may need clean up from time to time.

I added an option to delete old files on startup in the config.json

kanttouchthis · 2023-11-20T03:42:17Z

A quick note on speed vs quality etc as its not mentioned anywhere else. I notice the sample audio voice file used to generate audio, is about 7 seconds long, Mono (not stereo), PCM S16 LE with a Sample rate of 22050Hz and Bits per sample 16.

I'm guessing there are a few factors that may speed up processing.

Keeping it the lower quality like the original file.

Fewer seconds in length (I think somewhere it says you need 4 to 12 seconds as a sample)

I tried a very simple test using a 22050Hz sample voice and a 44100Hz sample voice (9 second mono sample).

22050Hz > Processing time: 59.185802936553955 44100Hz > Processing time: 125.19529104232788

This was generating the same amount of speech. Its not highly scientific, run over 1000's tests. But it would appear that if you want to use your favourite celebrity voice, get a high quality sample, make it mono, drop its bit rate to 22050Hz and keep it around the 4-9 second mark. (I suspect a shorter voice sample probably will be faster).

The model outputs 24khz mono files, so I presume that is the ideal format for samples as well. Could potentially write code to automatically resample the input files

RandomInternetPreson · 2023-11-20T03:44:51Z

Yeass! You got the repo fixed up, thank you again for making this. It is one of the last missing pieces for AI interactions, the speed and quality is above everything else.

fbradcdsc · 2023-11-20T03:53:40Z

Alright I got it to work! The problem was I installed TTS in textgen and not in the base environment

kanttouchthis · 2023-11-20T03:59:49Z

Alright I got it to work! The problem was I installed TTS in textgen and not in the base environment

As long as you have textgen activated when running the webui that shouldn't be an issue

erew123 mentioned this issue Nov 19, 2023

How to make it work on Windows - Instructions #2

Closed

daswer123 mentioned this issue Nov 21, 2023

Add support for the new TTS - XTTSv2 SillyTavern/SillyTavern#1383

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WOW Great extension! The best TTS extension out there! Here are some code fixes for auto play and installation! #3

WOW Great extension! The best TTS extension out there! Here are some code fixes for auto play and installation! #3

RandomInternetPreson commented Nov 19, 2023 •

edited

Loading

allenhs commented Nov 19, 2023

RandomInternetPreson commented Nov 19, 2023 •

edited

Loading

RandomInternetPreson commented Nov 19, 2023

RandomInternetPreson commented Nov 19, 2023

erew123 commented Nov 19, 2023 •

edited

Loading

erew123 commented Nov 20, 2023 •

edited

Loading

fbradcdsc commented Nov 20, 2023

RandomInternetPreson commented Nov 20, 2023

kanttouchthis commented Nov 20, 2023

kanttouchthis commented Nov 20, 2023

RandomInternetPreson commented Nov 20, 2023

fbradcdsc commented Nov 20, 2023

kanttouchthis commented Nov 20, 2023

WOW Great extension! The best TTS extension out there! Here are some code fixes for auto play and installation! #3

WOW Great extension! The best TTS extension out there! Here are some code fixes for auto play and installation! #3

Comments

RandomInternetPreson commented Nov 19, 2023 • edited Loading

allenhs commented Nov 19, 2023

RandomInternetPreson commented Nov 19, 2023 • edited Loading

RandomInternetPreson commented Nov 19, 2023

RandomInternetPreson commented Nov 19, 2023

erew123 commented Nov 19, 2023 • edited Loading

erew123 commented Nov 20, 2023 • edited Loading

fbradcdsc commented Nov 20, 2023

RandomInternetPreson commented Nov 20, 2023

kanttouchthis commented Nov 20, 2023

kanttouchthis commented Nov 20, 2023

RandomInternetPreson commented Nov 20, 2023

fbradcdsc commented Nov 20, 2023

kanttouchthis commented Nov 20, 2023

RandomInternetPreson commented Nov 19, 2023 •

edited

Loading

RandomInternetPreson commented Nov 19, 2023 •

edited

Loading

erew123 commented Nov 19, 2023 •

edited

Loading

erew123 commented Nov 20, 2023 •

edited

Loading