Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query about ideal drive:\location for install to reduce conflicts #49

Closed
Magenta-6 opened this issue Jun 25, 2023 · 22 comments
Closed

Query about ideal drive:\location for install to reduce conflicts #49

Magenta-6 opened this issue Jun 25, 2023 · 22 comments

Comments

@Magenta-6
Copy link

This is not an issue as such but a request for advice about where to set up an installation.

Firstly, your idea to create a cross-platform tts tool is extremely valuable, especially with a one-click installer. however
I am hesitant to just set up a folder in my C:\Users\NAME drive and expect everything to go flawlessly.

Over the past year the advent of Ai generated tti and LLM's has been an exciting journey.
I am one of those people who has gone through a steep learning curve getting to grips with virtual python environments and do not fully understand the intricacies of how the multitudes of modules inter-relate.
I do know enough to know that that conflicts can occur between them and that a lot of time can be spent un-installing and reinstalling them.

At present I have successfully set up several tts applications: Coqui, Silero, & Bark.
I have also attempted to set up Tortoise and AudioCraft, but have failed to troubleshoot installation errors. (Note1. below)

I first started using Silero in oobabooga, which worked fine.
However the voices were v. limited so I set up coqui and bark to get a better variety of voices and accents.
And the possibility of voice cloning/training is extremely appealing also.

All that is a long way to ask the question:

  1. Is there an ideal place to install your tts-webui that will not create conflicts with other installations?
  2. Should I un-install all the other applications first?
  3. Is it likely that the installation will add duplicate versions of torch, conda and other dependencies that are already installed?

BACKGROUND
I am using Windows 10, RTX 4070Ti, CUDA 11.7, Anaconda3, conda 22.9.0, Torch 2.0.1
2023-06-25_pip list.txt

Here is a list of the folders where I have set up various applications:
Initially I installed tts so that they could be used with oobabooga and later Silly Tavern
C:\SuperStableDiffusion2.0\stable-diffusion-webui
C:\SuperStableDiffusion2.0\oobabooga-windows
C:\SuperStableDiffusion2.0\Bark\bark-gui
C:\SuperStableDiffusion2.0\CoquiTTS\TTS

Later I began setting up applications in the User\ directory
C:\Users\ABC\Audiocraft\audiocraft-main
C:\Users\ABC\Bark-tts\bark_win\bark-gui
C:\Users\ABC\Silero-tts
C:\Users\ABC\coqui-tts
C:\Users\ABC\tortoise-tts

(Note 1.) Issues Raised:
neonbjb/tortoise-tts#468
facebookresearch/audiocraft#123

@rsxdalv
Copy link
Owner

rsxdalv commented Jun 25, 2023

Ok so, short answer - it should not matter, I'd say C:\SuperStableDiffusion2.0\ is a good directory to keep it organized, for example, if you want to start/stop using some of them.

The technical background is - the one click installer is related[1] to oobabooga's one click installer, and it installs everything. Since I have some development tools installed on my machine I might have a blind spot, but I know that it installs 1. python 2. conda 3. all the packages, including torch and drivers, 4. git (not sure) and everything is contained within the directory you chose.

If there are conflicts - that would be a bug. The installer is optimized for being standalone and non-conflicting, hence why the installation is slower and bulkier; however, it ought to be more stable and robust.

As for the pip list, these pip packages shouldn't affect the internal virtual environment.

[1] - Although I saw that there's a second installer and that they aren't always using this one, it's a bit confusing, but the original target was to install oobabooga's UI.

@Magenta-6
Copy link
Author

Magenta-6 commented Jun 26, 2023

@rsxdalv - Much appreciate the advice.
Having mucked up things previously, I've become a little wary of simply "pip installing" everything on offer.

Install went absolutely perfectly!
Took about 15 mins, but not a single problem so far.
You have made an exceptional one stop shop with this!!

THANK YOU !!

Kind regards
Magenta-6

@rsxdalv
Copy link
Owner

rsxdalv commented Jun 26, 2023 via email

@Magenta-6
Copy link
Author

@rsxdalv had a great day yesterday looking at the functional aspects of the various programs.
Everything seemed to work fine.
However today I cannot get it to run. - I am simply d'clicking the start_windows.bat file

Error as below:
++++++++++++++++++++++++++++++++++++++++++++++++++
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
Loading extensions:
Loaded extension: callback_save_generation_ffmpeg
Loaded extension: callback_save_generation_musicgen_ffmpeg
Loaded extension: empty_extension
Loaded 2 callback_save_generation extensions.
Loaded 1 callback_save_generation_musicgen extensions.
Loading Bark models
- Text Generation: GPU: Yes, Small Model: Yes
- Coarse-to-Fine Inference: GPU: Yes, Small Model: Yes
- Fine-tuning: GPU: Yes, Small Model: No
- Codec: GPU: Yes
2023-06-27 16:19:08 | INFO | fairseq.tasks.text_to_speech | Please install tensorboardX: pip install tensorboardX
2023-06-27 16:19:08 | WARNING | xformers | Triton is not available, some optimizations will not be enabled.
This is just a warning: No module named 'triton'
Traceback (most recent call last):
File "C:\SuperStableDiffusion2.0\TTS-4.0\tts-generation-webui\server.py", line 87, in
history_tab(register_use_as_history_button)
File "C:\SuperStableDiffusion2.0\TTS-4.0\tts-generation-webui\src\history_tab\main.py", line 56, in history_tab
return history_content(
File "C:\SuperStableDiffusion2.0\TTS-4.0\tts-generation-webui\src\history_tab\main.py", line 81, in history_content
history_list_as_gallery = gr.Gallery(value=get_wav_files_img(directory))
File "C:\SuperStableDiffusion2.0\TTS-4.0\installer_files\env\lib\site-packages\gradio\components.py", line 4403, in init
IOComponent.init(
File "C:\SuperStableDiffusion2.0\TTS-4.0\installer_files\env\lib\site-packages\gradio\components.py", line 215, in init
else self.postprocess(initial_value)
File "C:\SuperStableDiffusion2.0\TTS-4.0\installer_files\env\lib\site-packages\gradio\components.py", line 4468, in postprocess
file_path = self.make_temp_copy_if_needed(img)
File "C:\SuperStableDiffusion2.0\TTS-4.0\installer_files\env\lib\site-packages\gradio\components.py", line 259, in make_temp_copy_if_needed
temp_dir = self.hash_file(file_path)
File "C:\SuperStableDiffusion2.0\TTS-4.0\installer_files\env\lib\site-packages\gradio\components.py", line 223, in hash_file
with open(file_path, "rb") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'outputs\2023-06-26_16-33-06__bark__de_speaker_0.png\2023-06-26_16-33-06__bark__de_speaker_0.png.png'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\SuperStableDiffusion2.0\TTS-4.0\tts-generation-webui\server.py", line 66, in
with gr.Blocks(
File "C:\SuperStableDiffusion2.0\TTS-4.0\installer_files\env\lib\site-packages\gradio\blocks.py", line 1411, in exit
self.config = self.get_config_file()
File "C:\SuperStableDiffusion2.0\TTS-4.0\installer_files\env\lib\site-packages\gradio\blocks.py", line 1378, in get_config_file
props = block.get_config() if hasattr(block, "get_config") else {}
File "C:\SuperStableDiffusion2.0\TTS-4.0\installer_files\env\lib\site-packages\gradio\components.py", line 4433, in get_config
"value": self.value,
AttributeError: 'Gallery' object has no attribute 'value'

Done!
Press any key to continue . . .
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

From what I've read elsewhere the Triton thing is not a problem,
but I don't know how to tackle the errors in the Traceback.
Should I simply delete the TTS-4.0 directory and re-install, or something else?

@rsxdalv
Copy link
Owner

rsxdalv commented Jun 27, 2023 via email

@rsxdalv
Copy link
Owner

rsxdalv commented Jun 27, 2023 via email

@rsxdalv
Copy link
Owner

rsxdalv commented Jun 27, 2023

The error happens because the output folder gets named "2023-06-26_16-33-06__bark__de_speaker_0.png", I tried a simple generation and I get the proper folder name "2023-06-27_13-25-59__bark__de_speaker_0".

Did anything unusual happen?
I even tried changing my installation directory and it still didn't have this issue.

For now I also isolated the issue so that it only affects the history tab, and you can still boot up the app. Until a future upgrade I cannot make a stable solution. To get the latest changes run the "update" script.

@Magenta-6
Copy link
Author

Hey thanks.
I replaced the [outputs] folder with a new empty folder and it worked a treat.

The cause of the error may be this:
After generating heaps of stuff in stable diffusion I usually manually go through the outputs using windows explorer to review and delete the rubbish and move the usable content into a folder with a date and a description. This helps me to theoretically find content later and it reduces the content on the hard drive.

Using the same idea I adopted this practice with TTS-4.0.
I started to extract the .wavs from each of the sub-folders and left them in the main "outputs" folder to aggregate them into a new folder called [2023-06-26_TestPrompt-01].
All the other files, the .png's, the .oggs, .npz .json and the sub-folder itself were deleted.

I'm guessing that in this process a folder must have accidentally got re-named
[2023-06-26_16-33-06__bark__de_speaker_0.png]
I cannot find this folder to confirm, but it seems the most likely cause of the glitch.

I appreciate the time you put into handling these queries and in getting back to me with appropriate advice. I've dropped a copy of the file generated in the hope it might give you a chuckle. I'm finding that some of the non-english voices generate really good accented english content. Bark voice: de_speaker_0 is one of my faves!

I did this before I realized that there is a handy tab within the api for reviewing and deleting content. There are also tools for creating favourites and collections, which I have yet to utilize.

With all of these Ai Gen Tools, (as with digital photography), a good file management workflow is essential for managing and curating digital assets. I can see that your "One Stop Shop" approach to TTS + Audio gen is similar to Adobe Lightroom in bringing workflow and content creation together. It should get a lot of attention.

As an aside there is a pretty powerful image management tool called "breadboard" which reads metadata within images and can sort by text tags.
Link: https://github.com/cocktailpeanut/breadboard

2023-06-26_16-33-06__bark__de_speaker_0.zip

@rsxdalv
Copy link
Owner

rsxdalv commented Jun 27, 2023 via email

@rsxdalv
Copy link
Owner

rsxdalv commented Jun 28, 2023

That's an interesting project! Yes, I have been using Stable Diffusion and I saw a similar issue of collections being a necessity and a bottleneck.
For Bark files, something like this can be ported to run on local files: https://rsxdalv.github.io/bark-speaker-directory/voice-drafts

@Magenta-6
Copy link
Author

@rsxdalv that card index for voices is an awesome add on.
I really like the way it is set up with a pic, voice sample and tags.
I assumed that ported means it can it be brought into TTS-4.0 as an extension in the same way the oobabooga works.

However using my trial and error has got me into trouble again.
I copy/pasted you web url: https://rsxdalv.github.io/bark-speaker-directory/ into the bottom field of the gradio settings page
I think it was called Directories
It crashed the api . . . and caused the following error:

++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Traceback (most recent call last):
File "C:\SuperStableDiffusion2.0\TTS-4.0\tts-generation-webui\server.py", line 126, in
demo.queue(
File "C:\SuperStableDiffusion2.0\TTS-4.0\installer_files\env\lib\site-packages\gradio\blocks.py", line 1757, in launch raise ValueError("allowed_paths must be a list of directories.")
ValueError: allowed_paths must be a list of directories.

Done!
Press any key to continue . . .
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
I ran the update_windows.bat hoping it might clear the field, but that was a bit optimistic.
I did see quite a few updates come in though so still worth doing.

If you could let me know whether I should replace a "particular_file.py" it would help me.
I have Visual Studio so I could even manually delete the url from the appropriate file, if I knew where to look.

PS: The card system is super useful too so any tips on doing it the right way would be appreciated.
Sorry to waste your time on this.

Cheers from NZ.

@rsxdalv
Copy link
Owner

rsxdalv commented Jun 29, 2023

If you create an unrecoverable issue in the settings, you can just delete or backup config.json and it will get recreated.

@Magenta-6
Copy link
Author

Thanks - Opened the config.json and removed the link.
Perfect again.
Still interested in your voice card idea, but unsure how to go about porting them.

@Magenta-6
Copy link
Author

Just found your readme.

@rsxdalv
Copy link
Owner

rsxdalv commented Jul 1, 2023 via email

@Magenta-6
Copy link
Author

Magenta-6 commented Jul 2, 2023

@rsxdalv - Thanks for asking.
I think the voice selector is what I like but I can see that a tree could be useful if the heirarchy can be figured out.

What would be great is if the Voice Tab in your webui could be linked to the Voice Cards as well as to the .npz files.

As an example of a similar situation, below is a screenshot of Textual Inversion (T.I.) cards in Automatic1111's Stable diffusion webui. [2023-07-04, image deleted and replaced with a screenshot of the folder contents showing pairs of embeddings.pt files and .png files]

Each of the images was generated using a particular Textual Inversion.
The images in .png format simply got dropped into the [embeddings] folder with the T.I. files.
If the file name of the T.I. is marsattacks3.pt, then an image called marsattacks3.png gets automatically pulled into the right slot.

(There is a way of embedding (steganizing) data into images, which might be able to be used but I don't know how to do that). - [2023-07-04, See Later post below]

As people start generating content with multiple voices and the number of voices starts to increase a Voice Selector with fields for #hashtags, etc will be an extremely useful way of setting up collections, families and characters.
Card system is also a great way of sharing voices with others.

[2023-07-04, image replaced]
Clip

@rsxdalv
Copy link
Owner

rsxdalv commented Jul 3, 2023

@rsxdalv - Thanks for asking. I think the voice selector is what I like but I can see that a tree could be useful if the heirarchy can be figured out.

What would be great is if the Voice Tab in your webui could be linked to the Voice Cards as well as to the .npz files.

As an example of a similar situation, below is a screenshot of Textual Inversion (T.I.) cards in Automatic1111's Stable diffusion webui.

Each of the images was generated using a particular Textual Inversion. The images in .png format simply got dropped into the [embeddings] folder with the T.I. files. If the file name of the T.I. is marsattacks3.pt, then an image called marsattacks3.png gets automatically pulled into the right slot.

(There is a way of embedding (steganizing) data into images, which might be able to be used but I don't know how to do that).

As people start generating content with multiple voices and the number of voices starts to increase a Voice Selector with fields for #hashtags, etc will be an extremely useful way of setting up collections, families and characters. Card system is also a great way of sharing voices with others.

Just to be safe let's censor the image

@rsxdalv
Copy link
Owner

rsxdalv commented Jul 3, 2023

As for that, yes, I need to see if I can somehow get image generation, and then I could write a plugin that saves voices as images. Currently my approach is that I want to keep the "core" simpler and then enhance it with plugins, which could eventually become part of the "core".

@Magenta-6
Copy link
Author

A plugin sounds good. A modular approach around a central core seems like the way to go, then others with special skillsets can create add-ons. I guess that's part of the charm of Github and open source.

PS. I've deleted the screenshot on the previous post.
Below is an example of a .png file with image-style data written on the sides as a QR type code.
At least that's what it looks like to me.

Inspecting the file info in photoshop revealed almost no metadata apart from image size and format.

SamDoesArt-5000

@rsxdalv
Copy link
Owner

rsxdalv commented Jul 16, 2023

I added the initial basic version of this #78, where if you have the same filename you can see it in the UI.

@Magenta-6
Copy link
Author

Thanks - Love the way you coded it to automatically rename the image file. Works a treat.

@rsxdalv
Copy link
Owner

rsxdalv commented Jul 26, 2023

#98
ok, now it will rename both automatically, and also you can select voices from the gallery

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants