-
-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alltalkbeta enhancements #255
Alltalkbeta enhancements #255
Conversation
Hi @bmalaski First off, Id like to say a thanks again! I've been testing through this and adding a few small tweaks and I still have a few to complete. Hopefully I will have those done soon. I think I should be allowed to edit the code on the PR (never done it before) and you should be able to see the differences and we can share notes. I think I've run about 4x full training sessions now, checked that nothing else has gone odd in any way. All generally looks great! For the most part Im just capturing some logic issues and Ive nudged a few things around in the interface e.g. the Cosine Annealing with warm restarts needs a minimum of 4x epochs, and I know if I dont try to capture these errors now, I will get questions. Will post something back soon. Thanks |
Yea if you edit the commit we can compare and work on together essentially. Noticed I pushed a bad file, removed that. |
@bmalaski I have to travel for a couple of days, so wont be able to get back to you for a few days. Will finish my changes and upload them when I return. Thanks so much! |
Hi @bmalaski Sorry for being so long on this. I ended up being away travelling for a week (unexpectedly). So it looks like my code update has pushed back into your github repository (I wasn't exactly sure how it worked until now). I've made a few changes. 90% of what I've done is visual + additional documentation/explanations. I believe I may have missed your other commit in my version/code: bmalaski@e081442 So I think that is missing from my version I have uploaded to you and may need pushing back in. Trying to remember other changes I made... Lines 870-873: request a minimum amount of steps for Cosine Annealing Warm Restarts. I cant remember why this is a minimum, but it was, so I set a user message for this. 1826: Set Cosine Annealing Warm Restarts as default. 1878: may need the value setting back to your setting! 1938-1947: this is just so that it feeds back the messages, such as "Cosine Annealing Warm Restarts" needing 4x steps. Im pretty sure everything else was just visual/documentation. Im sure I could still improve the documentation somewhat, but I hope I mostly captured the essence of the changes/additions you made. Please take a look, see what you think. Feel free to make any changes and Im looking forward to merging this PR in! :) |
I pushed a few more changes, I added a small warmup option for using larger learning rates. I have found this helps training. Also, critically, there was a bug when resuming training where it flipped the training and eval csv. I have fixed that. Otherwise your changes look great. Feel free to look over the warmup and merge if you are ready. |
Hi @bmalaski I forgot to properly look at the multi audio sample setup bit in depth, but had been thinking about that in the past. My thought is that we put some extra logic in it. If the supplied name ends in wav, we use that as a single wav file (and error out if its missing). If wav isnt supplied in the name, we assume its a sub directory and we catalogue all the files in that sub directory and use them as the provided sample audio, again erroring out if the subdirectory doesnt exist or wavs not there. Pretty sure we need to error out in both instances, otherwise I believe it softlocks and you have to restart the engine. So my suggested change is this:
Does that sound good to you? Otherwise everything looks great and Im ready to pull in the PR. Thanks |
I found one bug when compacting the models, its not moving all the files. I haven't had time to check, hopefully ill get to it soon. |
The only thing I know that could be happening on that one, is it may need the refresh button hitting/its not pulled in a variable somewhere. |
Hi @bmalaski I've merged this now. I'm going to give it a few more test for bugs, and some may crop up as other people use it. Thanks so much for all your work/help with this. It really has been appreciated. |
Hi @bmalaski Hope you are well. If possible, would you be able to look over a question someone has. You may have a better insight than myself. The issue is here #276 (comment) If you have time, great! :) If not, I understand. Thanks so much! |
Hello,
Ported over the changes to the beta, as well as adding some more additional changes like being able to select an optimizer.
I adding some tweaks to how the training works. I know you are in the middle of 2.0 so I will most likely need to fix these when that is released. Anyway, for your consideration:
Git-Ignore: Implements the default python git-ignore with some changes to ignore the conda env.
Learning Rate Scheduler: Currently it is broken, allow the user to change the scheduler and add detailed explanations. Set default scheduler to Cosine Annealing Warm Restarts, which I have been using with great results. I tried to add sensible defaults to all other rates, however I have yet to test all of them. Mostly just using
Optimizer: Allows you to change the optimizer to several that have been preconfigured. Leaving the default to AdamW for right now, need to test all of them to compare what yields a better training result.
Continue Training: Lost power during training a model and it was annoying to have to manually setup everything again. There actually exists a bug in the original coqui trainer.py that prevented this, and since they wont be making anymore updates I added the training file locally and fixed the bug. The short version is this, when continuing it attempted to reload the config as Xtts, but alltalk is using GPT. This caused a serialization failure. It should be working now.
Multiple Training Projects: I am training several voices. So I want to train for some epochs, move to the next and continue as needed. To do this, I have changed the "person name" to be a project name and made out_path dynamic against this name. All files are generated to this project directory. This allows the user to train multiple models, and continue training whatever model they like at the time.
Metrics Logging: Pretty graphs, using Matlab, because I have used it before, to save a pretty training image. I extend the base console logger from coqui and use it to track the metric. Mostly just arrays which go into memory. Maybe some systems will limited system memory may have an issue, so I can always limit the step data later if needed. While tensor boards do exist, this seems like a more approachable option and it mirrors the logs.
Estimated Completion Time: Weighted Average completion time, where the most recent epochs give a stronger influence over the estimation. Needs 2 epochs completed before it will give an ETA.
Limit Shared Memory: I would rather get an out of memory error than have the training use shared memory. To accomplish this, we limited the overall GPU memory to 95%. This should remove the behavior of training spilling into shared memory, however some will still be used by torch.
BPE Tokenizer: Allow the user to create a custom tokenizer based off their own data during training. This can lead to better training results in cases with unique words and vocab. I found this especially useful when processing voices that use fictional words.
Dataset Progress: Just simple progress tracking and estimated time for dataset creation
Allow Reference Audio to be a Directory: According to TTS documentation, "You can pass multiple audio files to the speaker_wav argument for better voice cloning." I have tested this out, and it does appear to work better when passing several wavs.
Take a look and let me know what you think.