Alltalkbeta enhancements #255

bmalaski · 2024-06-21T02:58:25Z

Hello,

Ported over the changes to the beta, as well as adding some more additional changes like being able to select an optimizer.

I adding some tweaks to how the training works. I know you are in the middle of 2.0 so I will most likely need to fix these when that is released. Anyway, for your consideration:

Git-Ignore: Implements the default python git-ignore with some changes to ignore the conda env.
Learning Rate Scheduler: Currently it is broken, allow the user to change the scheduler and add detailed explanations. Set default scheduler to Cosine Annealing Warm Restarts, which I have been using with great results. I tried to add sensible defaults to all other rates, however I have yet to test all of them. Mostly just using
Optimizer: Allows you to change the optimizer to several that have been preconfigured. Leaving the default to AdamW for right now, need to test all of them to compare what yields a better training result.
Continue Training: Lost power during training a model and it was annoying to have to manually setup everything again. There actually exists a bug in the original coqui trainer.py that prevented this, and since they wont be making anymore updates I added the training file locally and fixed the bug. The short version is this, when continuing it attempted to reload the config as Xtts, but alltalk is using GPT. This caused a serialization failure. It should be working now.
Multiple Training Projects: I am training several voices. So I want to train for some epochs, move to the next and continue as needed. To do this, I have changed the "person name" to be a project name and made out_path dynamic against this name. All files are generated to this project directory. This allows the user to train multiple models, and continue training whatever model they like at the time.
Metrics Logging: Pretty graphs, using Matlab, because I have used it before, to save a pretty training image. I extend the base console logger from coqui and use it to track the metric. Mostly just arrays which go into memory. Maybe some systems will limited system memory may have an issue, so I can always limit the step data later if needed. While tensor boards do exist, this seems like a more approachable option and it mirrors the logs.
Estimated Completion Time: Weighted Average completion time, where the most recent epochs give a stronger influence over the estimation. Needs 2 epochs completed before it will give an ETA.
Limit Shared Memory: I would rather get an out of memory error than have the training use shared memory. To accomplish this, we limited the overall GPU memory to 95%. This should remove the behavior of training spilling into shared memory, however some will still be used by torch.
BPE Tokenizer: Allow the user to create a custom tokenizer based off their own data during training. This can lead to better training results in cases with unique words and vocab. I found this especially useful when processing voices that use fictional words.
Dataset Progress: Just simple progress tracking and estimated time for dataset creation
Allow Reference Audio to be a Directory: According to TTS documentation, "You can pass multiple audio files to the speaker_wav argument for better voice cloning." I have tested this out, and it does appear to work better when passing several wavs.

Take a look and let me know what you think.

erew123 · 2024-06-25T09:26:48Z

Hi @bmalaski

First off, Id like to say a thanks again! I've been testing through this and adding a few small tweaks and I still have a few to complete. Hopefully I will have those done soon. I think I should be allowed to edit the code on the PR (never done it before) and you should be able to see the differences and we can share notes.

I think I've run about 4x full training sessions now, checked that nothing else has gone odd in any way. All generally looks great!

For the most part Im just capturing some logic issues and Ive nudged a few things around in the interface e.g. the Cosine Annealing with warm restarts needs a minimum of 4x epochs, and I know if I dont try to capture these errors now, I will get questions.

Will post something back soon.

Thanks

bmalaski · 2024-06-26T02:32:52Z

Yea if you edit the commit we can compare and work on together essentially.

Noticed I pushed a bad file, removed that.

erew123 · 2024-06-26T10:29:40Z

@bmalaski I have to travel for a couple of days, so wont be able to get back to you for a few days. Will finish my changes and upload them when I return. Thanks so much!

erew123 · 2024-07-01T07:48:52Z

Hi @bmalaski Sorry for being so long on this. I ended up being away travelling for a week (unexpectedly). So it looks like my code update has pushed back into your github repository (I wasn't exactly sure how it worked until now).

I've made a few changes. 90% of what I've done is visual + additional documentation/explanations.

I believe I may have missed your other commit in my version/code: bmalaski@e081442

So I think that is missing from my version I have uploaded to you and may need pushing back in.

Trying to remember other changes I made... Lines

870-873: request a minimum amount of steps for Cosine Annealing Warm Restarts. I cant remember why this is a minimum, but it was, so I set a user message for this.

1826: Set Cosine Annealing Warm Restarts as default.

1878: may need the value setting back to your setting!

1938-1947: this is just so that it feeds back the messages, such as "Cosine Annealing Warm Restarts" needing 4x steps.

Im pretty sure everything else was just visual/documentation.

Im sure I could still improve the documentation somewhat, but I hope I mostly captured the essence of the changes/additions you made.

Please take a look, see what you think. Feel free to make any changes and Im looking forward to merging this PR in! :)

bmalaski · 2024-07-02T04:17:58Z

I pushed a few more changes,

I added a small warmup option for using larger learning rates. I have found this helps training.

Also, critically, there was a bug when resuming training where it flipped the training and eval csv. I have fixed that.

Otherwise your changes look great. Feel free to look over the warmup and merge if you are ready.

erew123 · 2024-07-02T14:40:37Z

Hi @bmalaski

I forgot to properly look at the multi audio sample setup bit in depth, but had been thinking about that in the past. My thought is that we put some extra logic in it. If the supplied name ends in wav, we use that as a single wav file (and error out if its missing). If wav isnt supplied in the name, we assume its a sub directory and we catalogue all the files in that sub directory and use them as the provided sample audio, again erroring out if the subdirectory doesnt exist or wavs not there.

Pretty sure we need to error out in both instances, otherwise I believe it softlocks and you have to restart the engine.

So my suggested change is this:

if voice.lower().endswith('.wav'):
    # Single file case
    file_path = os.path.join(base_path, voice)
    if os.path.isfile(file_path):
        wavs_files = [file_path]
    else:
        raise FileNotFoundError(f"Specified WAV file not found: {file_path}")
else:
    # Directory case
    dir_path = os.path.join(base_path, voice)
    if os.path.isdir(dir_path):
        wavs_files = glob.glob(os.path.join(dir_path, "*.wav"))
        if not wavs_files:
            raise FileNotFoundError(f"No WAV files found in directory: {dir_path}")
    else:
        raise NotADirectoryError(f"Specified voice directory not found: {dir_path}")

Does that sound good to you?

Otherwise everything looks great and Im ready to pull in the PR.

Thanks

bmalaski · 2024-07-05T02:18:07Z

I found one bug when compacting the models, its not moving all the files. I haven't had time to check, hopefully ill get to it soon.

erew123 · 2024-07-05T06:22:38Z

The only thing I know that could be happening on that one, is it may need the refresh button hitting/its not pulled in a variable somewhere.

erew123 · 2024-07-13T06:13:58Z

Hi @bmalaski

I've merged this now. I'm going to give it a few more test for bugs, and some may crop up as other people use it.

Thanks so much for all your work/help with this. It really has been appreciated.

erew123 · 2024-07-24T20:40:00Z

Hi @bmalaski Hope you are well. If possible, would you be able to look over a question someone has. You may have a better insight than myself. The issue is here #276 (comment)

If you have time, great! :) If not, I understand.

Thanks so much!

bmalaski added 2 commits June 20, 2024 13:56

start merge

062483d

ported changes form original branch

a97ad65

Update finetune.py

e081442

Update finetune.py

797a4b9

add warmup

e347f15

Update WarmUpScheduler.py

6a992cb

Update WarmUpScheduler.py

d5c2026

erew123 mentioned this pull request Jul 7, 2024

Speed of voice don't match the reference file #265

Closed

erew123 merged commit f03f359 into erew123:alltalkbeta Jul 13, 2024

erew123 mentioned this pull request Jul 22, 2024

New Tokenizer Loading #276

Open

erew123 mentioned this pull request Jul 26, 2024

Continuing where you left off. No Training effect. #280

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alltalkbeta enhancements #255

Alltalkbeta enhancements #255

bmalaski commented Jun 21, 2024

erew123 commented Jun 25, 2024

bmalaski commented Jun 26, 2024

erew123 commented Jun 26, 2024

erew123 commented Jul 1, 2024

bmalaski commented Jul 2, 2024

erew123 commented Jul 2, 2024

bmalaski commented Jul 5, 2024

erew123 commented Jul 5, 2024

erew123 commented Jul 13, 2024

erew123 commented Jul 24, 2024

Alltalkbeta enhancements #255

Alltalkbeta enhancements #255

Conversation

bmalaski commented Jun 21, 2024

erew123 commented Jun 25, 2024

bmalaski commented Jun 26, 2024

erew123 commented Jun 26, 2024

erew123 commented Jul 1, 2024

bmalaski commented Jul 2, 2024

erew123 commented Jul 2, 2024

bmalaski commented Jul 5, 2024

erew123 commented Jul 5, 2024

erew123 commented Jul 13, 2024

erew123 commented Jul 24, 2024