discrete token for audio resynthesis #423

South-Twilight · 2024-02-02T13:18:17Z

Here is the PR for audio resynthesis in discrete token:

We extend hubert_voc1 to token_voc1 and it can handle more models token;
We add f0 for training and inference when finding poor prounciation in singing;
We add multi-stream method including residual cluster and weight sum;
Using embedding feature of models is also allowed.

The following models have been validated in opencpop recipe: HuBERT, XLS-R, WavLM, MERT, Encodec.

1) add f0 2) use embedding feat as input (test topline) 3) add weight sum token

1) separate single layer config: hifigan_hubert_16k_nodp_f0.v1.yaml 2) add annotation to DiscreteSymbolF0Generator.infer

1) add stage 4 of run.sh -- "Scoring"

1) update training steps from 25w to 40w

1) add f0 rmse,semitone acc,uvu acc evaluation indicators

1) add continuous f0 2) add yaml for 48khz wav

…veGAN

…ugs: 1) add multi-stream RVQ cluster 2) add 48kHz encodec token 3) update some annotations 4) remove git some useless tracks

… token_voc1 for PR 1) refactor conf 2) add annotations

South-Twilight added 27 commits August 28, 2023 10:56

update hubert and mert in opencpop

85e1152

update opencpop hubert(in progress)

be6e500

add f0 in hubert(WIP)

121a151

update gitigonre

95160c2

discrete unit: add process of mfcc

d68da20

merge: update merge

2b4db7e

to(1): update base info

c39020d

sync hubert/mert/mfcc

5e0e580

sync with remote(2023.9.6)

6c27921

feat: add f0

8a8dc8f

fix merge conflict: merge add_f0

f601448

update run.sh

bd56257

feat: use pretrain feature embedding as input(for top-line)

a10a575

feat: merge f0 and use pretrain feature

94add27

feat: use different layer of pretrain model

1c0f192

feat(egs/opencpop/hubert_voc1, hifigan.py):

42f8aac

1) add f0 2) use embedding feat as input (test topline) 3) add weight sum token

fix(hifigan.py : DiscreteSymbolF0Generator)

8a9b948

1) separate single layer config: hifigan_hubert_16k_nodp_f0.v1.yaml 2) add annotation to DiscreteSymbolF0Generator.infer

feat(hubert_voc): add evaluate MCD

56ecd19

1) add stage 4 of run.sh -- "Scoring"

fix(hubert_voc/conf):

c4c3478

1) update training steps from 25w to 40w

Merge remote-tracking branch 'origin/master'

9850df5

fix(hubert_voc1/run.sh):

f805522

1) add f0 rmse,semitone acc,uvu acc evaluation indicators

Merge remote-tracking branch 'origin/master'

b27b05b

fix(hubert/voc1): update the way to calculate f0

e3359da

1) add continuous f0 2) add yaml for 48khz wav

Merge branch 'master' of https://github.com/South-Twilight/ParallelWa…

f7e9c96

…veGAN

feat(egs/opencpop/hubert_voc1): add multi-stream layer and fix some b…

7427151

…ugs: 1) add multi-stream RVQ cluster 2) add 48kHz encodec token 3) update some annotations 4) remove git some useless tracks

refactor(egs/*/hubert_voc1->egs/*/token_voc1): refactor hubert_voc to…

f7249ed

… token_voc1 for PR 1) refactor conf 2) add annotations

Merge branch 'kan-bayashi:master' into master

6107aaf

kan-bayashi self-requested a review February 5, 2024 08:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

discrete token for audio resynthesis #423

discrete token for audio resynthesis #423

South-Twilight commented Feb 2, 2024 •

edited

Loading

discrete token for audio resynthesis #423

Are you sure you want to change the base?

discrete token for audio resynthesis #423

Conversation

South-Twilight commented Feb 2, 2024 • edited Loading

South-Twilight commented Feb 2, 2024 •

edited

Loading