Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

discrete token for audio resynthesis #423

Open
wants to merge 27 commits into
base: master
Choose a base branch
from

Conversation

South-Twilight
Copy link

@South-Twilight South-Twilight commented Feb 2, 2024

Here is the PR for audio resynthesis in discrete token:

  1. We extend hubert_voc1 to token_voc1 and it can handle more models token;
  2. We add f0 for training and inference when finding poor prounciation in singing;
  3. We add multi-stream method including residual cluster and weight sum;
  4. Using embedding feature of models is also allowed.

The following models have been validated in opencpop recipe: HuBERT, XLS-R, WavLM, MERT, Encodec.

1) add f0
2) use embedding feat as input (test topline)
3) add weight sum token
1) separate single layer config: hifigan_hubert_16k_nodp_f0.v1.yaml
2) add annotation to DiscreteSymbolF0Generator.infer
1) add stage 4 of run.sh -- "Scoring"
1) update training steps from 25w to 40w
1) add f0 rmse,semitone acc,uvu acc evaluation indicators
1) add continuous f0
2) add yaml for 48khz wav
…ugs:

1) add multi-stream RVQ cluster
2) add 48kHz encodec token
3) update some annotations
4) remove git some useless tracks
… token_voc1 for PR

1) refactor conf
2) add annotations
@kan-bayashi kan-bayashi self-requested a review February 5, 2024 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant