HuggingFace Version #6

leng-yue · 2023-03-27T16:05:17Z

Thanks to the author for his/her fantastic model that nearly eliminated the timbre leaking problem.
For someone who doesn't want to use the FairSeq, I made a HuggingFace version of content vec best legacy: Link.

The text was updated successfully, but these errors were encountered:

auspicious3000 · 2023-03-31T05:01:29Z

@leng-yue Thank you for your contribution. Based on the model definition in our paper, it appears that removing the final projection is necessary to achieve the desired outcome.

leng-yue · 2023-03-31T08:07:49Z

Appreciate your quick update ⚡️
Although I recommended removing the final_proj function call in the sample code, I also decided to keep it in the model for the sake of backward compatibility. This is because there are several existing models, such as so-vits-svc, ddsp-svc, and fish-diffusion, that have been using the final_proj since 1 or 2 months ago. 😂

auspicious3000 · 2023-03-31T09:08:19Z

Surprisingly, adding the final projection did not cause any problems.
First, the final projection is trained on the output of the final layer instead of the 9th layer. Applying the final projection on the 9th layer may cause mismatches.
Second, the final projection is injected with speaker information. While the purpose of contentvec is to remove speaker information, adding the final projection may defeat the purpose of contentvec.

leng-yue · 2023-03-31T09:52:42Z

To my mind, this is probably because they use the final projection with the 9th layer, which may not be in the same latent space. In this case, the final projection may not be able to inject speaker information...

auspicious3000 · 2023-03-31T10:03:31Z

I believe removing the final projection may improve the performance of those projects.

Anyway, for those who would like to use this Huggingface interface, please remove the final projection to get the correct output.

leng-yue · 2023-03-31T10:18:21Z

Currently, the final_proj is just left there, and it's not called by default (in the forward).

w-okada · 2023-04-01T10:09:05Z

Very very great discussion!

I converted huggingface model to onnx.
Size is very small (280MB) ! And it work fine in my app that is realtime voice changer based on so-vits-svc.

Converter repo is here.

freds0 · 2023-08-31T09:10:20Z

@leng-yue how did you convert to huggingface? I trained my own version of contentvec and using the "transformers/src/transformers/models/hubert/convert_hubert_original_pytorch_checkpoint_to_pytorch.py" script to convert it, however I am getting this error message:

2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.10.attention.out_proj.weight was initialized from encoder.layers.10.self_attn.out_proj.weight.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.10.attention.out_proj.bias was initialized from encoder.layers.10.self_attn.out_proj.bias.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.10.layer_norm.weight was initialized from encoder.layers.10.self_attn_layer_norm.weight.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.10.layer_norm.bias was initialized from encoder.layers.10.self_attn_layer_norm.bias.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.10.feed_forward.intermediate_dense.weight was initialized from encoder.layers.10.fc1.weight.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.10.feed_forward.intermediate_dense.bias was initialized from encoder.layers.10.fc1.bias.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.10.feed_forward.output_dense.weight was initialized from encoder.layers.10.fc2.weight.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.10.feed_forward.output_dense.bias was initialized from encoder.layers.10.fc2.bias.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.10.final_layer_norm.weight was initialized from encoder.layers.10.final_layer_norm.weight.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.10.final_layer_norm.bias was initialized from encoder.layers.10.final_layer_norm.bias.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.11.attention.k_proj.weight was initialized from encoder.layers.11.self_attn.k_proj.weight.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.11.attention.k_proj.bias was initialized from encoder.layers.11.self_attn.k_proj.bias.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.11.attention.v_proj.weight was initialized from encoder.layers.11.self_attn.v_proj.weight.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.11.attention.v_proj.bias was initialized from encoder.layers.11.self_attn.v_proj.bias.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.11.attention.q_proj.weight was initialized from encoder.layers.11.self_attn.q_proj.weight.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.11.attention.q_proj.bias was initialized from encoder.layers.11.self_attn.q_proj.bias.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.11.attention.out_proj.weight was initialized from encoder.layers.11.self_attn.out_proj.weight.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.11.attention.out_proj.bias was initialized from encoder.layers.11.self_attn.out_proj.bias.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.11.layer_norm.weight was initialized from encoder.layers.11.self_attn_layer_norm.weight.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.11.layer_norm.bias was initialized from encoder.layers.11.self_attn_layer_norm.bias.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.11.feed_forward.intermediate_dense.weight was initialized from encoder.layers.11.fc1.weight.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.11.feed_forward.intermediate_dense.bias was initialized from encoder.layers.11.fc1.bias.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.11.feed_forward.output_dense.weight was initialized from encoder.layers.11.fc2.weight.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.11.feed_forward.output_dense.bias was initialized from encoder.layers.11.fc2.bias.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.11.final_layer_norm.weight was initialized from encoder.layers.11.final_layer_norm.weight.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.11.final_layer_norm.bias was initialized from encoder.layers.11.final_layer_norm.bias.
Traceback (most recent call last):
File "transformers/src/transformers/models/hubert/convert_hubert_original_pytorch_checkpoint_to_pytorch.py", line 248, in
args.checkpoint_path, args.pytorch_dump_folder_path, args.config_path, args.dict_path, not args.not_finetuned
File "/root/anaconda/envs/contentvec_hf/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "transformers/src/transformers/models/hubert/convert_hubert_original_pytorch_checkpoint_to_pytorch.py", line 232, in convert_hubert_checkpoint
recursively_load_weights(model, hf_wav2vec, is_finetuned)
File "transformers/src/transformers/models/hubert/convert_hubert_original_pytorch_checkpoint_to_pytorch.py", line 122, in recursively_load_weights
set_recursively(hf_model, mapped_key, value, name, weight_type)
File "transformers/src/transformers/models/hubert/convert_hubert_original_pytorch_checkpoint_to_pytorch.py", line 60, in set_recursively
hf_pointer = getattr(hf_pointer, attribute)
File "/root/anaconda/envs/contentvec_hf/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1270, in getattr
type(self).name, name))
AttributeError: 'ModuleList' object has no attribute '12'

leng-yue · 2023-09-01T08:05:52Z

You can use this file: https://huggingface.co/lengyue233/content-vec-best/blob/main/convert.py

freds0 · 2023-09-11T12:15:43Z

@leng-yue Great! Thanks!

zdj97 · 2023-09-13T06:01:16Z

You can use this file: https://huggingface.co/lengyue233/content-vec-best/blob/main/convert.py

Hi, lengyue, how can i convert this pth model of content to onnx?

leng-yue · 2023-09-13T07:08:29Z

Check huggingface optimum.

li-henan · 2024-04-01T06:29:25Z

dear friend, could I ask the detailed steps of training a better contentvec model applied on svc task, based on the open source model? Is there any other methods except changing a bigger dataset? Thanks a lot @leng-yue

Bella-Tim · 2024-04-24T13:57:44Z

Very very great discussion!

I converted huggingface model to onnx. Size is very small (280MB) ! And it work fine in my app that is realtime voice changer based on so-vits-svc.

Converter repo is here.

Hello, can you share your way of converting to onnx with me? I also need it

leng-yue mentioned this issue Mar 31, 2023

content提取方式的比较？ svc-develop-team/so-vits-svc#16

Open

34j mentioned this issue Apr 2, 2023

ContentVec is being used incorrectly voicepaw/so-vits-svc-fork#206

Closed

CNChTu mentioned this issue Apr 2, 2023

Possible incorrect use of contentvec voicepaw/so-vits-svc-fork#213

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HuggingFace Version #6

HuggingFace Version #6

leng-yue commented Mar 27, 2023

auspicious3000 commented Mar 31, 2023

leng-yue commented Mar 31, 2023

auspicious3000 commented Mar 31, 2023

leng-yue commented Mar 31, 2023

auspicious3000 commented Mar 31, 2023

leng-yue commented Mar 31, 2023

w-okada commented Apr 1, 2023 •

edited

Loading

freds0 commented Aug 31, 2023

leng-yue commented Sep 1, 2023

freds0 commented Sep 11, 2023

zdj97 commented Sep 13, 2023

leng-yue commented Sep 13, 2023

li-henan commented Apr 1, 2024 •

edited

Loading

Bella-Tim commented Apr 24, 2024

HuggingFace Version #6

HuggingFace Version #6

Comments

leng-yue commented Mar 27, 2023

auspicious3000 commented Mar 31, 2023

leng-yue commented Mar 31, 2023

auspicious3000 commented Mar 31, 2023

leng-yue commented Mar 31, 2023

auspicious3000 commented Mar 31, 2023

leng-yue commented Mar 31, 2023

w-okada commented Apr 1, 2023 • edited Loading

freds0 commented Aug 31, 2023

leng-yue commented Sep 1, 2023

freds0 commented Sep 11, 2023

zdj97 commented Sep 13, 2023

leng-yue commented Sep 13, 2023

li-henan commented Apr 1, 2024 • edited Loading

Bella-Tim commented Apr 24, 2024

w-okada commented Apr 1, 2023 •

edited

Loading

li-henan commented Apr 1, 2024 •

edited

Loading