Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HuggingFace Version #6

Open
leng-yue opened this issue Mar 27, 2023 · 14 comments
Open

HuggingFace Version #6

leng-yue opened this issue Mar 27, 2023 · 14 comments

Comments

@leng-yue
Copy link

Thanks to the author for his/her fantastic model that nearly eliminated the timbre leaking problem.
For someone who doesn't want to use the FairSeq, I made a HuggingFace version of content vec best legacy: Link.

@auspicious3000
Copy link
Owner

@leng-yue Thank you for your contribution. Based on the model definition in our paper, it appears that removing the final projection is necessary to achieve the desired outcome.

@leng-yue
Copy link
Author

Appreciate your quick update ⚡️
Although I recommended removing the final_proj function call in the sample code, I also decided to keep it in the model for the sake of backward compatibility. This is because there are several existing models, such as so-vits-svc, ddsp-svc, and fish-diffusion, that have been using the final_proj since 1 or 2 months ago. 😂

@auspicious3000
Copy link
Owner

Surprisingly, adding the final projection did not cause any problems.
First, the final projection is trained on the output of the final layer instead of the 9th layer. Applying the final projection on the 9th layer may cause mismatches.
Second, the final projection is injected with speaker information. While the purpose of contentvec is to remove speaker information, adding the final projection may defeat the purpose of contentvec.

@leng-yue
Copy link
Author

To my mind, this is probably because they use the final projection with the 9th layer, which may not be in the same latent space. In this case, the final projection may not be able to inject speaker information...

@auspicious3000
Copy link
Owner

I believe removing the final projection may improve the performance of those projects.

Anyway, for those who would like to use this Huggingface interface, please remove the final projection to get the correct output.

@leng-yue
Copy link
Author

Currently, the final_proj is just left there, and it's not called by default (in the forward).

@w-okada
Copy link

w-okada commented Apr 1, 2023

Very very great discussion!

I converted huggingface model to onnx.
Size is very small (280MB) ! And it work fine in my app that is realtime voice changer based on so-vits-svc.

Converter repo is here.

@freds0
Copy link

freds0 commented Aug 31, 2023

@leng-yue how did you convert to huggingface? I trained my own version of contentvec and using the "transformers/src/transformers/models/hubert/convert_hubert_original_pytorch_checkpoint_to_pytorch.py" script to convert it, however I am getting this error message:

2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.10.attention.out_proj.weight was initialized from encoder.layers.10.self_attn.out_proj.weight.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.10.attention.out_proj.bias was initialized from encoder.layers.10.self_attn.out_proj.bias.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.10.layer_norm.weight was initialized from encoder.layers.10.self_attn_layer_norm.weight.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.10.layer_norm.bias was initialized from encoder.layers.10.self_attn_layer_norm.bias.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.10.feed_forward.intermediate_dense.weight was initialized from encoder.layers.10.fc1.weight.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.10.feed_forward.intermediate_dense.bias was initialized from encoder.layers.10.fc1.bias.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.10.feed_forward.output_dense.weight was initialized from encoder.layers.10.fc2.weight.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.10.feed_forward.output_dense.bias was initialized from encoder.layers.10.fc2.bias.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.10.final_layer_norm.weight was initialized from encoder.layers.10.final_layer_norm.weight.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.10.final_layer_norm.bias was initialized from encoder.layers.10.final_layer_norm.bias.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.11.attention.k_proj.weight was initialized from encoder.layers.11.self_attn.k_proj.weight.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.11.attention.k_proj.bias was initialized from encoder.layers.11.self_attn.k_proj.bias.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.11.attention.v_proj.weight was initialized from encoder.layers.11.self_attn.v_proj.weight.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.11.attention.v_proj.bias was initialized from encoder.layers.11.self_attn.v_proj.bias.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.11.attention.q_proj.weight was initialized from encoder.layers.11.self_attn.q_proj.weight.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.11.attention.q_proj.bias was initialized from encoder.layers.11.self_attn.q_proj.bias.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.11.attention.out_proj.weight was initialized from encoder.layers.11.self_attn.out_proj.weight.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.11.attention.out_proj.bias was initialized from encoder.layers.11.self_attn.out_proj.bias.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.11.layer_norm.weight was initialized from encoder.layers.11.self_attn_layer_norm.weight.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.11.layer_norm.bias was initialized from encoder.layers.11.self_attn_layer_norm.bias.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.11.feed_forward.intermediate_dense.weight was initialized from encoder.layers.11.fc1.weight.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.11.feed_forward.intermediate_dense.bias was initialized from encoder.layers.11.fc1.bias.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.11.feed_forward.output_dense.weight was initialized from encoder.layers.11.fc2.weight.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.11.feed_forward.output_dense.bias was initialized from encoder.layers.11.fc2.bias.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.11.final_layer_norm.weight was initialized from encoder.layers.11.final_layer_norm.weight.
2023-08-31 09:05:07 | INFO | main | hubert.encoder.layers.11.final_layer_norm.bias was initialized from encoder.layers.11.final_layer_norm.bias.
Traceback (most recent call last):
File "transformers/src/transformers/models/hubert/convert_hubert_original_pytorch_checkpoint_to_pytorch.py", line 248, in
args.checkpoint_path, args.pytorch_dump_folder_path, args.config_path, args.dict_path, not args.not_finetuned
File "/root/anaconda/envs/contentvec_hf/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "transformers/src/transformers/models/hubert/convert_hubert_original_pytorch_checkpoint_to_pytorch.py", line 232, in convert_hubert_checkpoint
recursively_load_weights(model, hf_wav2vec, is_finetuned)
File "transformers/src/transformers/models/hubert/convert_hubert_original_pytorch_checkpoint_to_pytorch.py", line 122, in recursively_load_weights
set_recursively(hf_model, mapped_key, value, name, weight_type)
File "transformers/src/transformers/models/hubert/convert_hubert_original_pytorch_checkpoint_to_pytorch.py", line 60, in set_recursively
hf_pointer = getattr(hf_pointer, attribute)
File "/root/anaconda/envs/contentvec_hf/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1270, in getattr
type(self).name, name))
AttributeError: 'ModuleList' object has no attribute '12'

@leng-yue
Copy link
Author

leng-yue commented Sep 1, 2023

@freds0
Copy link

freds0 commented Sep 11, 2023

@leng-yue Great! Thanks!

@zdj97
Copy link

zdj97 commented Sep 13, 2023

You can use this file: https://huggingface.co/lengyue233/content-vec-best/blob/main/convert.py

Hi, lengyue, how can i convert this pth model of content to onnx?

@leng-yue
Copy link
Author

Check huggingface optimum.

@li-henan
Copy link

li-henan commented Apr 1, 2024

dear friend, could I ask the detailed steps of training a better contentvec model applied on svc task, based on the open source model? Is there any other methods except changing a bigger dataset? Thanks a lot @leng-yue

@Bella-Tim
Copy link

Very very great discussion!

I converted huggingface model to onnx. Size is very small (280MB) ! And it work fine in my app that is realtime voice changer based on so-vits-svc.

Converter repo is here.

Hello, can you share your way of converting to onnx with me? I also need it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants