Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Zipformer-xl 700M Results on multi-hans-zh #1694

Merged
merged 3 commits into from
Jul 18, 2024

Conversation

yuekaizhang
Copy link
Collaborator

Update training results using 14k hours opensource Chinese data with a 750M zipformer.
The model has got current open model's SOTA for wenetspeech test_meeting set with 5.85% WER.

Model yuekai/icefall-asr-multi-zh-hans-zipformer-large yuekai/icefall-asr-multi-zh-hans-zipformer-xl
Config Transducer Greedy Offline Transducer Greedy Offline (blank_penalty 0.7)
aishell-1 test 1.38 1.31
aishell-2 test 3.23 3.27
aishell-4 test 15.36 14.64
WenetSpeech test_meeting 6.26 5.85
WenetSpeech tes_net 7.07 6.89

( The model reused whisper 80 dims fbank features from previous experiments. )

@marcoyang1998
Copy link
Collaborator

Hi Yuekai,

Nice results! Do you think blank penalty will also help with other decoding methods, e.g. modified_beam_search?

@yuekaizhang
Copy link
Collaborator Author

Hi Yuekai,

Nice results! Do you think blank penalty will also help with other decoding methods, e.g. modified_beam_search?

@marcoyang1998
Modified-beam-search has not been decoded yet, and perhaps even if it helps, the optimal penalty score might still change.

BTW, I did not use blank penalty for the 140M zipformer-large model. The reason I used it for the 700M model is that I found that as the number of training epochs increased, the deletion errors of the 700M model became larger, while the substitution errors consistently decreased (a total of 20 epochs were trained, and this phenomenon started to appear around the 10th epoch).

For greedy search, I also have some logs on tuning the blank penalty, which can be found here: https://huggingface.co/yuekai/icefall-asr-multi-zh-hans-zipformer-xl/tree/main/exp-xl/greedy_search/tmp. I found that a value of 0.7 worked best.

@yuekaizhang yuekaizhang merged commit 4af81af into k2-fsa:master Jul 18, 2024
253 checks passed
yfyeung pushed a commit to yfyeung/icefall that referenced this pull request Aug 9, 2024
* add blank penalty

* update zipformer-xl results

* fix typo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants