Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ready to merge] Pruned-transducer-stateless2 recipe for aidatatang_200zh #375

Conversation

luomingshuang
Copy link
Collaborator

This PR is to facilitate code review and it also inherits from the PR #355. This pr aims to merge into the master. The results and comparisons are as follows. Next, I plan to use a conformer-ctc model for aidatatang_200zh to generate a new baseline.

The WERs are

dev test comment
greedy search 5.53 6.59 --epoch 29, --avg 19, --max-duration 100
modified beam search (beam size 4) 5.27 6.33 --epoch 29, --avg 19, --max-duration 100
fast beam search (set as default) 5.30 6.34 --epoch 29, --avg 19, --max-duration 1500

The results with kaldi:

%WER 37.09 [ 173936 / 468933, 4868 ins, 31143 del, 137925 sub ] exp/mono/decode_test/cer_10_0.0
%WER 17.98 [ 84305 / 468933, 4724 ins, 12637 del, 66944 sub ] exp/tri1/decode_test/cer_13_0.0
%WER 17.94 [ 84149 / 468933, 5025 ins, 12427 del, 66697 sub ] exp/tri2/decode_test/cer_13_0.0
%WER 17.26 [ 80945 / 468933, 4421 ins, 12958 del, 63566 sub ] exp/tri3a/decode_test/cer_14_0.0
%WER 14.16 [ 66424 / 468933, 4567 ins, 10224 del, 51633 sub ] exp/tri4a/decode_test/cer_14_0.0
%WER 12.22 [ 57304 / 468933, 4799 ins, 8197 del, 44308 sub ] exp/tri5a/decode_test/cer_14_0.0
%WER 5.59 [ 26232 / 468933, 1701 ins, 4377 del, 20154 sub ] exp/chain/tdnn_1a_sp/decode_test/cer_10_0.0

# nnet3 tdnn with online pitch, local/nnet3/tuning/run_tdnn_2a.sh
%WER 7.21 [ 33797 / 468933, 2141 ins, 6117 del, 25539 sub ] exp/nnet3/tdnn_sp/decode_test/cer_13_0.0
%WER 7.44 [ 34878 / 468933, 2252 ins, 5854 del, 26772 sub ] exp/nnet3/tdnn_sp_online/decode_test/cer_12_0.0
%WER 7.79 [ 36542 / 468933, 2527 ins, 5674 del, 28341 sub ] exp/nnet3/tdnn_sp_online/decode_test_per_utt/cer_12_0.0

# chain with online pitch, local/chain/tuning/run_tdnn_2a.sh
%WER 5.61 [ 26311 / 468933, 1773 ins, 4789 del, 19749 sub ] exp/chain/tdnn_2a_sp/decode_test/cer_11_0.0
%WER 5.69 [ 26661 / 468933, 1723 ins, 4724 del, 20214 sub ] exp/chain/tdnn_2a_sp_online/decode_test/cer_11_0.0
%WER 5.98 [ 28046 / 468933, 2031 ins, 4527 del, 21488 sub ] exp/chain/tdnn_2a_sp_online/decode_test_per_utt/cer_11_0.0

The results with espnet:
Conformer_encoder+SpecAugment + Transformer_decoder

dataset Snt Wrd Corr Sub Del Ins Err S.Err
decode_dev_decode_lm 24216 234524 95.8 3.6 0.5 0.1 4.3 21.8
decode_test_decode_lm 48144 468933 95.2 4.3 0.5 0.2 5.0 24.0

README.md Outdated
### Aidatatang_200zh

We provide one model for this recipe: [Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][Aidatatang_200zh
_pruned_transducer_stateless2].
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line-break here leads to a broken linke.

Please remove the line-break.

README.md Outdated
| fast beam search | 5.30 | 6.34 |
| modified beam search | 5.27 | 6.33 |

We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)(https://colab.research.google.com/drive/1wNSnSj3T5oOctbh5IGCa393gKOoQw2GH?usp=sharing)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)(https://colab.research.google.com/drive/1wNSnSj3T5oOctbh5IGCa393gKOoQw2GH?usp=sharing)
We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: [![Open In Colab]](https://colab.research.google.com/assets/colab-badge.svg)(https://colab.research.google.com/drive/1wNSnSj3T5oOctbh5IGCa393gKOoQw2GH?usp=sharing)

@@ -0,0 +1,39 @@
Note: This recipe is trained with the codes from this PR https://github.com/k2-fsa/icefall/pull/355
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Note: This recipe is trained with the codes from this PR https://github.com/k2-fsa/icefall/pull/355
Note: This recipe is trained with the codes from this PR https://github.com/k2-fsa/icefall/pull/375

@@ -0,0 +1,39 @@
Note: This recipe is trained with the codes from this PR https://github.com/k2-fsa/icefall/pull/355
And the SpecAugment codes from this PR https://github.com/lhotse-speech/lhotse/pull/604.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
And the SpecAugment codes from this PR https://github.com/lhotse-speech/lhotse/pull/604.

It has been merged. No need to mention it.


#### 2022-05-16

Using the codes from this PR https://github.com/k2-fsa/icefall/pull/355.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Using the codes from this PR https://github.com/k2-fsa/icefall/pull/355.
Using the codes from this PR https://github.com/k2-fsa/icefall/pull/375.



"""
This file computes fbank features of the aishell dataset.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the comment

sampler = DynamicBucketingSampler(
cuts,
max_duration=self.args.max_duration,
rank=0,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think piotr has suggested you to remove rank and world_size here.

@@ -0,0 +1,955 @@
# Copyright 2021 Xiaomi Corp. (authors: Fangjun Kuang)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you replace it with a symlink?

@luomingshuang
Copy link
Collaborator Author

Thanks, all of requirements are done and tested by running.

@danpovey
Copy link
Collaborator

If you have a model with attentin_dim=512, it may be too large; you could try 256.

@@ -0,0 +1,103 @@
# Copyright 2021 Xiaomi Corp. (authors: Fangjun Kuang)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please replace such files with symlinks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done and tested.

@luomingshuang
Copy link
Collaborator Author

I can have a experiment with dim=256 later.

@luomingshuang
Copy link
Collaborator Author

When using dim=256 for conformer, the WERs (best performance) are

dev test comment
greedy search 6.16 7.36 --epoch 29, --avg 24, --max-duration 100
modified beam search (beam size 4) 5.66 6.75 --epoch 29, --avg 24, --max-duration 100
fast beam search (set as default) 5.85 7.08 --epoch 29, --avg 24, --max-duration 1500

The results based on dim=256 are worse than dim=512.

@luomingshuang luomingshuang added ready and removed ready labels May 23, 2022
@csukuangfj
Copy link
Collaborator

When using dim=256 for conformer, the WERs (best performance) are

dev test comment
greedy search 6.16 7.36 --epoch 29, --avg 24, --max-duration 100
modified beam search (beam size 4) 5.66 6.75 --epoch 29, --avg 24, --max-duration 100
fast beam search (set as default) 5.85 7.08 --epoch 29, --avg 24, --max-duration 1500
The results based on dim=256 are worse than dim=512.

Did you change other settings other than dim=256, i.e., number of encoder layers, dim feedforward?

@luomingshuang
Copy link
Collaborator Author

No, just change encoder model dim=512 to 256.

@luomingshuang
Copy link
Collaborator Author

I think this PR can be merged.

@csukuangfj
Copy link
Collaborator

Thanks!

@csukuangfj csukuangfj merged commit c8c8645 into k2-fsa:master May 24, 2022
@csukuangfj
Copy link
Collaborator

@luomingshuang
Copy link
Collaborator Author

@csukuangfj
Copy link
Collaborator

Thanks!

I have updated https://huggingface.co/spaces/k2-fsa/automatic-speech-recognition
to include aidatatang_200zh.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants