[Ready to merge] Pruned-transducer-stateless2 recipe for aidatatang_200zh #375

luomingshuang · 2022-05-19T04:46:10Z

This PR is to facilitate code review and it also inherits from the PR #355. This pr aims to merge into the master. The results and comparisons are as follows. Next, I plan to use a conformer-ctc model for aidatatang_200zh to generate a new baseline.

The WERs are

	dev	test	comment
greedy search	5.53	6.59	--epoch 29, --avg 19, --max-duration 100
modified beam search (beam size 4)	5.27	6.33	--epoch 29, --avg 19, --max-duration 100
fast beam search (set as default)	5.30	6.34	--epoch 29, --avg 19, --max-duration 1500

The results with kaldi:

%WER 37.09 [ 173936 / 468933, 4868 ins, 31143 del, 137925 sub ] exp/mono/decode_test/cer_10_0.0
%WER 17.98 [ 84305 / 468933, 4724 ins, 12637 del, 66944 sub ] exp/tri1/decode_test/cer_13_0.0
%WER 17.94 [ 84149 / 468933, 5025 ins, 12427 del, 66697 sub ] exp/tri2/decode_test/cer_13_0.0
%WER 17.26 [ 80945 / 468933, 4421 ins, 12958 del, 63566 sub ] exp/tri3a/decode_test/cer_14_0.0
%WER 14.16 [ 66424 / 468933, 4567 ins, 10224 del, 51633 sub ] exp/tri4a/decode_test/cer_14_0.0
%WER 12.22 [ 57304 / 468933, 4799 ins, 8197 del, 44308 sub ] exp/tri5a/decode_test/cer_14_0.0
%WER 5.59 [ 26232 / 468933, 1701 ins, 4377 del, 20154 sub ] exp/chain/tdnn_1a_sp/decode_test/cer_10_0.0

# nnet3 tdnn with online pitch, local/nnet3/tuning/run_tdnn_2a.sh
%WER 7.21 [ 33797 / 468933, 2141 ins, 6117 del, 25539 sub ] exp/nnet3/tdnn_sp/decode_test/cer_13_0.0
%WER 7.44 [ 34878 / 468933, 2252 ins, 5854 del, 26772 sub ] exp/nnet3/tdnn_sp_online/decode_test/cer_12_0.0
%WER 7.79 [ 36542 / 468933, 2527 ins, 5674 del, 28341 sub ] exp/nnet3/tdnn_sp_online/decode_test_per_utt/cer_12_0.0

# chain with online pitch, local/chain/tuning/run_tdnn_2a.sh
%WER 5.61 [ 26311 / 468933, 1773 ins, 4789 del, 19749 sub ] exp/chain/tdnn_2a_sp/decode_test/cer_11_0.0
%WER 5.69 [ 26661 / 468933, 1723 ins, 4724 del, 20214 sub ] exp/chain/tdnn_2a_sp_online/decode_test/cer_11_0.0
%WER 5.98 [ 28046 / 468933, 2031 ins, 4527 del, 21488 sub ] exp/chain/tdnn_2a_sp_online/decode_test_per_utt/cer_11_0.0

The results with espnet:
Conformer_encoder+SpecAugment + Transformer_decoder

dataset	Snt	Wrd	Corr	Sub	Del	Ins	Err	S.Err
decode_dev_decode_lm	24216	234524	95.8	3.6	0.5	0.1	4.3	21.8
decode_test_decode_lm	48144	468933	95.2	4.3	0.5	0.2	5.0	24.0

csukuangfj · 2022-05-19T04:50:50Z

README.md

+### Aidatatang_200zh
+
+We provide one model for this recipe: [Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][Aidatatang_200zh
+_pruned_transducer_stateless2].


Line-break here leads to a broken linke.

Please remove the line-break.

csukuangfj · 2022-05-19T04:52:12Z

README.md

+|   fast beam search   | 5.30  | 6.34  |
+| modified beam search | 5.27  | 6.33  |
+
+We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)(https://colab.research.google.com/drive/1wNSnSj3T5oOctbh5IGCa393gKOoQw2GH?usp=sharing)


Suggested change

We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)(https://colab.research.google.com/drive/1wNSnSj3T5oOctbh5IGCa393gKOoQw2GH?usp=sharing)

We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: [![Open In Colab]](https://colab.research.google.com/assets/colab-badge.svg)(https://colab.research.google.com/drive/1wNSnSj3T5oOctbh5IGCa393gKOoQw2GH?usp=sharing)

csukuangfj · 2022-05-19T04:53:20Z

egs/aidatatang_200zh/ASR/README.md

@@ -0,0 +1,39 @@
+Note: This recipe is trained with the codes from this PR https://github.com/k2-fsa/icefall/pull/355


Suggested change

Note: This recipe is trained with the codes from this PR https://github.com/k2-fsa/icefall/pull/355

Note: This recipe is trained with the codes from this PR https://github.com/k2-fsa/icefall/pull/375

csukuangfj · 2022-05-19T04:54:02Z

egs/aidatatang_200zh/ASR/README.md

@@ -0,0 +1,39 @@
+Note: This recipe is trained with the codes from this PR https://github.com/k2-fsa/icefall/pull/355
+And the SpecAugment codes from this PR https://github.com/lhotse-speech/lhotse/pull/604.


Suggested change

And the SpecAugment codes from this PR https://github.com/lhotse-speech/lhotse/pull/604.

It has been merged. No need to mention it.

csukuangfj · 2022-05-19T04:55:22Z

egs/aidatatang_200zh/ASR/RESULTS.md

+
+#### 2022-05-16
+
+Using the codes from this PR https://github.com/k2-fsa/icefall/pull/355.


Suggested change

Using the codes from this PR https://github.com/k2-fsa/icefall/pull/355.

Using the codes from this PR https://github.com/k2-fsa/icefall/pull/375.

csukuangfj · 2022-05-19T04:57:41Z

egs/aidatatang_200zh/ASR/local/compute_fbank_aidatatang_200zh.py

+
+
+"""
+This file computes fbank features of the aishell dataset.


Please update the comment

csukuangfj · 2022-05-19T05:01:58Z

egs/aidatatang_200zh/ASR/pruned_transducer_stateless2/asr_datamodule.py

+        sampler = DynamicBucketingSampler(
+            cuts,
+            max_duration=self.args.max_duration,
+            rank=0,


I think piotr has suggested you to remove rank and world_size here.

csukuangfj · 2022-05-19T05:02:41Z

egs/aidatatang_200zh/ASR/pruned_transducer_stateless2/beam_search.py

@@ -0,0 +1,955 @@
+# Copyright    2021  Xiaomi Corp.        (authors: Fangjun Kuang)


Could you replace it with a symlink?

luomingshuang · 2022-05-19T06:29:53Z

Thanks, all of requirements are done and tested by running.

danpovey · 2022-05-19T06:30:47Z

If you have a model with attentin_dim=512, it may be too large; you could try 256.

csukuangfj · 2022-05-19T06:33:22Z

egs/aidatatang_200zh/ASR/pruned_transducer_stateless2/decoder.py

@@ -0,0 +1,103 @@
+# Copyright    2021  Xiaomi Corp.        (authors: Fangjun Kuang)


please replace such files with symlinks.

Done and tested.

luomingshuang · 2022-05-19T06:34:46Z

I can have a experiment with dim=256 later.

luomingshuang · 2022-05-23T12:02:01Z

When using dim=256 for conformer, the WERs (best performance) are

	dev	test	comment
greedy search	6.16	7.36	--epoch 29, --avg 24, --max-duration 100
modified beam search (beam size 4)	5.66	6.75	--epoch 29, --avg 24, --max-duration 100
fast beam search (set as default)	5.85	7.08	--epoch 29, --avg 24, --max-duration 1500

The results based on dim=256 are worse than dim=512.

csukuangfj · 2022-05-23T12:21:07Z

When using dim=256 for conformer, the WERs (best performance) are

dev test comment
greedy search 6.16 7.36 --epoch 29, --avg 24, --max-duration 100
modified beam search (beam size 4) 5.66 6.75 --epoch 29, --avg 24, --max-duration 100
fast beam search (set as default) 5.85 7.08 --epoch 29, --avg 24, --max-duration 1500
The results based on dim=256 are worse than dim=512.

Did you change other settings other than dim=256, i.e., number of encoder layers, dim feedforward?

luomingshuang · 2022-05-23T12:22:16Z

No, just change encoder model dim=512 to 256.

luomingshuang · 2022-05-24T15:04:44Z

I think this PR can be merged.

csukuangfj · 2022-05-24T15:07:26Z

Thanks!

csukuangfj · 2022-07-19T06:26:17Z

Could you upload a torchscript model to
https://huggingface.co/luomingshuang/icefall_asr_aidatatang-200zh_pruned_transducer_stateless2/tree/main/exp
?

luomingshuang · 2022-07-19T14:25:31Z

Done.

Could you upload a torchscript model to https://huggingface.co/luomingshuang/icefall_asr_aidatatang-200zh_pruned_transducer_stateless2/tree/main/exp ?

csukuangfj · 2022-07-19T15:50:19Z

Thanks!

I have updated https://huggingface.co/spaces/k2-fsa/automatic-speech-recognition
to include aidatatang_200zh.

add pruned-rnnt2 model for aidatatang_200zh

b33b2e5

luomingshuang added the ready label May 19, 2022

csukuangfj reviewed May 19, 2022

View reviewed changes

luomingshuang added 2 commits May 19, 2022 14:22

do some changes

6c8c8b5

change for README.md

884bf61

csukuangfj reviewed May 19, 2022

View reviewed changes

do some changes

6f2f7af

sovle conflicts

4e94bf7

luomingshuang added ready and removed ready labels May 23, 2022

csukuangfj merged commit c8c8645 into k2-fsa:master May 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Ready to merge] Pruned-transducer-stateless2 recipe for aidatatang_200zh #375

[Ready to merge] Pruned-transducer-stateless2 recipe for aidatatang_200zh #375

luomingshuang commented May 19, 2022

csukuangfj May 19, 2022

csukuangfj May 19, 2022

csukuangfj May 19, 2022

csukuangfj May 19, 2022

csukuangfj May 19, 2022

csukuangfj May 19, 2022

csukuangfj May 19, 2022

csukuangfj May 19, 2022

luomingshuang commented May 19, 2022

danpovey commented May 19, 2022

csukuangfj May 19, 2022

luomingshuang May 19, 2022

luomingshuang commented May 19, 2022

luomingshuang commented May 23, 2022

csukuangfj commented May 23, 2022

luomingshuang commented May 23, 2022

luomingshuang commented May 24, 2022

csukuangfj commented May 24, 2022

csukuangfj commented Jul 19, 2022

luomingshuang commented Jul 19, 2022

csukuangfj commented Jul 19, 2022

	We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)(https://colab.research.google.com/drive/1wNSnSj3T5oOctbh5IGCa393gKOoQw2GH?usp=sharing)
	We provide a Colab notebook to run a pre-trained Pruned Transducer Stateless model: [![Open In Colab]](https://colab.research.google.com/assets/colab-badge.svg)(https://colab.research.google.com/drive/1wNSnSj3T5oOctbh5IGCa393gKOoQw2GH?usp=sharing)

		@@ -0,0 +1,39 @@
		Note: This recipe is trained with the codes from this PR https://github.com/k2-fsa/icefall/pull/355

		@@ -0,0 +1,39 @@
		Note: This recipe is trained with the codes from this PR https://github.com/k2-fsa/icefall/pull/355
		And the SpecAugment codes from this PR https://github.com/lhotse-speech/lhotse/pull/604.


		#### 2022-05-16

		Using the codes from this PR https://github.com/k2-fsa/icefall/pull/355.



		"""
		This file computes fbank features of the aishell dataset.

		@@ -0,0 +1,955 @@
		# Copyright 2021 Xiaomi Corp. (authors: Fangjun Kuang)

		@@ -0,0 +1,103 @@
		# Copyright 2021 Xiaomi Corp. (authors: Fangjun Kuang)

[Ready to merge] Pruned-transducer-stateless2 recipe for aidatatang_200zh #375

[Ready to merge] Pruned-transducer-stateless2 recipe for aidatatang_200zh #375

Conversation

luomingshuang commented May 19, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luomingshuang commented May 19, 2022

danpovey commented May 19, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luomingshuang commented May 19, 2022

luomingshuang commented May 23, 2022

csukuangfj commented May 23, 2022

luomingshuang commented May 23, 2022

luomingshuang commented May 24, 2022

csukuangfj commented May 24, 2022

csukuangfj commented Jul 19, 2022

luomingshuang commented Jul 19, 2022

csukuangfj commented Jul 19, 2022