Skip to content

Commit

Permalink
Support CTC/AED option for Zipformer recipe (#1389)
Browse files Browse the repository at this point in the history
* add attention-decoder loss option for zipformer recipe

* add attention-decoder-rescoring

* update export.py and pretrained_ctc.py

* update RESULTS.md
  • Loading branch information
yaozengwei authored Jul 5, 2024
1 parent cbcac23 commit f76afff
Show file tree
Hide file tree
Showing 9 changed files with 1,221 additions and 21 deletions.
70 changes: 70 additions & 0 deletions egs/librispeech/ASR/RESULTS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,75 @@
## Results

### zipformer (zipformer + CTC/AED)

See <https://github.com/k2-fsa/icefall/pull/1389> for more details.

[zipformer](./zipformer)

#### Non-streaming

##### large-scale model, number of model parameters: 174319650, i.e., 174.3 M

You can find a pretrained model, training logs, decoding logs, and decoding results at:
<https://huggingface.co/Zengwei/icefall-asr-librispeech-zipformer-large-ctc-attention-decoder-2024-05-26>

You can use <https://github.com/k2-fsa/sherpa> to deploy it.

Results of the CTC head:

| decoding method | test-clean | test-other | comment |
|--------------------------------------|------------|------------|---------------------|
| ctc-decoding | 2.29 | 5.14 | --epoch 50 --avg 29 |
| attention-decoder-rescoring-no-ngram | 2.1 | 4.57 | --epoch 50 --avg 29 |

The training command is:
```bash
export CUDA_VISIBLE_DEVICES="0,1,2,3"
# For non-streaming model training:
./zipformer/train.py \
--world-size 4 \
--num-epochs 50 \
--start-epoch 1 \
--use-fp16 1 \
--exp-dir zipformer/exp-large \
--full-libri 1 \
--use-ctc 1 \
--use-transducer 0 \
--use-attention-decoder 1 \
--ctc-loss-scale 0.1 \
--attention-decoder-loss-scale 0.9 \
--num-encoder-layers 2,2,4,5,4,2 \
--feedforward-dim 512,768,1536,2048,1536,768 \
--encoder-dim 192,256,512,768,512,256 \
--encoder-unmasked-dim 192,192,256,320,256,192 \
--max-duration 1200 \
--master-port 12345
```

The decoding command is:
```bash
export CUDA_VISIBLE_DEVICES="0"
for m in ctc-decoding attention-decoder-rescoring-no-ngram; do
./zipformer/ctc_decode.py \
--epoch 50 \
--avg 29 \
--exp-dir zipformer/exp-large \
--use-ctc 1 \
--use-transducer 0 \
--use-attention-decoder 1 \
--attention-decoder-loss-scale 0.9 \
--num-encoder-layers 2,2,4,5,4,2 \
--feedforward-dim 512,768,1536,2048,1536,768 \
--encoder-dim 192,256,512,768,512,256 \
--encoder-unmasked-dim 192,192,256,320,256,192 \
--max-duration 100 \
--causal 0 \
--num-paths 100 \
--decoding-method $m
done
```


### zipformer (zipformer + pruned stateless transducer + CTC)

See <https://github.com/k2-fsa/icefall/pull/1111> for more details.
Expand Down
Loading

0 comments on commit f76afff

Please sign in to comment.