[ALGORITHM]
@inproceedings{li2019show,
title={Show, attend and read: A simple and strong baseline for irregular text recognition},
author={Li, Hui and Wang, Peng and Shen, Chunhua and Zhang, Guyu},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={33},
number={01},
pages={8610--8617},
year={2019}
}
trainset | instance_num | repeat_num | source |
---|---|---|---|
icdar_2011 | 3567 | 20 | real |
icdar_2013 | 848 | 20 | real |
icdar2015 | 4468 | 20 | real |
coco_text | 42142 | 20 | real |
IIIT5K | 2000 | 20 | real |
SynthText | 2400000 | 1 | synth |
SynthAdd | 1216889 | 1 | synth, 1.6m in [1] |
Syn90k | 2400000 | 1 | synth |
testset | instance_num | type |
---|---|---|
IIIT5K | 3000 | regular |
SVT | 647 | regular |
IC13 | 1015 | regular |
IC15 | 2077 | irregular |
SVTP | 645 | irregular, 639 in [1] |
CT80 | 288 | irregular |
Methods | Backbone | Decoder | Regular Text | Irregular Text | download | |||||
---|---|---|---|---|---|---|---|---|---|---|
IIIT5K | SVT | IC13 | IC15 | SVTP | CT80 | |||||
SAR | R31-1/8-1/4 | ParallelSARDecoder | 95.0 | 89.6 | 93.7 | 79.0 | 82.2 | 88.9 | model | log | |
SAR | R31-1/8-1/4 | SequentialSARDecoder | 95.2 | 88.7 | 92.4 | 78.2 | 81.9 | 89.6 | model | log |
Methods | Backbone | Decoder | download | |
---|---|---|---|---|
SAR | R31-1/8-1/4 | ParallelSARDecoder | model | log | dict |
Notes:
R31-1/8-1/4
means the height of feature from backbone is 1/8 of input image, where 1/4 for width.- We did not use beam search during decoding.
- We implemented two kinds of decoder. Namely,
ParallelSARDecoder
andSequentialSARDecoder
.ParallelSARDecoder
: Parallel decoding during training withLSTM
layer. It would be faster.SequentialSARDecoder
: Sequential Decoding during training withLSTMCell
. It would be easier to understand.
- For train dataset.
- We did not construct distinct data groups (20 groups in [1]) to train the model group-by-group since it would render model training too complicated.
- Instead, we randomly selected
2.4m
patches fromSyn90k
,2.4m
fromSynthText
and1.2m
fromSynthAdd
, and grouped all data together. See config for details.
- We used 48 GPUs with
total_batch_size = 64 * 48
in the experiment above to speedup training, while keeping theinitial lr = 1e-3
unchanged.
[1] Li, Hui and Wang, Peng and Shen, Chunhua and Zhang, Guyu. Show, attend and read: A simple and strong baseline for irregular text recognition. In AAAI 2019.