Leveraging Text Localization for Scene Text Removal via Text-aware Masked Image Modeling

This is a pytorch implementation for paper TMIM

Installation

1.Requirements

Python==3.8.12
Pytorch==1.11.0
CUDA==11.3

conda create -n tmim python==3.8.12
conda activate tmim
pip install --upgrade pip
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html 
pip install -r requirements.txt

2.Datasets

Create a "data" folder. Download text removal dataset (SCUT-Enstext) and text detection datasets(TextOCR, Total-Text, ICDAR2015, COCO-Text, MLT19, ArT, lsvt(fullly annotated), ReCTS).
Create the coco-style annotations for text detection datasets with the code in utils/prepare_dataset/ (or download them from here(data.zip).

The structure of the data folder is shown below.

data
├── text_det
│   ├── art
│   │   ├── train_images
│   │   └── annotation.json
│   ├── cocotext
│   │   ├── train2014
│   │   └── cocotext.v2.json
│   ├── ic15
│   │   ├── train_images
│   │   └── annotation.json 
│   ├── lsvt
│   │   ├── train_images
│   │   └── annotation.json 
│   ├── mlt19
│   │   ├── train_images
│   │   └── annotation.json 
│   ├── rects
│   │   ├── img
│   │   └── annotation.json 
│   ├── textocr
│   │   ├── train_images
│   │   ├── TextOCR_0.1_train.json 
│   │   └── TextOCR_0.1_val.json 
│   └── totaltext
│       ├── train_images
│       └── annotation.json
└── text_rmv
    └── SCUT-EnsText
        ├── train
        │   ├── all_images
        │   ├── all_labels
        │   └── mask
        └── test
            ├── all_images
            ├── all_labels
            └── mask

Models

Model	Method	PSNR	MSSIM	MSE	AGE	Download
Uformer-B	Pretrained	36.66	97.66	0.0637	1.70	uformer_b_tmim.pth
Uformer-B	Fintuned	37.42	97.70	0.0459	1.52	uformer_b_tmim_str.pth
PERT	Pretrained	34.51	96.63	0.1231	2.11	pert_tmim.pth
PERT	Fintuned	35.66	97.18	0.0729	1.76	pert_tmim_str.pth
EraseNet	Pretrained	34.25	97.03	0.1141	2.23	erasenet_tmim.pth
EraseNet	Fintuned	35.47	97.30	0.0765	1.95	erasenet_tmim_str.pth

Inference

Download the pretrained models and run the following command for inference.

python -m torch.distributed.launch --master_port 29501 --nproc_per_node=1 demo.py --cfg configs/uformer_b_str.py --resume path/to/uformer_b_tmim_str.pth --test-dir path/to/image/folder --visualize-dir path/to/result/folder

Training and Testing

Set the "snapshot_dir"(The location for saving the checkpoints) and "dataroot"(The location of the datasets) in configs/*.py
Erasenet and Pert require 4 1080ti GPUs. Uformer requires 8 1080ti GPUs

1.Pretraining

Run the following command to pretrain the model on text detection datasets.

python -m torch.distributed.launch --master_port 29501 --nproc_per_node=8 train.py --cfg configs/uformer_b_tmim.py --ckpt-name uformer_b_tmim --save-log

Run the following command to test the performance of the pretrained model.

python test.py --cfg configs/uformer_b_tmim.py --ckpt-name uformer_b_tmim/latest.pth --save-log --visualize

2.Finetuning

Run the following command to finetune the model on text removal datasets.

python -m torch.distributed.launch --master_port 29501} --nproc_per_node=8 train.py --cfg configs/uformer_b_str.py --ckpt-name uformer_b_tmim_str --save-log --resume 'ckpt/uformer_b_tmim/latest.pth'

Run the following command to test the performance of the finetuned model.

python test.py --cfg configs/uformer_b_str.py --ckpt-name uformer_b_tmim_str/latest.pth --save-log --visualize

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Leveraging Text Localization for Scene Text Removal via Text-aware Masked Image Modeling

Installation

1.Requirements

2.Datasets

Models

Inference

Training and Testing

1.Pretraining

2.Finetuning

Files

README.md

Latest commit

History

README.md

File metadata and controls

Leveraging Text Localization for Scene Text Removal via Text-aware Masked Image Modeling

Installation

1.Requirements

2.Datasets

Models

Inference

Training and Testing

1.Pretraining

2.Finetuning