Skip to content

Latest commit

 

History

History
108 lines (92 loc) · 5.47 KB

README.md

File metadata and controls

108 lines (92 loc) · 5.47 KB

Leveraging Text Localization for Scene Text Removal via Text-aware Masked Image Modeling

This is a pytorch implementation for paper TMIM

Installation

1.Requirements

  • Python==3.8.12
  • Pytorch==1.11.0
  • CUDA==11.3
conda create -n tmim python==3.8.12
conda activate tmim
pip install --upgrade pip
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html 
pip install -r requirements.txt

2.Datasets

  • Create a "data" folder. Download text removal dataset (SCUT-Enstext) and text detection datasets(TextOCRTotal-Text, ICDAR2015, COCO-Text, MLT19, ArT, lsvt(fullly annotated), ReCTS). 

  • Create the coco-style annotations for text detection datasets with the code in utils/prepare_dataset/ (or download them from here(data.zip).

  • The structure of the data folder is shown below.

    data
    ├── text_det
    │   ├── art
    │   │   ├── train_images
    │   │   └── annotation.json
    │   ├── cocotext
    │   │   ├── train2014
    │   │   └── cocotext.v2.json
    │   ├── ic15
    │   │   ├── train_images
    │   │   └── annotation.json 
    │   ├── lsvt
    │   │   ├── train_images
    │   │   └── annotation.json 
    │   ├── mlt19
    │   │   ├── train_images
    │   │   └── annotation.json 
    │   ├── rects
    │   │   ├── img
    │   │   └── annotation.json 
    │   ├── textocr
    │   │   ├── train_images
    │   │   ├── TextOCR_0.1_train.json 
    │   │   └── TextOCR_0.1_val.json 
    │   └── totaltext
    │       ├── train_images
    │       └── annotation.json
    └── text_rmv
        └── SCUT-EnsText
            ├── train
            │   ├── all_images
            │   ├── all_labels
            │   └── mask
            └── test
                ├── all_images
                ├── all_labels
                └── mask
    

Models

Model Method PSNR MSSIM MSE AGE Download
Uformer-B Pretrained 36.66 97.66 0.0637 1.70 uformer_b_tmim.pth
Uformer-B Fintuned 37.42 97.70 0.0459 1.52 uformer_b_tmim_str.pth
PERT Pretrained 34.51 96.63 0.1231 2.11 pert_tmim.pth
PERT Fintuned 35.66 97.18 0.0729 1.76 pert_tmim_str.pth
EraseNet Pretrained 34.25 97.03 0.1141 2.23 erasenet_tmim.pth
EraseNet Fintuned 35.47 97.30 0.0765 1.95 erasenet_tmim_str.pth

Inference

  • Download the pretrained models and run the following command for inference.
python -m torch.distributed.launch --master_port 29501 --nproc_per_node=1 demo.py --cfg configs/uformer_b_str.py --resume path/to/uformer_b_tmim_str.pth --test-dir path/to/image/folder --visualize-dir path/to/result/folder

Training and Testing

  • Set the "snapshot_dir"(The location for saving the checkpoints) and "dataroot"(The location of the datasets) in configs/*.py
  • Erasenet and Pert require 4 1080ti GPUs. Uformer requires 8 1080ti GPUs

1.Pretraining

  • Run the following command to pretrain the model on text detection datasets.
python -m torch.distributed.launch --master_port 29501 --nproc_per_node=8 train.py --cfg configs/uformer_b_tmim.py --ckpt-name uformer_b_tmim --save-log 
  • Run the following command to test the performance of the pretrained model.
python test.py --cfg configs/uformer_b_tmim.py --ckpt-name uformer_b_tmim/latest.pth --save-log --visualize

2.Finetuning

  • Run the following command to finetune the model on text removal datasets.
python -m torch.distributed.launch --master_port 29501} --nproc_per_node=8 train.py --cfg configs/uformer_b_str.py --ckpt-name uformer_b_tmim_str --save-log --resume 'ckpt/uformer_b_tmim/latest.pth'
  • Run the following command to test the performance of the finetuned model.
python test.py --cfg configs/uformer_b_str.py --ckpt-name uformer_b_tmim_str/latest.pth --save-log --visualize