Human Pose Regression(HPR) is simple to estimate keypoints of human since it does not have any postprocess that transforms heatmaps to coordinates. HPR has a drawback that its accuracy is much lower than that of heatmap-based models. but recently, with flow-based model, HPR has so improved that it can be worth replace heatmap-based model.
Human Pose Regression with Residual Log-likelihood Estimation
Jiefeng Li, Siyuan Bian, Ailing Zeng, Can Wang, Bo Pang, Wentao Liu, Cewu Lu
ICCV 2021 Oral
Looking into the officials, there are only basic sources for reproducing scores written on the paper. Ummm...those are also important but practical experiments should be executed, such as test with mobile backbone, mobile deployment, ... etc. Let's have these!
To compare with the official results, regression model(Tensorflow) has trained on MSCOCO and the official configuration.
Model | input shape | #Params (M) |
GFLOPs | AP | AP.5 | AP .75 | AP (M) | AP (L) | AR | AR .5 | AR .75 | AR (M) | AR (L) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Benchmark (ResNet50) |
256x192 | 23.6 | 4.0 | 0.713 | 0.889 | 0.783 | - | - | - | - | - | - | - |
Ours (ResNet50) |
256x192 | 23.6 | 3.78 | 0.694 | 0.904 | 0.760 | 0.668 | 0.736 | 0.727 | 0.912 | 0.786 | 0.695 | 0.776 |
- AP is calculated on
flip_test=True
The backbones used in the paper are ResNet50 and HRNet which are not suitable on mobile devices. There are some tests applying lightweight backbones on this model. The backbones are like the below.
- Basically
MoibleNetV2
, which is the worldwide-used backbone network. EfficientNet-B0
, which has a considerable score with fast inference.GhostNetV2
, which has more params but, more efficient than any other backbones.
After training, something noticable is that there is a small amount of difference between flip=true
and flip=false
, which is much lower than that of heatmap-based models.
Model | input shape | #Params (M) |
GFLOPs | model size (MB) |
latency (fps) |
AP(flip=True) | AP(flip=False) |
---|---|---|---|---|---|---|---|
Ours (MobileNetV2) |
256x192 | 2.31 | 0.2935 | 4.7 | 10~11 | 0.614 | 0.600 |
Ours (EfficientNet-B0) |
256x192 | 4.09 | 0.3854 | 8.3 | 5~6 | 0.671 | 0.665 |
Ours (GhostNetV2 1.0x) |
256x192 | 3.71 | 0.1647 | 7.6 | 9~10 | 0.632 | 0.624 |
AP
is calcualtedflip=False
, because theflip
inference is not used on mobile.- The model is tested on
Galaxy Tab A7
withnum_threads=4
. - GLOPs has no effect on FPS more than size of model and number of parameters in model.
Since Galaxy Tab A7
is less powerful than recent devices or iOS pads, it is hard to make its latency realtime-level even if our models are so lightweight. I think those models has more less latency on Galaxy Tab S7
above or iPad Pro
s.
Model | input shape | #Params (M) |
GFLOPs | fps | AP | AP.5 | AP .75 | AP (M) | AP (L) | AR | AR .5 | AR .75 | AR (M) | AR (L) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
GhostNetV2 | 224x160 | 3.71 | 0.1187 | 10~11 | 0.597 | 0.859 | 0.670 | 0.574 | 0.638 | 0.635 | 0.871 | 0.701 | 0.604 | 0.681 |
EfficientNetB0 | 224x160 | 4.09 | 0.2810 | 6~7 | 0.645 | 0.882 | 0.717 | 0.623 | 0.680 | 0.680 | 0.893 | 0.746 | 0.651 | 0.723 |
GhostNetV2 | 192x128 | 3.71 | 0.0832 | 12~13 | 0.565 | 0.839 | 0.627 | 0.549 | 0.594 | 0.605 | 0.853 | 0.666 | 0.580 | 0.643 |
EfficientNetB0 | 192x128 | 4.09 | 0.1929 | 8~9 | 0.608 | 0.862 | 0.675 | 0.586 | 0.644 | 0.645 | 0.875 | 0.710 | 0.614 | 0.690 |
All the things in this repo are based on Ubuntu 18.04, and before starting, docker
, nvidia-docker
should be installed.
docker build -t rle:tf .
Before cloning this repo, you have to set the dir tree like below. if not, the codes all will throw errors.
root
├── datasets
│ └── mscoco
│ ├── annotations
│ └── images
├── $project_dir
│ ├── src/
│ ├── train.py
│ ├── evaluate.py
│ ├── README
│ └── ...
└── ...
Train & evaluation are operated on tfrecord files. so download the raw dataset form https://cocodataset.org and convert it to .tfrecord
.
# after running command below, `tfrecords` directory is made.
root
├── datasets
│ └── mscoco
│ ├── annotations
│ └── images
│ └── **tfrecords**
├── $project_dir
│ └── ...
└── ...
According to the dir tree mentioned above, it is easy to convert, just run the code below. If not following the tree, should change the current dir using -c
option on command line.
python write_tfrecord.py
python train.py -c config/256x192_res50_regress-flow.yaml
python export.py -b ${BACKBONE_TYPE} -w ${WEIGHT_PATH}
# e.g.
python export.py -b resnet50 -w results/resnet50/ckpt/best_model.tf