The official repository for the dataset introduced in the CVPR 2020 (Oral) paper
Weakly-Supervised Mesh-Convolutional Hand Reconstruction in the Wild
by Dominik Kulon, Riza Alp Güler, Iasonas Kokkinos, Michael Bronstein, Stefanos Zafeiriou
Project website: https://www.arielai.com/mesh_hands/
Contact: [email protected]
The dataset contains 3D vertex coordinates of 50,175 hand meshes aligned with in the wild images comprising hundreds of subjects performing a wide variety of tasks.
The training set was generated from 102 videos resulting in 47,125 hand annotations. The validation and test sets cover 7 videos with an empty intersection of subjects with the training set and contain 1,525 samples each.
The dataset has been collected in a fully automated manner. Please, refer to our paper for the details.
The dataset is published in the form of JSON files containing mesh annotations. The size of compressed files is approximately 800 MB.
The dataset is available exclusively to persons affiliated with academic research institutions (PhD/Master's students, postdocs, faculty members, and researchers), exclusively for research purposes, as detailed in the LICENSE document. You will need to provide a valid institutional email.
To access the dataset, please, fill out the provided request form: Dataset Request Form.
To load the JSON dataset, please, check the load_dataset
method in load_db.py. The JSON files are assumed to be placed in ./data/
.
JSON files have the following format:
images
name
- Image name in the form ofyoutube/VIDEO_ID/video/frames/FRAME_ID.png
.width
- Width of the image.height
- Height of the image.id
- Image ID.
annotations
vertices
- 3D vertex coordinates.is_left
- Binary value indicating a right/left hand side.image_id
- ID to the corresponding entry inimages
.id
- Annotation ID (an image can contain multiple hands).
The order of vertices corresponds to MANO. Mesh faces can be downloaded from their website.
In the paper, the error on the YouTube dataset is measured in px. Pose Error on the YouTube dataset is evaluated on 2D OpenPose landmarks while MAE and Mesh Error are evaluated on 3D vertices (all without rigid alignment).
We suggest using the test set mainly for self-comparison. Otherwise, running OpenPose on the images and comparing against Pose Error will result in a perfect score. Similarly, fitting MANO to OpenPose predictions with our proposed method will result in a perfect score on 3D evaluation metrics.
The repository contains a code example of using the pytube and OpenCV libraries to extract video frames which is not part of the licensed publication. The following software is required to run the code:
Python
pytube
opencv-python
Commands to run the example:
# Download a specific video.
python download_images.py --vid VIDEO_ID
# Download all videos from the training set.
python download_images.py --set train
# Download all videos from the test and validation sets.
python download_images.py --set test
We cannot provide any support if a video is protected or no longer accessible.
After downloading video frames, you can check the viz_sample
method in load_db.py to retrieve and visualize a specific sample.
If you use the dataset, please, make sure you are familiar with the LICENSE and cite our paper. In particular, the LICENSE forbids any commercial use including training neural networks/algorithms/systems/etc. for commercial purposes.
@InProceedings{Kulon_2020_CVPR,
author = {Kulon, Dominik and
Guler, Riza Alp and
Kokkinos, Iasonas and
Bronstein, Michael M. and
Zafeiriou, Stefanos},
title = {Weakly-Supervised Mesh-Convolutional Hand Reconstruction in the Wild},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}
Ground truth annotations are also derivatives of MANO which is under its own license.