Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A question about training on 3D keypoints datasets #1325

Open
Indigo6 opened this issue Apr 19, 2022 · 19 comments
Open

A question about training on 3D keypoints datasets #1325

Indigo6 opened this issue Apr 19, 2022 · 19 comments
Assignees
Labels
question Further information is requested

Comments

@Indigo6
Copy link
Contributor

Indigo6 commented Apr 19, 2022

I'm new to 3D keypoints detection. When preparing the Human3.6M dataset, I find that the structure of the preprocessed data in mmpose is different with that of PoseNet or RLE.
Could someone please tell me what's the difference and is there any way to transfer between each other(since PoseNet provides parsed data)?

mmpose:

`── data
├── h36m
├── annotation_body3d
`── images
├── S1
| ├── S1_Directions_1.54138969
| | ├── S1_Directions_1.54138969_00001.jpg
| | ├── S1_Directions_1.54138969_00002.jpg
| | ├── ...
| ├── ...\

PoseNet:

|-- h36m
`-- |-- annotations
| |-- Sample_trainmin_train_Human36M_protocol_2.json
| `-- Sample_64_test_Human36M_protocol_2.json
`-- images
| |-- s_01_act_02_subact_01_ca_01

@Indigo6 Indigo6 changed the title A question about preprocessed Human3.6M dataset A question about preparing 3D keypoints datasets Apr 19, 2022
@Indigo6
Copy link
Contributor Author

Indigo6 commented Apr 19, 2022

Also, the MPI-INF-3DHP dataset preparation is listed in mesh recovery but evaluation in 3D keypoints.
I'm confused.

  1. Can I evaluate pretrained models on MPI-INF-3DHP of keypoints detection?
  2. If yes, is the folder structure same as the that in mesh recovery dataset preparation?

@ly015 ly015 self-assigned this Apr 19, 2022
@ly015
Copy link
Member

ly015 commented Apr 19, 2022

Human3.6M dataset has been used for 2 different tasks in MMPose, namely 3D keypoint detection and mesh recovery, with different annotation structures and preparation processes. For 3D keypoint detection, the data is parsed from the raw downloaded from the official website with this script. Please refer to the docs for details. For 3D mesh, please refer to here for data preparation.

MPI-INF-3DHP dataset is only for 3D keypoint detection in MMPose. The data preparation guide is also for this task but is wrongly placed, which we will fix soon. The data parsing script is here.

Also, please note that algorithms and features related to 3D mesh recovery in MMPose are being deprecated and no longer maintained. Please check out our new codebase MMHuman3D for human pose and shape recovery with parametric models.

@Indigo6
Copy link
Contributor Author

Indigo6 commented Apr 20, 2022

Thank you for your reply! Now I know how to prepare MPI-INF-3DHP dataset for keypoints detection.

About Human3.6M dataset, I know there're two tasks and two folder structures respectively.
My question is, for keypoints detection, why there are two preprocessing methods and results? One from anibali/h36m-fetch, is used by MMPose here. The other CHUNYUWANG/H36M-Toolbox, is built on top of the former and used by POSENet and RLE.
I used to work on 2D keypoints, and am new to 3D.

@Indigo6
Copy link
Contributor Author

Indigo6 commented Apr 21, 2022

Thank you for your reply! Now I know how to prepare MPI-INF-3DHP dataset for keypoints detection.

About Human3.6M dataset, I know there're two tasks and two folder structures respectively. My question is, for keypoints detection, why there are two preprocessing methods and results? One from anibali/h36m-fetch, is used by MMPose here. The other CHUNYUWANG/H36M-Toolbox, is built on top of the former and used by POSENet and RLE. I used to work on 2D keypoints, and am new to 3D.

Human3.6M dataset has been used for 2 different tasks in MMPose, namely 3D keypoint detection and mesh recovery, with different annotation structures and preparation processes. For 3D keypoint detection, the data is parsed from the raw downloaded from the official website with this script. Please refer to the docs for details. For 3D mesh, please refer to here for data preparation.

MPI-INF-3DHP dataset is only for 3D keypoint detection in MMPose. The data preparation guide is also for this task but is wrongly placed, which we will fix soon. The data parsing script is here.

Also, please note that algorithms and features related to 3D mesh recovery in MMPose are being deprecated and no longer maintained. Please check out our new codebase MMHuman3D for human pose and shape recovery with parametric models.

I tried the preprocess_h36m script in MMPose to get the structure, fps10 and fps50, as claimed in the documentation. The final data takes more than 322G...... The processed data in RLE just takes about 100G.
Has anyone else tried the preprocess_h36m script? I wonder about the difference, and why not adapt CHUNYUWANG/H36M-Toolbox preprocessing?

@ly015
Copy link
Member

ly015 commented Apr 27, 2022

I am not sure why there is such a large difference between the data sizes. Maybe it's because of the video2image approach? We use OpenCV while CHUNYUWANG/H36M-Toolbox directly uses FFmpeg tools.

@ly015
Copy link
Member

ly015 commented Apr 27, 2022

Human3.6M dataset has been used for 2 different tasks in MMPose, namely 3D keypoint detection and mesh recovery, with different annotation structures and preparation processes. For 3D keypoint detection, the data is parsed from the raw downloaded from the official website with this script. Please refer to the docs for details. For 3D mesh, please refer to here for data preparation.

MPI-INF-3DHP dataset is only for 3D keypoint detection in MMPose. The data preparation guide is also for this task but is wrongly placed, which we will fix soon. The data parsing script is here.

Also, please note that algorithms and features related to 3D mesh recovery in MMPose are being deprecated and no longer maintained. Please check out our new codebase MMHuman3D for human pose and shape recovery with parametric models.

Corrections on the mpi-inf-3dhp dataset: it's also used for both mesh and 3d keypoint, while the data preprocessing guide for 3d keypoint task is missing from the docs. We will add it soon.

@Indigo6
Copy link
Contributor Author

Indigo6 commented Apr 27, 2022

Human3.6M dataset has been used for 2 different tasks in MMPose, namely 3D keypoint detection and mesh recovery, with different annotation structures and preparation processes. For 3D keypoint detection, the data is parsed from the raw downloaded from the official website with this script. Please refer to the docs for details. For 3D mesh, please refer to here for data preparation.
MPI-INF-3DHP dataset is only for 3D keypoint detection in MMPose. The data preparation guide is also for this task but is wrongly placed, which we will fix soon. The data parsing script is here.
Also, please note that algorithms and features related to 3D mesh recovery in MMPose are being deprecated and no longer maintained. Please check out our new codebase MMHuman3D for human pose and shape recovery with parametric models.

Corrections on the mpi-inf-3dhp dataset: it's also used for both mesh and 3d keypoint, while the data preprocessing guide for 3d keypoint task is missing from the docs. We will add it soon.

Thank you for your reply and excellent project!

@Indigo6
Copy link
Contributor Author

Indigo6 commented Apr 27, 2022

I am not sure why there is such a large difference between the data sizes. Maybe it's because of the video2image approach? We use OpenCV while CHUNYUWANG/H36M-Toolbox directly uses FFmpeg tools.

Thanks for the clue, I'll look into the preprocess script and try to find the difference(s).

@Indigo6
Copy link
Contributor Author

Indigo6 commented May 10, 2022

I am not sure why there is such a large difference between the data sizes. Maybe it's because of the video2image approach? We use OpenCV while CHUNYUWANG/H36M-Toolbox directly uses FFmpeg tools.

You are right, the image extracted by FFmpeg with qscale:v set to 3 is much smaller than that extracted by OpenCV. With the same number of extracted frames, the total preprocessed data size of h36m-fetch is 3 times larger than H36M-Toolbox. For example, 11980 frames for S1_Act2, 2.2G for h36m-fetch while 760m for H36M-Toolbox.
I wonder whether and how much the image extraction method influences the result?

@ly015
Copy link
Member

ly015 commented May 10, 2022

I am not sure why there is such a large difference between the data sizes. Maybe it's because of the video2image approach? We use OpenCV while CHUNYUWANG/H36M-Toolbox directly uses FFmpeg tools.

You are right, the image extracted by FFmpeg with qscale:v set to 3 is much smaller than that extracted by OpenCV. With the same number of extracted frames, the total preprocessed data size of h36m-fetch is 3 times larger than H36M-Toolbox. For example, 11980 frames for S1_Act2, 2.2G for h36m-fetch while 760m for H36M-Toolbox. I wonder whether and how much the image extraction method influences the result?

So far the Human3.6M dataset is only used for simplebaseline3D and videopose3D in MMPose, which are both 2d-to-3d lifting algorithms. So the images are not actually used and we don't know how much it would affect the results of some RGB-based methods.

@ly015
Copy link
Member

ly015 commented May 10, 2022

@Indigo6 BTW, would you be interested in an internship in OpenMMLab? If so please reach me via [email protected] :)

@Indigo6
Copy link
Contributor Author

Indigo6 commented May 10, 2022

I am not sure why there is such a large difference between the data sizes. Maybe it's because of the video2image approach? We use OpenCV while CHUNYUWANG/H36M-Toolbox directly uses FFmpeg tools.

You are right, the image extracted by FFmpeg with qscale:v set to 3 is much smaller than that extracted by OpenCV. With the same number of extracted frames, the total preprocessed data size of h36m-fetch is 3 times larger than H36M-Toolbox. For example, 11980 frames for S1_Act2, 2.2G for h36m-fetch while 760m for H36M-Toolbox. I wonder whether and how much the image extraction method influences the result?

So far the Human3.6M dataset is only used for simplebaseline3D and videopose3D in MMPose, which are both 2d-to-3d lifting algorithms. So the images are not actually used and we don't know how much it would affect the results of some RGB-based methods.

Ok, I‘ll try to implement some methods not based on 2D-to-3D Lifting and test the difference then.

@Indigo6
Copy link
Contributor Author

Indigo6 commented May 10, 2022

@Indigo6 BTW, would you be interested in an internship in OpenMMLab? If so please reach me via [email protected] :)

Thank you sincerely for your invitation! I am quite interested in an internship in OpenMMLab and really appreciate the opportunity. However, sadly( , my mentor does not allow any internship.

@Indigo6 Indigo6 changed the title A question about preparing 3D keypoints datasets A question about training on 3D keypoints datasets May 11, 2022
@Indigo6
Copy link
Contributor Author

Indigo6 commented May 11, 2022

I found there were PRs on the h36m one-stage dataset but they are closed: #868 and #975. May I ask the reason?

@ly015
Copy link
Member

ly015 commented May 11, 2022

There were two reasons: 1) The developer was an intern in the mmpose team and he left for an exchange opportunity before these PRs were ready to merge; 2) Coarse-to-fine is a rather old work (CVPR 2017) and we are reconsidering the choice of algorithms in this category to support in mmpose.

@jin-s13 jin-s13 added the question Further information is requested label May 12, 2022
@Indigo6
Copy link
Contributor Author

Indigo6 commented May 13, 2022

I'd like to help support direct 3d pose methods since my mentor assigned a national project on this to me, but I'm totally new to 3d pose dataset and transform.
What's your plan for direct 3d pose and how can I help? Can we support direct 3d pose with a simple regression head first?

@ly015
Copy link
Member

ly015 commented May 13, 2022

That would be great and thank you very much!
@jin-s13 Could you please give some suggestions here?

@jin-s13
Copy link
Collaborator

jin-s13 commented May 13, 2022

@Indigo6 For now, we do not have enough manpower to support all these awesome algorithms. Your contribution is really helpful! We appreciate it very much.

Can we support direct 3d pose with a simple regression head first?
Yes, I think it is okay. We may start from this simple baseline. One minor concern is that it may not work very well.

If you need a better model and still interested, it is suggested to also consider implementing this.

@Indigo6
Copy link
Contributor Author

Indigo6 commented May 13, 2022

Integral and variations of soft-argmax are ok to me. My major concern is how to implement Dataset object and pipelines.
Do you have any suggestions on building them from scratch or adapting them from Integral/RLE?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants