A question about training on 3D keypoints datasets #1325

Indigo6 · 2022-04-19T05:57:03Z

I'm new to 3D keypoints detection. When preparing the Human3.6M dataset, I find that the structure of the preprocessed data in mmpose is different with that of PoseNet or RLE.
Could someone please tell me what's the difference and is there any way to transfer between each other(since PoseNet provides parsed data)?

mmpose:

`── data
├── h36m
├── annotation_body3d
`── images
├── S1
| ├── S1_Directions_1.54138969
| | ├── S1_Directions_1.54138969_00001.jpg
| | ├── S1_Directions_1.54138969_00002.jpg
| | ├── ...
| ├── ...\

PoseNet:

|-- h36m
`-- |-- annotations
| |-- Sample_trainmin_train_Human36M_protocol_2.json
| `-- Sample_64_test_Human36M_protocol_2.json
`-- images
| |-- s_01_act_02_subact_01_ca_01

Indigo6 · 2022-04-19T09:54:33Z

Also, the MPI-INF-3DHP dataset preparation is listed in mesh recovery but evaluation in 3D keypoints.
I'm confused.

Can I evaluate pretrained models on MPI-INF-3DHP of keypoints detection?
If yes, is the folder structure same as the that in mesh recovery dataset preparation?

ly015 · 2022-04-19T12:52:28Z

Human3.6M dataset has been used for 2 different tasks in MMPose, namely 3D keypoint detection and mesh recovery, with different annotation structures and preparation processes. For 3D keypoint detection, the data is parsed from the raw downloaded from the official website with this script. Please refer to the docs for details. For 3D mesh, please refer to here for data preparation.

MPI-INF-3DHP dataset is only for 3D keypoint detection in MMPose. The data preparation guide is also for this task but is wrongly placed, which we will fix soon. The data parsing script is here.

Also, please note that algorithms and features related to 3D mesh recovery in MMPose are being deprecated and no longer maintained. Please check out our new codebase MMHuman3D for human pose and shape recovery with parametric models.

Indigo6 · 2022-04-20T01:47:48Z

Thank you for your reply! Now I know how to prepare MPI-INF-3DHP dataset for keypoints detection.

About Human3.6M dataset, I know there're two tasks and two folder structures respectively.
My question is, for keypoints detection, why there are two preprocessing methods and results? One from anibali/h36m-fetch, is used by MMPose here. The other CHUNYUWANG/H36M-Toolbox, is built on top of the former and used by POSENet and RLE.
I used to work on 2D keypoints, and am new to 3D.

Indigo6 · 2022-04-21T08:33:59Z

Thank you for your reply! Now I know how to prepare MPI-INF-3DHP dataset for keypoints detection.

About Human3.6M dataset, I know there're two tasks and two folder structures respectively. My question is, for keypoints detection, why there are two preprocessing methods and results? One from anibali/h36m-fetch, is used by MMPose here. The other CHUNYUWANG/H36M-Toolbox, is built on top of the former and used by POSENet and RLE. I used to work on 2D keypoints, and am new to 3D.

Human3.6M dataset has been used for 2 different tasks in MMPose, namely 3D keypoint detection and mesh recovery, with different annotation structures and preparation processes. For 3D keypoint detection, the data is parsed from the raw downloaded from the official website with this script. Please refer to the docs for details. For 3D mesh, please refer to here for data preparation.

MPI-INF-3DHP dataset is only for 3D keypoint detection in MMPose. The data preparation guide is also for this task but is wrongly placed, which we will fix soon. The data parsing script is here.

Also, please note that algorithms and features related to 3D mesh recovery in MMPose are being deprecated and no longer maintained. Please check out our new codebase MMHuman3D for human pose and shape recovery with parametric models.

I tried the preprocess_h36m script in MMPose to get the structure, fps10 and fps50, as claimed in the documentation. The final data takes more than 322G...... The processed data in RLE just takes about 100G.
Has anyone else tried the preprocess_h36m script? I wonder about the difference, and why not adapt CHUNYUWANG/H36M-Toolbox preprocessing?

ly015 · 2022-04-27T04:08:00Z

I am not sure why there is such a large difference between the data sizes. Maybe it's because of the video2image approach? We use OpenCV while CHUNYUWANG/H36M-Toolbox directly uses FFmpeg tools.

ly015 · 2022-04-27T13:04:13Z

Human3.6M dataset has been used for 2 different tasks in MMPose, namely 3D keypoint detection and mesh recovery, with different annotation structures and preparation processes. For 3D keypoint detection, the data is parsed from the raw downloaded from the official website with this script. Please refer to the docs for details. For 3D mesh, please refer to here for data preparation.

MPI-INF-3DHP dataset is only for 3D keypoint detection in MMPose. The data preparation guide is also for this task but is wrongly placed, which we will fix soon. The data parsing script is here.

Also, please note that algorithms and features related to 3D mesh recovery in MMPose are being deprecated and no longer maintained. Please check out our new codebase MMHuman3D for human pose and shape recovery with parametric models.

Corrections on the mpi-inf-3dhp dataset: it's also used for both mesh and 3d keypoint, while the data preprocessing guide for 3d keypoint task is missing from the docs. We will add it soon.

Indigo6 · 2022-04-27T14:03:50Z

Human3.6M dataset has been used for 2 different tasks in MMPose, namely 3D keypoint detection and mesh recovery, with different annotation structures and preparation processes. For 3D keypoint detection, the data is parsed from the raw downloaded from the official website with this script. Please refer to the docs for details. For 3D mesh, please refer to here for data preparation.
MPI-INF-3DHP dataset is only for 3D keypoint detection in MMPose. The data preparation guide is also for this task but is wrongly placed, which we will fix soon. The data parsing script is here.
Also, please note that algorithms and features related to 3D mesh recovery in MMPose are being deprecated and no longer maintained. Please check out our new codebase MMHuman3D for human pose and shape recovery with parametric models.

Corrections on the mpi-inf-3dhp dataset: it's also used for both mesh and 3d keypoint, while the data preprocessing guide for 3d keypoint task is missing from the docs. We will add it soon.

Thank you for your reply and excellent project!

Indigo6 · 2022-04-27T14:05:27Z

I am not sure why there is such a large difference between the data sizes. Maybe it's because of the video2image approach? We use OpenCV while CHUNYUWANG/H36M-Toolbox directly uses FFmpeg tools.

Thanks for the clue, I'll look into the preprocess script and try to find the difference(s).

Indigo6 · 2022-05-10T09:34:55Z

I am not sure why there is such a large difference between the data sizes. Maybe it's because of the video2image approach? We use OpenCV while CHUNYUWANG/H36M-Toolbox directly uses FFmpeg tools.

You are right, the image extracted by FFmpeg with qscale:v set to 3 is much smaller than that extracted by OpenCV. With the same number of extracted frames, the total preprocessed data size of h36m-fetch is 3 times larger than H36M-Toolbox. For example, 11980 frames for S1_Act2, 2.2G for h36m-fetch while 760m for H36M-Toolbox.
I wonder whether and how much the image extraction method influences the result?

ly015 · 2022-05-10T12:43:57Z

I am not sure why there is such a large difference between the data sizes. Maybe it's because of the video2image approach? We use OpenCV while CHUNYUWANG/H36M-Toolbox directly uses FFmpeg tools.

You are right, the image extracted by FFmpeg with qscale:v set to 3 is much smaller than that extracted by OpenCV. With the same number of extracted frames, the total preprocessed data size of h36m-fetch is 3 times larger than H36M-Toolbox. For example, 11980 frames for S1_Act2, 2.2G for h36m-fetch while 760m for H36M-Toolbox. I wonder whether and how much the image extraction method influences the result?

So far the Human3.6M dataset is only used for simplebaseline3D and videopose3D in MMPose, which are both 2d-to-3d lifting algorithms. So the images are not actually used and we don't know how much it would affect the results of some RGB-based methods.

ly015 · 2022-05-10T13:01:04Z

@Indigo6 BTW, would you be interested in an internship in OpenMMLab? If so please reach me via [email protected] :)

Indigo6 · 2022-05-10T13:42:35Z

I am not sure why there is such a large difference between the data sizes. Maybe it's because of the video2image approach? We use OpenCV while CHUNYUWANG/H36M-Toolbox directly uses FFmpeg tools.

You are right, the image extracted by FFmpeg with qscale:v set to 3 is much smaller than that extracted by OpenCV. With the same number of extracted frames, the total preprocessed data size of h36m-fetch is 3 times larger than H36M-Toolbox. For example, 11980 frames for S1_Act2, 2.2G for h36m-fetch while 760m for H36M-Toolbox. I wonder whether and how much the image extraction method influences the result?

So far the Human3.6M dataset is only used for simplebaseline3D and videopose3D in MMPose, which are both 2d-to-3d lifting algorithms. So the images are not actually used and we don't know how much it would affect the results of some RGB-based methods.

Ok, I‘ll try to implement some methods not based on 2D-to-3D Lifting and test the difference then.

Indigo6 · 2022-05-10T13:54:48Z

@Indigo6 BTW, would you be interested in an internship in OpenMMLab? If so please reach me via [email protected] :)

Thank you sincerely for your invitation! I am quite interested in an internship in OpenMMLab and really appreciate the opportunity. However, sadly( , my mentor does not allow any internship.

Indigo6 · 2022-05-11T09:37:29Z

I found there were PRs on the h36m one-stage dataset but they are closed: #868 and #975. May I ask the reason?

ly015 · 2022-05-11T12:00:49Z

There were two reasons: 1) The developer was an intern in the mmpose team and he left for an exchange opportunity before these PRs were ready to merge; 2) Coarse-to-fine is a rather old work (CVPR 2017) and we are reconsidering the choice of algorithms in this category to support in mmpose.

Indigo6 · 2022-05-13T02:04:17Z

I'd like to help support direct 3d pose methods since my mentor assigned a national project on this to me, but I'm totally new to 3d pose dataset and transform.
What's your plan for direct 3d pose and how can I help? Can we support direct 3d pose with a simple regression head first?

ly015 · 2022-05-13T04:26:13Z

That would be great and thank you very much!
@jin-s13 Could you please give some suggestions here?

jin-s13 · 2022-05-13T06:13:46Z

@Indigo6 For now, we do not have enough manpower to support all these awesome algorithms. Your contribution is really helpful! We appreciate it very much.

Can we support direct 3d pose with a simple regression head first?
Yes, I think it is okay. We may start from this simple baseline. One minor concern is that it may not work very well.

If you need a better model and still interested, it is suggested to also consider implementing this.

Indigo6 · 2022-05-13T07:26:48Z

Integral and variations of soft-argmax are ok to me. My major concern is how to implement Dataset object and pipelines.
Do you have any suggestions on building them from scratch or adapting them from Integral/RLE?

Indigo6 changed the title ~~A question about preprocessed Human3.6M dataset~~ A question about preparing 3D keypoints datasets Apr 19, 2022

ly015 self-assigned this Apr 19, 2022

Indigo6 changed the title ~~A question about preparing 3D keypoints datasets~~ A question about training on 3D keypoints datasets May 11, 2022

jin-s13 added the question Further information is requested label May 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A question about training on 3D keypoints datasets #1325

A question about training on 3D keypoints datasets #1325

Indigo6 commented Apr 19, 2022

Indigo6 commented Apr 19, 2022

ly015 commented Apr 19, 2022

Indigo6 commented Apr 20, 2022

Indigo6 commented Apr 21, 2022

ly015 commented Apr 27, 2022

ly015 commented Apr 27, 2022

Indigo6 commented Apr 27, 2022

Indigo6 commented Apr 27, 2022

Indigo6 commented May 10, 2022 •

edited

Loading

ly015 commented May 10, 2022

ly015 commented May 10, 2022

Indigo6 commented May 10, 2022

Indigo6 commented May 10, 2022 •

edited

Loading

Indigo6 commented May 11, 2022 •

edited

Loading

ly015 commented May 11, 2022

Indigo6 commented May 13, 2022

ly015 commented May 13, 2022

jin-s13 commented May 13, 2022 •

edited

Loading

Indigo6 commented May 13, 2022

A question about training on 3D keypoints datasets #1325

A question about training on 3D keypoints datasets #1325

Comments

Indigo6 commented Apr 19, 2022

Indigo6 commented Apr 19, 2022

ly015 commented Apr 19, 2022

Indigo6 commented Apr 20, 2022

Indigo6 commented Apr 21, 2022

ly015 commented Apr 27, 2022

ly015 commented Apr 27, 2022

Indigo6 commented Apr 27, 2022

Indigo6 commented Apr 27, 2022

Indigo6 commented May 10, 2022 • edited Loading

ly015 commented May 10, 2022

ly015 commented May 10, 2022

Indigo6 commented May 10, 2022

Indigo6 commented May 10, 2022 • edited Loading

Indigo6 commented May 11, 2022 • edited Loading

ly015 commented May 11, 2022

Indigo6 commented May 13, 2022

ly015 commented May 13, 2022

jin-s13 commented May 13, 2022 • edited Loading

Indigo6 commented May 13, 2022

Indigo6 commented May 10, 2022 •

edited

Loading

Indigo6 commented May 10, 2022 •

edited

Loading

Indigo6 commented May 11, 2022 •

edited

Loading

jin-s13 commented May 13, 2022 •

edited

Loading