About the generalization capability of the model #45

logic03 · 2021-03-18T01:47:21Z

I also found that the effect of using this model in my own pictures was very bad. I was wondering whether it was because my own depth pictures had not been standardized by mean and STD.Has anyone solved this problem?In theory, this data set should be generalizable to your own depth map using ITOP.

我也发现在自己的图片中使用这个模型的效果十分糟糕，我在想是不是因为我自己的深度图片没有经过mean和std的标准化处理。请问有人解决这个问题了吗？理论上，使用ITOP这个数据集应该是可以泛化到自己的深度图中的。

logic03 · 2021-03-18T09:31:05Z

Can you provide the training code for ITOP data set?Just like the NYU dataset is provided.

zhangboshen · 2021-03-24T07:36:01Z

@logic03, Sorry that ITOP training code is kind of in a mess and requires efforts to reorganize, but most of the ITOP training details are similar to nyu code, except that we use bndbox instead of center points.
And for the poor performance on your data, my guess is also the MEAN and STD in your pics are very different with ITOP dataset., maybe you can train your own model.

Shreyas-NR · 2022-06-12T10:24:16Z

Hi @logic03
Were you able to utilize this model to predict the Joints for a custom dataset?
I'm also trying to pass a depth frame along with the ITOP side dataset and change the mean value so that the input depth frame to the model matches with the ITOP_side dataset. Unfortunately, the results are very bad.
Could you tell me if you were able to do something more on this?

bmmtstb · 2023-06-28T23:29:35Z

I tried a long time to predict the keypoints on a custom dataset, cleaned up the code, modified my depth images to be as close to the ITOP_side ones as possible, but all the results are pretty much garbage. My guess is that the model overfitted the ITOP data. Did anyone train a more general model on multiple datasets? As far as I can see were none of the models tested on different datasets than their training set. Different data yes, but not on different datasets.

Additionally I cannot get the speeds praised by the original paper. I can get to round about 10 iterations / second and that is on a better GPU with my fully optimized torch only code with better data-loading and no output... With e.g. yolo-v3 prediction human bboxes I get not much more than 8 iterations / sec.

YHaooo-4508 · 2024-02-18T09:06:12Z

I tried a long time to predict the keypoints on a custom dataset, cleaned up the code, modified my depth images to be as close to the ITOP_side ones as possible, but all the results are pretty much garbage. My guess is that the model overfitted the ITOP data. Did anyone train a more general model on multiple datasets? As far as I can see were none of the models tested on different datasets than their training set. Different data yes, but not on different datasets.

Additionally I cannot get the speeds praised by the original paper. I can get to round about 10 iterations / second and that is on a better GPU with my fully optimized torch only code with better data-loading and no output... With e.g. yolo-v3 prediction human bboxes I get not much more than 8 iterations / sec.

Your reply has saved me a lot of time. I originally intended to use a depth camera for real-time inference, but now it seems unnecessary.

Its a novel idea for HPE using anchors. But I feel like there are many unreasonable designs. Firstly, the mathematical calculation of anchor coordinates plus offset multiplied by weight can almost be replaced by anchor coordinates multiplied by weight. Secondly, in depth(z_coord) calculation, the network predicts the depth value of each anchor point(111116*14), and then weighted and summed up to obtain the final keypoints depth.....Why not directly use the depth corresponding to the anchor point to weight and sum up？

git-xuefu · 2024-05-31T01:38:53Z

I tried a long time to predict the keypoints on a custom dataset, cleaned up the code, modified my depth images to be as close to the ITOP_side ones as possible, but all the results are pretty much garbage. My guess is that the model overfitted the ITOP data. Did anyone train a more general model on multiple datasets? As far as I can see were none of the models tested on different datasets than their training set. Different data yes, but not on different datasets.
Additionally I cannot get the speeds praised by the original paper. I can get to round about 10 iterations / second and that is on a better GPU with my fully optimized torch only code with better data-loading and no output... With e.g. yolo-v3 prediction human bboxes I get not much more than 8 iterations / sec.

Your reply has saved me a lot of time. I originally intended to use a depth camera for real-time inference, but now it seems unnecessary.

Its a novel idea for HPE using anchors. But I feel like there are many unreasonable designs. Firstly, the mathematical calculation of anchor coordinates plus offset multiplied by weight can almost be replaced by anchor coordinates multiplied by weight. Secondly, in depth(z_coord) calculation, the network predicts the depth value of each anchor point(11_11_16*14), and then weighted and summed up to obtain the final keypoints depth.....Why not directly use the depth corresponding to the anchor point to weight and sum up？

Hi, have you used other models to successfully do real-time reasoning？

YHaooo-4508 · 2024-05-31T01:52:35Z

I tried a long time to predict the keypoints on a custom dataset, cleaned up the code, modified my depth images to be as close to the ITOP_side ones as possible, but all the results are pretty much garbage. My guess is that the model overfitted the ITOP data. Did anyone train a more general model on multiple datasets? As far as I can see were none of the models tested on different datasets than their training set. Different data yes, but not on different datasets.
Additionally I cannot get the speeds praised by the original paper. I can get to round about 10 iterations / second and that is on a better GPU with my fully optimized torch only code with better data-loading and no output... With e.g. yolo-v3 prediction human bboxes I get not much more than 8 iterations / sec.

Your reply has saved me a lot of time. I originally intended to use a depth camera for real-time inference, but now it seems unnecessary.
Its a novel idea for HPE using anchors. But I feel like there are many unreasonable designs. Firstly, the mathematical calculation of anchor coordinates plus offset multiplied by weight can almost be replaced by anchor coordinates multiplied by weight. Secondly, in depth(z_coord) calculation, the network predicts the depth value of each anchor point(11_11_16*14), and then weighted and summed up to obtain the final keypoints depth.....Why not directly use the depth corresponding to the anchor point to weight and sum up？

Hi, have you used other models to successfully do real-time reasoning？

Currently, there are few depth-based algorithm, but there are many rgb-based 3D HPE algorithm，such as RLE、Poseformer、motionbert and so on.

git-xuefu · 2024-05-31T02:06:58Z

I tried a long time to predict the keypoints on a custom dataset, cleaned up the code, modified my depth images to be as close to the ITOP_side ones as possible, but all the results are pretty much garbage. My guess is that the model overfitted the ITOP data. Did anyone train a more general model on multiple datasets? As far as I can see were none of the models tested on different datasets than their training set. Different data yes, but not on different datasets.
Additionally I cannot get the speeds praised by the original paper. I can get to round about 10 iterations / second and that is on a better GPU with my fully optimized torch only code with better data-loading and no output... With e.g. yolo-v3 prediction human bboxes I get not much more than 8 iterations / sec.

Your reply has saved me a lot of time. I originally intended to use a depth camera for real-time inference, but now it seems unnecessary.
Its a novel idea for HPE using anchors. But I feel like there are many unreasonable designs. Firstly, the mathematical calculation of anchor coordinates plus offset multiplied by weight can almost be replaced by anchor coordinates multiplied by weight. Secondly, in depth(z_coord) calculation, the network predicts the depth value of each anchor point(11_11_16*14), and then weighted and summed up to obtain the final keypoints depth.....Why not directly use the depth corresponding to the anchor point to weight and sum up？

Hi, have you used other models to successfully do real-time reasoning？

Currently, there are few depth-based algorithm, but there are many rgb-based 3D HPE algorithm，such as RLE、Poseformer、motionbert and so on.目前基于深度的算法很少，但基于rgb的3D HPE算法很多，如RLE、Poseformer、motionbert等。

Thanks for your reply,it help me a lot . one more question, do you know RGBD-based algorithm can use to do real-time inference.

YHaooo-4508 · 2024-05-31T02:24:49Z

I tried a long time to predict the keypoints on a custom dataset, cleaned up the code, modified my depth images to be as close to the ITOP_side ones as possible, but all the results are pretty much garbage. My guess is that the model overfitted the ITOP data. Did anyone train a more general model on multiple datasets? As far as I can see were none of the models tested on different datasets than their training set. Different data yes, but not on different datasets.
Additionally I cannot get the speeds praised by the original paper. I can get to round about 10 iterations / second and that is on a better GPU with my fully optimized torch only code with better data-loading and no output... With e.g. yolo-v3 prediction human bboxes I get not much more than 8 iterations / sec.

Your reply has saved me a lot of time. I originally intended to use a depth camera for real-time inference, but now it seems unnecessary.
Its a novel idea for HPE using anchors. But I feel like there are many unreasonable designs. Firstly, the mathematical calculation of anchor coordinates plus offset multiplied by weight can almost be replaced by anchor coordinates multiplied by weight. Secondly, in depth(z_coord) calculation, the network predicts the depth value of each anchor point(11_11_16*14), and then weighted and summed up to obtain the final keypoints depth.....Why not directly use the depth corresponding to the anchor point to weight and sum up？

Hi, have you used other models to successfully do real-time reasoning？

Currently, there are few depth-based algorithm, but there are many rgb-based 3D HPE algorithm，such as RLE、Poseformer、motionbert and so on.目前基于深度的算法很少，但基于rgb的3D HPE算法很多，如RLE、Poseformer、motionbert等。

Thanks for your reply,it help me a lot . one more question, do you know RGBD-based algorithm can use to do real-time inference.

Using depth map，there is little research in this area. I don't know if there is an RGBD-based algorithm that can achieve both fast and accurate results. From the results of my paper search, it appears that A2J is the algorithm closest to your requirements

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the generalization capability of the model #45

About the generalization capability of the model #45

logic03 commented Mar 18, 2021 •

edited

Loading

logic03 commented Mar 18, 2021

zhangboshen commented Mar 24, 2021

Shreyas-NR commented Jun 12, 2022

bmmtstb commented Jun 28, 2023

YHaooo-4508 commented Feb 18, 2024

git-xuefu commented May 31, 2024

YHaooo-4508 commented May 31, 2024

git-xuefu commented May 31, 2024

YHaooo-4508 commented May 31, 2024

About the generalization capability of the model #45

About the generalization capability of the model #45

Comments

logic03 commented Mar 18, 2021 • edited Loading

logic03 commented Mar 18, 2021

zhangboshen commented Mar 24, 2021

Shreyas-NR commented Jun 12, 2022

bmmtstb commented Jun 28, 2023

YHaooo-4508 commented Feb 18, 2024

git-xuefu commented May 31, 2024

YHaooo-4508 commented May 31, 2024

git-xuefu commented May 31, 2024

YHaooo-4508 commented May 31, 2024

logic03 commented Mar 18, 2021 •

edited

Loading