Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to eval depth with Cityscapes #18

Open
ChengJianjia opened this issue Apr 11, 2021 · 7 comments
Open

How to eval depth with Cityscapes #18

ChengJianjia opened this issue Apr 11, 2021 · 7 comments

Comments

@ChengJianjia
Copy link

No description provided.

@ChengJianjia
Copy link
Author

I want to eval depth on Cityscapes datasets ,What parameters should I pay attention to?
Thanks

@klingner
Copy link
Contributor

Hi,

so mainly Cityscapes has a different aspect ratio than KITTI, so this is something I would look out for, when resizing the Cityscapes images. It might be possible to pass them through the network in the same resolution as the KITTI images, but with a significantly altered aspect ratio. Alternatively, the network is fully convolutional, so you may also try to pass the Cityscapes images with an unaltered aspect ratio through the network.

If you also want to evaluate the depth performance on Cityscapes then it is important to note that Cityscapes only provides ground truth depth maps calculated from Multi-Vew Stereo, so be careful when drawing conclusions between the performance on KITTI/Cityscapes. Also, these depth maps have a higher coverage of the image pixels than the KITTI depth maps (only sparse LiDAR beams). Also, the images on the Cityscapes dataset are stored in a slightly different format than the KITTI images (I think it is described in the README of Cityscapes somewhere).

Hope this helps!

@ChengJianjia
Copy link
Author

你好,

因此,主要是Cityscapes的纵横比与KITTI不同,因此在调整Cityscapes图像的大小时,我会注意这一点。它们可能以与KITTI图像相同的分辨率通过网络传输,但纵横比却发生了显着变化。另外,网络是完全卷积的,因此您也可以尝试将纵横比不变的Cityscapes图像通过网络。

如果您还想评估Cityscapes的深度性能,则需要注意的是Cityscapes仅提供根据Multi-Vew Stereo计算的地面真实深度图,因此在得出关于KITTI / Cityscapes的性能之间的结论时请务必小心。而且,这些深度图比KITTI深度图(仅稀疏LiDAR光束)具有更高的图像像素覆盖率。同样,Cityscapes数据集上的图像以与KITTI图像略有不同的格式存储(我认为它在《 Cityscapes的自述文件》中有所描述)。

希望这可以帮助!

Hi,

so mainly Cityscapes has a different aspect ratio than KITTI, so this is something I would look out for, when resizing the Cityscapes images. It might be possible to pass them through the network in the same resolution as the KITTI images, but with a significantly altered aspect ratio. Alternatively, the network is fully convolutional, so you may also try to pass the Cityscapes images with an unaltered aspect ratio through the network.

If you also want to evaluate the depth performance on Cityscapes then it is important to note that Cityscapes only provides ground truth depth maps calculated from Multi-Vew Stereo, so be careful when drawing conclusions between the performance on KITTI/Cityscapes. Also, these depth maps have a higher coverage of the image pixels than the KITTI depth maps (only sparse LiDAR beams). Also, the images on the Cityscapes dataset are stored in a slightly different format than the KITTI images (I think it is described in the README of Cityscapes somewhere).

Hope this helps!

Thanks for reply. I have eval depth on cityscapes,but its performance seems not good.I will try to train it on cityscapes.

@klingner
Copy link
Contributor

Thank you for sharing your initial results,

maybe one thought: I would not directly compare a performance metric between KITTI and Cityscapes due to the large structural difference in the ground truths. Have you looked at some images and tried to qualitatively judge, how the results look like? This might help in the evaluation process. Training directyl on Cityscapes is, however, likely to improve performance with the right set of hyperparameters.

@ChengJianjia
Copy link
Author

Thank you for sharing your initial results,

maybe one thought: I would not directly compare a performance metric between KITTI and Cityscapes due to the large structural difference in the ground truths. Have you looked at some images and tried to qualitatively judge, how the results look like? This might help in the evaluation process. Training directyl on Cityscapes is, however, likely to improve performance with the right set of hyperparameters.

Thanks for reply.
I have used Inference. py to output the depth prediction image of the Cityscapes dataset .The output image looks not bad.
Then I used two models provided in the code to eval on Cityscapes , and the results are as follows:
depth_full.pth:
{'abs_rel': 0.9437683314528279, 'sq_rel': 46.842625193553495, 'rmse': 13.539455869561722, 'rmse_log': 0.5114394837606806}
{'delta1': 0.46745538016478827, 'delta2': 0.7847414059078959, 'delta3': 0.9212443854481077}
depth_only.pth:
{'abs_rel': 0.838633564325633, 'sq_rel': 36.69276755167625, 'rmse': 13.141963201280008, 'rmse_log': 0.5032015791397804}
{'delta1': 0.4857175435314797, 'delta2': 0.7851770513177866, 'delta3': 0.9191037496773082}
And It's strange that depth_only.pth performs better than depth_full.pth
About the hyperparameters,I made three changes:
1.about the mask_fn and clamb_fn ,
2.in mytransform.py, when generate depth_gt,I used this line of code :
sample[key][sample[key] > 1.0] = 0.209313 * 2262.52/((np.array(sample[key][sample[key] > 1.0]).astype(np.float) - 1.0) / 256.)
3,Input image resize to 256*512

I don't know if there are any other parameters that need to be changed.

@klingner
Copy link
Contributor

Well, I think at the moment, there is not really a standard on how to evaluate on Cityscapes. Some thoughts regarding your changes:

  1. It could make sense to exclude the region in the bottom of the Cityscapes images with the car from the evaluation. Maybe this is already considered by you in the mask_fn?
  2. This line of code seems correct to me. I would do it the same way
  3. I am not sure, if this is the optimal size, but it seems close enough to the KITTI image size to yield meaningful results.

Although I did not observe the depth-only model to be better than the depth-full model before, it might be that due to the domain shift, the results differ from the ones obtained on KITTI. In the end, the model is still optimized for operation on KITTI, so it might also be interesting, if a training of both models on Cityscapes also yields such a relation.

@ZhuYingJessica
Copy link

@ChengJianjia Hi, how do you compute the errors on Cityscapes, could you share a link to GT depth data of Cityscapes dataset? Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants