Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to jointly optimize the pose? #5

Open
Seasandwpy opened this issue May 12, 2022 · 4 comments
Open

How to jointly optimize the pose? #5

Seasandwpy opened this issue May 12, 2022 · 4 comments

Comments

@Seasandwpy
Copy link

Hi,
I tried to optimize the camera pose jointly with shape and texture codes, where I set the azimuth and elevation as 0 and distance as 0.5 in the beginning, and add the parameters in the optimizer. However the result is still blurry after 500 iterations, I would like to ask if this is normal or I miss something in the steps?
opt1_499
opt1_999

@wbjang
Copy link
Owner

wbjang commented May 13, 2022

Hello @Seasandwpy ,

From my experience, the pose is optimized first then shape/texture latent vectors are optimized later according to the roughly estimated pose. If the network cannot find the right pose in the first few iterations, please try with other hyper-parameters.

For distance, ShapeNet-SRN Car has near = 0.8 and far = 1.8, I would suggest starting from 1.3. For elevation, it is better to start from 0.5 so that the camera is not on the surface.

If you train ShapeNet-SRN Cars and apply for other datasets, scaling also matters. To me, the second car seems a bit larger than the training set.

Hope this helps.

Wonbong

@Kulbear
Copy link

Kulbear commented Jun 20, 2022

Hi Wonbong,

In the paper, you mentioned "We minimize the photometric loss (5) jointly with respect to shape and texture codes and camera parameters (fixing the decoder parameters Θ)". And according to your previous reply "From my experience, the pose is optimized first then shape/texture latent vectors are optimized later according to the roughly estimated pose."

I wonder is this my misunderstanding that the above two statements are opposite?
At test time, your shape code and your appearance code are unknown, even with a frozen network, how can you find a camera pose given this condition?

@wbjang
Copy link
Owner

wbjang commented Jun 21, 2022

Hello @Kulbear

Shape/Texture code, as well as camera pose, are optimized simultaneously. What I meant in the previous reply was that even though all three are optimized simultaneously, the model found the camera pose first and then optimize shape/texture codes later.

The frozen network works as a prior so that the model finds the camera pose and shape/texture codes accordingly.

In the failure cases, the model is stuck on the bad(local) camera pose (the model cannot move out of the local optima), and shape/texture codes are updated based on the bad(local) camera pose.

Cheers,
Wonbong

@chengzhag
Copy link

Hi @Seasandwpy, @Kulbear,

How are your attempts to reproduce the pose estimation going? I would appreciate it if you could share your own implementation.

Hi @wbjang,

this work is really impressive, thank you for sharing your code. It would be great if you could also publish the part that optimizes code and pose simultaneously.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants