Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regarding inverse_warping #147

Open
wtpro opened this issue Jan 23, 2023 · 10 comments
Open

regarding inverse_warping #147

wtpro opened this issue Jan 23, 2023 · 10 comments

Comments

@wtpro
Copy link

wtpro commented Jan 23, 2023

Hi Clement,

Thanks for translating the tf repo to a pytorch one!

I do have a small question with the 6 DOF pose used in

def inverse_warp(img, depth, pose, intrinsics, rotation_mode='euler', padding_mode='zeros'):

My own dataset only contains intrinsics and extrinsics matrices. I wonder if it is possible to translate this 6 DOF pose to a series of matrices multiplications.

I already have a rough idea that the pose can be expressed in the form of:

extrinsics_src @ extrinsics_tgt.inverse()

where extrinsics_src is the extrinsic matrix of the source image and extrinsics_tgt is the extrinsic matrix of the target image. So the whole warping process can be written (roughly) as:

grid_sample(source_image, intrinsics @ extrinsics_src @ extrinsics_tgt.inverse() @ intrinsics_inverse() @ target_depth)

assuming all images are taken by the same camera and matrices are in homogeneous coordinates.

Really appreciate your input here!

@ClementPinard
Copy link
Owner

Hello,

thank you for your interest in this repo.

Actually, the function inverse_warp does implement a 6DoF pose to extrinsics function :

def pose_vec2mat(vec, rotation_mode='euler'):

So you would have to change the code a little to use matrices instead of pose vectors, and thgen this line will have to go :

pose_mat = pose_vec2mat(pose, rotation_mode) # [B,3,4]

One quick warning with using matrices instead of pose vectors is the output of thge network. It's much more stable to use pose vectors than extrinsics matrices as output to train if every ceofficient of the outputted matrix is a learnable parameter. So the best strategy for you is to have the euler2mat function directly embedded in your network that you will call at each forward.

That way, your network will output directly matrices instead of pose vectors, allowing you to do the inverse wapr with matrices, and will still be stable as if it was using pose vectors.

Hope it helped,

Clément

@wtpro
Copy link
Author

wtpro commented Jan 23, 2023

Thanks for your reply!
Yes I am aware that pose can be converted to a matrix by using pose_vec2mat function.
What I am asking is whether it is possible to find the equivalence of pose in the form of extrinsics and intrinsics

@ClementPinard
Copy link
Owner

The code already uses an intrinsics matrix separately from pose, so the pose is totally independent from intrinsics.
when we convert pose to a 4x3 matrix, we actually get the extrinsics matrix, so there they are equivalent.

@wtpro
Copy link
Author

wtpro commented Jan 24, 2023

So which extrinsics is pose equivalent to? The target image extrinsics or the source image extrinsics? Or is it
extrinsics_src @ extrinsics_tgt.inverse()
Judging from the code, I believe pose represents the transformation from target pixel coord to source pixel coord, so a single extrinsics might not suffice.

@ClementPinard
Copy link
Owner

In this setup, the extresincs of target image is always the identity matrix (or the null pose vector)

There is no way for the network to know the pose of both target and reference image, it can only estimate the difference between the two. In other words, it can estimate the extrinsincs of reference image in a coordinate system centered around target image.

If you want the pose of both target and reference images, you need an anchor somewhere. If the anchor is e.g. the first image of the whole sequence, and you need the pose of the Nth image relative to the first frame, you will need to accumulate the pose differences and thus compute the compisition of several extrinsics because they are not written in the same coordinate system.

@wtpro
Copy link
Author

wtpro commented Jan 25, 2023

Sorry for the unclear quesrtion. I am currently trying to accommodate your inverse_warp to my own dataset without using the pose prediction network, because my dataset provides extrinsics and intrinsics.

I technically could just use the network for pose and call it a day but I just want to explore more possibilities considering the extrinsics and intrinsics are easy to obtain with specialized hardwares so a network is not needed.

@ClementPinard
Copy link
Owner

Oh, now I understand, sorry !

If you have extrinsics for both view, then yes the inverse warp does use the difference of extrinsics inplicitly, which results in your proposed formula in your first pose.

Now the tricky part is to make sure the order is right. Is it intrinsics @ extrinsics_src @ extrinsics_tgt.inverse() @ intrinsics_inverse() @ target_depth or is it intrinsics @ extrinsics_tgt.inverse() @ extrinsics_src @ intrinsics_inverse() @ target_depth ? Honestly, the best way to be sure is trial and error by visualizing the result.

Hope it helped !

@wtpro
Copy link
Author

wtpro commented Jan 26, 2023

I did some quick visualization and here are the results:

intrinsics @ extrinsics_src @ extrinsics_tgt.inverse() @ intrinsics_inverse() @ target_depth:
first

intrinsics @ extrinsics_tgt.inverse() @ extrinsics_src @ intrinsics_inverse() @ target_depth:
second

target_image:
target_image

it seems that the first one is more correct to me. The second one contains some pixels that was unseen in target image.

Appreciate your input here!

@ClementPinard
Copy link
Owner

Hard to tell without the depth (are you using groundtruth from the sensor ?) but I'd agree with you that first one looks better, especially for the chair onn the bottom.

Note that the duplication of pixels in the first is normal for view that are occluded on ref image but not on tgt image. It's impossible to reconstruct them since they are not visible and thus the algorithm take the color of foreground. It is visible for the table in bottom as well for example.

@wtpro
Copy link
Author

wtpro commented Jan 30, 2023

000049

This is the GT depth of the target image. It is actually taken from a synthetic dataset not real life scene.

Also I have a small question, if the source and target images do not have any overlap, does this warp still work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants