Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get world depth instead of FOV depth [C++] [RealSense] #7279

Closed
Jethrootje opened this issue Sep 8, 2020 · 11 comments
Closed

Get world depth instead of FOV depth [C++] [RealSense] #7279

Jethrootje opened this issue Sep 8, 2020 · 11 comments

Comments

@Jethrootje
Copy link

Required Info
Camera Model { D435 }
Firmware Version Latest
Operating System & Version Windows 10
Kernel Version (Linux Only) (e.g. 4.14.13)
Platform PC
SDK Version { latest }
Language {C++/opencv }
Segment {?}

Issue Description

So currently i have made a small project to test some things with rs2_deproject_pixel_to_point, seen some issue threads but wasn't able to find any useful information. Right now i'm working on a detection in zones, now the problem is that the camera looks in it's own Field Of View perspective instead of the actual depth. Now i was going to work on a pythagorean method myself to calculate it, but then i found out about the deproject method, now my problem is that it's returning the exact same depth and the x seems to be a negative number, which shouldn't be happening if i'm correct.

    pipeline p;
    auto config = p.start();
    auto profile = config.get_stream(RS2_STREAM_COLOR);
    rs2::align align_to(RS2_STREAM_COLOR);

    float add = 0;
    while (true)
    {
        rs2::frameset frames = p.wait_for_frames();
        rs2::align align(rs2_stream::RS2_STREAM_COLOR);
        rs2::frameset aligned_frameset = align.process(frames);
        rs2::depth_frame depth = aligned_frameset.get_depth_frame();
        rs2::frame color_frame = aligned_frameset.get_color_frame();
        int width = depth.get_width();
        int height = depth.get_height();
        auto img = frame_to_mat(color_frame);
        Mat* imRef = &img;

        auto frt = color_frame.get_profile()
            .as<video_stream_profile>().get_intrinsics();
        float point[3] = { 0,0,0 };
        float checkPoint[2] = { 50 + add , 300 };
        add += 50;
        if (add > 400) {
            add = 0;
        }

        float checkDepth = depth.get_distance((int)checkPoint[0], (int)checkPoint[1]);
        cout << checkDepth << "\n";
        rs2_deproject_pixel_to_point(point, &frt, checkPoint, checkDepth);
        cout << point[0] << ", " << point[1] << ", " << point[2] << "\n\n";

        imshow("Test", *imRef);
        waitKey(1);
    }

Output:

image

As you can see the first depth stays the same as the second depth after using the method and the x sometimes is a negative number.

@MartyG-RealSense
Copy link
Collaborator

MartyG-RealSense commented Sep 8, 2020

Hi @Jethrootje The 3D origin (0,0,0) of the camera is at the center of the physical left IR imager component. As described by the Projection documentation (see the link below), the positive x-axis points to the right, the positive y-axis points down, and the positive z-axis points forward.

https://github.com/IntelRealSense/librealsense/wiki/Projection-in-RealSense-SDK-2.0#point-coordinates

So if a point is in the top left area of the image, it could have negative x and negative y values (because it is to the left of the imager center (minus X axis) and above the imager center (minus Y axis).

Therefore, a minus X and positive Y result would refer to a coordinate that is to the left and below the imager's center.

@Jethrootje
Copy link
Author

Hi @Jethrootje The 3D origin (0,0,0) of the camera is at the center of the physical left IR imager component. As described by the Projection documentation (see the link below), the positive x-axis points to the right, the positive y-axis points down, and the positive z-axis points forward.

https://github.com/IntelRealSense/librealsense/wiki/Projection-in-RealSense-SDK-2.0#point-coordinates

So if a point is in the top left area of the image, it could have negative x and negative y values (because it is to the left of the imager center (minus X axis) and above the imager center (minus Y axis).

Therefore, a minus X and positive Y result would refer to a coordinate that is to the left and below the imager's center. That is my understanding of the principles.

So basically a minus X and positive Y result is normal? Then how about the distance?
The distance doesn't seem to change if i use rs2_deproject_pixel_to_point instead of only using get_distance.

@Jethrootje
Copy link
Author

Jethrootje commented Sep 9, 2020

So now i've found this, but my code looks the exact same (with C++ instead of Python) but the depth from the FOV isn't different from the depth if i'd like to view it with "World Depth"

He also says

I translated the camera coordinates of each point found by deprojection to world coordinates, I ignored the Y coordinate and projected the X and the Z coordinates to the XZ world coordinate plane.

Now i'm kinda confused about how he translates the Z Coordinate to a World Coordinate, because right now it still looks at the FOV of the camera..

@MartyG-RealSense
Copy link
Collaborator

Could you clarify please what you mean when you say "The distance doesn't seem to change".

Do you mean that the distance does not update when you move the camera when using rs2_deproject_pixel_to_point? Or that the depth reading is the same when using rs2_deproject_pixel_to_point and when using get_distance()?

@Jethrootje
Copy link
Author

Could you clarify please what you mean when you say "The distance doesn't seem to change".

Do you mean that the distance does not update when you move the camera when using rs2_deproject_pixel_to_point? Or that the depth reading is the same when using rs2_deproject_pixel_to_point and when using get_distance()?

The second.
Basically what my problem is that i'm trying to detect things with a area:
X (Starting X)
Y (Starting Y)
Width (End X)
Height (End Y)
MinimumZ (Minimum Depth)
MaximumZ (Maximum Depth)

My problem is basically this:
image
I want it to detect everything on the same depth, but if i make multiple Areas on top of eachother sometimes the area gets mismatched because the depth gets calculated from the FOV, and i thought that rs2_deproject_pixel_to_point would be able to fix that, but i think i'm wrong with that part.. Not sure how i'd do it though.

@MartyG-RealSense
Copy link
Collaborator

The above question is one that is suited to RealSense team member @ev-mp as an expert on stereo depth. @ev-mp could you kindly provide advice to @Jethrootje on the question above, please?

@ev-mp
Copy link
Collaborator

ev-mp commented Sep 10, 2020

@Jethrootje , I added some details for clarification:
image

The terms "FOV Depths" and "World Depth" are subjective and can mean different things to different people. So to clarify the terms:

  1. The depth stream defines an Euclidian Coordinate System (CS) with the origin being set at the camera's base (0,0,0).
    The definition of the axes is according to @MartyG-RealSense explanation given above.
    From that point on all the depth calculations are performed in that coordinate system.

  2. The content of the depth frame is "Z" values calculated for every pixel in the camera's frustum (or cropped FOV). In the the sketch it is clear that while range (or radial distance) may coincide with depth (or "Z"), in 99.99% they will be different.

  3. This will provide answer

The distance doesn't seem to change if i use rs2_deproject_pixel_to_point instead of only using get_distance.

The call to frame.get_distance(x,y) provides "Z" value.
The function rs2_deproject_pixel_to_point takes "Z" component of 3D coordinate and calculates the missing X and Y component so that it will produce a coherent [X,Y,Z] location within the mentioned CS. So the "Z" component will be identical to results obtained with frame.get_distance(x,y).
If you need to find the (radial) range from camera to the object then you need to calculate the Euclidean distance (sqrt(x^2+y^2+z^2))

  1. In case you need to translate the the location of the pixel from camera CS to an arbitrary ("World") CS, like relative to the person standing 2 meter behind the camera) then you need to find the translation matrix between the origin of Depth and the "World" CS and do the multiplications.

@Jethrootje
Copy link
Author

Jethrootje commented Sep 17, 2020

@Jethrootje , I added some details for clarification:
image

The terms "FOV Depths" and "World Depth" are subjective and can mean different things to different people. So to clarify the terms:

  1. The depth stream defines an Euclidian Coordinate System (CS) with the origin being set at the camera's base (0,0,0).
    The definition of the axes is according to @MartyG-RealSense explanation given above.
    From that point on all the depth calculations are performed in that coordinate system.
  2. The content of the depth frame is "Z" values calculated for every pixel in the camera's frustum (or cropped FOV). In the the sketch it is clear that while range (or radial distance) may coincide with depth (or "Z"), in 99.99% they will be different.
  3. This will provide answer

The distance doesn't seem to change if i use rs2_deproject_pixel_to_point instead of only using get_distance.

The call to frame.get_distance(x,y) provides "Z" value.
The function rs2_deproject_pixel_to_point takes "Z" component of 3D coordinate and calculates the missing X and Y component so that it will produce a coherent [X,Y,Z] location within the mentioned CS. So the "Z" component will be identical to results obtained with frame.get_distance(x,y).
If you need to find the (radial) range from camera to the object then you need to calculate the Euclidean distance (sqrt(x^2+y^2+z^2))

  1. In case you need to translate the the location of the pixel from camera CS to an arbitrary ("World") CS, like relative to the person standing 2 meter behind the camera) then you need to find the translation matrix between the origin of Depth and the "World" CS and do the multiplications.

Been trying alot of stuff, but i really can't get it to work, basically i want the blue arrows that you've drawn in meters, on the picture you're saying that's 1.30M but that only happens in the FOV, if you'd check it from straight up it'd be different. How would i calculate that distance, like, how can i pretend the camera to be facing straight up instead a FOV degrees. Because basically if you'd put the camera more to the side, it'd give the same distance in the middle but if you check the X and Y of the middle that it was placed in before it'd be different.

image
Basically i want to calculate the 1.10M at the sides, sorry if i'm a little confusing.

Edit:
Now i know that this'd work:
image

But basically if the depth isn't the same height, this would happen:
image

@MartyG-RealSense
Copy link
Collaborator

Hi @Jethrootje Do you still require assistance with this case please? Thanks!

@Jethrootje
Copy link
Author

No not really, i tried a few new methods, thanks for both of your help!

@MartyG-RealSense
Copy link
Collaborator

You're very welcome @Jethrootje - thanks for the update!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants