Get 3D coordinates from 2D image pixel if extrinsic and intrinsic parameters are known

38.8k views Asked by At

I am doing camera calibration from tsai algo. I got intrensic and extrinsic matrix, but how can I reconstruct the 3D coordinates from that inormation?

enter image description here

1) I can use Gaussian Elimination for find X,Y,Z,W and then points will be X/W , Y/W , Z/W as homogeneous system.

2) I can use the OpenCV documentation approach:

enter image description here

as I know u, v, R , t , I can compute X,Y,Z.

However both methods end up in different results that are not correct.

What am I'm doing wrong?

2

There are 2 answers

10
Jav_Rock On BEST ANSWER

If you got extrinsic parameters then you got everything. That means that you can have Homography from the extrinsics (also called CameraPose). Pose is a 3x4 matrix, homography is a 3x3 matrix, H defined as

                   H = K*[r1, r2, t],       //eqn 8.1, Hartley and Zisserman

with K being the camera intrinsic matrix, r1 and r2 being the first two columns of the rotation matrix, R; t is the translation vector.

Then normalize dividing everything by t3.

What happens to column r3, don't we use it? No, because it is redundant as it is the cross-product of the 2 first columns of pose.

Now that you have homography, project the points. Your 2d points are x,y. Add them a z=1, so they are now 3d. Project them as follows:

        p          = [x y 1];
        projection = H * p;                   //project
        projnorm   = projection / p(z);      //normalize

Hope this helps.

2
DerekG On

As nicely stated in the comments above, projecting 2D image coordinates into 3D "camera space" inherently requires making up the z coordinates, as this information is totally lost in the image. One solution is to assign a dummy value (z = 1) to each of the 2D image space points before projection as answered by Jav_Rock.

p          = [x y 1];
projection = H * p;                   //project
projnorm   = projection / p(z);      //normalize

One interesting alternative to this dummy solution is to train a model to predict the depth of each point prior to reprojection into 3D camera-space. I tried this method and had a high degree of success using a Pytorch CNN trained on 3D bounding boxes from the KITTI dataset. Would be happy to provide code but it'd be a bit lengthy for posting here.