Calculate 3D cordinates from with camera matrix and know distance

738 views Asked by At

I have been struggeling with this quiz question. This was part of FSG 2022 registration quiz and I can't figure out how to solve it

At first I thought that I can use extrinsic and intrinsic parameters to calculate 3D coordinates using equations described by Mathworks or in this article. Later I realized that the distance to the object is provided in camera frame, which means that this could be treat as a depth camera and convert depth info into 3d space as described in medium.com article

this article is using formula show below to calculate x and y coordinates and is very similar to this question, yet I can't get the correct solution.

enter image description here

One of my Matlab scripts attempting to solve it:

rot = eul2rotm(deg2rad([102 0 90]));
trans = [500 160 1140]' / 1000; % mm to m
t = [rot trans];


u = 795; % here was typo as pointed out by solstad.
v = 467;

cx = 636;
cy = 548;

fx = 241;
fy = 238;

z = 2100 / 1000 % mm to m

tmp_x = (u - cx) * z / fx;
tmp_y = (v - cy) * z / fy;

% attempt 1
tmp_cords = [tmp_x; tmp_y; z; 1]
linsolve(t', tmp_cords)'
% result is: 1.8913    1.8319   -0.4292

% attempt 2
tmp_cords = [tmp_x; tmp_y; z]
rot * tmp_cords + trans
% result is: 2.2661    1.9518    0.4253

If possible I would like to see the calculation process not any kind of a python code. Correct answer is under the image.

quize question

Correct solution provided by the organisers were 2.030, 1.272, 0.228 m

2

There are 2 answers

2
Christoph Rackwitz On BEST ANSWER

The task states that the object's euclidean (straight-line) distance is 2.1 m. That doesn't mean its distance along z is 2.1 m. Those two only coincide if there is no x or y component in the object's translation to the camera frame.

The z component of the object's translation will be less than 2.1 meters.

You need to take a ray/vector for the screen space coordinates (normalized) and multiply that by the euclidean distance.

v_x = (u - cx) / fx;
v_y = (v - cy) / fy;
v_z = 1;
v = [v_x; v_y; v_z];

dist = 2.1;
tmp = v / norm(v) * dist;

The rotation may be an issue. Roll happens around X, then pitch happens around Y, and then yaw happens around Z. These operations are applied in that order, i.e. inner to outer.

R_Z * R_Y * R_X * v

My rotation matrix is

[[ 0.       0.20791  0.97815]
 [ 1.       0.       0.     ]
 [ 0.       0.97815 -0.20791]]

That camera, taking the usual (X right, Y down, Z far) frame, would be looking, upside down, out the windshield, and slightly down.

Make sure that eul2rotm() does the right thing (specify axis order as 'XYZ') or that you use something else.

You can use rotvec2mat3d() to build individual rotation matrices from an axis-angle encoding.

Perhaps also review different MATLAB conventions regarding matrix multiplication: https://www.mathworks.com/help/images/migrate-geometric-transformations-to-premultiply-convention.html

I used Python and scipy.spatial.transform.Rotation.from_euler('xyz', [R_roll, R_pitch, R_yaw], degrees=True).as_matrix() to arrive at the sample solution.

Properly, the task should have specified a frame conversion step between vehicle and camera because the differing views are quite confusing, with a car having +X being forward and a camera having +Z being forward...

0
Pisikoll On

In addition to Christoph Rackwitz answer, which is correct and should get all the credited, here is a working Matlab script:

rot = eul2rotm(deg2rad([90 0 102]));
trans = [500 160 1140]' / 1000; % mm to m

u = 795;
v = 467;

cx = 636;
cy = 548;

fx = 241;
fy = 238;


v_x = (u - cx) / fx;
v_y = (v - cy) / fy;
v_z = 1;
v = [v_x; v_y; v_z];

dist = 2.1;
tmp = v / norm(v) * dist;

rot * tmp + trans