2D image coordinate to 3D space coordinate through camera matrix

670 views Asked by At

I am trying to get a grasp of how to project 2D coordinates into a 3D space through my camera matrix, but I can't for the love of it, understand it. So I am hoping that someone here can point me to a guide or something that can help me. Here is what I got: I have read and tried all of these articles to try and understand the material:

Find 3D coordinate with respect to the camera using 2D image coordinates https://en.wikipedia.org/wiki/Camera_matrix https://se.mathworks.com/help/vision/ug/camera-calibration.html#bu0nh2_ https://staff.fnwi.uva.nl/r.vandenboomgaard/IPCV20162017/LectureNotes/CV/PinholeCamera/PinholeCamera.html https://towardsdatascience.com/camera-calibration-fda5beb373c3

So I have a camera that is pointing "straight down" towards a table and it is centered on the table. I am guessing that from this I can create my translation matrix and rotation matrix (I am unsure what angle down is compared to 0degree, 90 or 180?)

enter image description here

T = [0.0, 0.0, 0.0]
R = [[cos(angle), -sin(angle), 0.0],
    [sin(angle), cos(angle), 0.0],
    [0.0, 0.0, 1.0]]

These are my extrinsic matrices. My 2D photo is 1280x720px and my cameras focal length is 1.88mm and from this I can create a camera matrix based on this:

camera matrix

fx = 1280 / 1.88
fy = 720 / 1.88
u0 = 1280 / 2
v0 = 720 / 2

K = [[0.00146875, 0.0, 640.0, 0.0],
     [0.0, 0.00261111, 360.0, 0.0],
     [0.0, 0.0, 1.0, 0.0]]

I know that the distance between my camera and the table is 650mm As far as I understand I am supposed to use linear algebra or matrix multiplication to take my 2D coordinate (300, 200) and put it into 3D space, but how to actually do it I can't seem to figure out. It seems like a lot of the material I can find is about matching a 3D coordinate in 2D space.

From this question

How do I reverse-project 2D points into 3D?

I found this formula:

mat = [
[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 1],
]
x = mat[0][0] * p.x + mat[0][1] * p.y + mat[0][2] * p.z + mat[0][3] * 1
y = mat[1][0] * p.x + mat[1][1] * p.y + mat[1][2] * p.z + mat[1][3] * 1
w = mat[3][0] * p.x + mat[3][1] * p.y + mat[3][2] * p.z + mat[3][3] * 1

But again I am not sure if this is the way to go since it gives me some weird results.

I am really hoping someone can help me out. Please request if any more information is needed.

Edit: I noticed that there are two different formulas for the intrinsic camera matrix:

camera matrix

camera matrix2

which has u0,v0 and cx, xy in different locations, but they both express the center of the image. Which one is correct to use and with what units, mm or pixels?

I looked into vector x matrix multiplication and I think I understand that part. vector x matrix multiplication

The second matrix formula with cx,cy in the third row will never consider the Z distance because of the way the multiplication works. Again I am not entirely sure how this works, but that does not make sense to me?

0

There are 0 answers