Find the real sizes of a rectangle drawn into an image

89 views Asked by At

I can't find anywhere a solution to this problem, so I'm asking here hoping someone has the solution.

The problem is the following:

given an image is it possible to find the "real sizes", intended as the projection onto the horizontal plane of the road, of a rectangle drawn in an image?

I have probably any information needed to solve this problem:

  • focal of the camera in millimetres mm
  • height of the camera compared to the road surface in cm
  • inclination camera compared to the road surface in °
  • sensor size in inches
  • sizes of the image in px
  • sizes of the rectangle in px
  • position of the rectangle compared to the image in px retrieved by using openCV

However I miss the formulas to even start writing down the code.

Here 2 screenshots taken from Google. In both of them the rectangle is the same size, it has been only moved upward which creates another big issue: Perspective.

image 1

image 2

By looking around noone seems to even mention the projection onto the horizontal plane neither perspective. Honestly I can't even figure out where to start, I've never dealt with something like this before. If I didn't make everything clear, don't hesitate asking.

EDIT 1: Here there are 2 pictures for better understanding the problem.

  • The first one as a context image:

Context

  • The second one as a visual example from https://docs.opencv.org/4.x/d9/d0c/group__calib3d.html of what is the point of all of this (sorry for the low quality level). The blue indicates the road, while the orange rectangle is the projection of the red one on the road plane:

Visual example

1

There are 1 answers

0
MvG On

Your problem essentially boils down to “what is the position on the ground that corresponds to any given pixel coordinate position” because once you have that, determining ground positions for four corners of a quadrilateral is easy.

If you assume a pinhole camera, you have a direct ray of light from a point on the ground through the pinhole to the sensor. So the most important part is to embed your sensor coordinate system correctly into 3d space.

  1. Start with a rectangle of the known pixel dimensions of the sensor.
  2. Scale it to the physical dimensions of the sensor. So the scale factor is essentially the inverse of pixel density.
  3. Shift it so that the center of the sensor is at the origin. We assume the optical axis of the camera is through the center.
  4. Offset it by the physical focal length (not the 35mm equivalent focal length) perpendicular to the sensor plane. Now the sensor is facing the pinhole at the right distance.
  5. Rotate to the known inclination of the camera.
  6. Model the ground as a horizontal plane with the distance to the origin matching the height of the camera.

Steps 1 through 5 should allow you to take (x, y) positions in the resulting image and turn them into 3d positions of the corresponding point on the correctly positioned sensor. Any multiple of that 3d vector is a point on the line spanned by the point on the sensor and the pinhole. Pick the multiple that also lies in the ground plane, i.e. has the one non-zero coordinate match that of the ground. It is your position on the ground.

All of this will take care of perspective by having different positions in the image correspond to different positions on the ground so that the same pixel distance on the sensor doesn't correspond to equal distances on the ground.

OpenCV might have some tools to help with all of this, but understanding how to correctly use those tools might be harder than multiplying 4 matrices to combine the basic transformations I outlined above. If you can't assume the camera to act like a pinhole camera, things become a lot more complicated, though.

Related topics: see Finding the transform matrix from 4 projected points for how to compute the matrix if instead of camera parameters you have 4 matching points, and see How to calculate true lengths from perspective projection? if instead of camera parameters you know the size of a visible object in the plane.