I am using OpenCV for visual odometry. I have a video of a road taken from a monocular camera mounted on a moving car. I would like to obtain the translation vectors between frames.
What I have done so far:
- I obtained matches of keypoints between one frame and the next one.
- I then used
recoverPose
and got the Rotation matrices, the translation vectors (up to scale) and some three dimensional points coordinates.
My issue is that with just two frames I cannot recover the real
translation vector (just the direction). But if I find the same point
in three different frames and triangulate its 3D coordinates with
respect to the first and second reference frame I think I can retrieve
the real translation vector between the camera at t0
and t1
as the
difference of the coordinates which I find in the two frames.
Is the above statement correct?
Of course it would be better to have multiple points and some kind of voting method. I just want to know if the method is feasible or I am missing some fundamental problem.
It is incorrect. You can only recover translation vector up to unknown scale factor from the set of images regardless of technique used. You need another source of real-world information to recover correct scale and get real translation vector. See my answer to similar question.