I'm trying to do 3D scene reconstruction and camera pose estimation on video input, however the camera positions are not matching what I am seeing in the video.
Here is the code I wrote to recover the pose and landmark positions
def SfM(self, points1, points2):
x = 800 / 2
y = 600 / 2
fov = 80 * (math.pi / 180)
f_x = x / math.tan(fov / 2)
f_y = y / math.tan(fov / 2)
# intrinsic camera matrix
K = np.array([[f_x, 0, x],
[0, f_y, y],
[0, 0, 1]])
#find fundamental matrix
E, mask = cv2.findFundamentalMat(np.float32(points2), np.float32(points1), cv2.FM_8POINT)
#get rotation matrix and translation vector
points, R, t, mask = cv2.recoverPose(E, np.float32(points2), np.float32(points1), K, 500)
#caculate the new camera position based on the translation, camPose is the previous camera position
self.cam_xyz.append([self.camPose[0] + t[0], self.camPose[1] + t[1], self.camPose[2] + t[2]])
#calculate the extrinsic matrix
C = np.hstack((R, t))
#calculate the landmark positions
for i in range(len(points2)):
#convert coordinates into a 3x1 array
pts2d = np.asmatrix([points2[i][0], points2[i][1], 1]).T
#calculate camera matrix
P = np.asmatrix(K) * np.asmatrix(C)
#find 3d coordinate
pts3d = np.asmatrix(P).I * pts2d
#add to list of landmarks
self.lm_xyz.append([pts3d[0][0] * self.scale + self.camPose[0],
pts3d[1][0] * self.scale + self.camPose[1],
pts3d[2][0] * self.scale + self.camPose[2]])
#update the previous camera position
self.camPose = [self.camPose[0] + t[0], self.camPose[1] + t[1], self.camPose[2] + t[2]]
When I passed in this video I got this as my output
I can't figure out why it is veering to right when the camera only heads straight in the video. I suspect that I am implementing the cv2.recoverPose
method incorrectly but I don't no what else I can do to make it better. I put the full code in a PasteBin in case anyone wants to replicate the program. Any help would be greatly appreciated. Thank you so much!
Shouldn't you calculate the essential matrix E with cv.findEssentialMatrix instead? In this way, you calculated the fundamental matrix F, but to recover the pose, you must pass E = K^T * F * K, w/ K = camera matrix