I have been capturing a photo of my face every day for the last couple of months, resulting in a sequence of images taken from the same spot, but with slight variations in orientation of my face. I have tried several ways to stabilize this sequence using Python and OpenCV, with varying rates of success. My question is: "Is the process I have now the best way to tackle this, or are there better techniques / order to execute things in?"
My process so far looks like this:
- Collect images, keep the original image, a downscaled version and a downscaled grayscale version
- Using
dlib.get_frontal_face_detector()
on the grayscale image, get a rectangleface
containing my face. - Using the
dlib
shape-predictor68_face_landmarks.dat
, obtain the coordinates of the 68 face landmarks, and extract the position of eyes, nose, chin and mouth (specifically landmarks 8, 30, 36, 45, 48 and 54) - Using a 3D representation of my face (i.e. a
numpy
array containing 3D coordinates of an approximation of these landmarks on my real actual face in an arbitrary reference frame) andcv2.solvePnP
, calculate a perspective transform matrixM1
to align the face with my 3D representation - Using the transformed face landmarks (i.e.
cv2.projectPoints(face_points_3D, rvec, tvec, ...)
with_, rvec, tvec = cv2.solvePnP(...)
), calculate the 2D rotation and translation required to align the eyes vertically, center them horizontally and place them on a fixed distance from each other, and obtain the transformation matrixM2
. - Using
M = np.matmul(M2, M1)
andcv2.warpPerspective
, warp the image.
Using this method, I get okay-ish results, but it seems the 68 landmark prediction is far from perfect, resulting in twitchy stabilization and sometimes very skewed images (in that I can't remember having such a large forehead...). For example, the landmark prediction of one of the corners of the eye not always aligns with the actual eye, resulting in a perspective transform with the actual eye being skewed 20px down.
In an attempt to fix this, I have tried using SIFT
to find features in two different photos (aligned using above method) and obtain another perspective transform. I then force the features to be somewhere around my detected face landmarks as to not align the background (using a mask in cv2.SIFT_create().detectAndCompute(...)
), but this sometimes results in features only (or predominantly) being found around only one of the eyes, or not around the mouth, resulting again in extremely skewed images.
What would be a good way to get a smooth sequence of images, stabilized around my face? For reference, this video (not mine, which is stabilized around the eyes).