Combining facial landmarks from ios vision framework with depth images

369 views Asked by At

I am capturing depth images with an iphone truedepth camera and using the ios vision framework to find face landmarks in the image. The capture device resolution is 3088x2136 and the depth map is 640x480. We are trying to find the depth of all of the face landmark points but I cannot scale the landmarks down to match the depth map dimensions correctly.

This is the code I am currently using:

var landmarks = [
    lastFaceObservation.landmarks?.leftEye,
    lastFaceObservation.landmarks?.rightEye,
    lastFaceObservation.landmarks?.nose,
    lastFaceObservation.landmarks?.noseCrest,
    lastFaceObservation.landmarks?.medianLine,
    lastFaceObservation.landmarks?.faceContour
]
        
var landmarkNames = [
    "leftEye",
    "rightEye",
    "nose",
    "noseCrest",
    "medianLine",
    "faceContour"
]

var data = ""

let frameSize = CGSize(width: 640, height: 480)
        
for (index, landmark) in landmarks.enumerated() {
            
    for (pointIndex, point) in landmark!.normalizedPoints.enumerated() {
                
        var vectorPoint: simd_float2 = simd_float2(Float(point.x), Float(point.y))
                
        var pixel: CGPoint = VNImagePointForFaceLandmarkPoint(vectorPoint, lastFaceObservation.boundingBox, Int(captureDeviceResolution.width), Int(captureDeviceResolution.height))
                              
        let transform = CGAffineTransform(scaleX: 640 / captureDeviceResolution.width, y: 480 / captureDeviceResolution.height)

        pixel = pixel.applying(transform)
                
        var pixelX = pixel.x
        var pixelY = pixel.y
                
        let Z = depthPointer[Int(Float(pixelY) * Float(CGFloat(width)) + Float(pixelX))]
        let X = (Float(pixelX) - principalPointX) * Z / focalX
        let Y = (Float(pixelY) - principalPointY) * Z / focalY
                
        data.append("\(landmarkNames[index]), \(pointIndex), \(X), \(Y), \(Z)\n")
            
    }
}

It is somehow not as simple as taking a landmark point on the full-size image, and scaling it down into a 640x480 image because this. When I run this code and plot the resulting depth cloud and landmark points I get the following:

FRONT: front

BACK (reverse side of the front face: back

And some of the landmarks are way off as you can see here from the zoomed out picture of the back: zoomed

The side-on view of the model shows the extent of the misalignment: side

I have tried scaling the pixel values down at various points but that didn't work. I've also tried removing the principal point offset but without success. I think there is something wrong with the affine transform but I'm not sure how to correct it. I would expect to see the landmark points lining up correctly with the depth cloud

0

There are 0 answers