Combining facial landmarks from ios vision framework with depth images

357 views Asked by joshua jackson At 22 April 2021 at 08:04

I am capturing depth images with an iphone truedepth camera and using the ios vision framework to find face landmarks in the image. The capture device resolution is 3088x2136 and the depth map is 640x480. We are trying to find the depth of all of the face landmark points but I cannot scale the landmarks down to match the depth map dimensions correctly.

This is the code I am currently using:

var landmarks = [
    lastFaceObservation.landmarks?.leftEye,
    lastFaceObservation.landmarks?.rightEye,
    lastFaceObservation.landmarks?.nose,
    lastFaceObservation.landmarks?.noseCrest,
    lastFaceObservation.landmarks?.medianLine,
    lastFaceObservation.landmarks?.faceContour
]
        
var landmarkNames = [
    "leftEye",
    "rightEye",
    "nose",
    "noseCrest",
    "medianLine",
    "faceContour"
]

var data = ""

let frameSize = CGSize(width: 640, height: 480)
        
for (index, landmark) in landmarks.enumerated() {
            
    for (pointIndex, point) in landmark!.normalizedPoints.enumerated() {
                
        var vectorPoint: simd_float2 = simd_float2(Float(point.x), Float(point.y))
                
        var pixel: CGPoint = VNImagePointForFaceLandmarkPoint(vectorPoint, lastFaceObservation.boundingBox, Int(captureDeviceResolution.width), Int(captureDeviceResolution.height))
                              
        let transform = CGAffineTransform(scaleX: 640 / captureDeviceResolution.width, y: 480 / captureDeviceResolution.height)

        pixel = pixel.applying(transform)
                
        var pixelX = pixel.x
        var pixelY = pixel.y
                
        let Z = depthPointer[Int(Float(pixelY) * Float(CGFloat(width)) + Float(pixelX))]
        let X = (Float(pixelX) - principalPointX) * Z / focalX
        let Y = (Float(pixelY) - principalPointY) * Z / focalY
                
        data.append("\(landmarkNames[index]), \(pointIndex), \(X), \(Y), \(Z)\n")
            
    }
}

It is somehow not as simple as taking a landmark point on the full-size image, and scaling it down into a 640x480 image because this. When I run this code and plot the resulting depth cloud and landmark points I get the following:

FRONT:

BACK (reverse side of the front face:

And some of the landmarks are way off as you can see here from the zoomed out picture of the back:

The side-on view of the model shows the extent of the misalignment:

I have tried scaling the pixel values down at various points but that didn't work. I've also tried removing the principal point offset but without success. I think there is something wrong with the affine transform but I'm not sure how to correct it. I would expect to see the landmark points lining up correctly with the depth cloud

Original Q&A

TechQA.

Combining facial landmarks from ios vision framework with depth images

There are 0 answers

Related Questions in SWIFT

Related Questions in COMPUTER-VISION

Related Questions in TRUEDEPTH-CAMERA

Popular Questions

Popular Tags

Trending Questions