I am trying to build a real-time face detector that should meet these requirements:
- Fast
- As much accurate as possible even in darker conditions of the enviroment
- I can easly add a face recognition model after its detection
These requirements led me to build a script that captures frames from my webcam and a model MTCNN analyzes those frames and eventually gives the output. This solution allows me to get a decent accuracy, even in darker enviroments, and to detect a face in around 3.6 seconds (analyzing 3 frames at time, which seems a good deal between accuracy and speed). This is the code I actually use:
import cv2
from time import sleep
from deepface import DeepFace
import os
import datetime
frames_analyzed = 3
backends = [
'opencv',
'ssd',
'dlib',
'mtcnn',
'retinaface',
'mediapipe',
'yolov8',
'yunet',
]
while True:
i = 0
start_time = datetime.datetime.now()
while i < frames_analyzed:
vidcap = cv2.VideoCapture(0)
if vidcap.isOpened():
ret, frame = vidcap.read()
if ret:
cv2.imwrite("Frames/"+str(i)+".jpg",frame)
else:
print("Error : Failed to capture frame")
else:
print("Cannot open camera")
vidcap.release()
i = i+1
face_detected = 0
for i in range(frames_analyzed-1):
try:
DeepFace.extract_faces(img_path = "Frames/"+str(i)+".jpg", detector_backend = backends[3])
face_detected = face_detected+1
except:
continue
os.remove("Frames/"+str(i)+".jpg")
end_time = datetime.datetime.now()
print(face_detected, (end_time-start_time).total_seconds())
These are my questions:
- Is my approach reasonable to real-time face detection?
- Is there any change I can apply to get a better deal between fast and accuracy?
Default face recognition model is VGG-Face. Try to use Facenet or ArcFace model to have a faster results in your real time application.
The default detector is opencv which is fast but coming with low precision. Mediapipe gives better results than opencv but still mtcnn is the best option according to the speed and accuracy.