implementing object detection like openCV

850 views Asked by At

I'm trying to implement the Viola-Jones algorithm for object detection using Haar cascades (like openCV's implementation) in C, to detect faces. I writing the C code in a Vivado HLS compatible way, so I can port the the implementation to an FPGA. My main goal is to learn as much as possible, rather than just getting it to work. I would also appreciate any help with improving my question.

I basically started reading G. Bradski's Learning openCV, watched some online tutorials and got started writing the code. Sure enough its not detecting faces and I don't know why. At this point I care more about understanding my mistakes rather than beeing able to detect faces.

My Implementation Steps

I'm not sure how much detail is appropriate, but to keep it short:

  • Extracting Haar cascade data from haarcascade_frontalface_default.xml to C readable structures (huge arrays)
  • Writing a function to create an integral image of any given 8bit greyscale image of size 24x24 (same size as listed in the cascade)
  • Applying knowledge from this great post to make the necessary calculations

My Testing Scheme

  • Implementing a python script to detect faces using the openCV library with the same Haar cascade as mentioned above to create golden data, a detected face is cut out (ensuring 24x24 size) from the image and stored.
  • Stored images are converted to one dimensional C arrays, containing pixel values row-wise: img = {row0col0, row0col1, row1col0, row1col1, ... }
  • integral image is calculated and face detection applied

Result

  • Faces pass only 6 from 25 stages of the Haar cascade and are therefore not detected by my implementation, where I know they should have been detected since the python script with openCV and the same Haar cascade did indeed detect them.

My Code

 /*
 * This is detectFace.c
 */

#include <stdio.h>
#include "detectFace.h"

// define constants based on Haar cascade in use
// Each feature is made of max 3 rects
//#define FEAT_NO 1     // max no. of features (= 2912 for face_default.xml)
#define RECTS_IN_FEAT 3 // max no. of rect's per feature
//#define INTS_IN_RECT 5    // no. of int's needed to describe a rect
// each node has one feature (bijective relation) and three doubles
#define STAGE_NO 25 // no. of stages
#define NODE_NO 211 // no of nodes per stage, corresponds to FEAT_NO since each Node has always one feature in haarcascade_frontalface_default.xml
//#define ELMNT_IN_NODE 3   // no. of doubles needed to describe a node

// constants for frame size
#define WIN_WIDTH 24 // width = height =24

//int detectFace(int features[FEAT_NO][RECTS_IN_FEAT][INTS_IN_RECT], double stages[STAGE_NO][NODE_NO][ELMNT_IN_NODE], double stageThresh[STAGE_NO], int ii[24][24]){
int detectFace(
    int ii[576],
    int stageNum,
    int stageOrga[25],
    float stageThresholds[25],
    float nodes[8739],
    int featOrga[2913],
    int rectangles[6383][5])
{
    int passedStages = 0; // number of stages passed in this run
    int faceDetected = 0; // turns to 1 if face is detected and to 0 if its not detected
    // Debug:
    int nodesUsed = 0; // number of floats out of nodes[] processed, use to skip to the unprocessed floats
    int rectsUsed = 0; // number of rects processed
    int droppedInStage0 = 0;

    // loop through all stages
    int i;
detectFace_label1:
    for (i = 0; i < STAGE_NO; i++)
    {
        double tmp = 0.0;           //variable to accumulate node-values, to then compare to stage threshold
        int nodeNum = stageOrga[i]; // get number of nodes for this stage from stageOrga using stage index i
        // loop through nodes inside each stage
        // NOTE: it is assumed that each node maps to one corresponding feature. Ex: node[0] has feat[0) and node[1] has feat[1]
        // because this is how it is written in the haarcascade_frontalface_default.xml
        int j;
    detectFace_label0:
        for (j = 0; j < NODE_NO; j++)
        {
            // a node is defined by 3 values:
            double nodeThresh = nodes[nodesUsed]; // the first value is the node threshold
            double lValue = nodes[nodesUsed + 1]; // the second value is the left value
            double rValue = nodes[nodesUsed + 2]; // the third value is the right value
            int sum = 0;                          // contains the weighted value of rectangles in one Haar feature
            // loop through rect's in a feature, some have 2 and some have 3 rect's.
            // Each node always refers to one feature in a way that node0 maps to feature0 and node1 to feature1 (The XML file is build like that)
            //int rectNum = featOrga[j]; // get number of rects for current feature using current node index j
            int k;
        detectFace_label2:
            for (k = 0; k < RECTS_IN_FEAT; k++)
            {
                int x = 0, y = 0, width = 0, height = 0, weight = 0, coordUpL = 0, coordUpR = 0, coordDownL = 0, coordDownR = 0;

                // a rect is defined by 5 values:
                x = rectangles[rectsUsed][0];      // the first value is the x coordinate of the top left corner pixel
                y = rectangles[rectsUsed][1];      // the second value is the y coordinate of the top left corner pixel
                width = rectangles[rectsUsed][2];  // the third value is the width of the current rectangle
                height = rectangles[rectsUsed][3]; // the fourth value is the height of this rectangle
                weight = rectangles[rectsUsed][4]; // the fifth value is the weight of this rectangle

                // calculating 1-Dim index for points of interest. Formula: index = width * row + column, assuming values are stored in row order
                coordUpL = ((WIN_WIDTH * y) - WIN_WIDTH) + (x - 1);
                coordUpR = coordUpL + width;
                coordDownL = coordUpL + (height * WIN_WIDTH);
                coordDownR = coordDownL + width;

                // calculate the area sum according to Viola-Jones
                //sum += (ii[x][y] + ii[x+width][y+height] - ii[x][y+height] - ii[x+width][y]) * weight;
                sum += (ii[coordUpL] + ii[coordDownR] - ii[coordUpR] - ii[coordDownL]) * weight;
                // Debug: counting the number of actual rectangles used
                rectsUsed++; //
            }
            // decide whether the result of the feature calculation reaches the node threshold
            if (sum < nodeThresh)
            {
                tmp += lValue; // add left value to tmp if node threshold was not reached
            }
            else
            {
                tmp += rValue; // // add right value to tmp if node threshold was reached
            }
            nodesUsed = nodesUsed + 3; // one node is processed, increase nodesUsed by number of floats needed to represent a node (3)¬
        }
        //########  at this point we went through each node in the current stage #######
        // check if threshold of current stage was reached
        if (tmp < stageThresholds[i])
        {
            faceDetected = 0; // if any stage threshold is not reached the operation is done and no face is present
            // Debug: show in which stage the frame was dropped
            printf("Face detection failed in stage %d \n", i);
            //i = stageNum;         // breaks out this loop, because i is supposed to stay smaller than STAGE_NO
        }
        else
        {
            passedStages++; // stage threshold is reached, therefore passedStages will count up
        }
    }
    //########  at this point we went through all stages ###############################
    //----------------------------------------------------------------------------------
    // if the number of passed stages reaches the total number of stages, a face is detected
    if (passedStages == stageNum)
    {
        faceDetected = 1; // one symbolizes that the input is a face
    }
    else
    {
        faceDetected = 0; // zero symbolizes that the input is not a face
    };
    return faceDetected;
}
0

There are 0 answers