Emotion recognition matrix input to SVM

297 views Asked by At

I am going to build an emotion recognition system (based on Paul Ekman's emotion model = happiness, sadness, anger, fear, surprise, disgust, neutral) using STASM and OpenCV's SVM with video clips as the source.

However, I have no clue about what kind of data that should be fed for the training phase itself. I know that we have to input a Mat type to the SVM, but I'm wondering what should be included in the matrix.

For example, suppose we have landmark points that we get from STASM. Each landmark point has its own [x,y] coordinate. Furthermore, each face expression has many landmarks points, let's say we cover 17 landmark points. That means after knowing those 17 landmark points for one face expression, we will wrap this data to the first row of the matrix data type and feed it into the SVM and then the process is the same for other expressions (we should label them too but let's not focus on that part first).

My question:

  1. Is feeding the x and y coordinates of the landmark points for the system enough?

Intuitively I don't think it is enough. We should get some sort of displacement from a neutral expression to a 'peak' expression described in this paper here. For example, we can get the Euclidean distance of each landmark point from a neutral state to a happy state and feed the displacements of the coordinates to the matrix, instead of the coordinates.

But I feel like there is still something missing with this idea.

  1. If every expression is compared to the neutral state, then how can the machine know if someone's face is in a neutral state?

I am confused because I don't know to what should the neutral expression comparison be based on since other expressions actually use neutral expression as the comparison.

0

There are 0 answers