Use libxtract or other small C, C++ library for VAD functionality

2.9k views Asked by At

I try to create speaker identification system on Android. Currently I'm using libxtract to calculate MFCC vector from frames and libsvm for classify.

Do you have any idea how to use libxtract or other small C, C++ library that I can compile under NDK to detect voice (VAD Voice Activity Detection) in frames?

3

There are 3 answers

0
j b On

Robust VAD is a non-trivial problem, and there are many approaches.

The approach you take depends on factors such as:

  • the specifics of your application context and how your application will be used
  • what sort of assumptions you can make about the audio you will be processing (what types of background noise or non-voice audio you can expect)
  • whether or not your system needs to operate in real-time

A simple approach might involve taking a "bag of features" (e.g. f0, noisiness, magnitudes of first 10 partials) post-noise reduction for each audio frame, and training a machine learning algorithm (SVM would suffice) with a wide selection of voice and non-voice exemplars.

However, it is probably best not to treat VAD a a simple framewise audio classification problem, but rather to take time varying aspects of the audio into account. This will give you a better estimate of where speech segments begin and end. For this you could use an envelope follower or spectral flux. You could set a high and low threshold on these envelope values, and use these (for example) to control a gate on the audio stream.

0
Charles On

How about LibVAD? www.libvad.com

Seems like that does exactly what you're describing.

Disclosure: I'm the developer behind LibVAD

0
ruoho ruotsi On

The Voicebox toolkit has a good VAD implementation, using a few of the techniques that Jamie describes. You can find it in vadsohn.m which implements "A Statistical Model-Based Voice Activity Detection" (1999) - by Sohn, et al.

You can also find some implementations, say of the G729 codec VAD (used in VOIP applications) on github. For example this masters thesis.

These implementations are in MATLAB/Octave, but can be ported to C/C++ with a bit of work. Good luck!