I try to create speaker identification system on Android. Currently I'm using libxtract to calculate MFCC vector from frames and libsvm for classify.
Do you have any idea how to use libxtract or other small C, C++ library that I can compile under NDK to detect voice (VAD Voice Activity Detection) in frames?
Robust VAD is a non-trivial problem, and there are many approaches.
The approach you take depends on factors such as:
A simple approach might involve taking a "bag of features" (e.g. f0, noisiness, magnitudes of first 10 partials) post-noise reduction for each audio frame, and training a machine learning algorithm (SVM would suffice) with a wide selection of voice and non-voice exemplars.
However, it is probably best not to treat VAD a a simple framewise audio classification problem, but rather to take time varying aspects of the audio into account. This will give you a better estimate of where speech segments begin and end. For this you could use an envelope follower or spectral flux. You could set a high and low threshold on these envelope values, and use these (for example) to control a gate on the audio stream.