Is it feasible to perform sound recognition training on a mobile device?

123 views Asked by At

There is a body of literature concerning the categorization of sounds where the possible matches would be any sound found in the modern world (for instance: http://projects.csail.mit.edu/soundnet/). This question is different in that it's limited to searching just a handful of specific sounds, recorded and trained locally. This question is about the feasibility of coding a mobile application that would record and convert a small set of sounds (say, fewer than 10), then be able to "listen" for, and identify those sounds.

In this similar, unanswered SO question, the author gives the sound of a doorbell as an example. My example would be a bit different in that I'd like to categorize vocalizations of dogs. I might define "fido bark", "rover bark", "fido whine", "rover whine", so four buttons when the app was in training mode. Then the dogs would make their sounds, and the human user would categorize each sound. The app would then be changed to listening mode, and if a certain dog made a certain vocalization, the app would match the sound and display which dog, and which vocalization occurred.

Is it feasible to code a application, such as the one outlined above, on a typical mobile device, without external processing? If so, how?

2

There are 2 answers

1
OfirD On

It's doable. I found an article that deployed sound-based bird classification model to iOS, using Core ML and Skafos libraries: Detecting Bird Sounds with Create ML, CoreML3, and Skafos.

So it can be done with dogs as well, assuming you've got the data and then a trained model.

0
Dale On

In order to perform analysis on audio using a mobile device requires the same techniques as offline analysis (typically found: spectrogram, frequency shift, CNN classifier, ensembling), but under the more resource and time constrained restrictions of a mobile device.

The process of training the model is probably best done offline, only then will the model be deployed to the mobile device. On mobile devices, there are often efficient ways (libraries) that allow image matching / comparison. By converting audio to a spectrograph, these same comparison techniques can be leveraged.

More specifically, training offline with TensorFlow and deploying to Android has been described here: Net Guru blog post: Audio Classification with Machine Learning – Implementation on Mobile Devices. That post also describes the more involved steps required to get the model deployed to iOS. Additionally, the jlibrosa is an open source library to help implement some of the steps of audio processing.

Vasanthkumar Velayudham has written several articles that would be a good place to start understanding the landscape of apps in this realm, for instance on heartbeat.fritz.ai and on medium.com