What algorithm is used for audio feature extraction in google's audioset?

Question

What algorithm is used for audio feature extraction in google's audioset?

800 views Asked by jerpint At 19 May 2017 at 22:48

I am getting started with Google's Audioset. While the dataset is extensive, I find the information with regards to the audio feature extraction very vague. The website mentions

128-dimensional audio features extracted at 1Hz. The audio features were extracted using a VGG-inspired acoustic model described in Hershey et. al., trained on a preliminary version of YouTube-8M. The features were PCA-ed and quantized to be compatible with the audio features provided with YouTube-8M. They are stored as TensorFlow Record files.

Within the paper, the authors discuss using mel spectrograms on 960 ms chunks to get a 96x64 representation. It is then unclear to me how they get to the 1x128 format representation used in the Audioset. Does anyone know more about this??

Original Q&A

There are 1 answers

**foxer lee** · Accepted Answer · 2018-08-13T06:34:09+00:00

foxer lee On 13 August 2018 at 06:34 BEST ANSWER

They use the 96*64 data as input for a modified VGG network.The last layer of VGG is FC-128, so its output will be 1*128, and that is the reason.

The architecture of VGG can be found here: https://github.com/tensorflow/models/blob/master/research/audioset/vggish_slim.py

TechQA.

What algorithm is used for audio feature extraction in google's audioset?

There are 1 answers

Related Questions in AUDIO

Related Questions in MACHINE-LEARNING

Related Questions in SOUND-RECOGNITION

Popular Questions

Popular Tags

Trending Questions