I have one main folder that contain of 20 sub-folders. And any sub-folders have 6 sub-folders again (20 speaker, any speaker voice (*.wav) classifieds to 6 class).

I want to read all of *.wav files and feature extraction. feature extraction is input of my training model for Neural Network.

How can i read and feature extraction of all .wav files?

All of classes must training together? how?

My code for reading wav files from main folder as follows (but this code read only one sub-folder):

import os
import scipy.io.wavfile as wav

r_dir = '/my path/'

data = []
rate = []
for root,sub,files in os.walk(r_dir):
    files = sorted(files)
    for f in files:
        s_rate, x = wav.read(os.path.join(root, f))
        rate.append(s_rate)
        data.append(x)

And for feature extraction i use this code ( i want feature extraction for all of my sub-folders and wav files):

from python_speech_features import fbank
import scipy.io.wavfile as wav

(rate,sig)=wav.read("/my path for one .wav file")
fbank_feat = fbank(sig,rate)

print(fbank_feat)

I'm so confused. Please help me how can i do, step by step.

Thanks.

2 Answers

0
Sharan Arumugam On

glob is even better when used with pathlib.Path.

from pathlib import Path

path = Path('D:\\test path').glob('**/*.wav')
wavs = [str(wavf) for wavf in path if wavf.is_file()]

print(wavs)

yields

D:\test path\a..wav
D:\test path\b.wav
D:\test path\sub 1\1a..wav
D:\test path\sub 1\1b.wav
D:\test path\sub 1\nest a\aaa..wav
D:\test path\sub 1\nest a\bbb.wav
D:\test path\sub 2\2a..wav
D:\test path\sub 2\2b.wav
0
mf Al Fafa On

To read all *.wav files in a directory and sub directory, you can use the following:

#Read all *.wav files inside dir & sub dir
import xlrd
import os
import scipy.io.wavfile as wav

mydir = (os.getcwd()).replace('\\','/') + '/'

#Get all *wav files include subdir
filelist=[]
for path, subdirs, files in os.walk(mydir):
    for file in files:
        if (file.endswith('.wav') or file.endswith('.WAV')):
            filelist.append(os.path.join(path, file))
number_of_files=len(filelist)
print(filelist)

wav_data=[]
for i in range(number_of_files):
    #extract all *.wav files here
    samplerate, data = wav.read(filelist[i])
    wav_data.append(data)
print(wav_data)

My directory:

Output:

['D:/SOF/answer30/file_example_WAV_5MG.wav', 'D:/SOF/answer30/subdir1\\file_example_WAV_1MG.wav', 'D:/SOF/answer30/subdir2\\file_example_WAV_2MG.wav']

WAV Data:

[array([[-204,   23],
       [-232,   32],
       [-192,   34],
       ...,
       [4938, 4256],
       [4974, 3977],
       [4734, 3798]], dtype=int16), array([[ -114,    23],
       [ -241,     3],
       [ -285,   -29],
       ...,
       [ -772, -1059],
       [ -422,  -840],
       [ -787,  -314]], dtype=int16), array([[ -139,    18],
       [ -215,    34],
       [ -196,     6],
       ...,
       [ -523,  -563],
       [ -765,  -319],
       [-1002,  -190]], dtype=int16)]