MFCC produces "ValueError: index can't contain negative values" for parsing wav file

2.5k views Asked by At

On using a general code in order to extract scaled MFCC data:

def extract_features(file_name):

try:
    audio, sample_rate = librosa.load(file_name, res_type='kaiser_fast') 
    mfccs = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=40)
    mfccsscaled = np.mean(mfccs.T,axis=0)
    
except Exception as e:
    print("Error encountered while parsing file: ", file)
    return None 
 
return mfccsscaled

Example code being used on single file:

max_pad_len = 174
file_name = '201-AWCKARAK47Close0116BIT.wav'
audio, sample_rate = librosa.load(file_name, res_type='kaiser_fast', sr=None)
mfccs = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=40)
pad_width = max_pad_len - mfccs.shape[1]
mfccs = np.pad(mfccs, pad_width=((0, 0), (0, pad_width)), mode='constant')
mfccsscaled

I get the following error being thrown:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-26-118328675a5f> in <module>
      4 mfccs = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=40)
      5 pad_width = max_pad_len - mfccs.shape[1]
----> 6 mfccs = np.pad(mfccs, pad_width=((0, 0), (0, pad_width)), mode='constant')
      7 mfccsscaled

<__array_function__ internals> in pad(*args, **kwargs)

c:\python\lib\site-packages\numpy\lib\arraypad.py in pad(array, pad_width, mode, **kwargs)
    746 
    747     # Broadcast to shape (array.ndim, 2)
--> 748     pad_width = _as_pairs(pad_width, array.ndim, as_index=True)
    749 
    750     if callable(mode):

c:\python\lib\site-packages\numpy\lib\arraypad.py in _as_pairs(x, ndim, as_index)
    517 
    518     if as_index and x.min() < 0:
--> 519         raise ValueError("index can't contain negative values")
    520 
    521     # Converting the array with `tolist` seems to improve performance

ValueError: index can't contain negative values

Can you tell me why this error is being thrown and how to work around it?

BACKGROUND

I an using files obtained from https://www.boomlibrary.com/. Most of the files are 24bit depth. I tried to downsample (to 16bit) and also upsample (to 32bit) the original wav files. Even passing both of the files through librosa, the min~max data does not conform to [-1,1]. I get Librosa audio file min~max range: -1.2105241 to 1.2942984. Not sure if this bit of data will help in converging to a resolution to my question. Thanks!

1

There are 1 answers

0
Lukasz Tracewski On BEST ANSWER

You are padding with negative values, as indicated by the exception.

The problem stems from this line:

pad_width = max_pad_len - mfccs.shape[1]

The mfccs.shape[1] is proportional to the audio length and depends on hop length that is used for computing the mfcc. By default the hop_length is 512.

The audio in question is 201-AWCKARAK47Close0116BIT.wav, a roughly 45 second long clip sampled at 96kHz. A back of the envelope calculation tells us that the number of MFCCs that you will get for this audio file is:

45 second * (96000 samples / second)  / 512 samples ~ 8500 

In turn:

pad_width = max_pad_len - mfccs.shape[1] = 174 - 8500 => NEGATIVE NUMBER