Effect of window shifting in spectrogram?

803 views Asked by At

I am quite new in DSP and generating the spectrogram of an audio file. My spectrogram is not smooth and it is showing the quite raw image with pixel values, something like this

enter image description here

While I am looking for a smooth spectrogram like this

enter image description here

Where I am doing mistake? Is it because of size of window size? My code to generate Mel spectrogram is

def readData(file):
    origData,origSampFreq = librosa.load(file, sr=None)
    return origData, origSampFreq


def resample(originalData, origSampFreq, targetSampFreq):
    resampledData = librosa.resample(originalData, origSampFreq, targetSampFreq)
    return resampledData


def normalizeSound(resampledData, axis):
    """ Axis is 0 for row-wise and 1 
    for column wise"""
    normalizedData = normalize(resampledData, axis)
    return normalizedData

def calculateMelSpectogram(normalizedData, hop_length, win_length, sr):
    #newSamplingFreq = 16000
    S=librosa.feature.melspectrogram(normalizedData, sr=sr, hop_length=hop_length, win_length=win_length)
    return S

#Plot melspectogram

def plotMelSpectogram(S, sr, name, ref=np.max):
    plt.figure(figsize=(10,3))
    S_dB = librosa.power_to_db(S, ref=np.max)
    librosa.display.specshow(S_dB, x_axis='time',y_axis='mel', sr=16000,)
    plt.colorbar(format='%+2.0f dB')
    plt.title('Mel-frequency spectrogram')
    plt.savefig('./chunk_images/' + name + "mel.png",dpi=(300), bbox_inches='tight')
    plt.tight_layout()
    plt.show()
def featureExtraction(audioFile, name, targetSampFreq = 16000, 
                      axis =0 , 
                      hop_length= 256,
                      win_length=512):
    
    y, y_sr = readData(file=audioFile)
    print(y, y_sr)
    resampledData = resample(originalData=y, origSampFreq=y_sr, targetSampFreq=targetSampFreq)
    normalizedData = normalizeSound(resampledData, axis=axis)
    S = calculateMelSpectogram(normalizedData=normalizedData, hop_length=hop_length, win_length=win_length, sr=targetSampFreq)
    plotSound(soundData=normalizedData, sr=targetSampFreq,x_axis_string='time' , name = name)
    plotMelSpectogram(S, sr=targetSampFreq, name = name, ref=np.max)
    return S

# plot orginal time domain data

def plotSound(soundData, sr, x_axis_string, name):
    plt.figure(figsize=(10,3))
    waveplot(soundData, sr, x_axis=x_axis_string)
    plt.savefig('./chunk_images/' + name + "sound.png",dpi=(300), bbox_inches='tight')
1

There are 1 answers

0
Jon Nordby On

The time-resolution of a mel-spectrogram is specified by the hop_length. 256 samples at 16kHz is 16 ms, a rather high resolution. Lower values means higher resolution. You can achieve some smoothing by using a n_fft which is larger than hop_length. The default is n_fft = 4x hop_length, where as you have only 2x.

The frequency resolution is given by n_mels, which you have not specified. Higher gives higher resolution. It is often in the range 32-256 bands, typical value being 128 (also the default in librosa). If you want even higher than that along frequency axis, you are better off using only the STFT - without applying Mel filterbank.

Btw, you are not passing hop_length to librosa.display.specshow, so the time-axis is likely of the plot is likely to be wrong.