Effect of window shifting in spectrogram?

Question

Effect of window shifting in spectrogram?

798 views Asked by Aaditya Ura At 25 September 2020 at 23:03

I am quite new in DSP and generating the spectrogram of an audio file. My spectrogram is not smooth and it is showing the quite raw image with pixel values, something like this

While I am looking for a smooth spectrogram like this

Where I am doing mistake? Is it because of size of window size? My code to generate Mel spectrogram is

def readData(file):
    origData,origSampFreq = librosa.load(file, sr=None)
    return origData, origSampFreq


def resample(originalData, origSampFreq, targetSampFreq):
    resampledData = librosa.resample(originalData, origSampFreq, targetSampFreq)
    return resampledData


def normalizeSound(resampledData, axis):
    """ Axis is 0 for row-wise and 1 
    for column wise"""
    normalizedData = normalize(resampledData, axis)
    return normalizedData

def calculateMelSpectogram(normalizedData, hop_length, win_length, sr):
    #newSamplingFreq = 16000
    S=librosa.feature.melspectrogram(normalizedData, sr=sr, hop_length=hop_length, win_length=win_length)
    return S

#Plot melspectogram

def plotMelSpectogram(S, sr, name, ref=np.max):
    plt.figure(figsize=(10,3))
    S_dB = librosa.power_to_db(S, ref=np.max)
    librosa.display.specshow(S_dB, x_axis='time',y_axis='mel', sr=16000,)
    plt.colorbar(format='%+2.0f dB')
    plt.title('Mel-frequency spectrogram')
    plt.savefig('./chunk_images/' + name + "mel.png",dpi=(300), bbox_inches='tight')
    plt.tight_layout()
    plt.show()
def featureExtraction(audioFile, name, targetSampFreq = 16000, 
                      axis =0 , 
                      hop_length= 256,
                      win_length=512):
    
    y, y_sr = readData(file=audioFile)
    print(y, y_sr)
    resampledData = resample(originalData=y, origSampFreq=y_sr, targetSampFreq=targetSampFreq)
    normalizedData = normalizeSound(resampledData, axis=axis)
    S = calculateMelSpectogram(normalizedData=normalizedData, hop_length=hop_length, win_length=win_length, sr=targetSampFreq)
    plotSound(soundData=normalizedData, sr=targetSampFreq,x_axis_string='time' , name = name)
    plotMelSpectogram(S, sr=targetSampFreq, name = name, ref=np.max)
    return S

# plot orginal time domain data

def plotSound(soundData, sr, x_axis_string, name):
    plt.figure(figsize=(10,3))
    waveplot(soundData, sr, x_axis=x_axis_string)
    plt.savefig('./chunk_images/' + name + "sound.png",dpi=(300), bbox_inches='tight')

Original Q&A

There are 1 answers

**Jon Nordby** · Answer 1 · 2020-09-26T20:29:39+00:00

The time-resolution of a mel-spectrogram is specified by the hop_length. 256 samples at 16kHz is 16 ms, a rather high resolution. Lower values means higher resolution. You can achieve some smoothing by using a n_fft which is larger than hop_length. The default is n_fft = 4x hop_length, where as you have only 2x.

The frequency resolution is given by n_mels, which you have not specified. Higher gives higher resolution. It is often in the range 32-256 bands, typical value being 128 (also the default in librosa). If you want even higher than that along frequency axis, you are better off using only the STFT - without applying Mel filterbank.

Btw, you are not passing hop_length to librosa.display.specshow, so the time-axis is likely of the plot is likely to be wrong.

TechQA.

Effect of window shifting in spectrogram?

There are 1 answers

Related Questions in PYTHON

Related Questions in MATPLOTLIB

Related Questions in SIGNAL-PROCESSING

Related Questions in LIBROSA

Related Questions in SPECTROGRAM

Popular Questions

Popular Tags

Trending Questions