Different results of Griffin-Lim from librosa and torchaudio

156 views Asked by At

I'm trying to transform the spectrogram back to the audio. First I used librosa.griffinlim and it worked well, but it was time-consuming. Therefore I am trying to use torchaudio on GPU to boost the transformation. However I obtained different reconstruction results compared to the librosa.

This is my code:

# Preprocess
data, fs = librosa.load('waveform.wav', sr=44100)
b, a = signal.butter(3, [20 / fs, 1000 / fs], 'bandpass')
data = signal.filtfilt(b, a, data)
plt.plot(data)

# STFT
DMatrix = librosa.stft(data, n_fft=2048, hop_length=int(2048 * 0.1), window='hann')
dbMatrix = librosa.amplitude_to_db(np.abs(DMatrix), ref=np.max)

The original waveform

And I obtained results similar to the original waveform using librosa:

spec = librosa.db_to_amplitude(dbMatrix)
re_wav = librosa.griffinlim(spec, n_iter=100, n_fft=2048, hop_length=int(2048 * 0.1), window='hann')
plt.plot(re_wav)

librosa reconstruction

But when I changed to torchaudio, the result is different.

griffinlim = torchaudio.transforms.GriffinLim(n_fft=2048, n_iter=100, hop_length=int(2048 * 0.1)).to('cuda')
spec = librosa.db_to_amplitude(dbMatrix)
re_wav = griffinlim(torch.tensor(spec).to('cuda'))
plt.plot(re_wav.cpu().detach().numpy())

torchaudio reconstruction

What am I missing?

1

There are 1 answers

1
Jon Nordby On BEST ANSWER

There are three common representations for the values in a magnitude spectrogram: amplitude, power and decibel. The Griffin-Lim transform must be aware of this when converting back to a waveform. When using spec = librosa.db_to_amplitude(dbMatrix), the result is an amplitude spectrogram. In librosa.griffinlim the default is for an amplitude spectrogram, so you get a good reconstruction.

For torchaudio.transforms.GriffinLim the default is for a power spectrogram. In order make it work with an amplitude spectrogram, pass the argument power=1.