Getting soundfile.LibsndfileError: Error opening 'speech.wav': Format not recognized when giving 2D numpy array to soundfile

12.4k views Asked by At

Tried generating audio from tensors generated from NVIDIA TTS nemo model before running into the error:

Here is the code for it:

import soundfile as sf

from nemo.collections.tts.models import FastPitchModel
from nemo.collections.tts.models import HifiGanModel

spec_generator = FastPitchModel.from_pretrained("tts_en_fastpitch")
vocoder = HifiGanModel.from_pretrained(model_name="tts_hifigan")

text = "Just keep being true to yourself, if you're passionate about something go for it. Don't sacrifice anything, just have fun."
parsed = spec_generator.parse(text)
spectrogram = spec_generator.generate_spectrogram(tokens=parsed)
audio = vocoder.convert_spectrogram_to_audio(spec=spectrogram)
audio = audio.to('cpu').detach().numpy()

sf.write("speech.wav", audio, 22050)

Expected to get an audio file speech.wav

3

There are 3 answers

0
jlamperez On BEST ANSWER

Looking at your example I see that your audio shape is (1, 173056).

Based on https://github.com/bastibe/python-soundfile/issues/309 I have converted the audio to 1D array of size 173056 and worked fine.

Used code:

>>> import numpy as np
>>> sf.write("speech.wav", np.ravel(audio), sample_rate)

Regards,

0
Ayodeji Babalola On
x, _ = lib.load(path, sr=None, mono=True)
sf.write('new-file.wav', x, 4000) # for a file we want to write with 4k sample rate

check that mono == True so you load a stereo file.

The above code solves the problem. You need to check that the channels loaded correspond to the one you are trying to write.

0
wleong On

In case you really need the audio in stereo (like I did), transpose the array. Per soundfile documentation, the expected shape is (samples x channels).