ffmpeg mp3 chunk to wav chunk adds gap in the start of the audio

144 views Asked by At

I have an mp3 streaming from a URL, I save the chunks in 1024 byes buffer size. After I get all the chunks, I'm using ffmpeg to convert the incoming mp3 chunk (22050 mono) to a wav chunk.

When I open/play the wav chunk I see that there is an empty gap at the start of every chunk.

here is the code I'm running in Python subprocess in a loop for all the saved chunks

subprocess.run(["ffmpeg", "-i",
    f"{Path.cwd()}/input/{path}",
    f"{Path.cwd()}/temp_output/{path.replace('.mp3', '')}.wav"
])

here is the output in the terminal

processing: test-016.mp3
ffmpeg version 6.0 Copyright (c) 2000-2023 the FFmpeg developers
  built with Apple clang version 15.0.0 (clang-1500.0.40.1)
  configuration: --prefix=/usr/local/Cellar/ffmpeg/6.0_1 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags='-Wl,-ld_classic' --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libaribb24 --enable-libbluray --enable-libdav1d --enable-libjxl --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libsvtav1 --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox --enable-audiotoolbox
  libavutil      58.  2.100 / 58.  2.100
  libavcodec     60.  3.100 / 60.  3.100
  libavformat    60.  3.100 / 60.  3.100
  libavdevice    60.  1.100 / 60.  1.100
  libavfilter     9.  3.100 /  9.  3.100
  libswscale      7.  1.100 /  7.  1.100
  libswresample   4. 10.100 /  4. 10.100
  libpostproc    57.  1.100 / 57.  1.100
[mp3 @ 0x7fd48e104480] Format mp3 detected only with low score of 25, misdetection possible!
[mp3 @ 0x7fd48e104480] Skipping 463 bytes of junk at 0.
[mp3 @ 0x7fd48e104480] Estimating duration from bitrate, this may be inaccurate
Input #0, mp3, from '/Users/mayur/Projects/input/test-016.mp3':
  Duration: 00:00:00.39, start: 0.000000, bitrate: 169 kb/s
  Stream #0:0: Audio: mp3, 22050 Hz, mono, fltp, 160 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (mp3 (mp3float) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to '/Users/mayur/Projects/temp_output/test-016.wav':
  Metadata:
    ISFT            : Lavf60.3.100
  Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 22050 Hz, mono, s16, 352 kb/s
    Metadata:
      encoder         : Lavc60.3.100 pcm_s16le
size=      17kB time=00:00:00.36 bitrate= 379.7kbits/s speed= 253x    
video:0kB audio:17kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.451389%

I tried the pydub as well and faced similar issue.

1

There are 1 answers

4
Colin Andrews On

Audio compression algorithms operate on single blocks of data a time. One block of compressed data is decompressed into a buffer of raw digital audio data of a fixed duration. These blocks are referred to as packets, samples, or access units in different contexts. ffmpeg calls them packets. With MP3 and similar lossy compression strategies, the size of the input block varies due to the nature of the audio and the level of compression. With CBR compression, there is less variation in size compared to VBR, but the size of the blocks is still not constant.

What is happening with your approach is that your fixed input buffer size of 1024 bytes is not lining up with the packet boundaries. Packets are getting split up across buffer boundaries. When ffmpeg is spawned with a buffer from the middle of the stream, it has to skip ahead to find the start of the next packet. The packet that spanned the buffer boundary is getting lost because it can't decode a partial packet.

MP3 uses a magic byte sequence to mark the start of a new packet: 0xFF 0xFB. In order to not have the data lost, you need to find the last 0xFF 0xFB in the previous buffer and copy all the data from there to end of the buffer into the beginning of the next buffer.

Your audio is still not going to sound right though. With nearly every audio compression strategy each packet after the first one depends on some info from the previous packet in order to sound right. The decoder saves some information from the previous packet and uses that info while decoding the next packet. Because you are spawning separate ffmpeg processes from each buffer, the info from the previous packet is lost. This will cause the start of the WAV files to sound a little wrong sometimes when played.

What you really need to do is to append the new buffers on to a single stream and have single ffmpeg process decode the whole thing. I assume you want to do it as you are decoding it so you probably don't just want to download the whole thing then decode it all at once. I think that ffmpeg can decode from some inter-process and network sources. Maybe you can spawn the ffmpeg process then append the bytes to a pipe or a local network port.

For more info on how to do this I would checkout this question about how to pipe input into ffmpeg and this one about how to write to a subprocess stdin.