How to re-encode an audio to match another one, to avoid re-encoding the whole audio

45 views Asked by At

I have an audio editor in the browser using ffmpeg (WebAssembly), and I want to insert new audio into the existing audio without having to re-encode everything. Re-encoding everything takes a long time, especially in the browser, so I would like to only re-encode the inserted file, match it to the original one and concatenate them using the copy command.

On ffmpeg concatenate docs it says:

All files must have the same streams (same codecs, same time base, etc.)

But it is not clear what is meant by time base. So far I have observed I need to match:

  • codec
  • bit rate
  • sample rate
  • channels (mono, stereo)

Is there anything else I need to match so that the resulting audio is not corrupt/broken when concatenating?

I have observed with mp3 for example it has VBR, CBR, and ABR. If the original audio has a bit rate of 128 kb/s, I am assuming it is a CBR, so I match it with:

ffmpeg -i original.mp3
# > Stream #0:0: Audio: mp3, 44100 Hz, stereo, fltp, 128 kb/s

ffmpeg -i input.mp3 -b:a 128k -ar 44100 -ac 2 re_encoded.mp3

# then merge
# concat_list.txt contains the original audio and the re_encoded.mp3

ffmpeg -f concat -i concat_list.txt -safe 0 -c copy merged.mp3

And that works fine for CBR such as 8, 16, 24, 32, 40, 48, 64, 80, 96, 112, 128, 160, 192, 224, 256, or 320 (docs), as far as I have tested.

The issue is when the original.mp3 has a VBR (variable bit rate) or ABR, such as 150 kb/s.

If I try to match it like below:

ffmpeg -i input.mp3 -b:a 150k -ar 44100 -ac 2 re_encoded.mp3
ffmpeg -i re_encoded.mp3
# Stream #0:0: Audio: mp3, 44100 Hz, stereo, fltp, 160 kb/s

The resulting bitrate is rounded to the nearest CBR which is 160.

I can solve this with mp3 by using -abr 1:

ffmpeg -i input.mp3 -abr 1 -b:a 150k -ar 44100 -ac 2 re_encoded.mp3
ffmpeg -i re_encoded.mp3
# Stream #0:0: Audio: mp3, 44100 Hz, stereo, fltp, 150 kb/s

Now the bitrate matches the original audio, however I am not sure this is correct since I am modifying the new audio to an ABR and concatenating it with a VBR? I am not even sure how to check with ffmpeg if the audio is VBR, CBR or ABR, or if that even matters when concatenating.

Another issue also happens with aac files. When I try to match the original audio bitrate I can't.

ffmpeg -i input.mp3 -b:a 128k -ar 44100 -ac 2 re_encoded.aac
ffmpeg -i re_encoded.aac
# Stream #0:0: Audio: aac (LC), 44100 Hz, stereo, fltp, 135 kb/s

The resulting bitrate always seems to be variable (135 in this case), and hence I can't match it to the original one.

So my question is, what conditions need to be met when concatenating audios with different streams, and how can I achieve re-encoding only one audio to match the other one. Or if there is some package that can do this, it would be of great help.

1

There are 1 answers

1
Brad On

You need to match codec, channel count, and sample rate. You do not need to match bitrate. The decoder will work with a varying bitrate as if it were any other VBR stream. Each frame can indicate its size. For CBR, all the frames just happen to be the same size.

Realistically though, you're not going to want to bother with this. You're going to want to decode everything to raw PCM and re-encode. While this does result in a generation of loss, the upsides are clear:

  • Sample accurate timing requires splicing in a format that can actually split on samples. You can't do that with a lossy codec that works in chunks (i.e. frames).
  • Mixing inputs of different codecs requires picking a winner to output to, and you'll have a generation of loss there anyway.
  • Your users will inevitably want to bring in sources of varying sample rates. The resampling needs to be done in the time domain, which requires another generation of decoding and encoding anyway.
  • It's easy and reliable to convert to PCM for the edit stage. It's hard, problematic, and error prone to try to edit with chunks of compressed data.