Ffmpeg does not concat the media files correctly using various testing. One of the videos is a .mp4 (h264 codec) video generated previously using a .mp3 and a jpeg background. I've tried testing with various flags, closest I've gotten is below for the final output.
My main issue is the final video with the current test, the audio is about 3 seconds delayed once the two videos are stitched together.
Here are all the files I'm using:
Input Files:
Output Files:
files.txt
file '/tmp/new_image_video.mp4'
file '/tmp/main_video.mp4'
Image Video Creation:
ffmpeg -loop 1 -i /tmp/image.jpg -i /tmp/audio.mp3 -acodec libfdk_aac -framerate 30 -vcodec libx264 -shortest /tmp/new_image_video_raw.mp4
Part two:
ffmpeg -threads 0 -i /tmp/new_image_video_raw.mp4 -vf "scale=w=560:h=320:force_original_aspect_ratio=decrease, pad=560:320:(560-iw*min(560/iw\,320/ih))/2:(320-ih*min(560/iw\,320/ih))/2" -acodec libfdk_aac -af aresample=resampler=soxr -qp 20 -ar 44100 -r 30 -ab 128k -ac 1 -vcodec libx264 -max_muxing_queue_size 9999 -shortest -movflags +faststart /tmp/new_image_video.mp4 -y
Main Video Transcode:
ffmpeg -i /tmp/main_video_raw.mp4 -vf "scale=iw*min(560/iw\,320/ih):ih*min(560/iw\,320/ih), pad=560:320:(560-iw*min(560/iw\,320/ih))/2:(320-ih*min(560/iw\,320/ih))/2" -acodec libfdk_aac -af aresample=resampler=soxr -ar 44100 -aspect 16:9 -qp 20 -framerate 30 -ab 128k -ac 1 -vcodec libx264 -max_muxing_queue_size 9999 -movflags +faststart /tmp/main_video.mp4 -y
Concat Video:
ffmpeg -threads 0 -f concat -safe 0 -i /tmp/files.txt -vf "scale=iw*min(560/iw\,320/ih):ih*min(560/iw\,320/ih), pad=560:320:(560-iw*min(560/iw\,320/ih))/2:(320-ih*min(560/iw\,320/ih))/2" -preset veryslow -crf 15 -acodec libfdk_aac -af aresample=resampler=soxr -ar 44100 -aspect 16:9 -qp 20 -framerate 30 -ab 128k -ac 1 -vcodec libx264 -max_muxing_queue_size 9999 -movflags +faststart /tmp/final_output_video.mp4 -y
Output for new_image_video.mp4
Stream mapping:
Stream #0:0 -> #0:0 (mjpeg (native) -> h264 (libx264))
Stream #1:0 -> #0:1 (copy)
Press [q] to stop, [?] for help
[libx264 @ 0x150ce00] using SAR=1/1
[libx264 @ 0x150ce00] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
[libx264 @ 0x150ce00] profile High, level 2.1
[libx264 @ 0x150ce00] 264 - core 152 - H.264/MPEG-4 AVC codec - Copyleft 2003-2017 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:-3:-3 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=2.00:0.70 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-4 threads=10 lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=1 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.20
Output #0, mp4, to '/tmp/new_image_video.mp4':
Metadata:
encoder : Lavf57.76.100
Stream #0:0: Video: h264 (libx264) (avc1 / 0x31637661), yuvj420p(pc), 560x320 [SAR 1:1 DAR 7:4], q=-1--1, 1 fps, 16384 tbn, 1 tbc
Metadata:
encoder : Lavc57.102.100 libx264
Side data:
cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: -1
Stream #0:1: Audio: mp3 (mp4a / 0x6134706D), 44100 Hz, stereo, s16p, 157 kb/s
Metadata:
encoder : Lavc56.41
frame= 73 fps=0.0 q=17.0 Lsize= 362kB time=00:00:16.00 bitrate= 185.3kbits/s speed=88.6x
video:49kB audio:308kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 1.166542%
[libx264 @ 0x150ce00] frame I:1 Avg QP: 4.09 size: 38729
[libx264 @ 0x150ce00] frame P:18 Avg QP: 5.77 size: 843
[libx264 @ 0x150ce00] frame B:54 Avg QP: 0.64 size: 49
[libx264 @ 0x150ce00] consecutive B-frames: 1.4% 0.0% 0.0% 98.6%
[libx264 @ 0x150ce00] mb I I16..4: 54.6% 18.9% 26.6%
[libx264 @ 0x150ce00] mb P I16..4: 0.0% 0.0% 0.0% P16..4: 9.1% 0.1% 0.5% 0.0% 0.0% skip:90.3%
[libx264 @ 0x150ce00] mb B I16..4: 0.0% 0.0% 0.0% B16..8: 2.6% 0.0% 0.0% direct: 0.0% skip:97.4% L0:69.1% L1:30.9% BI: 0.0%
[libx264 @ 0x150ce00] 8x8 transform intra:18.9% inter:59.9%
[libx264 @ 0x150ce00] coded y,uvDC,uvAC intra: 44.1% 45.3% 45.0% inter: 1.4% 0.0% 0.0%
[libx264 @ 0x150ce00] i16 v,h,dc,p: 91% 2% 6% 1%
[libx264 @ 0x150ce00] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 22% 18% 18% 8% 5% 6% 7% 9% 7%
[libx264 @ 0x150ce00] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 23% 16% 8% 7% 10% 9% 10% 9% 9%
[libx264 @ 0x150ce00] i8c dc,h,v,p: 71% 12% 12% 5%
[libx264 @ 0x150ce00] Weighted P-Frames: Y:0.0% UV:0.0%
[libx264 @ 0x150ce00] ref P L0: 79.3% 0.1% 19.5% 1.1%
[libx264 @ 0x150ce00] ref B L0: 68.3% 30.5% 1.2%
[libx264 @ 0x150ce00] ref B L1: 98.4% 1.6%
[libx264 @ 0x150ce00] kb/s:6.20
Output for new_image_video.mp4 (Part 2)
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/tmp/new_image_video_raw.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf57.76.100
Duration: 00:00:19.00, start: 0.000000, bitrate: 156 kb/s
Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuvj420p(pc), 560x320 [SAR 1:1 DAR 7:4], 21 kb/s, 1 fps, 1 tbr, 16384 tbn, 2 tbc (default)
Metadata:
handler_name : VideoHandler
Stream #0:1(und): Audio: mp3 (mp4a / 0x6134706D), 44100 Hz, stereo, s16p, 157 kb/s (default)
Metadata:
handler_name : SoundHandler
Stream mapping:
Stream #0:0 -> #0:0 (h264 (native) -> h264 (libx264))
Stream #0:1 -> #0:1 (mp3 (native) -> aac (libfdk_aac))
Press [q] to stop, [?] for help
[libx264 @ 0x2175560] using SAR=1/1
[libx264 @ 0x2175560] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
[libx264 @ 0x2175560] profile High, level 3.0
[libx264 @ 0x2175560] 264 - core 152 - H.264/MPEG-4 AVC codec - Copyleft 2003-2017 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=10 lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc=cqp mbtree=0 qp=20 ip_ratio=1.40 pb_ratio=1.30 aq=0
Output #0, mp4, to '/tmp/new_image_video.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf57.76.100
Stream #0:0(und): Video: h264 (libx264) (avc1 / 0x31637661), yuvj420p(pc), 560x320 [SAR 1:1 DAR 7:4], q=-1--1, 30 fps, 15360 tbn, 30 tbc (default)
Metadata:
handler_name : VideoHandler
encoder : Lavc57.102.100 libx264
Side data:
cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: -1
Stream #0:1(und): Audio: aac (libfdk_aac) (mp4a / 0x6134706D), 44100 Hz, mono, s16, 128 kb/s (default)
Metadata:
handler_name : SoundHandler
encoder : Lavc57.102.100 libfdk_aac
[mp4 @ 0x2150cc0] Starting second pass: moving the moov atom to the beginning of the file drop=0 speed=31.6x
frame= 569 fps=0.0 q=-1.0 Lsize= 351kB time=00:00:18.86 bitrate= 152.3kbits/s dup=579 drop=0 speed=32.5x
video:81kB audio:251kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 5.851973%
[libx264 @ 0x2175560] frame I:3 Avg QP:17.00 size: 23393
[libx264 @ 0x2175560] frame P:143 Avg QP:20.00 size: 26
[libx264 @ 0x2175560] frame B:423 Avg QP:21.67 size: 19
[libx264 @ 0x2175560] consecutive B-frames: 0.9% 0.0% 0.0% 99.1%
[libx264 @ 0x2175560] mb I I16..4: 54.7% 26.0% 19.4%
[libx264 @ 0x2175560] mb P I16..4: 0.0% 0.0% 0.0% P16..4: 0.1% 0.0% 0.0% 0.0% 0.0% skip:99.9%
[libx264 @ 0x2175560] mb B I16..4: 0.0% 0.0% 0.0% B16..8: 0.2% 0.0% 0.0% direct: 0.0% skip:99.7% L0:23.7% L1:76.3% BI: 0.0%
[libx264 @ 0x2175560] 8x8 transform intra:26.0% inter:14.0%
[libx264 @ 0x2175560] coded y,uvDC,uvAC intra: 39.8% 44.1% 43.4% inter: 0.0% 0.0% 0.0%
[libx264 @ 0x2175560] i16 v,h,dc,p: 91% 3% 5% 1%
[libx264 @ 0x2175560] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 22% 17% 15% 8% 6% 9% 6% 8% 8%
[libx264 @ 0x2175560] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 20% 17% 6% 7% 10% 9% 11% 10% 9%
[libx264 @ 0x2175560] i8c dc,h,v,p: 71% 11% 13% 5%
[libx264 @ 0x2175560] Weighted P-Frames: Y:0.0% UV:0.0%
[libx264 @ 0x2175560] ref P L0: 95.4% 0.7% 3.9%
[libx264 @ 0x2175560] ref B L0: 44.6% 55.4%
[libx264 @ 0x2175560] ref B L1: 98.3% 1.7%
[libx264 @ 0x2175560] kb/s:34.62
Output for main_video.mp4
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/tmp/main_video_raw.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
creation_time : 1970-01-01T00:00:00.000000Z
encoder : Lavf53.24.2
Duration: 00:01:02.32, start: 0.000000, bitrate: 1347 kb/s
Stream #0:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 959 kb/s, 25 fps, 25 tbr, 12800 tbn, 50 tbc (default)
Metadata:
creation_time : 1970-01-01T00:00:00.000000Z
handler_name : VideoHandler
Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, 5.1, fltp, 383 kb/s (default)
Metadata:
creation_time : 1970-01-01T00:00:00.000000Z
handler_name : SoundHandler
Stream mapping:
Stream #0:0 -> #0:0 (h264 (native) -> h264 (libx264))
Stream #0:1 -> #0:1 (aac (native) -> aac (libfdk_aac))
Press [q] to stop, [?] for help
[libx264 @ 0x758900] using SAR=64/63
[libx264 @ 0x758900] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
[libx264 @ 0x758900] profile High, level 2.1
[libx264 @ 0x758900] 264 - core 152 - H.264/MPEG-4 AVC codec - Copyleft 2003-2017 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=10 lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc=cqp mbtree=0 qp=20 ip_ratio=1.40 pb_ratio=1.30 aq=0
Output #0, mp4, to '/tmp/main_video.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf57.76.100
Stream #0:0(und): Video: h264 (libx264) (avc1 / 0x31637661), yuv420p, 560x320 [SAR 64:63 DAR 16:9], q=-1--1, 25 fps, 12800 tbn, 25 tbc (default)
Metadata:
creation_time : 1970-01-01T00:00:00.000000Z
handler_name : VideoHandler
encoder : Lavc57.102.100 libx264
Side data:
cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: -1
Stream #0:1(und): Audio: aac (libfdk_aac) (mp4a / 0x6134706D), 44100 Hz, mono, s16, 128 kb/s (default)
Metadata:
creation_time : 1970-01-01T00:00:00.000000Z
handler_name : SoundHandler
encoder : Lavc57.102.100 libfdk_aac
[mp4 @ 0x755900] Starting second pass: moving the moov atom to the beginning of the file11.1x
frame= 1557 fps=275 q=-1.0 Lsize= 5144kB time=00:01:02.32 bitrate= 676.1kbits/s speed= 11x
video:4119kB audio:975kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.989500%
[libx264 @ 0x758900] frame I:13 Avg QP:17.00 size: 34937
[libx264 @ 0x758900] frame P:657 Avg QP:20.00 size: 3546
[libx264 @ 0x758900] frame B:887 Avg QP:21.69 size: 1615
[libx264 @ 0x758900] consecutive B-frames: 18.9% 12.6% 8.1% 60.4%
[libx264 @ 0x758900] mb I I16..4: 12.5% 51.8% 35.7%
[libx264 @ 0x758900] mb P I16..4: 0.2% 1.9% 1.0% P16..4: 17.9% 9.3% 8.4% 0.0% 0.0% skip:61.3%
[libx264 @ 0x758900] mb B I16..4: 0.1% 0.3% 0.3% B16..8: 18.0% 5.6% 2.4% direct: 2.8% skip:70.6% L0:33.9% L1:42.5% BI:23.6%
[libx264 @ 0x758900] 8x8 transform intra:55.4% inter:56.4%
[libx264 @ 0x758900] coded y,uvDC,uvAC intra: 84.0% 93.3% 75.0% inter: 12.6% 14.9% 3.3%
[libx264 @ 0x758900] i16 v,h,dc,p: 8% 38% 3% 51%
[libx264 @ 0x758900] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 16% 20% 8% 7% 9% 9% 10% 10% 11%
[libx264 @ 0x758900] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 15% 20% 9% 8% 11% 10% 10% 9% 9%
[libx264 @ 0x758900] i8c dc,h,v,p: 41% 26% 17% 16%
[libx264 @ 0x758900] Weighted P-Frames: Y:0.2% UV:0.0%
[libx264 @ 0x758900] ref P L0: 72.3% 14.4% 9.7% 3.6% 0.0%
[libx264 @ 0x758900] ref B L0: 89.9% 7.7% 2.4%
[libx264 @ 0x758900] ref B L1: 97.1% 2.9%
[libx264 @ 0x758900] kb/s:541.66
Outupt for concat:
ffmpeg version 3.3.3 Copyright (c) 2000-2017 the FFmpeg developers
built with gcc 4.4.7 (GCC) 20120313 (Red Hat 4.4.7-18)
configuration: --prefix=/root/ffmpeg_build --extra-cflags=-I/root/ffmpeg_build/include --extra-ldflags='-L/root/ffmpeg_build/lib -ldl' --bindir=/root/bin --pkg-config-flags=--static --enable-gpl --enable-version3 --disable-debug --enable-shared --enable-runtime-cpudetect --enable-postproc --enable-pic --enable-libfdk_aac --enable-libfreetype --enable-libmp3lame --enable-libopus --enable-libvorbis --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libtheora --enable-libvo-amrwbenc --enable-gray --enable-libopenjpeg --enable-libass --enable-libvidstab --enable-libsoxr --enable-libvpx --enable-libx264 --enable-libx265 --enable-libxvid --enable-libwebp --enable-fontconfig --enable-libspeex --enable-nonfree
libavutil 55. 73.100 / 55. 73.100
libavcodec 57.102.100 / 57.102.100
libavformat 57. 76.100 / 57. 76.100
libavdevice 57. 7.100 / 57. 7.100
libavfilter 6. 98.100 / 6. 98.100
libswscale 4. 7.102 / 4. 7.102
libswresample 2. 8.100 / 2. 8.100
libpostproc 54. 6.100 / 54. 6.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/tmp/main_video.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
creation_time : 1970-01-01T00:00:00.000000Z
encoder : Lavf53.24.2
Duration: 00:01:02.32, start: 0.000000, bitrate: 1347 kb/s
Stream #0:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 959 kb/s, 25 fps, 25 tbr, 12800 tbn, 50 tbc (default)
Metadata:
creation_time : 1970-01-01T00:00:00.000000Z
handler_name : VideoHandler
Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, 5.1, fltp, 383 kb/s (default)
Metadata:
creation_time : 1970-01-01T00:00:00.000000Z
handler_name : SoundHandler
Stream mapping:
Stream #0:0 -> #0:0 (h264 (native) -> h264 (libx264))
Stream #0:1 -> #0:1 (aac (native) -> aac (libfdk_aac))
Press [q] to stop, [?] for help
[libx264 @ 0x1563900] using SAR=64/63
[libx264 @ 0x1563900] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
[libx264 @ 0x1563900] profile High, level 2.1
[libx264 @ 0x1563900] 264 - core 152 - H.264/MPEG-4 AVC codec - Copyleft 2003-2017 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=10 lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc=cqp mbtree=0 qp=20 ip_ratio=1.40 pb_ratio=1.30 aq=0
Output #0, mp4, to '/tmp/new_image_video.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf57.76.100
Stream #0:0(und): Video: h264 (libx264) (avc1 / 0x31637661), yuv420p, 560x320 [SAR 64:63 DAR 16:9], q=-1--1, 25 fps, 12800 tbn, 25 tbc (default)
Metadata:
creation_time : 1970-01-01T00:00:00.000000Z
handler_name : VideoHandler
encoder : Lavc57.102.100 libx264
Side data:
cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: -1
Stream #0:1(und): Audio: aac (libfdk_aac) (mp4a / 0x6134706D), 44100 Hz, mono, s16, 128 kb/s (default)
Metadata:
creation_time : 1970-01-01T00:00:00.000000Z
handler_name : SoundHandler
encoder : Lavc57.102.100 libfdk_aac
[mp4 @ 0x1560900] Starting second pass: moving the moov atom to the beginning of the file1.2x
frame= 1557 fps=277 q=-1.0 Lsize= 5144kB time=00:01:02.32 bitrate= 676.1kbits/s speed=11.1x
video:4119kB audio:975kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.989500%
[libx264 @ 0x1563900] frame I:13 Avg QP:17.00 size: 34937
[libx264 @ 0x1563900] frame P:657 Avg QP:20.00 size: 3546
[libx264 @ 0x1563900] frame B:887 Avg QP:21.69 size: 1615
[libx264 @ 0x1563900] consecutive B-frames: 18.9% 12.6% 8.1% 60.4%
[libx264 @ 0x1563900] mb I I16..4: 12.5% 51.8% 35.7%
[libx264 @ 0x1563900] mb P I16..4: 0.2% 1.9% 1.0% P16..4: 17.9% 9.3% 8.4% 0.0% 0.0% skip:61.3%
[libx264 @ 0x1563900] mb B I16..4: 0.1% 0.3% 0.3% B16..8: 18.0% 5.6% 2.4% direct: 2.8% skip:70.6% L0:33.9% L1:42.5% BI:23.6%
[libx264 @ 0x1563900] 8x8 transform intra:55.4% inter:56.4%
[libx264 @ 0x1563900] coded y,uvDC,uvAC intra: 84.0% 93.3% 75.0% inter: 12.6% 14.9% 3.3%
[libx264 @ 0x1563900] i16 v,h,dc,p: 8% 38% 3% 51%
[libx264 @ 0x1563900] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 16% 20% 8% 7% 9% 9% 10% 10% 11%
[libx264 @ 0x1563900] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 15% 20% 9% 8% 11% 10% 10% 9% 9%
[libx264 @ 0x1563900] i8c dc,h,v,p: 41% 26% 17% 16%
[libx264 @ 0x1563900] Weighted P-Frames: Y:0.2% UV:0.0%
[libx264 @ 0x1563900] ref P L0: 72.3% 14.4% 9.7% 3.6% 0.0%
[libx264 @ 0x1563900] ref B L0: 89.9% 7.7% 2.4%
[libx264 @ 0x1563900] ref B L1: 97.1% 2.9%
[libx264 @ 0x1563900] kb/s:541.66
It looks like the main_video.mp4 audio track was variable. I was able to get it working by transcoding the video like: