I am trying to write some code that resamples an audio file to 16kHz and 1 channel and then encodes it to PCM, but I am having an issue with channel layouts.
In a nutshell:
My AVCodecContext and the frames I get from the stream via avcodec_receive_frame() have a channel layout order of AV_CHANNEL_ORDER_UNSPEC. But when I call swr_init() it changes the in_ch_layout order to AV_CHANNEL_ORDER_NATIVE. Then when I call swr_convert_frame() with my AVFrames, because the channel layout orders don't match, the resampling fails because it thinks the input changed.
More details:
I create an AVCodecContext from my audio stream's codec, and it has a channel layout of AV_CHANNEL_ORDER_UNSPEC with 2 channels, and any frames I decode from the stream via avcodec_receive_frame() also have a channel layout order of AV_CHANNEL_ORDER_UNSPEC.
I set SwrContext's |in_ch_layout| to the sample channel layout from the codec context:
AVChannelLayout in_ch_layout = in_codec_context->ch_layout,
...
int ret = swr_alloc_set_opts2(&swr_ctx, ...
&in_ch_layout,
...);
But SwrContext->init() changes its internal in_ch_layout from AV_CHANNEL_ORDER_UNSPEC to AV_CHANNEL_ORDER_NATIVE meaning it fails the next time I call swr_convert_frame() because the input frame has a different channel layout to the SwrContext. When swr_init() is called (in my case indirectly by swr_convert_frame(), but also if I alternatively call it directly) the SwrContext->used_ch_layout and SwrContext->in_ch_layout are updated to have channel layout order of AV_CHANNEL_ORDER_NATIVE:
// swresample.c
av_cold int swr_init(struct SwrContext *s){
...
if (!av_channel_layout_check(&s->used_ch_layout)) <-- This hits if I don't set anything for used_ch_layout
av_channel_layout_default(&s->used_ch_layout, s->in.ch_count); <-- default is AV_CHANNEL_ORDER_NATIVE
...
if (s->used_ch_layout.order == AV_CHANNEL_ORDER_UNSPEC) <-- This hits if I do set used_ch_layout
av_channel_layout_default(&s->used_ch_layout, s->used_ch_layout.nb_channels); <-- default is AV_CHANNEL_ORDER_NATIVE
Then when I next call swr_convert_frame(), because the frame has the same layout as the audio stream's codec (AV_CHANNEL_ORDER_UNSPEC), and this is different to SwrContext->in_ch_layout (AV_CHANNEL_ORDER_NATIVE), it early exits with ret |= AVERROR_INPUT_CHANGED.
// swresample_frame.c
int swr_convert_frame(SwrContext *s,
AVFrame *out, const AVFrame *in)
{
...
if ((ret = config_changed(s, out, in)))
return ret;
...
static int config_changed(SwrContext *s,
const AVFrame *out, const AVFrame *in)
{
...
if ((err = av_channel_layout_copy(&ch_layout, &in->ch_layout)) < 0)
...
if (av_channel_layout_compare(&s->in_ch_layout, &ch_layout) || ...) { <-- This hits the next time I call swr_convert_frame()
ret |= AVERROR_INPUT_CHANGED;
}
// channel_layout.c
int av_channel_layout_compare(const AVChannelLayout *chl, const AVChannelLayout *chl1)
{
...
// if only one is unspecified -> not equal
if ((chl->order == AV_CHANNEL_ORDER_UNSPEC) !=
(chl1->order == AV_CHANNEL_ORDER_UNSPEC))
return 1;
If I hardcode the channel layout order of each input AVFrame to AV_CHANNEL_ORDER_NATIVE before resampling, then the resampling and subsequent encoding works, but this feels like a really bad idea and of course wouldn't work as soon as I resample an audio file with a different channel layout.
avcodec_receive_frame(in_codec_context, input_frame);
AVChannelLayout input_frame_ch_layout;
av_channel_layout_default(&input_frame_ch_layout, 2 /* = nb_channels*/);
input_frame->ch_layout = input_frame_ch_layout;
// Bad idea - but "fixes" my issue!
My questions
What do I need to do to the resampler OR/AND the decoded audio frame to make sure they have the same channel layout order and the resampling works?
How can I make the channel order of the AVFrames that I get from avcodec_receive_frame() match the input channel order of SwrContext so the resampling works? My understanding is that the decoded frames should be 'correct' already and I shouldn't need to change any of their values, only values of the output (resampled) frames that I create.
Is there something I need to set on the AVFrame before I resample it?
Why does the SwrContext choose to change the channel order to AV_CHANNEL_ORDER_NATIVE?
Note:
A workaround could be to use swr_convert() with the raw data buffer instead of swr_convert_frame(), since it looks like it bypasses this check (since there are no frames involved). I haven't tried this but this shouldn't be necessary and I would like to use swr_convert_frame() as I am working with input and output frames.
Unfortunately I can't find example code using swr_convert_frame() (not even the ffmpeg code seems to ever call it).
My full c++ source code (error handling omitted for readability):
std::string fileToUse = "/home/projects/audioFileProject/Audio files/14 Black Cadillacs.wma";
const std::string outputFilename = "out.wav";
const std::string PCMS16BE_encoder_name = "pcm_f32le";
int main()
{
// Open audio file
AVFormatContext* in_format_context = avformat_alloc_context();
avformat_open_input(&in_format_context, fileToUse.c_str(), NULL, NULL);
avformat_find_stream_info(in_format_context, NULL);
// Get audio stream from file and corresponding decoder
AVStream* in_stream = in_format_context->streams[0];
AVCodecParameters* codec_params = in_stream->codecpar;
const AVCodec* in_codec = avcodec_find_decoder(codec_params->codec_id);
AVCodecContext *in_codec_context = avcodec_alloc_context3(in_codec);
avcodec_parameters_to_context(in_codec_context, codec_params);
avcodec_open2(in_codec_context, in_codec, NULL);
// Prepare output stream and output encoder (PCM)
AVFormatContext* out_format_context = nullptr;
avformat_alloc_output_context2(&out_format_context, NULL, NULL, outputFilename.c_str());
AVStream* out_stream = avformat_new_stream(out_format_context, NULL);
const AVCodec* output_codec = avcodec_find_encoder_by_name(PCMS16BE_encoder_name.c_str());
AVCodecContext* output_codec_context = avcodec_alloc_context3(output_codec);
// -------------------------------
AVChannelLayout output_ch_layout;
av_channel_layout_default(&output_ch_layout, 1); // AV_CHANNEL_LAYOUT_MONO
output_codec_context->ch_layout = output_ch_layout;
auto out_sample_rate = 16000;
output_codec_context->sample_rate = out_sample_rate;
output_codec_context->sample_fmt = output_codec->sample_fmts[0];
//output_codec_context->bit_rate = output_codec_context->bit_rate; // TODO Do we need to set the bit rate?
output_codec_context->time_base = (AVRational){1, out_sample_rate};
out_stream->time_base = output_codec_context->time_base;
auto in_sample_rate = in_codec_context->sample_rate;
AVChannelLayout in_ch_layout = in_codec_context->ch_layout,
out_ch_layout = output_ch_layout; // AV_CHANNEL_LAYOUT_MONO;
enum AVSampleFormat in_sample_fmt = in_codec_context->sample_fmt,
out_sample_fmt = in_codec_context->sample_fmt;
SwrContext *swr_ctx = nullptr;
int ret = swr_alloc_set_opts2(&swr_ctx,
&out_ch_layout,
out_sample_fmt,
out_sample_rate,
&in_ch_layout,
in_sample_fmt,
in_sample_rate,
0, // log_offset
NULL); // log_ctx
// Probably not necessary - documentation says "This option is
only used for special remapping."
av_opt_set_chlayout(swr_ctx, "used_chlayout", &in_ch_layout, 0);
// Open output file for writing
avcodec_open2(output_codec_context, output_codec, NULL);
avcodec_parameters_from_context(out_stream->codecpar, output_codec_context);
if (out_format_context->oformat->flags & AVFMT_GLOBALHEADER)
out_format_context->flags |= AV_CODEC_FLAG_GLOBAL_HEADER;
avio_open(&out_format_context->pb, outputFilename.c_str(), AVIO_FLAG_WRITE);
AVDictionary* muxer_opts = nullptr;
avformat_write_header(out_format_context, &muxer_opts);
AVFrame* input_frame = av_frame_alloc();
AVPacket* in_packet = av_packet_alloc();
// Loop through decoded input frames. Resample and get resulting samples in a new output frame.
// I think PCM supports variable number of samples in frames so probably can immediately write out
while (av_read_frame(in_format_context, in_packet) >= 0) {
avcodec_send_packet(in_codec_context, in_packet);
avcodec_receive_frame(in_codec_context, input_frame);
// I don't want to do this, but it 'fixes' the error where channel layout of input frames
// doesn't match what the resampler expects - hardcoded the number 2 to fit my sample audio file.
AVChannelLayout input_frame_ch_layout;
av_channel_layout_default(&input_frame_ch_layout, 2 /* = nb_channels*/);
input_frame->ch_layout = input_frame_ch_layout;
AVFrame* output_frame = av_frame_alloc();
output_frame->sample_rate = out_sample_rate;
output_frame->format = out_sample_fmt;
output_frame->ch_layout = out_ch_layout;
output_frame->nb_samples = output_codec_context->frame_size;
// TODO Probably need to do maths to calculate new pts properly
output_frame->pts = input_frame->pts;
if (swr_convert_frame(swr_ctx, output_frame, input_frame))
{ logging("Swr Convert failed"); return -1; }
/// ^ Fails here, the second time (since the first time init() is called internally)
AVPacket *output_packet = av_packet_alloc();
int response = avcodec_send_frame(output_codec_context, output_frame);
while (response >= 0) {
response = avcodec_receive_packet(output_codec_context, output_packet);
if (response == AVERROR(EAGAIN) || response == AVERROR_EOF) {
break;
}
output_packet->stream_index = 0;
av_packet_rescale_ts(output_packet, in_stream->time_base, out_stream->time_base);
av_interleaved_write_frame(out_format_context, output_packet);
}
av_packet_unref(output_packet);
av_packet_free(&output_packet);
av_frame_unref(input_frame); // Free references held by the frame before reading new data into it.
av_frame_unref(output_frame);
}
// TODO write last output packet flushing the buffer
avformat_close_input(&in_format_context);
return 0;
}