Calculate PTS before frame encoding in FFmpeg

13.2k views Asked by At

How to calculate correct PTS value for frame before encoding in FFmpeg C API?

For encoding I'm using function avcodec_encode_video2 and then writing it by av_interleaved_write_frame.

I found some formulas, but none of them work.

In doxygen example they are using

frame->pts = 0;
for (;;) {
    // encode & write frame
    // ...
    frame->pts += av_rescale_q(1, video_st->codec->time_base, video_st->time_base);
}

This blog says that formula must be like this:

(1 / FPS) * sample rate * frame number

Someone uses only frame number to set pts:

frame->pts = videoCodecCtx->frame_number;

Or an alternative way:

int64_t now = av_gettime();
frame->pts = av_rescale_q(now, (AVRational){1, 1000000}, videoCodecCtx->time_base);

And the last one:

// 40 * 90 means 40 ms and 90 because of the 90kHz by the standard for PTS-values. 
frame->pts = encodedFrames * 40 * 90;

Which one is correct? I think answer for this question will be helpful for not only for me.

3

There are 3 answers

0
user3523581 On

There's also the option with setting it like frame->pts = av_frame_get_best_effort_timestamp(frame) but I'm not sure this is the correct approach either.

1
Jack On

It's better to think about PTS more abstractly before trying code.

What you're doing is meshing 3 "time sets" together. The first is time we're used to, based on 1000 ms per second, 60 seconds per minute, and so on. The second is the codec time for the particular codec you are using. Each codec has a certain way it wants to represent time, usually in a 1/number format meaning that for every second there is "number" amount of ticks. The third format works similar to the second except that it is the time base for the container that you are used.

Some people prefer to start with actual time, others frame count, neither is "wrong".

Starting with a frame count you need to first convert it based on your frame rate. Note all conversions I speak of use av_rescale_q(...). The purpose of this conversion is to turn a counter into time, so you rescale with your frame rate (video steam time base usually). Then you have to convert that into the time_base of your video codec before encoding.

Similarly, with a real time, your first conversion needs to be from current_time - start_time scaled to your video codec time.

Anyone using only frame counter is probably using a codec with a time_base equal to their frame rate. Most codecs do not work like this and their hack is not portable. Example:

frame->pts = videoCodecCtx->frame_number;  // BAD

Additionally, anyone using hardcoded numbers in their av_rescale_q is leveraging the fact that they know what their time_base is and this should be avoided. The code isn't portable to other video formats. Instead use video_st->time_base, video_st->codec->time_base, and output_ctx->time_base to figure things out.

I hope understanding it from a higher level will help you see which of those are "correct" and which are "bad practice". There is no single answer, but maybe now you can decide which approach is best for you.

0
Bill Yan On

Time is measured not in seconds or milliseconds or any standard unit. Instead, it is measured by the avCodecContext's timebase.

So if you set the codecContext->time_base to 1/1, it means using second for measurement.

cctx->time_base = (AVRational){1, 1};

Assuming you want to encode at a steady fps of 30. Then, the time when a frame is encoded is framenumber * (1.0/fps)

But once again, the PTS is also not measured in seconds or any standard unit. It's measured by avStream's time_base.

In the question, the author mentioned 90k as the standard resolution for pts. But you will see that this is not always true. The exact resolution is saved in avstream. you can read it back by:

    if ((err = avformat_write_header(ofctx, NULL)) < 0) {
        std::cout << "Failed to write header" << err << std::endl;
        return -1;
    }

    av_dump_format(ofctx, 0, "test.webm", 1);
    std::cout << stream->time_base.den  << " " << stream->time_base.num << std::endl;

The value of stream->time_stamp is only populated after calling avformat_write_header

Therefore, the right formula for calculating PTS is:

//The following assumes that codecContext->time_base = (AVRational){1, 1};
videoFrame->pts = frameduration * (frameCounter++) * stream->time_base.den / (stream->time_base.num * fps);

So really there are 3 components in the formula,

  1. fps
  2. codecContext->time_base
  3. stream->time_base

so pts = fps*codecContext->time_base/stream->time_base

I have detailed my discovery here