How to drop frames while recording with MediaCodec and InputSurface?

2.3k views Asked by At

In my Android app I want to record a video with Time-lapse. I have an InputSurface -> MediaCodec (encoder) -> MediaMuxer.

But if I want to speed up the video (for example: x3), I get the resulted video with very high framerate. For example: with normal speed I get video 30fps. If I speed up (x3), I get the video 90fps.

Since the framerate of video is high, the video player of my phone cannot play the video normally (The video player of computer plays the video well without any problem). So I think I have to drop some frames to keep the framerate lower than 60fps.

But I don't know how to drop the frames. Because in AVC stream, we have I, B, P frames and they may be dependent upon others, so we can't drop them arbitrarily. Can anybody help me?

1

There are 1 answers

2
fadden On

You have to decode and re-encode the stream, dropping frames as you go. Simply halving the time stamps in a 60fps video will leave you with a 120fps video.

Bear in mind that the raw H.264 video stream does not have any timestamps embedded in it. The .mp4 wrapper parsed by MediaExtractor and added by MediaMuxer holds the timing information. The MediaCodec interfaces appear to accept and produce the presentation time stamp, but it's mostly just passing it through to help you keep the timestamp associated with the correct frame -- frames can be reordered by the encoder. (Some encoders do look at the timestamps to try to meet the bit rate target, so you can't pass bogus values through.)

You can do something like the DecodeEditEncode example. When the decoder calls releaseOutputBuffer(), you just pass "false" for the render argument on every other frame.

If you're accepting video frames from some other source, such as a virtual display for screen recording, you can't hand the encoder's Surface directly to the display. You would have to create a SurfaceTexture, create a Surface from that, and then process the frames as they arrive. The DecodeEditEncode example does exactly this, modifying each frame with a GLES shader as it does so.

Screen recording does present an additional difficulty though. Frames from virtual displays arrive as they are produced, not at a fixed frame rate, yielding variable-frame-rate video. For example, you might have a sequence of frames like this:

[1] [2] <10 seconds pass> [3] [4] [5] ...

While most of the frames are arriving 16.7ms apart (60fps), there are gaps when the display isn't updating. If your recording grabs every other frame, you will get:

[1] <10+ seconds pass> [3] [5] ...

You end up paused for 10 seconds on the wrong frame, which can be glaring if there was a lot of movement between 1 and 2. Making this work correctly requires some intelligence in the frame-dropping, e.g. repeating the previous frame as needed to produce constant-frame-rate 30fps video.