VP8 C/C++ source, how to encode frames in ARGB format to frame instead of from file

1.1k views Asked by At

I'm trying to get started with the VP8 library, I'm not building it in the standard way they tell you to, I just loaded all of the main files and the "encoder" folder into a new Visual Studio C++ DLL project, and just included the C files in an extern "C" dll export function, which so far builds fine etc., I just have no idea where to start with the C++ API to encode, say, 3 frames of ARGB data into a very basic video, just to get started

The only example I could find is in the examples folder called simple_encoder.c, although their premise is that they are loading in another file already and parsing its frames then converting it, so it seems a bit complicated, I just want to be able to pass in a byte array of a few ARGB frames and have it output a very simple VP8 video

I've seen How to encode series of images into VP8 using WebM VP8 Encoder API? (C/C++) but the accepted answer just links to the build instructions and references the general specification of the vp8 format, the closest I could find there is the example encoding parameters but I just want to do everything from C++ and I can't seem to find any other examples, besides for the default one simple_encoder.c?

Just to cite some of the relevant parts I think I understand, but still need more help on

//in int main...
...
vpx_image_t raw;
if (!vpx_img_alloc(&raw, VPX_IMG_FMT_I420, info.frame_width,
                     info.frame_height, 1)) {
    //"Failed to allocate image." error
}

So that part I think I understand for the most part, VPX_IMG_FMT_I420 is the only part that's not made in this file itself, but its in vpx_image.h, first as

#define VPX_IMG_FMT_PLANAR 
//then after...
typedef enum vpx_img_fmt {
    VPX_IMG_FMT_NONE,
    VPX_IMG_FMT_RGB24,   /**< 24 bit per pixel packed RGB */
    ///some other formats....
    VPX_IMG_FMT_ARGB,     /**< 32 bit packed ARGB, alpha=255 */

    VPX_IMG_FMT_YV12    = VPX_IMG_FMT_PLANAR | VPX_IMG_FMT_UV_FLIP | 1, /**< planar YVU */
    VPX_IMG_FMT_I420    = VPX_IMG_FMT_PLANAR | 2,
   
  } vpx_img_fmt_t; /**< alias for enum vpx_img_fmt */

So I guess part of my question is answered already just from writing this, that one of the formats is VPX_IMG_FMT_ARGB, although I don't where where it's defined, but I'm guessing in the above code I would replace it with

const VpxInterface *encoder = get_vpx_encoder_by_name("v8");

vpx_image_t raw;
VpxVideoInfo info = { 0, 0, 0, { 0, 0 } };

info.frame_width = 1920;
info.frame_height = 1080;
info.codec_fourcc = encoder->fourcc;
info.time_base.numerator = 1;
info.time_base.denominator = 24;

bool didIt = vpx_img_alloc(&raw, VPX_IMG_FMT_ARGB, 
          info.frame_width, info.frame_height/*example width and height*/, 1)
//check didIt..

vpx_codec_enc_cfg_t cfg;
vpx_codec_ctx_t codec;
vpx_codec_err_t res;

res = vpx_codec_enc_config_default(encoder->codec_interface(), &cfg, 0);
//check if !res for error

cfg.g_w = info.frame_width;
cfg.g_h = info.frame_height;
cfg.g_timebase.num = info.time_base.numerator;
cfg.g_timebase.den = info.time_base.denominator;
cfg.rc_target_bitrate = 200;

VpxVideoWriter *writer = NULL;

writer = vpx_video_writer_open(outfile_arg, kContainerIVF, &info);
//check if !writer for error

bool startIt = vpx_codec_enc_init(&codec, encoder->codec_interface(), &cfg, 0);
//not even sure where codec was set actually..


//check !startIt for error starting

//now the next part in the original is where it reads from the input file, but instead
//I need to pass in an array of some ARGB byte arrays..
//thing is, in the next step they use a while loop for 
//vpx_img_read(&raw, fopen("path/to/YV12formatVideo", "rb"))
//to set the contents of the raw vpx image allocated earlier, then
//they call another program that writes it to the writer object,
//but I don't know how to read the actual ARGB data directly into the raw image
//without using fopen, so that's one question (review at end)

//so I'll just put a placeholder here for the **question**

//assuming I have an array of byte arrays stored individually
//for simplicity sake
int size = 1920 * 1080 * 4;

uint8_t imgOne[size] = {/*some big byte array*/};
uint8_t imgTwo[size] = {/*some big byte array*/};
uint8_t imgThree[size] = {/*some big byte array*/};

uint8_t *images[] = {imgOne, imgTwo, imgThree};

int framesDone = 0;
int maxFrames = 3;

//so now I can replace the while loop with a filler function 
//until I find out how to set the raw image with ARGB data
while(framesDone < maxFrames) {
    magicalFunctionToSetARGBOfRawImage(&raw, images[framesDone]);
    
    encode_frame(&codec, &raw, framesDone, 0, writer);
    
    framesDone++;
}

//now apparently it needs to be flushed after

while(encode_frame(&codec, 0, -1, 0, writer)){}
vpx_img_free(&raw);
bool isDestroyed = vpx_codec_destroy(&codec);
//check if !isDestroyed for error

//now we gotta define the encode_Frames function, but simpler 
//(and make it above other function for reference purposes 
//or in header

static int encode_frame(
     vpx_codex_ctx_t *coydek, 
     vpx_image_t pic,
     int currentFrame, 
     int flags,
     VpxVideoWriter *koysayv/*writer*/
) {
    //now to substitute their encodeFrame function for
    //the actual raw calls to simplify things
    const DidIt = vpx_codec_encode(
        coydek,
        pic,
        currentFrame,
        1,//duration I think
        flags,//whatever that is
        VPX_DL_REALTIME//different than simlpe_encoder
    );
    
    if(!DidIt) return;//error here
    
    vpx_codec_iter_t iter = 0;
    const vpx_codec_cx_pkt_t *pkt = 0;
    int gotThings = 0;
    
    while(
        (pkt = vpx_codec_get_cx_data(
            coydek,
            &iter
        )) != 0
    ) {
        gotThings = 1;
        
        if(
            pkt->kind 
            == VPX_CODEC_CX_FRAME_PKT //don't exactly
            //understand this part
        ) {
            const 
            int 
            keyframe = (
                pkt
                    ->
                    data
                    .frame
                    .flags 
                    & 
                    VPX_FRAME_IS_KEY
            ) != 0; //don'texactly understand the
            //& operator here or how it gets the keyframe
            
            bool wroteFrame = vpx_video_writer_write_frame(
                koysayv,
                pkt->data.frame.buf
                //I'm guessing this is the encoded 
                //frame data
                ,
                pkt->data.frame.sz,
                pkt->data.frame.pts
            );
            
            if(!wroteFrame) return; //error
        }
    }
    
    return gotThings;
}

Thing is though, I don't know how to actually read the ARGB data into the RAW image buffer itself, as mentioned above, in the original example, they use vpx_img_read(&raw, fopen("path/to/file", "rb")) but if I'm starting off with the byte arrays themselves then what function do I use for that instead of the file?

I have a feeling it can be solved by the source code for the vpx_img_read found in tools_common.c function:

int vpx_img_read(vpx_image_t *img, FILE *file) {
  int plane;

  for (plane = 0; plane < 3; ++plane) {
    unsigned char *buf = img->planes[plane];
    const int stride = img->stride[plane];
    const int w = vpx_img_plane_width(img, plane) *
                  ((img->fmt & VPX_IMG_FMT_HIGHBITDEPTH) ? 2 : 1);
    const int h = vpx_img_plane_height(img, plane);
    int y;

    for (y = 0; y < h; ++y) {
      if (fread(buf, 1, w, file) != (size_t)w) return 0;
      buf += stride;
    }
  }

  return 1;
}

although I personally am not experienced enough to necessarily know how to get a single frames ARGB data in, I think the key part is fread(buf, 1, w, file) which seems to read parts of file into buf which represents img->planes[plane];, which I think then by reading into buf that automatically reads into img->planes[plane];, but I'm not sure if that is the case, and also not sure how to replace the fread from file to just take in a bye array that is alreasy loaded into memory...

1

There are 1 answers

0
Phyrs On

VPX_IMG_FMT_ARGB is not defined because not supported by libvpx (as far as I have seen). To compress an image using this library, you must first convert it to one of the supported format, like I420 (VPX_IMG_FMT_I420). The code here (not mine) : https://gist.github.com/racerxdl/8164330 do it well for the RGB format. If you don't want to use libswscale to make the conversion from RGB to I420, you can do things like this (this code convert a RGBA array of bytes to a I420 vpx_image that can be use by libvpx):

    unsigned int   tx       = <width of your image>
    unsigned int   ty       = <height of your image>
    unsigned char *image    = <array of bytes : RGBARGBA... of size ty*tx*4>
    vpx_image_t   *imageVpx = <result that must have been properly initialized by libvpx>

    imageVpx->stride[VPX_PLANE_U    ] = tx/2;
    imageVpx->stride[VPX_PLANE_V    ] = tx/2;
    imageVpx->stride[VPX_PLANE_Y    ] = tx;
    imageVpx->stride[VPX_PLANE_ALPHA] = tx;
    imageVpx->planes[VPX_PLANE_U    ] = new unsigned char[ty*tx/4];
    imageVpx->planes[VPX_PLANE_V    ] = new unsigned char[ty*tx/4];
    imageVpx->planes[VPX_PLANE_Y    ] = new unsigned char[ty*tx  ];
    imageVpx->planes[VPX_PLANE_ALPHA] = new unsigned char[ty*tx  ];

    unsigned char *planeY  = imageVpx->planes[VPX_PLANE_Y    ];
    unsigned char *planeU  = imageVpx->planes[VPX_PLANE_U    ];
    unsigned char *planeV  = imageVpx->planes[VPX_PLANE_V    ];
    unsigned char *planeA  = imageVpx->planes[VPX_PLANE_ALPHA];

    for (unsigned int y=0; y<ty; y++)
    {
        if (!(y % 2))
        {
            for (unsigned int x=0; x<tx; x+=2)
            {
                int r = *image++;
                int g = *image++;
                int b = *image++;
                int a = *image++;

                *planeY++ = max(0, min(255, (( 66*r + 129*g +  25*b) >> 8) + 16));
                *planeU++ = max(0, min(255, ((-38*r + -74*g + 112*b) >> 8) + 128));
                *planeV++ = max(0, min(255, ((112*r + -94*g + -18*b) >> 8) + 128));
                *planeA++ = a;

                r = *image++;
                g = *image++;
                b = *image++;
                a = *image++;

                *planeA++ = a;
                *planeY++ = max(0, min(255, ((66*r + 129*g + 25*b) >> 8) + 16));
            }
        }
        else
        {
            for (unsigned int x=0; x<tx; x++)
            {
                int const r = *image++;
                int const g = *image++;
                int const b = *image++;
                int const a = *image++;

                *planeA++ = a;
                *planeY++ = max(0, min(255, ((66*r + 129*g + 25*b) >> 8) + 16));
            }
        }
    }