gstreamer debayering on GPU: Insight needed for slow conversion time

255 views Asked by At

Short discription

Gstreamer pipeline works but runs slow (~1FPS) when offloading debayer to GPU. I'd like some insight in why this happens, and if there is any way to improve this behaviour.

Full story

Context

  • Tested Hardware: IMX6DL (Duallite). Has a Vivante GC880 GPU
  • Linux version: 5.4.70-F+S
  • DMA: ION-allocator
  • Gstreamer version (& plugins): 1.16.2-imx
  • OpenGL 3.0

Goal

We are developing a revision of our hardware, which uses an IMX6 Duallite core through our efus A9 computer on module. My task is to interface our ov9732 camera, which should be used to output a 640x480 YUY2 image.

The method

To accomplish this, I've had to write a camera driver, which is then integrated into a gstreamer pipeline. The camera does not debayer itself, so this has to be done somewhere in the gstreamer pipeline. The used pipeline can be viewed further below. It does the following steps:

  1. Take in the camera image
  2. Convert the image from bayer to rgba
  3. Convert the rgba image to YUY2
  4. Expose a video sink that our streamer can use.

Problem #1 (which is solved)

Bayer conversion is slow, and doing it on the CPU takes too much of our processing power. Instead, I've opted to do this task on the GPU. Unfortunately,gstreamer does not support uploading video-bayer to the GPU (or color convert it), so I had to integrate debayering into the opengl plugin of the gstreamer-base-plugins.

This works: glupload can now upload video/x-bayer to video/x-bayer(Memory:GLMemory) and glcolorconvert can convert video/x-bayer(Memory:GLMemory) to video/x-raw(Memory:GLMemory)

Here is the used pipeline as a reference (the RGBA -> YUY2 part is offloaded to the IPU, a very quick process) :

gst-launch-1.0 --gst-debug-level=$GST_DEBUG_LEVEL imxv4l2videosrc device=/dev/video0 ! "video/x-bayer,format=bggr,width=640,height=480,framerate=15/1" ! glupload   ! glcolorconvert ! "video/x-raw(memory:GLMemory),format=RGBA,framerate=15/1" ! gldownload ! imxipuvideotransform ! "video/x-raw,format=YUY2,framerate=15/1"  ! tee name=tp !   queue  ! v4l2sink device=/dev/video2

Problem #2 (open)

Despite our modified pipeline generating a correctly debayered image, the framerate is unacceptably slow (~1fps). I suspect the slow framerate has something to do with the underlying DRM format. The code snippet below is my modified version of the _drm_rgba_fourcc_from_info function in gsteglimage.c, which is part of the negotiation for the underlying DMA buf. when using below return format:

      *out_format = GST_GL_RED; 
      return DRM_FORMAT_R8;

The pipeline shows a correct image after all processing, but is slow.

When using a range of other formats, e.g. this one:

      //*out_format = GST_GL_RGBA;
      //return rgba_fourcc;

The end image is distorted (e.g. 2 parallel monochrome images), but runs smoothly

The full method can be find below:

/*
 * GStreamer format descriptions differ from DRM formats as the representation
 * is relative to a register, hence in native endianness. To reduce the driver
 * requirement, we only import with a subset of texture formats and use
 * shaders to convert. This way we avoid having to use external texture
 * target.
 */
static int
_drm_rgba_fourcc_from_info (GstVideoInfo * info, int plane,
    GstGLFormat * out_format)
{
  GstVideoFormat format = GST_VIDEO_INFO_FORMAT (info);
#if G_BYTE_ORDER == G_LITTLE_ENDIAN
  const gint rgba_fourcc = DRM_FORMAT_ABGR8888;
  const gint rgb_fourcc = DRM_FORMAT_BGR888;
  const gint rg_fourcc = DRM_FORMAT_GR88;
#else
  const gint rgba_fourcc = DRM_FORMAT_RGBA8888;
  const gint rgb_fourcc = DRM_FORMAT_RGB888;
  const gint rg_fourcc = DRM_FORMAT_RG88;
#endif

  GST_DEBUG ("Getting DRM fourcc for %s plane %i",
      gst_video_format_to_string (format), plane);

  switch (format) {
    case GST_VIDEO_FORMAT_RGB16:
    case GST_VIDEO_FORMAT_BGR16:
      *out_format = GST_GL_RGB565;
      return DRM_FORMAT_RGB565;

    case GST_VIDEO_FORMAT_RGB:
    case GST_VIDEO_FORMAT_BGR:
      *out_format = GST_GL_RGB;
      return rgb_fourcc;

    case GST_VIDEO_FORMAT_RGBA:
    case GST_VIDEO_FORMAT_RGBx:
    case GST_VIDEO_FORMAT_BGRA:
    case GST_VIDEO_FORMAT_BGRx:
    case GST_VIDEO_FORMAT_ARGB:
    case GST_VIDEO_FORMAT_xRGB:
    case GST_VIDEO_FORMAT_ABGR:
    case GST_VIDEO_FORMAT_xBGR:
    case GST_VIDEO_FORMAT_AYUV:
      *out_format = GST_GL_RGBA;
      return rgba_fourcc;

    case GST_VIDEO_FORMAT_GRAY8:
      *out_format = GST_GL_RED;
      return DRM_FORMAT_R8;

    case GST_VIDEO_FORMAT_YUY2:
    case GST_VIDEO_FORMAT_UYVY:
    case GST_VIDEO_FORMAT_GRAY16_LE:
    case GST_VIDEO_FORMAT_GRAY16_BE:
      *out_format = GST_GL_RG;
      return rg_fourcc;

    case GST_VIDEO_FORMAT_NV12:
    case GST_VIDEO_FORMAT_NV21:
      *out_format = plane == 0 ? GST_GL_RED : GST_GL_RG;
      return plane == 0 ? DRM_FORMAT_R8 : rg_fourcc;

    case GST_VIDEO_FORMAT_I420:
    case GST_VIDEO_FORMAT_YV12:
    case GST_VIDEO_FORMAT_Y41B:
    case GST_VIDEO_FORMAT_Y42B:
    case GST_VIDEO_FORMAT_Y444:
      *out_format = GST_GL_RED;
      return DRM_FORMAT_R8;
    /* Cases added by me*/
    case GST_VIDEO_FORMAT_rggb:
    case GST_VIDEO_FORMAT_bggr:
    case GST_VIDEO_FORMAT_grbg:
    case GST_VIDEO_FORMAT_gbrg:
      /*the below formats produce a correct bayer format, but upload is slow*/
      *out_format = GST_GL_RED; 
      return DRM_FORMAT_R8; 
      /*Changing the DRM_FORMAT to the incorrect ones below 
        changes the output (after YUY2 conversion) to a distorted image ( 2 split-screen images, with one    red and one blue), but has a smooth framerate of 15 FPS*/ 
      //*out_format = GST_GL_RGBA;
      //return rgba_fourcc;
    default:
      GST_ERROR ("Unsupported format for DMABuf.");
      return -1;
  }
}

It puzzles me why simply changing the DRM_FORMAT results in a smooth stream (albeit with a wrong image). The steps the pipeline takes are still the same. I'm 100% certain that in both cases the debayering step is done and takes place on the CPU. The algorithm is based on this paper. I can provide the full source if anyone needs it.

So in conclusion: Can anyone provide a reason why the pipeline is slow, or at least point me in a direction of why DRM_FORMAT_R8 results in such a slow pipeline? Any help is greatly appreciated.

I can provide any of the modified Gstreamer code, or the opencl code if desired.

To verify dma is used instead of raw data, I've put a breakpoint in _dma_buf_upload_accept function of the gstglupload.c source code. It is hit each frame.

If I wouldn't have modified _drm_rgba_fourcc_from_info, this method would return -1 and gstupload.c would eventually fallback to _raw_data_upload_accept.

0

There are 0 answers