How should you correctly encode a large number of blit or scaling commands in Metal?

707 views Asked by At

In an application I'm working on that uses a traditional Metal rendering loop, I periodically need to copy a lot of image data from IOSurfaces to MTLTextures so that the data can be accessed in the fragment shader. I am trying to learn the most effective way to do that.

Each IOSurface represents a tile in a potentially very large image. (Like a stitched panorama.) Core Image is used for rendering image data into each IOSurface.

In Metal, I have an MTLTexture of type 2DArray that contains enough slices to cover the viewport and/or image itself, if the image is "zoomed out" smaller than the view's size.

The IOSurface and MTLTexture each of dimensions that are powers-of-two, but they might be different dimensions at times. When they are the same dimension, I use an MTLBlitCommandEncoder but when they differ in size I use MPSImageScale.

If I need to copy a lot of IOSurface's to a lot of Metal Textures, should I do it one-at-a-time, in batches or all at once?

Attempt #1: All At Once

This method works but starts to breakdown if the number of visible surfaces becomes quite large. You end up pre-allocating a bunch of surface-backed textures before committing them. This method seems the most logical to me, but it also causes the most warnings in Xcode's GPU insights and uses up the most texture memory when it doesn't need to.

Pseudo-code below:

func renderAllAtOnce() { 

  // Create one command buffer. 
  let commandBuffer = commandQueue.makeCommandBuffer()
  let blitEncoder = commandBuffer.makeBlitCommandEncoder()

  // Encode a copy for each surface.
  for surface in visibleSurfaces { 

    // Make a texture from the surface.
    let surfaceTexture = makeTextureFromSurface(surface)

    // Copy from the surface-backed texture into the appropriate slice in the destination texture.
    bitEncoder.copy(surfaceTexture, to: destinationTexture, slice:...)
  }

  // Commit the encoder.
  blitEncoder.endEncoding()
  commandBuffer.commit()
  commandBuffer.waitUntilCompleted()

  // Bind textures and issue draw calls using a render encoder.
  renderEncoder.draw(...)
}

Attempt 2: In Batches

In this implementation, I arbitrarily group the copy copy commands into groups of 10. This means I only ever pre-allocate up to 10 surface-backed 'sourceTextures' before committing the buffer. This seems to make the GPU a bit happier but the value of 10 seems rather arbitrary. Is there an optimum number here one could determine based on the hardware?

func renderInBatches() { 

  // Arbitrarily group surfaces into groups of 10.
  for group in visibleSurfaces(groupsOf: 10) {

    // Create a new command buffer and encoder for each group.
    let commandBuffer = commandQueue.makeCommandBuffer()
    let blitEncoder = commandBuffer.makeBlitCommandEncoder()

    // Encode only up to 10 copy commands.
    for surface in group { 
      let surfaceTexture = makeTextureFromSurface()
      bitEncoder.copy(surfaceTexture, to: destinationTexture, slice:...)
    }
  
    blitEncoder.endEncoding()
    commandBuffer.commit()
    commandBuffer.waitUntilCompleted()
  }

  // Bind textures and issue draw calls using a render encoder.
}

Attempt 3: One at a Time

No code, but this option is just using the batch option above but with groups of 1. In effect, creating a new command buffer and blit encoded for every surface that needs to be copied to a texture. Initially this seemed incredibly wasteful, but now I realize that command buffers and encoders are quite lightweight. After all, you create new ones on each render pass anyways.

But is doing it once at a time under-utilizing the GPU? There's no dependencies between the copy operations.

TL;DR

If you have to issue a lot of blit copy commands, or scale commands using MPS, what is the most efficient and "correct" way of doing that?

For now I'm building against macOS 11.0 and higher. The application is expected to run on any supported hardware.

1

There are 1 answers

5
Spo1ler On

You should definitely put as much work in a command buffers and encoders as possible.

In this case, you can have a single command buffer, which you populate with image filters first, and then do all the blits in a single blit command encoder.

On another note, you can also create an MTLTexture from IOSurface, so you won't have to blit if they have the same dimensions.

https://developer.apple.com/documentation/metal/mtldevice/1433378-newtexturewithdescriptor?language=objc