Fine grained Kernel scheduling with MPS

208 views Asked by At

I am working on using NVIDIA CUDA Multi Process Service (MPS) for running multiple TensorFlow inference jobs using the same GPU. For my use-case, when GPU is being shared by more than one processes, I sometimes need to prioritize execution of kernels of one process over the other. Is this supported?

To explain the problem in more detail, consider an example in which we have two processes, p1 and p2 (each with just one kernel execution stream) sharing a GPU.

Scenario: When there are one or more kernels in ready queue for both p1 and p2.

Default MPS behavior (My understanding):

If there is enough resources, execute multiple kernels at the same time from both p1 and p2.

Desired behavior: Ability to decide based on priority if:

  • Execute kernel of p1 first then p2.
  • Execute kernel of p2 first then p1.
  • Incase there is enough resources, execute multiple kernels at the same time from both p1 and p2.

If this kind of customized scheduling is not supported, It will be great if someone can guide what code changes will be needed to make it work.

1

There are 1 answers

1
talonmies On BEST ANSWER

I sometimes need to prioritize execution of kernels of one process over the other. Is this supported?

No it is not.