Cusolver has Cholesky decomposition, unlike CUBLAS. I see cusolverDnDpotrsBatched and cusolverDnDpotrfBatched, but unfortunately I can't seem to find cusolverDnDpotriBatched in the documentation.
Is there any way I can batch cusolverDnDpotri without massive overhead, or a way to do the equivalent of what the API would have done?
Unfortunately, the only way would be to write your own kernel, as there are no "automatic" ways to convert a non-batched kernel to a batched one (writing a well-performing batched version of a kernel is by itself a scientific paper that can easily get accepted to a high-profile HPC conference).
Are you sure you actually need the inverse? Operations with the inverse can usually be expressed as a solution of a linear system, for which you could be using
cusolverDnPotrsBatched.If you really need the inverse, the only way I can think of without the need to write CUDA code would be to call
cusolverDnPotrsBatchedwith the right-hand sidesBarrayset to a batch of identity matrices. This way, the solutions Xi of systemsAi * Xi = I(which overwriteBarray) are the inverses of the matrix batchAarray. It does need extra memory, and is not as efficient as writing a kernel for the inverse, but should be faster than doing it sequentially.Another option would be to forget that the matrices are symmetric, and treat them as general matrices. You could then use the MAGMA library and its
magma_dgetri_outoflace_batched()function to invert the matrices (again not in-place). Unfortunately, MAGMA also does not support the batched version of symmetric inverse.