I have in mind to to use getrf and getrs from the cuSolver package and to solve AB=X with B=I.
Is this the most best way to solve this problem?
If so, what is the best way to create the col-major identity matrix
Bin device memory? It can be done trivially using aforloop but this would 1. take up a lot of memory and 2. be quite slow. Is there a faster way?
Note that cuSolver does not provide getri unfortunately. Therefore I must to use getrs.
Until CUDA provides the LAPACK API
getri, I thinkgetrfandgetrsis the best choice for large matrix inversion.The matrix
Bis of the same size asA, so I don't think allocatingBmakes this task consume much larger memory than its input/output data does.The complexity of
getrfandgetrsareO(n^3)andO(n^2), respectively, while settingB=Iis ofO(n^2) + O(n). I don't think it should be a bottleneck of the whole procedure. You may share your implementation, so we could check where the problem could be.