I'm making a medical imaging equipment. I want to use CUDA for making faster equipment
I receive 1024 size 1d data from CCD 512 times. before I perform IFFT I have to apply high performance interpolation algorithm (like cubic spline interpolation) to the 1024 size data each (then 1d interpolation 512 times).
- Is there any CUDA library to perform cubic spline interpolation? (I found that there is one library, but it is for 2 or 3 dimensional image. Since I need to perform other complicated filtering functions, I need the data on the global memory, not on the texture memory.) 
- Is there any NUFFT (non uniform fast Fourier transform) library (doesn't need to be written for CUDA)? I'm thinking that if I have NUFFT function, I don't have to do interpolation and IFFT separately which is possible for making even faster equipment. 
 
                        
I don't know about that algorithm, but if what you've found you think fast enough for your equipment, then why dont you change the implementation from using texture memory to just a simple array, and maybe you can do more speedup using shared memory?
I've found some written in matlab and fortran 77:
http://www.cims.nyu.edu/cmcl/nufft/nufft.html
http://www.mathworks.com/matlabcentral/fileexchange/25135-nufft-nufft-usffft