This simple CUFFT code was run on two IDEs -
- VS 2013 with Cuda 7.0
- VS 2010 with Cuda 4.2
I found that VS 2013 with Cuda 7.0 was a 1000
times slower approximately. The code executed in 0.6 ms
in VS 2010, and took 520 ms
on VS 2013, both on an average.
#include "stdafx.h"
#include "cuda.h"
#include "cuda_runtime_api.h"
#include "cufft.h"
typedef cuComplex Complex;
#include <iostream>
using namespace std;
int _tmain(int argc, _TCHAR* argv[])
{
cudaEvent_t start, stop;
cudaEventCreate(&start);
cudaEventCreate(&stop);
cudaEventRecord(start);
const int SIZE = 10000;
Complex *h_col = (Complex*)malloc(SIZE*sizeof(Complex));
for (int i = 0; i < SIZE; i++)
{
h_col[i].x = i;
h_col[i].y = i;
}
Complex *d_col;
cudaMalloc((void**)&d_col, SIZE*sizeof(Complex));
cudaMemcpy(d_col, h_col, SIZE*sizeof(Complex), cudaMemcpyHostToDevice);
cufftHandle plan;
const int BATCH = 1;
cufftPlan1d(&plan, SIZE, CUFFT_C2C, BATCH);
cufftExecC2C(plan, d_col, d_col, CUFFT_FORWARD);
cudaMemcpy(h_col, d_col, SIZE*sizeof(Complex), cudaMemcpyDeviceToHost);
cudaEventRecord(stop);
cudaEventSynchronize(stop);
float milliseconds = 0;
cudaEventElapsedTime(&milliseconds, start, stop);
cufftDestroy(plan);
cout << milliseconds;
return 0;
}
The code was run on the same computer, with the same OS, same Graphics card, and immediately one after another. The configuration in both cases was x64 Release. You get to choose whether to compile the file using C++ compiler or CUDA C/C++. I tried both the options on both the projects and it made no difference.
Any ideas to fix this?
FWIW, I get the same results with Cuda 6.5 on VS 2013 as Cuda 7
The cufft library has gotten considerably larger from 4.2 to 7.0 and it results in substantially more initialization time. If you remove this initialization time as a factor, I think you will find there will be far less than 1000x difference in execution time.
Here's a modified code demonstrating this:
The second number above represents essentially the same code with the cufft initialization removed (since it was done on the first pass).