I have a nD array, say of dimensions: (144, 522720) and I need to compute its FFT.
PyFFTW seems slower than numpy and scipy, that it is NOT expected.
Am I doing something obviously wrong?
Below is my code
import numpy
import scipy
import pyfftw
import time
n1 = 144
n2 = 522720
loops = 2
pyfftw.config.NUM_THREADS = 4
pyfftw.config.PLANNER_EFFORT = 'FFTW_ESTIMATE'
# pyfftw.config.PLANNER_EFFORT = 'FFTW_MEASURE'
Q_1 = pyfftw.empty_aligned([n1, n2], dtype='float64')
Q_2 = pyfftw.empty_aligned([n1, n2], dtype='complex_')
Q_ref = pyfftw.empty_aligned([n1, n2], dtype='complex_')
# repeat a few times to see if pyfft planner helps
for i in range(0,loops):
Q_1 = numpy.random.rand(n1,n2)
s1 = time.time()
Q_ref = numpy.fft.fft(Q_1, axis=0)
print('NUMPY - elapsed time: ', time.time() - s1, 's.')
s1 = time.time()
Q_2 = scipy.fft.fft(Q_1, axis=0)
print('SCIPY - elapsed time: ', time.time() - s1, 's.')
print('Equal = ', numpy.allclose(Q_2, Q_ref))
s1 = time.time()
Q_2 = pyfftw.interfaces.numpy_fft.fft(Q_1, axis=0)
print('PYFFTW NUMPY - elapsed time = ', time.time() - s1, 's.')
print('Equal = ', numpy.allclose(Q_2, Q_ref))
s1 = time.time()
Q_2 = pyfftw.interfaces.scipy_fftpack.fft(Q_1, axis=0)
print('PYFFTW SCIPY - elapsed time = ', time.time() - s1, 's.')
print('Equal = ', numpy.allclose(Q_2, Q_ref))
s1 = time.time()
fft_object = pyfftw.builders.fft(Q_1, axis=0)
Q_2 = fft_object()
print('FFTW PURE Elapsed time = ', time.time() - s1, 's')
print('Equal = ', numpy.allclose(Q_2, Q_ref))
Firstly, if you turn on the cache before you main loop, the interfaces work largely as expected:
It's interesting that despite wisdom that should be stored, the construction of the
pyfftwobjects is still rather slow when the cache is off. No matter, this is exactly the purpose of the cache. In your case you need to make the cache keep-alive time quite long because your loop is very long.Secondly, it's not a fair comparison to include the construction time of the
fft_objectin the final test. If you move it outside the timer, then callingfft_objectis a better measure.Thirdly, it's also interesting to see that even with cache turned on, the call to
numpy_fftis slower than the call toscipy_fft. Since there is no obvious difference in the code path, I suggest that is caching issue. This is the sort of issue thattimeitseeks to mitigate. Here's my proposed timing code which is more meaningful:On my machine this gives an output like:
You can do a bit better if you don't force it to copy the input array into a complex data type by changing
Q_1to becomplex128:That interesting
scipyslow-down is repeatable.That said, if your input is real, you should be doing a real transform (for >50% speed-up with
pyfftw) and manipulating the resultant complex output.What's interesting about this example is (I think) how important the cache is in the results (which I suggest is why switching to a real transform is so effective in speeding things up). You see something dramatic also when you use change the array size to 524288 (the next power of two, which you think might perhaps speed things up, but not slow it down dramatically). In this case everything slows down quite a bit,
scipyparticularly. It feels to me thatscipyis more cache sensitive, which would explain the slow down with changing the input tocomplex128(522720 is quite a nice number for FFTing though, so perhaps we should expect a slowdown).Finally, if speed is secondary to accuracy, you can always use 32-bit floats as the data type. If you combine that with doing a real transform, you get a better than factor of 10 speed-up over the initial
numpybest given above:(numpy and scipy don't change much as I think they use 64-bit floats internally).
Edit: I forgot that the Scipy's
fftpackreal FFTs have a weird output structure, whichpyfftwreplicates with some slowdown. This is changed to be more sensible in the new FFT module.The new FFT interface is implemented in pyFFTW and should be preferred. There was unfortunately a problem with the docs being rebuilt so the docs were a long time out of date and didn't show the new interface - hopefully that is fixed now.