Hello,
I try to reach the speed of Matlab's FFT in Julia. By default:
r=randn(2^10,2^10)
using FFTW
@time s=fft(r)
it is slower by more than one order of magnitude.
I noticed I could set the number of thread:
FFTW.set_num_threads(16)
and use the real version of the fft: rfft()
, and eventually compute the fft plan:
pr=plan_rfft(randn(2^10,2^10); flags=FFTW.PATIENT,timelimit=Inf);
@time s=pr*r
...to get much closer to Matlab's fft speed (measured with tic toc). However, it remains almost always slower than Matlab (that has very few variations in successive computation times, as compared to Julia).
Do I still miss some optimization (that Matlab apparently does under the hood)?
Thank you!
One relatively uninformed idea: Matlab may ship with Intel binaries for some math libraries and high Julia does not because open source. See MKL.jl for faster intel performance.
I use an AMD CPU if that matters.
You might be looking for the big note here: https://juliamath.github.io/AbstractFFTs.jl/stable/api/#AbstractFFTs.fft
This performs a multidimensional FFT by default. FFT libraries in other languages such as Python and Octave perform a one-dimensional FFT along the first non-singleton dimension of the array. This is worth noting while performing comparisons.
Malab says:
If X is a matrix, then fft(X) treats the columns of X as vectors and returns the Fourier transform of each column.
Indeed, that's important to specify that I am using the 2-dimensional fft in Matlab:
r=randn(2^10);
tic; s=fft2(r); toc;
When drawing the histogram of timings for the regular (complex) 2D FFT, I get this significant difference, even though I use plan_fft:
FFT_Julia_Matlab_speed.png
However, when I use Julia's rfft (real version) against Matlab's regular fft2 (complex), it looks more reassuring that yesterday, for some reason (less memory allocation to Matlab process?)
RFFT_Julia_Matlab_speed.png
But r=randn(2^10)
is a vector, while above it was a matrix? I'm confused, but I suggest you make very sure (on small arrays) that you are computing the same thing.
no, randn(N)
returns a NxN matrix in Matlab
Oh, beware that the Matlab's 2d FFT and Julia's 2d FFT might not be the same thing
One could be a true 2d transform, and the other just an array of 1d transforms
OK, I can give some more insights.
Actually, what I have measured and plotted seems to depend quite a lot on the computation history, certainly through memory and CPU states. After many joint attempts, Julia's standard (complex) 2D fft without and with plan_fft keep up quite well with Matlab. I don't know whether Matlab uses a garbage collector, but after many runs, the timing distributions for both Julia and Matlab are quite close and bimodal (supposedly runs without garbage collection and runs with it...).
Here are the differences: Julia's fft has a delay (overhead) of 1ms as compared to Julia's plan_fft (about 10%), that achieves the same absolute minimal timing as Matlab (6ms). However, Julia's plan_fft (complex 2D) is 17% slower in average than Matlab's (complex 2D) fft. It could mean that garbage collection is quicker in Matlab.
Also, for this same computation, Julia uses 66% of the total CPU resources while Matlab takes only 41% (both using all 16 threads). Here again, Matlab's old age is a synonym of efficiency.
Satisfyingly, Julia's plan_rfft (for real 2D input) is 4 to 5 times faster (min 1.4ms, mean 1.8ms) than Matlab's always complex fft ;-) (At least today... A fresh run of Julia seems slower while a fresh run in Matlab seems faster.)
Last updated: Nov 22 2024 at 04:41 UTC