How to reach Matlab speed for FFT? · helpdesk (published)

r=randn(2^10,2^10)
using FFTW
@time s=fft(r)

it is slower by more than one order of magnitude.
I noticed I could set the number of thread:

FFTW.set_num_threads(16)

and use the real version of the fft: rfft(), and eventually compute the fft plan:

 pr=plan_rfft(randn(2^10,2^10); flags=FFTW.PATIENT,timelimit=Inf);
@time s=pr*r

...to get much closer to Matlab's fft speed (measured with tic toc). However, it remains almost always slower than Matlab (that has very few variations in successive computation times, as compared to Julia).
Do I still miss some optimization (that Matlab apparently does under the hood)?
Thank you!

Alec (May 21 2023 at 19:50):

One relatively uninformed idea: Matlab may ship with Intel binaries for some math libraries and high Julia does not because open source. See MKL.jl for faster intel performance.

Alexandre Guillet (May 21 2023 at 22:09):

Michael Abbott (May 21 2023 at 22:28):

Alexandre Guillet (May 22 2023 at 08:41):

Indeed, that's important to specify that I am using the 2-dimensional fft in Matlab:

r=randn(2^10);
tic; s=fft2(r); toc;

When drawing the histogram of timings for the regular (complex) 2D FFT, I get this significant difference, even though I use plan_fft:
FFT_Julia_Matlab_speed.png

However, when I use Julia's rfft (real version) against Matlab's regular fft2 (complex), it looks more reassuring that yesterday, for some reason (less memory allocation to Matlab process?)
RFFT_Julia_Matlab_speed.png

Michael Abbott (May 22 2023 at 12:59):

But r=randn(2^10) is a vector, while above it was a matrix? I'm confused, but I suggest you make very sure (on small arrays) that you are computing the same thing.

Sebastian Pfitzner (May 22 2023 at 13:54):

Martin D. Maas (May 23 2023 at 23:25):

Oh, beware that the Matlab's 2d FFT and Julia's 2d FFT might not be the same thing

Martin D. Maas (May 23 2023 at 23:26):

Alexandre Guillet (May 23 2023 at 23:34):

OK, I can give some more insights.
Actually, what I have measured and plotted seems to depend quite a lot on the computation history, certainly through memory and CPU states. After many joint attempts, Julia's standard (complex) 2D fft without and with plan_fft keep up quite well with Matlab. I don't know whether Matlab uses a garbage collector, but after many runs, the timing distributions for both Julia and Matlab are quite close and bimodal (supposedly runs without garbage collection and runs with it...).

Here are the differences: Julia's fft has a delay (overhead) of 1ms as compared to Julia's plan_fft (about 10%), that achieves the same absolute minimal timing as Matlab (6ms). However, Julia's plan_fft (complex 2D) is 17% slower in average than Matlab's (complex 2D) fft. It could mean that garbage collection is quicker in Matlab.
Also, for this same computation, Julia uses 66% of the total CPU resources while Matlab takes only 41% (both using all 16 threads). Here again, Matlab's old age is a synonym of efficiency.

Satisfyingly, Julia's plan_rfft (for real 2D input) is 4 to 5 times faster (min 1.4ms, mean 1.8ms) than Matlab's always complex fft ;-) (At least today... A fresh run of Julia seems slower while a fresh run in Matlab seems faster.)

Stream: helpdesk (published)

Topic: How to reach Matlab speed for FFT?

Alexandre Guillet (May 21 2023 at 15:49):