Stream: helpdesk (published)

Topic: How to reach Matlab speed for FFT?


view this post on Zulip Alexandre Guillet (May 21 2023 at 15:49):

Hello,
I try to reach the speed of Matlab's FFT in Julia. By default:

r=randn(2^10,2^10)
using FFTW
@time s=fft(r)

it is slower by more than one order of magnitude.
I noticed I could set the number of thread:

FFTW.set_num_threads(16)

and use the real version of the fft: rfft(), and eventually compute the fft plan:

 pr=plan_rfft(randn(2^10,2^10); flags=FFTW.PATIENT,timelimit=Inf);
@time s=pr*r

...to get much closer to Matlab's fft speed (measured with tic toc). However, it remains almost always slower than Matlab (that has very few variations in successive computation times, as compared to Julia).
Do I still miss some optimization (that Matlab apparently does under the hood)?
Thank you!

view this post on Zulip Alec (May 21 2023 at 19:50):

One relatively uninformed idea: Matlab may ship with Intel binaries for some math libraries and high Julia does not because open source. See MKL.jl for faster intel performance.

view this post on Zulip Alexandre Guillet (May 21 2023 at 22:09):

I use an AMD CPU if that matters.

view this post on Zulip Michael Abbott (May 21 2023 at 22:28):

You might be looking for the big note here: https://juliamath.github.io/AbstractFFTs.jl/stable/api/#AbstractFFTs.fft

This performs a multidimensional FFT by default. FFT libraries in other languages such as Python and Octave perform a one-dimensional FFT along the first non-singleton dimension of the array. This is worth noting while performing comparisons.

Malab says:

If X is a matrix, then fft(X) treats the columns of X as vectors and returns the Fourier transform of each column.

view this post on Zulip Alexandre Guillet (May 22 2023 at 08:41):

Indeed, that's important to specify that I am using the 2-dimensional fft in Matlab:

r=randn(2^10);
tic; s=fft2(r); toc;

When drawing the histogram of timings for the regular (complex) 2D FFT, I get this significant difference, even though I use plan_fft:
FFT_Julia_Matlab_speed.png

However, when I use Julia's rfft (real version) against Matlab's regular fft2 (complex), it looks more reassuring that yesterday, for some reason (less memory allocation to Matlab process?)
RFFT_Julia_Matlab_speed.png

view this post on Zulip Michael Abbott (May 22 2023 at 12:59):

But r=randn(2^10) is a vector, while above it was a matrix? I'm confused, but I suggest you make very sure (on small arrays) that you are computing the same thing.

view this post on Zulip Sebastian Pfitzner (May 22 2023 at 13:54):

no, randn(N) returns a NxN matrix in Matlab

view this post on Zulip Martin D. Maas (May 23 2023 at 23:25):

Oh, beware that the Matlab's 2d FFT and Julia's 2d FFT might not be the same thing

view this post on Zulip Martin D. Maas (May 23 2023 at 23:26):

One could be a true 2d transform, and the other just an array of 1d transforms

view this post on Zulip Alexandre Guillet (May 23 2023 at 23:34):

OK, I can give some more insights.
Actually, what I have measured and plotted seems to depend quite a lot on the computation history, certainly through memory and CPU states. After many joint attempts, Julia's standard (complex) 2D fft without and with plan_fft keep up quite well with Matlab. I don't know whether Matlab uses a garbage collector, but after many runs, the timing distributions for both Julia and Matlab are quite close and bimodal (supposedly runs without garbage collection and runs with it...).

Here are the differences: Julia's fft has a delay (overhead) of 1ms as compared to Julia's plan_fft (about 10%), that achieves the same absolute minimal timing as Matlab (6ms). However, Julia's plan_fft (complex 2D) is 17% slower in average than Matlab's (complex 2D) fft. It could mean that garbage collection is quicker in Matlab.
Also, for this same computation, Julia uses 66% of the total CPU resources while Matlab takes only 41% (both using all 16 threads). Here again, Matlab's old age is a synonym of efficiency.

Satisfyingly, Julia's plan_rfft (for real 2D input) is 4 to 5 times faster (min 1.4ms, mean 1.8ms) than Matlab's always complex fft ;-) (At least today... A fresh run of Julia seems slower while a fresh run in Matlab seems faster.)


Last updated: Nov 22 2024 at 04:41 UTC