Nicer performant syntax for replacing elements conditionally · helpdesk (published)

x[x .<= maximum(x)/4] .= 0.0

cutoff = maximum(x)/4
for i in eachindex(x)
    if x[i] <= cutoff
        x[i] = 0.0
    end
end

I see about a 2x performance hit from the first and 3 allocations (which surprised me, I was expecting one from the comparison)

Sukera (Jan 05 2024 at 11:56):

Mason Protter (Jan 05 2024 at 12:33):

Mason Protter (Jan 05 2024 at 12:34):

x[(i for i in eachindex(x) if x[i] <= maximum(x)/4)] .= 0.0

Mason Protter (Jan 05 2024 at 12:46):

x .= ifelse.(x .<= maximum(x)/4, 0.0, x)

Mason Protter (Jan 05 2024 at 12:47):

julia> f1(x) = x[x .<= maximum(x)/4] .= 0.0;

julia> f2(x) = let cutoff=maximum(x)/4
           for i in eachindex(x)
               if x[i] <= cutoff
                   x[i] = 0.0
               end
           end
           x
       end;

julia> f3(x) = x .= ifelse.(x .<= maximum(x)/4, 0.0, x);

julia> for N ∈ (100, 1000, 10000)
           @show N
           x = rand(N)
           print("f1  "); @btime f1(x) setup=(x = rand($N)) evals=1
           print("f2  "); @btime f2(x) setup=(x = rand($N)) evals=1
           print("f3  "); @btime f3(x) setup=(x = rand($N)) evals=1
       end
N = 100
f1    290.000 ns (3 allocations: 256 bytes)
f2    130.000 ns (0 allocations: 0 bytes)
f3    110.000 ns (0 allocations: 0 bytes)
N = 1000
f1    1.012 μs (4 allocations: 6.17 KiB)
f2    671.000 ns (0 allocations: 0 bytes)
f3    401.000 ns (0 allocations: 0 bytes)
N = 10000
f1    6.723 μs (5 allocations: 23.53 KiB)
f2    4.799 μs (0 allocations: 0 bytes)
f3    2.174 μs (0 allocations: 0 bytes)

Nils (Jan 05 2024 at 13:34):

Nils (Jan 05 2024 at 13:37):

Fwiw I don't see a speedup, so might be CPU dependent - in any case it's at least as fast and shorter and cleaner, so wins all round, thank you!

Mason Protter (Jan 05 2024 at 13:46):

yeah, the relative performance of ifelse versus actual branching can be very CPU dependant

Mason Protter (Jan 05 2024 at 13:49):

ifelse tends to be easier to make work with SIMD, but the branch-predictor can make branching more efficient

Mason Protter (Jan 05 2024 at 13:49):

could also be julia-version dependant, since it'll depend a lot on various low-level optimizations LLVM may or may not make

chriselrod (Jan 05 2024 at 15:08):

Yeah, for me f2 and f3 perform about the same, both of them SIMDing thanks to LLVM.

AMJ (Jan 05 2024 at 16:14):

On my machine, if I am understanding correctly(which is a big if), f3 creates 2 main branches which both are heavily SIMDed, while f2 has one SIMDed branch with fewer instructions. So probably a better vectorization plus a better branch prediction pattern?

Stream: helpdesk (published)

Topic: Nicer performant syntax for replacing elements conditionally

Nils (Jan 05 2024 at 11:28):

Sukera (Jan 05 2024 at 11:56):

Sukera (Jan 05 2024 at 11:56):

Sukera (Jan 05 2024 at 11:56):

Mason Protter (Jan 05 2024 at 12:33):

Mason Protter (Jan 05 2024 at 12:34):

Mason Protter (Jan 05 2024 at 12:46):

Mason Protter (Jan 05 2024 at 12:47):

Nils (Jan 05 2024 at 13:34):

Nils (Jan 05 2024 at 13:37):

Mason Protter (Jan 05 2024 at 13:46):

Mason Protter (Jan 05 2024 at 13:49):

Mason Protter (Jan 05 2024 at 13:49):

chriselrod (Jan 05 2024 at 15:08):

AMJ (Jan 05 2024 at 16:14):