Stream: helpdesk (published)

Topic: closure vs function vs global performance


view this post on Zulip Filippos Christou (Jun 06 2022 at 14:11):

It is said that using a closure is fast (at least in comparison with global variables).
I decided to try that out with a simple example.

Following functions adds a number 100 times, once using a closure, once with function parameters and once with a global variable.
I used a random generator, so that the compiler doesn't cheat.

using BenchmarkTools
using Random

function addclos()
    rng = MersenneTwister(0)
    x = 0
    function ()
        for _ in 1:100
            x += rand(rng, 1:10)
        end
        x
    end
end

function addfun(rng, x)
    for _ in 1:100
        x += rand(rng, 1:10)
    end
    x
end

gx = 0

function addglob(rng)
    for _ in 1:100
        global gx += rand(rng, 1:10)
    end
    gx
end

@assert addclos()() == addfun(MersenneTwister(0), 0) == addglob(MersenneTwister(0))

gx = 0

clos = addclos()
@btime clos()
# 2.597 μs (100 allocations: 1.56 KiB)

@btime addfun(MersenneTwister(0),0)
# 14.521 μs (13 allocations: 36.45 KiB)

@btime addglob(MersenneTwister(0))
# 16.939 μs (113 allocations: 38.02 KiB)

Closure scores 2.597 μs (100 allocations: 1.56 KiB)
Passing as parameters in a function scores 14.521 μs (13 allocations: 36.45 KiB)
Using a global variables scores 16.939 μs (113 allocations: 38.02 KiB)

So, with this very simplistic example it looks like using closures is the way to go.
Although I find it weird that they even overperform the traditional function style. :thinking:
It would be nice if someone could explain why.

Lastly if you have any comments or suggestions regarding closures and best practices and recommended use cases please let us know !

run on Julia stable 1.7.3

view this post on Zulip Chad Scherrer (Jun 06 2022 at 14:13):

For this to be a fair comparison, I think you need

@btime addfun($(MersenneTwister(0)),0)

view this post on Zulip Chad Scherrer (Jun 06 2022 at 14:14):

Oh wait, maybe that's wrong. addclos should also include the RNG creation cost

view this post on Zulip Chad Scherrer (Jun 06 2022 at 14:17):

But I think I'd still do

function addfun(x)
    rng = MersenneTwister(0)
    for _ in 1:100
        x += rand(rng, 1:10)
    end
    x
end

Otherwise the RNG difference is confounded with the function call structure difference

view this post on Zulip Filippos Christou (Jun 06 2022 at 14:24):

btime clos() doesn't include the RNG initialization, so I guess passing @btime addfun($(MersenneTwister(0)),0) should be fine, since the evaluation already happens before benchmarking ?

e.g. by doing

clos = addclos()
@btime clos()
# 2.500 μs (100 allocations: 1.56 KiB)

@btime addfun($(MersenneTwister(0)),0)
# 426.075 ns (0 allocations: 0 bytes)

@btime addglob($(MersenneTwister(0)))
# 2.630 μs (100 allocations: 1.56 KiB)

The results are also more credible, actually suggesting that using a closure is somewhat faster than global but still way slower than function params.

view this post on Zulip Filippos Christou (Jun 06 2022 at 14:28):

putting the RNG initialization inside the benchmarking function creates similar results:

using BenchmarkTools
using Random

function addclos()
    x = 0
    function ()
        rng = MersenneTwister(0)
        for _ in 1:100
            x += rand(rng, 1:10)
        end
        x
    end
end

function addfun(x)
    rng = MersenneTwister(0)
    for _ in 1:100
        x += rand(rng, 1:10)
    end
    x
end

gx = 0

function addglob()
    rng = MersenneTwister(0)
    for _ in 1:100
        global gx += rand(rng, 1:10)
    end
    gx
end

@assert addclos()() == addfun(0) == addglob()

gx = 0

clos = addclos()
@btime clos()
# 16.564 μs (113 allocations: 38.02 KiB)

@btime addfun(0)
# 14.517 μs (13 allocations: 36.45 KiB)

@btime addglob()
# 16.765 μs (113 allocations: 38.02 KiB)

view this post on Zulip Filippos Christou (Jun 06 2022 at 14:30):

which suggests that there is only a minor win by using closures. So, what's the hype ? :stuck_out_tongue:

view this post on Zulip Chad Scherrer (Jun 06 2022 at 14:39):

I don't know of any "closures are better than pure functions" hype - I've never heard that claim before.

view this post on Zulip Filippos Christou (Jun 06 2022 at 14:42):

I do not exclude that this hype may be a misconception I had only inside my head. xD

view this post on Zulip Chad Scherrer (Jun 06 2022 at 14:42):

To see why these are slower than you might expect, you should try @code_warntype:

julia> @code_warntype addclos()
MethodInstance for addclos()
  from addclos() in Main at REPL[6]:1
Arguments
  #self#::Core.Const(addclos)
Locals
  #3::var"#3#4"
  x::Core.Box
Body::var"#3#4"
1      (x = Core.Box())
       Core.setfield!(x, :contents, 0)
       (#3 = %new(Main.:(var"#3#4"), x))
└──     return #3

julia> @code_warntype addfun(0)
MethodInstance for addfun(::Int64)
  from addfun(x) in Main at REPL[8]:1
Arguments
  #self#::Core.Const(addfun)
  x@_2::Int64
Locals
  @_3::Union{Nothing, Tuple{Int64, Int64}}
  rng::Any
  x@_5::Int64
Body::Int64
1        (x@_5 = x@_2)
         (rng = Main.MersenneTwister(0))
   %3  = (1:100)::Core.Const(1:100)
         (@_3 = Base.iterate(%3))
   %5  = (@_3::Core.Const((1, 1)) === nothing)::Core.Const(false)
   %6  = Base.not_int(%5)::Core.Const(true)
└──       goto #4 if not %6
2  %8  = @_3::Tuple{Int64, Int64}
         Core.getfield(%8, 1)
   %10 = Core.getfield(%8, 2)::Int64
   %11 = x@_5::Int64
   %12 = rng::Any
   %13 = (1:10)::Core.Const(1:10)
   %14 = Main.rand(%12, %13)::Int64
         (x@_5 = %11 + %14)
         (@_3 = Base.iterate(%3, %10))
   %17 = (@_3 === nothing)::Bool
   %18 = Base.not_int(%17)::Bool
└──       goto #4 if not %18
3        goto #2
4        return x@_5



julia> @code_warntype addglob()
MethodInstance for addglob()
  from addglob() in Main at REPL[2]:1
Arguments
  #self#::Core.Const(addglob)
Locals
  @_2::Union{Nothing, Tuple{Int64, Int64}}
  rng::Any
Body::Any
1        (rng = Main.MersenneTwister(0))
   %2  = (1:100)::Core.Const(1:100)
         (@_2 = Base.iterate(%2))
   %4  = (@_2::Core.Const((1, 1)) === nothing)::Core.Const(false)
   %5  = Base.not_int(%4)::Core.Const(true)
└──       goto #4 if not %5
2  %7  = @_2::Tuple{Int64, Int64}
         Core.getfield(%7, 1)
   %9  = Core.getfield(%7, 2)::Int64
         nothing
   %11 = rng::Any
   %12 = (1:10)::Core.Const(1:10)
   %13 = Main.rand(%11, %12)::Int64
   %14 = (Main.gx + %13)::Any
   %15 = Core.get_binding_type(Main, :gx)::Core.Const(Any)
   %16 = Base.convert(%15, %14)::Any
   %17 = Core.typeassert(%16, %15)::Any
         (Main.gx = %17)
         (@_2 = Base.iterate(%2, %9))
   %20 = (@_2 === nothing)::Bool
   %21 = Base.not_int(%20)::Bool
└──       goto #4 if not %21
3        goto #2
4        return Main.gx

On your screen, problem areas will show in red. Here you see lots of Any and Core.Box

view this post on Zulip Filippos Christou (Jun 06 2022 at 14:50):

| Here you see lots of Any and Core.Box

which is weird, because the closure definition looks to me as type stable as it can get.

I've heard before of a similar issue https://github.com/JuliaLang/julia/issues/15276, which maybe is to blame (?)

But on the other hand if we hit this issue with such a simple code, it seems impossible (at least not for expert users) to write type-stable closures.

view this post on Zulip Chad Scherrer (Jun 06 2022 at 14:56):

If you want to use a closure, I'd probably do

function addclos2()
    function f(rng)
        x = 0
        for _ in 1:100
            x += rand(rng, 1:10)
        end
        x
    end

    f(MersenneTwister(0))
end


julia> @code_warntype addclos2()
MethodInstance for addclos2()
  from addclos2() in Main at REPL[17]:1
Arguments
  #self#::Core.Const(addclos2)
Locals
  f::var"#f#11"
Body::Int64
1       (f = %new(Main.:(var"#f#11")))
   %2 = Main.MersenneTwister(0)::MersenneTwister
   %3 = (f)(%2)::Int64
└──      return %3

view this post on Zulip Filippos Christou (Jun 06 2022 at 15:04):

technically it is a closure, but this way you cannot pass the closured function around, e.g. to the caller or even hold some internal states (like addclos holds a state of x that is implicitly accesible outside the function)

view this post on Zulip Chad Scherrer (Jun 06 2022 at 15:05):

I think this is the right way to think about it. First figure out what semantics you need, then look for an efficient way to do that.

view this post on Zulip Mason Protter (Jun 06 2022 at 15:30):

That problem here is that you’re benchmarking incorrectly. When you write

clos = addclos()
@btime clos()

Then clos is a non-constant global variable.

view this post on Zulip Mason Protter (Jun 06 2022 at 15:31):

In this case, you should write

@btime $clos()

Or

let rclos = Ref(clos)
    @btime $rclos[]()
end

view this post on Zulip Chad Scherrer (Jun 06 2022 at 15:32):

Good catch, I missed that!

view this post on Zulip Mason Protter (Jun 06 2022 at 15:33):

But yeah, it’ll still be slower due to the closure boxing issue

view this post on Zulip Filippos Christou (Jun 06 2022 at 17:08):

in this example, I think it hardly makes any difference, because the evaluation of the global variable is done only once.
But thank you for mentioning. I will keep it in mind for future best practices !

view this post on Zulip Mason Protter (Jun 06 2022 at 18:22):

In case you didn’t see in the linked issue, you can solve the boxing problem by turning x into a Ref and mutating it instead of rebinding it

view this post on Zulip Mason Protter (Jun 06 2022 at 18:23):

That is

function addclos()
    rng = MersenneTwister(0)
    x = Ref(0)
    function ()
        for _ in 1:100
            x[] += rand(rng, 1:10)
        end
        x[]
    end
end

view this post on Zulip Mason Protter (Jun 06 2022 at 18:24):

Basically, any variable you capture in a closure should not be rebound in the closure body, it should instead be mutated.

view this post on Zulip Filippos Christou (Jun 07 2022 at 06:04):

very cool. It takes around 450ns, i.e., similar to the pure function case. When this Box issue is fixed, I can imagine it will be a game changer for substituting global variables. For example one can define different operations that can happen to the enclosed variable.

@enum operation add_op subtract_op
function process()
    rng = MersenneTwister(0)
    x = Ref(0)
    function (op::operation)
        if op == add_op
            for _ in 1:100
                x[] += rand(rng, 1:10)
            end
        elseif op == subtract_op
            for _ in 1:100
                x[] -= rand(rng, 1:10)
            end
        end
        x[]
    end
end

which is a decent alternative and way faster than globals.
Well, this deserves some "hype" :grinning_face_with_smiling_eyes:


Last updated: Oct 02 2023 at 04:34 UTC