closure vs function vs global performance · helpdesk (published)

It is said that using a closure is fast (at least in comparison with global variables).
I decided to try that out with a simple example.

Following functions adds a number 100 times, once using a closure, once with function parameters and once with a global variable.
I used a random generator, so that the compiler doesn't cheat.

using BenchmarkTools
using Random

function addclos()
    rng = MersenneTwister(0)
    x = 0
    function ()
        for _ in 1:100
            x += rand(rng, 1:10)
        end
        x
    end
end

function addfun(rng, x)
    for _ in 1:100
        x += rand(rng, 1:10)
    end
    x
end

gx = 0

function addglob(rng)
    for _ in 1:100
        global gx += rand(rng, 1:10)
    end
    gx
end

@assert addclos()() == addfun(MersenneTwister(0), 0) == addglob(MersenneTwister(0))

gx = 0

clos = addclos()
@btime clos()
# 2.597 μs (100 allocations: 1.56 KiB)

@btime addfun(MersenneTwister(0),0)
# 14.521 μs (13 allocations: 36.45 KiB)

@btime addglob(MersenneTwister(0))
# 16.939 μs (113 allocations: 38.02 KiB)

Closure scores 2.597 μs (100 allocations: 1.56 KiB)
Passing as parameters in a function scores 14.521 μs (13 allocations: 36.45 KiB)
Using a global variables scores 16.939 μs (113 allocations: 38.02 KiB)

So, with this very simplistic example it looks like using closures is the way to go.
Although I find it weird that they even overperform the traditional function style. :thinking:
It would be nice if someone could explain why.

Lastly if you have any comments or suggestions regarding closures and best practices and recommended use cases please let us know !

Chad Scherrer (Jun 06 2022 at 14:13):

@btime addfun($(MersenneTwister(0)),0)

Chad Scherrer (Jun 06 2022 at 14:14):

Oh wait, maybe that's wrong. addclos should also include the RNG creation cost

Chad Scherrer (Jun 06 2022 at 14:17):

function addfun(x)
    rng = MersenneTwister(0)
    for _ in 1:100
        x += rand(rng, 1:10)
    end
    x
end

Otherwise the RNG difference is confounded with the function call structure difference

Filippos Christou (Jun 06 2022 at 14:24):

btime clos() doesn't include the RNG initialization, so I guess passing @btime addfun($(MersenneTwister(0)),0) should be fine, since the evaluation already happens before benchmarking ?

clos = addclos()
@btime clos()
# 2.500 μs (100 allocations: 1.56 KiB)

@btime addfun($(MersenneTwister(0)),0)
# 426.075 ns (0 allocations: 0 bytes)

@btime addglob($(MersenneTwister(0)))
# 2.630 μs (100 allocations: 1.56 KiB)

The results are also more credible, actually suggesting that using a closure is somewhat faster than global but still way slower than function params.

Filippos Christou (Jun 06 2022 at 14:28):

putting the RNG initialization inside the benchmarking function creates similar results:

using BenchmarkTools
using Random

function addclos()
    x = 0
    function ()
        rng = MersenneTwister(0)
        for _ in 1:100
            x += rand(rng, 1:10)
        end
        x
    end
end

function addfun(x)
    rng = MersenneTwister(0)
    for _ in 1:100
        x += rand(rng, 1:10)
    end
    x
end

gx = 0

function addglob()
    rng = MersenneTwister(0)
    for _ in 1:100
        global gx += rand(rng, 1:10)
    end
    gx
end

@assert addclos()() == addfun(0) == addglob()

gx = 0

clos = addclos()
@btime clos()
# 16.564 μs (113 allocations: 38.02 KiB)

@btime addfun(0)
# 14.517 μs (13 allocations: 36.45 KiB)

@btime addglob()
# 16.765 μs (113 allocations: 38.02 KiB)

Filippos Christou (Jun 06 2022 at 14:30):

which suggests that there is only a minor win by using closures. So, what's the hype ? :stuck_out_tongue:

Chad Scherrer (Jun 06 2022 at 14:39):

I don't know of any "closures are better than pure functions" hype - I've never heard that claim before.

Filippos Christou (Jun 06 2022 at 14:42):

I do not exclude that this hype may be a misconception I had only inside my head. xD

Chad Scherrer (Jun 06 2022 at 14:42):

To see why these are slower than you might expect, you should try @code_warntype:

julia> @code_warntype addclos()
MethodInstance for addclos()
  from addclos() in Main at REPL[6]:1
Arguments
  #self#::Core.Const(addclos)
Locals
  #3::var"#3#4"
  x::Core.Box
Body::var"#3#4"
1 ─     (x = Core.Box())
│       Core.setfield!(x, :contents, 0)
│       (#3 = %new(Main.:(var"#3#4"), x))
└──     return #3

julia> @code_warntype addfun(0)
MethodInstance for addfun(::Int64)
  from addfun(x) in Main at REPL[8]:1
Arguments
  #self#::Core.Const(addfun)
  x@_2::Int64
Locals
  @_3::Union{Nothing, Tuple{Int64, Int64}}
  rng::Any
  x@_5::Int64
Body::Int64
1 ─       (x@_5 = x@_2)
│         (rng = Main.MersenneTwister(0))
│   %3  = (1:100)::Core.Const(1:100)
│         (@_3 = Base.iterate(%3))
│   %5  = (@_3::Core.Const((1, 1)) === nothing)::Core.Const(false)
│   %6  = Base.not_int(%5)::Core.Const(true)
└──       goto #4 if not %6
2 ┄ %8  = @_3::Tuple{Int64, Int64}
│         Core.getfield(%8, 1)
│   %10 = Core.getfield(%8, 2)::Int64
│   %11 = x@_5::Int64
│   %12 = rng::Any
│   %13 = (1:10)::Core.Const(1:10)
│   %14 = Main.rand(%12, %13)::Int64
│         (x@_5 = %11 + %14)
│         (@_3 = Base.iterate(%3, %10))
│   %17 = (@_3 === nothing)::Bool
│   %18 = Base.not_int(%17)::Bool
└──       goto #4 if not %18
3 ─       goto #2
4 ┄       return x@_5



julia> @code_warntype addglob()
MethodInstance for addglob()
  from addglob() in Main at REPL[2]:1
Arguments
  #self#::Core.Const(addglob)
Locals
  @_2::Union{Nothing, Tuple{Int64, Int64}}
  rng::Any
Body::Any
1 ─       (rng = Main.MersenneTwister(0))
│   %2  = (1:100)::Core.Const(1:100)
│         (@_2 = Base.iterate(%2))
│   %4  = (@_2::Core.Const((1, 1)) === nothing)::Core.Const(false)
│   %5  = Base.not_int(%4)::Core.Const(true)
└──       goto #4 if not %5
2 ┄ %7  = @_2::Tuple{Int64, Int64}
│         Core.getfield(%7, 1)
│   %9  = Core.getfield(%7, 2)::Int64
│         nothing
│   %11 = rng::Any
│   %12 = (1:10)::Core.Const(1:10)
│   %13 = Main.rand(%11, %12)::Int64
│   %14 = (Main.gx + %13)::Any
│   %15 = Core.get_binding_type(Main, :gx)::Core.Const(Any)
│   %16 = Base.convert(%15, %14)::Any
│   %17 = Core.typeassert(%16, %15)::Any
│         (Main.gx = %17)
│         (@_2 = Base.iterate(%2, %9))
│   %20 = (@_2 === nothing)::Bool
│   %21 = Base.not_int(%20)::Bool
└──       goto #4 if not %21
3 ─       goto #2
4 ┄       return Main.gx

On your screen, problem areas will show in red. Here you see lots of Any and Core.Box

Filippos Christou (Jun 06 2022 at 14:50):

which is weird, because the closure definition looks to me as type stable as it can get.

But on the other hand if we hit this issue with such a simple code, it seems impossible (at least not for expert users) to write type-stable closures.

Chad Scherrer (Jun 06 2022 at 14:56):

function addclos2()
    function f(rng)
        x = 0
        for _ in 1:100
            x += rand(rng, 1:10)
        end
        x
    end

    f(MersenneTwister(0))
end


julia> @code_warntype addclos2()
MethodInstance for addclos2()
  from addclos2() in Main at REPL[17]:1
Arguments
  #self#::Core.Const(addclos2)
Locals
  f::var"#f#11"
Body::Int64
1 ─      (f = %new(Main.:(var"#f#11")))
│   %2 = Main.MersenneTwister(0)::MersenneTwister
│   %3 = (f)(%2)::Int64
└──      return %3

Filippos Christou (Jun 06 2022 at 15:04):

technically it is a closure, but this way you cannot pass the closured function around, e.g. to the caller or even hold some internal states (like addclos holds a state of x that is implicitly accesible outside the function)

Chad Scherrer (Jun 06 2022 at 15:05):

I think this is the right way to think about it. First figure out what semantics you need, then look for an efficient way to do that.

Mason Protter (Jun 06 2022 at 15:30):

clos = addclos()
@btime clos()

Mason Protter (Jun 06 2022 at 15:31):

let rclos = Ref(clos)
    @btime $rclos[]()
end

Chad Scherrer (Jun 06 2022 at 15:32):

Mason Protter (Jun 06 2022 at 15:33):

Filippos Christou (Jun 06 2022 at 17:08):

in this example, I think it hardly makes any difference, because the evaluation of the global variable is done only once.
But thank you for mentioning. I will keep it in mind for future best practices !

Mason Protter (Jun 06 2022 at 18:22):

In case you didn’t see in the linked issue, you can solve the boxing problem by turning x into a Ref and mutating it instead of rebinding it

Mason Protter (Jun 06 2022 at 18:23):

function addclos()
    rng = MersenneTwister(0)
    x = Ref(0)
    function ()
        for _ in 1:100
            x[] += rand(rng, 1:10)
        end
        x[]
    end
end

Mason Protter (Jun 06 2022 at 18:24):

Basically, any variable you capture in a closure should not be rebound in the closure body, it should instead be mutated.

Filippos Christou (Jun 07 2022 at 06:04):

very cool. It takes around 450ns, i.e., similar to the pure function case. When this Box issue is fixed, I can imagine it will be a game changer for substituting global variables. For example one can define different operations that can happen to the enclosed variable.

@enum operation add_op subtract_op
function process()
    rng = MersenneTwister(0)
    x = Ref(0)
    function (op::operation)
        if op == add_op
            for _ in 1:100
                x[] += rand(rng, 1:10)
            end
        elseif op == subtract_op
            for _ in 1:100
                x[] -= rand(rng, 1:10)
            end
        end
        x[]
    end
end

which is a decent alternative and way faster than globals.
Well, this deserves some "hype" :grinning_face_with_smiling_eyes:

Stream: helpdesk (published)

Topic: closure vs function vs global performance

Filippos Christou (Jun 06 2022 at 14:11):