It is said that using a closure is fast (at least in comparison with global variables).
I decided to try that out with a simple example.
Following functions adds a number 100 times, once using a closure, once with function parameters and once with a global variable.
I used a random generator, so that the compiler doesn't cheat.
using BenchmarkTools
using Random
function addclos()
rng = MersenneTwister(0)
x = 0
function ()
for _ in 1:100
x += rand(rng, 1:10)
end
x
end
end
function addfun(rng, x)
for _ in 1:100
x += rand(rng, 1:10)
end
x
end
gx = 0
function addglob(rng)
for _ in 1:100
global gx += rand(rng, 1:10)
end
gx
end
@assert addclos()() == addfun(MersenneTwister(0), 0) == addglob(MersenneTwister(0))
gx = 0
clos = addclos()
@btime clos()
# 2.597 μs (100 allocations: 1.56 KiB)
@btime addfun(MersenneTwister(0),0)
# 14.521 μs (13 allocations: 36.45 KiB)
@btime addglob(MersenneTwister(0))
# 16.939 μs (113 allocations: 38.02 KiB)
Closure scores 2.597 μs (100 allocations: 1.56 KiB)
Passing as parameters in a function scores 14.521 μs (13 allocations: 36.45 KiB)
Using a global variables scores 16.939 μs (113 allocations: 38.02 KiB)
So, with this very simplistic example it looks like using closures is the way to go.
Although I find it weird that they even overperform the traditional function style. :thinking:
It would be nice if someone could explain why.
Lastly if you have any comments or suggestions regarding closures and best practices and recommended use cases please let us know !
run on Julia stable 1.7.3
For this to be a fair comparison, I think you need
@btime addfun($(MersenneTwister(0)),0)
Oh wait, maybe that's wrong. addclos
should also include the RNG creation cost
But I think I'd still do
function addfun(x)
rng = MersenneTwister(0)
for _ in 1:100
x += rand(rng, 1:10)
end
x
end
Otherwise the RNG difference is confounded with the function call structure difference
btime clos()
doesn't include the RNG initialization, so I guess passing @btime addfun($(MersenneTwister(0)),0)
should be fine, since the evaluation already happens before benchmarking ?
e.g. by doing
clos = addclos()
@btime clos()
# 2.500 μs (100 allocations: 1.56 KiB)
@btime addfun($(MersenneTwister(0)),0)
# 426.075 ns (0 allocations: 0 bytes)
@btime addglob($(MersenneTwister(0)))
# 2.630 μs (100 allocations: 1.56 KiB)
The results are also more credible, actually suggesting that using a closure is somewhat faster than global but still way slower than function params.
putting the RNG
initialization inside the benchmarking function creates similar results:
using BenchmarkTools
using Random
function addclos()
x = 0
function ()
rng = MersenneTwister(0)
for _ in 1:100
x += rand(rng, 1:10)
end
x
end
end
function addfun(x)
rng = MersenneTwister(0)
for _ in 1:100
x += rand(rng, 1:10)
end
x
end
gx = 0
function addglob()
rng = MersenneTwister(0)
for _ in 1:100
global gx += rand(rng, 1:10)
end
gx
end
@assert addclos()() == addfun(0) == addglob()
gx = 0
clos = addclos()
@btime clos()
# 16.564 μs (113 allocations: 38.02 KiB)
@btime addfun(0)
# 14.517 μs (13 allocations: 36.45 KiB)
@btime addglob()
# 16.765 μs (113 allocations: 38.02 KiB)
which suggests that there is only a minor win by using closures. So, what's the hype ? :stuck_out_tongue:
I don't know of any "closures are better than pure functions" hype - I've never heard that claim before.
I do not exclude that this hype may be a misconception I had only inside my head. xD
To see why these are slower than you might expect, you should try @code_warntype
:
julia> @code_warntype addclos()
MethodInstance for addclos()
from addclos() in Main at REPL[6]:1
Arguments
#self#::Core.Const(addclos)
Locals
#3::var"#3#4"
x::Core.Box
Body::var"#3#4"
1 ─ (x = Core.Box())
│ Core.setfield!(x, :contents, 0)
│ (#3 = %new(Main.:(var"#3#4"), x))
└── return #3
julia> @code_warntype addfun(0)
MethodInstance for addfun(::Int64)
from addfun(x) in Main at REPL[8]:1
Arguments
#self#::Core.Const(addfun)
x@_2::Int64
Locals
@_3::Union{Nothing, Tuple{Int64, Int64}}
rng::Any
x@_5::Int64
Body::Int64
1 ─ (x@_5 = x@_2)
│ (rng = Main.MersenneTwister(0))
│ %3 = (1:100)::Core.Const(1:100)
│ (@_3 = Base.iterate(%3))
│ %5 = (@_3::Core.Const((1, 1)) === nothing)::Core.Const(false)
│ %6 = Base.not_int(%5)::Core.Const(true)
└── goto #4 if not %6
2 ┄ %8 = @_3::Tuple{Int64, Int64}
│ Core.getfield(%8, 1)
│ %10 = Core.getfield(%8, 2)::Int64
│ %11 = x@_5::Int64
│ %12 = rng::Any
│ %13 = (1:10)::Core.Const(1:10)
│ %14 = Main.rand(%12, %13)::Int64
│ (x@_5 = %11 + %14)
│ (@_3 = Base.iterate(%3, %10))
│ %17 = (@_3 === nothing)::Bool
│ %18 = Base.not_int(%17)::Bool
└── goto #4 if not %18
3 ─ goto #2
4 ┄ return x@_5
julia> @code_warntype addglob()
MethodInstance for addglob()
from addglob() in Main at REPL[2]:1
Arguments
#self#::Core.Const(addglob)
Locals
@_2::Union{Nothing, Tuple{Int64, Int64}}
rng::Any
Body::Any
1 ─ (rng = Main.MersenneTwister(0))
│ %2 = (1:100)::Core.Const(1:100)
│ (@_2 = Base.iterate(%2))
│ %4 = (@_2::Core.Const((1, 1)) === nothing)::Core.Const(false)
│ %5 = Base.not_int(%4)::Core.Const(true)
└── goto #4 if not %5
2 ┄ %7 = @_2::Tuple{Int64, Int64}
│ Core.getfield(%7, 1)
│ %9 = Core.getfield(%7, 2)::Int64
│ nothing
│ %11 = rng::Any
│ %12 = (1:10)::Core.Const(1:10)
│ %13 = Main.rand(%11, %12)::Int64
│ %14 = (Main.gx + %13)::Any
│ %15 = Core.get_binding_type(Main, :gx)::Core.Const(Any)
│ %16 = Base.convert(%15, %14)::Any
│ %17 = Core.typeassert(%16, %15)::Any
│ (Main.gx = %17)
│ (@_2 = Base.iterate(%2, %9))
│ %20 = (@_2 === nothing)::Bool
│ %21 = Base.not_int(%20)::Bool
└── goto #4 if not %21
3 ─ goto #2
4 ┄ return Main.gx
On your screen, problem areas will show in red. Here you see lots of Any
and Core.Box
| Here you see lots of Any and Core.Box
which is weird, because the closure definition looks to me as type stable as it can get.
I've heard before of a similar issue https://github.com/JuliaLang/julia/issues/15276, which maybe is to blame (?)
But on the other hand if we hit this issue with such a simple code, it seems impossible (at least not for expert users) to write type-stable closures.
If you want to use a closure, I'd probably do
function addclos2()
function f(rng)
x = 0
for _ in 1:100
x += rand(rng, 1:10)
end
x
end
f(MersenneTwister(0))
end
julia> @code_warntype addclos2()
MethodInstance for addclos2()
from addclos2() in Main at REPL[17]:1
Arguments
#self#::Core.Const(addclos2)
Locals
f::var"#f#11"
Body::Int64
1 ─ (f = %new(Main.:(var"#f#11")))
│ %2 = Main.MersenneTwister(0)::MersenneTwister
│ %3 = (f)(%2)::Int64
└── return %3
technically it is a closure, but this way you cannot pass the closured function around, e.g. to the caller or even hold some internal states (like addclos
holds a state of x
that is implicitly accesible outside the function)
I think this is the right way to think about it. First figure out what semantics you need, then look for an efficient way to do that.
That problem here is that you’re benchmarking incorrectly. When you write
clos = addclos()
@btime clos()
Then clos
is a non-constant global variable.
In this case, you should write
@btime $clos()
Or
let rclos = Ref(clos)
@btime $rclos[]()
end
Good catch, I missed that!
But yeah, it’ll still be slower due to the closure boxing issue
in this example, I think it hardly makes any difference, because the evaluation of the global variable is done only once.
But thank you for mentioning. I will keep it in mind for future best practices !
In case you didn’t see in the linked issue, you can solve the boxing problem by turning x
into a Ref
and mutating it instead of rebinding it
That is
function addclos()
rng = MersenneTwister(0)
x = Ref(0)
function ()
for _ in 1:100
x[] += rand(rng, 1:10)
end
x[]
end
end
Basically, any variable you capture in a closure should not be rebound in the closure body, it should instead be mutated.
very cool. It takes around 450ns, i.e., similar to the pure function case. When this Box issue is fixed, I can imagine it will be a game changer for substituting global variables. For example one can define different operations that can happen to the enclosed variable.
@enum operation add_op subtract_op
function process()
rng = MersenneTwister(0)
x = Ref(0)
function (op::operation)
if op == add_op
for _ in 1:100
x[] += rand(rng, 1:10)
end
elseif op == subtract_op
for _ in 1:100
x[] -= rand(rng, 1:10)
end
end
x[]
end
end
which is a decent alternative and way faster than globals.
Well, this deserves some "hype" :grinning_face_with_smiling_eyes:
Last updated: Nov 06 2024 at 04:40 UTC