There's a particular pattern I use frequently, but I've realized that it creates some type instability (due to a closure, IIUC). That pattern is to declare some helper variables, and then map
over a range/collection and return a vector of a custom struct.
Here's a MWE:
struct D{A,B}
a::A
b::B
end
function foo(n)
x = 0.
map(1:n) do i
x += exp(i)
D(x,i)
end
end
I could do something like the following:
function foo2(n)
x = 0.
output = Vector{D{Float64,Int64}}(undef,n)
for i in 1:n
x += exp(i)
output[i] = D(x,i)
end
output
end
But:
map
approach is a lot cleaner.D
can be a pretty complex parametric type and pre-allocating the right concrete output type can be difficult manage.Are there any other patterns or ways to avoid the type instability that don't require a lot of forethought about pre-allocating the right output container?
(My MWE is a simplified version of the (d::DeferredAnnuity)(curve)
function from my new blogpost here.)
Taking your question at face value, the reason for the type instability is that x
is defined in one function but mutated in another (in the closure). The compiler then is not able to infer the type of x
. Some ways to fix that:
declare the type of x
, like x::Float64 = 0.
make x
something like a Ref{Float64}
, or Box{Float64}
(from ZeroDimensionalArrays.jl)
However I think there may be other problems with your code.
The Transducers.jl way would be to write
using Transducers
foo(n) = 1:n |>
Scan((0, nothing)) do (x, _), i
x′ = x + exp(i)
x′, D(x′,i)
end |> Map(last) |> collect
Or if we're allowed to use the fact that x
is stored in the previous iteration's .a
field, you could write it with Base.accumulate
:
foo(n) = accumulate(1:n; init=D(0.0, 0)) do (; a,), i
x = a + exp(i)
D(x,i)
end
I only brought in Transducers.jl for the composition of Map
with Scan
. I guess you could also do it with Iterators.accumulate
/ Iterators.map
.
Whenever I write something clever like this with accumulate, I'm proud of myself, later horrified when I have to read it...
Ref
etc got mentioned, but someone ought to say that technically map
claims not to guarantee order.
Another possible pattern is just to push!
repeatedly:
julia> function foo_ref(n)
x = Ref(0.0)
map(1:n) do i
x[] += exp(i)
D(x[],i)
end
end
foo_ref (generic function with 1 method)
julia> function foo_1(n)
i = 1
x = exp(i)
out = sizehint!([D(x,i)], n) # sizehint! optional
for i in 2:n
x += exp(i)
push!(out, D(x, i))
end
out
end
foo_1 (generic function with 1 method)
Of course it's a bit awful to have to write the loop body twice. Sometimes I wish we had a macro @pushalloc for i in 1:n
which would expand to something like this -- separate the first iteration, find every push!
and allocate space using the first time's types. But I never wrote one.
Of course it's a bit awful to have to write the loop body twice. Sometimes I wish we had a macro
@pushalloc for i in 1:n
which would expand to something like this -- separate the first iteration, find everypush!
and allocate space using the first time's types. But I never wrote one.
What I use for stuff like this is an @unroll
macro:
macro unroll(N::Int, loop)
Base.isexpr(loop, :for) || error("only works on for loops")
Base.isexpr(loop.args[1], :(=)) || error("This loop pattern isn't supported")
val, itr = esc.(loop.args[1].args)
body = esc(loop.args[2])
@gensym loopend
label = :(@label $loopend)
goto = :(@goto $loopend)
out = Expr(:block, :(itr = $itr), :(next = iterate(itr)))
unrolled = map(1:N) do _
quote
isnothing(next) && @goto loopend
$val, state = next
$body
next = iterate(itr, state)
end
end
append!(out.args, unrolled)
remainder = quote
while !isnothing(next)
$val, state = next
$body
next = iterate(itr, state)
end
@label loopend
end
push!(out.args, remainder)
out
end
Combined with BangBang / MicroCollections, you can avoid type instability and still be generic:
julia> using BangBang, MicroCollections
julia> function foo_bang(n)
out = UndefVector{Union{}}(n)
x = 0.0
@unroll 2 for i ∈ 1:n
x += exp(i)
out = setindex!!(out, D(x, i), i)
end
out
end;
julia> @btime foo_bang(4)
11.934 ns (2 allocations: 128 bytes)
4-element Vector{D{Float64, Int64}}:
D{Float64, Int64}(2.718281828459045, 1)
D{Float64, Int64}(10.107337927389695, 2)
D{Float64, Int64}(30.19287485057736, 3)
D{Float64, Int64}(84.7910248837216, 4)
Had a go at writing the dumb one I imagined:
julia> function foo_mac(n)
x = 0.0
@pusher out = for i in 1:n
x += exp(i)
push!(out, D(x, i))
end
end
foo_mac (generic function with 1 method)
julia> foo_mac(3)
3-element Vector{D{Float64, Int64}}:
D{Float64, Int64}(2.718281828459045, 1)
D{Float64, Int64}(10.107337927389695, 2)
D{Float64, Int64}(30.19287485057736, 3)
defined:
"""
@pusher out = for i in ...
Looks for `push!(out, x)` in the body of a given `for` loop,
and replaces the first iteration with `out = [x]` to create the vector.
Not very smart, may be fooled by nested loops, and by `push!(out, x, y)`.
"""
macro pusher(ex)
Meta.isexpr(ex, :(=)) || error("expects @pusher out = for i in ...")
out, loop = ex.args
out isa Symbol || error("expects @pusher out = for i in ...")
Meta.isexpr(loop, :for) || error("expects @pusher out = for i in ...")
top, body = loop.args
Meta.isexpr(top, :(=)) || error("expects @pusher out = for i in ...")
index, iter = top.args
@gensym rest
firstbody = deepcopy(body)
_firstbody!(firstbody, out)
quote
$index, $rest = $Iterators.peel($iter)
$firstbody
for $index in $rest
$body
end
$out
end |> esc
end
function _firstbody!(ex, out::Symbol)
if Meta.isexpr(ex, :call) && ex.args[1] == :push! && ex.args[2] == out
rhs = ex.args[3]
ex.head = :(=) # apparently you can mutate this
empty!(ex.args)
push!(ex.args, out, Expr(:vect, rhs))
elseif ex isa Expr
foreach(x -> _firstbody!(x, out), ex.args)
end
end
Yours is more clever about allocating exactly the right amount of space, thanks to UndefVector{Union{}}(n)
I guess:
julia> @btime foo_bang(100); # same as @btime foo2(100); with Vector{}(undef, ...)
415.415 ns (2 allocations: 1.62 KiB)
julia> @btime foo_mac(100); # repeated simple push!
636.364 ns (6 allocations: 3.98 KiB)
julia> @btime foo_1(100); # sizehint! then push!... could be added to @pusher macro
487.031 ns (3 allocations: 1.67 KiB)
julia> @btime foo(100); # original map with Core.Box
19.417 μs (307 allocations: 8.00 KiB)
I get the same performance if I delete the macro in foo_bang(n)
, i.e. just call out = setindex!!(out, D(x, i), i)
in the plain loop without unrolling.
that probably works out due to small union optimizations and branch prediction
the reason for the unroll is to make sure the loop body is able to know the concrete type of out
I guess that's right. Both versions infer to return out::Union{Vector{D{Float64, Int64}}, UndefVector{...}}
but inside the loop the unrolled one should be more certain.
julia> foo(0)
Any[]
julia> foo_bang(0)
0-element UndefVector{Union{}}(0)
julia> foo_mac(0)
ERROR: MethodError: no method matching iterate(::Nothing)
yeah. Might want a length check to eliminate the 0-arg path and make the final return concrete.
Y'all are amazingly helpful as always! I updated the blog post with the more advanced unrolling version which was 3x faster on a large AD calculation after making the function type stable.
Last updated: Jul 01 2025 at 04:54 UTC