Is there a quick and dirty way to see where compilation/inference time is being spent when running a particular script? I feel like I've done this before, but I can't remember how. Is the answer just using Snoopcompile?
I didn't realize y'all had created #helpdesk (published) in addition to this stream since I last visited zulip. Could a mod please move this message to that stream?
This topic was moved here from #helpdesk > Compilation/Inference time by Mason Protter
Welcome back @Adam non-jedi Beckmeyer! Yeah, unfortunately this is kinda a hard thing to profile right now. SnoopCompile.jl has the best tools for this as far as I'm aware
It won't necessarily help you find which functions the compiler spends the most time on necessarily, but it'll definitely help identify things like invalidation which lead to excessive re-compilation
Thanks. Guess I'll have to dig back into how snoopcompile works. This script has just a handful of new functions (none are methods of existing functions, so no invalidations afaik), so it's very confusing that it's spending over a .75 seconds on compilation/inference (compared to 5.5 second total runtime).
Hmm, interesting. Are you loading packages? It could be that you are loading packages which invalidate Base
methods which end up in the call chain of your code.
Nope. But I think I found the culprit; just a sec
A big chunk of the time wasn't actually compilation but evaluation of code I was using to define global const
s (please don't judge the code below):
julia> @time begin
REVMASKS = Tuple(Tuple(reverse!(VecElement.(0%Int8:15%Int8), 1, i)) for i=1:16)
SHIFTMASKS = Tuple(Tuple(insert!(VecElement.(1%Int8:15%Int8), i, VecElement(0%Int8))) for i=1:16)
end;
0.273808 seconds (760.64 k allocations: 46.730 MiB, 1.84% gc time, 100.70% compilation time)
Replacing that with the equivalent using nested Base.Cartesian.@ntuple
fixed things right up.
Moral of the story: the difference between /usr/bin/time
and @btime
shouldn't immediately be blamed on compilation/inference time.
Always fun to see "compilation time" > 100%
Code like Tuple(Tuple(reverse!(VecElement.(0%Int8:15%Int8), 1, i)) for i=1:16)
defines some complicated closures so I wouldn't be surprised if it takes time to compile it. I wonder if sandwiching it with @nospecialize
and @specialize
helps. Another approach is to work with Any[]
.
Second time running it only gets you from 0.27 down to 0.17, so it's slow regardless of compilation fwiw.
I think it's more the dynamism in creating the tuples than compilation per-se that's the problem. I.e. dynamically using Val
types will be slow regardless of whether the code has been compiled already or not
Out of curiosity, how much faster was the the @ntuple
solution?
Quite a bit...
julia> using Base.Cartesian
julia> const V = VecElement
VecElement
julia> @time begin
REVMASKS = @ntuple 16 i->
@ntuple 16 k-> V{Int8}(k <= i ? i - k : k - 1)
SHIFTMASKS = @ntuple 16 i->
@ntuple 16 k-> V{Int8}(k == i ? 0 : k < i ? k : k - 1)
end;
0.000001 seconds
Adam non-jedi Beckmeyer said:
Second time running it only gets you from 0.27 down to 0.17, so it's slow regardless of compilation fwiw.
how do you run it "second time"? I think, evaluating GLOBAL = $RHS
twice creates different closure type if $RHS
needs one. so, the inference of the closure is always done from scratch (though the inference for functions called inside is cached). you need to write f() = $RHS; GLOBAL = f()
if you want to exclude the compilation.
Takafumi Arakaki (tkf) said:
I think, evaluating
GLOBAL = $RHS
twice creates different closure type if$RHS
needs one. so, the inference of the closure is always done from scratch
Ah. I didn't think about the fact that the closure the second time around wouldn't have the same type; good point.
(On a related note, julia is about to be faster than both C and Rust on one of the benchmarks from the benchmarks game...)
Btw, if you don't want to do an explicit using Base.Cartesian
, I believe that doing
@time begin
REVMASKS = ntuple(Val(16)) do i
ntuple(k -> V{Int8}(k <= i ? i - k : k - 1), Val(16))
end
SHIFTMASKS = ntuple(Val(16)) do i
ntuple(k-> V{Int8}(k == i ? 0 : k < i ? k : k - 1), Val(16))
end
end;
should be equivalent
Slower than the version with Tuple
actually
julia> const V = VecElement
VecElement
julia> @time begin
REVMASKS = ntuple(Val(16)) do i
ntuple(k -> V{Int8}(k <= i ? i - k : k - 1), Val(16))
end
SHIFTMASKS = ntuple(Val(16)) do i
ntuple(k-> V{Int8}(k == i ? 0 : k < i ? k : k - 1), Val(16))
end
end;
0.303107 seconds (1.38 M allocations: 91.848 MiB, 3.30% gc time, 100.96% compilation time)
It has the same problem as the original version where it's creating closures. @ntuple
doesn't actually treat the anonymous functions it operates on as functions appearances to the contrary.
Ah interesting
ntuple(_, Val(n))
is a @generated
function so the compiler tries very hard to optimize it
I don't really like Base.Cartesian
but I think it's the best option here
I think we should open an issue to track and hopefully fix why we get >100% compilation times. These are quite nice MWE’s
I though compilation was multi-threaded now, so seeing >100% compilation time was expected behavior?
Precompilation of multiple separate modules can be multithreaded, but not compilation itself (yet)
Ah. hrm
I think the >100% is due to some nesting of the timing blocks in src
Mason Protter said:
Precompilation of multiple separate modules can be multithreaded, but not compilation itself (yet)
Kinda nitpick, but parallelization of Pkg.precompile
is process-based so it's not "threaded" in the sense of Base.Threads
.
Oh I didn't realize it used processes, that makes sense, thanks
Last updated: Dec 28 2024 at 04:38 UTC