Compilation/Inference time · helpdesk (published)

Is there a quick and dirty way to see where compilation/inference time is being spent when running a particular script? I feel like I've done this before, but I can't remember how. Is the answer just using Snoopcompile?

Adam non-jedi Beckmeyer (Apr 20 2021 at 19:09):

I didn't realize y'all had created #helpdesk (published) in addition to this stream since I last visited zulip. Could a mod please move this message to that stream?

Notification Bot (Apr 20 2021 at 19:13):

Mason Protter (Apr 20 2021 at 19:14):

Welcome back @Adam non-jedi Beckmeyer! Yeah, unfortunately this is kinda a hard thing to profile right now. SnoopCompile.jl has the best tools for this as far as I'm aware

Mason Protter (Apr 20 2021 at 19:15):

It won't necessarily help you find which functions the compiler spends the most time on necessarily, but it'll definitely help identify things like invalidation which lead to excessive re-compilation

Adam non-jedi Beckmeyer (Apr 20 2021 at 20:32):

Thanks. Guess I'll have to dig back into how snoopcompile works. This script has just a handful of new functions (none are methods of existing functions, so no invalidations afaik), so it's very confusing that it's spending over a .75 seconds on compilation/inference (compared to 5.5 second total runtime).

Mason Protter (Apr 20 2021 at 21:00):

Hmm, interesting. Are you loading packages? It could be that you are loading packages which invalidate Base methods which end up in the call chain of your code.

Adam non-jedi Beckmeyer (Apr 20 2021 at 21:02):

Adam non-jedi Beckmeyer (Apr 20 2021 at 21:08):

A big chunk of the time wasn't actually compilation but evaluation of code I was using to define global consts (please don't judge the code below):

julia> @time begin
         REVMASKS = Tuple(Tuple(reverse!(VecElement.(0%Int8:15%Int8), 1, i)) for i=1:16)
         SHIFTMASKS = Tuple(Tuple(insert!(VecElement.(1%Int8:15%Int8), i, VecElement(0%Int8))) for i=1:16)
       end;
  0.273808 seconds (760.64 k allocations: 46.730 MiB, 1.84% gc time, 100.70% compilation time)

Replacing that with the equivalent using nested Base.Cartesian.@ntuple fixed things right up.

Adam non-jedi Beckmeyer (Apr 20 2021 at 21:09):

Moral of the story: the difference between /usr/bin/time and @btime shouldn't immediately be blamed on compilation/inference time.

Adam non-jedi Beckmeyer (Apr 20 2021 at 21:10):

Takafumi Arakaki (tkf) (Apr 20 2021 at 21:25):

Code like Tuple(Tuple(reverse!(VecElement.(0%Int8:15%Int8), 1, i)) for i=1:16) defines some complicated closures so I wouldn't be surprised if it takes time to compile it. I wonder if sandwiching it with @nospecialize and @specialize helps. Another approach is to work with Any[].

Adam non-jedi Beckmeyer (Apr 20 2021 at 21:27):

Second time running it only gets you from 0.27 down to 0.17, so it's slow regardless of compilation fwiw.

Mason Protter (Apr 20 2021 at 21:29):

I think it's more the dynamism in creating the tuples than compilation per-se that's the problem. I.e. dynamically using Val types will be slow regardless of whether the code has been compiled already or not

Mason Protter (Apr 20 2021 at 21:29):

Adam non-jedi Beckmeyer (Apr 20 2021 at 21:32):

julia> using Base.Cartesian

julia> const V = VecElement
VecElement

julia> @time begin
           REVMASKS = @ntuple 16 i->
               @ntuple 16 k-> V{Int8}(k <= i ? i - k : k - 1)
           SHIFTMASKS = @ntuple 16 i->
               @ntuple 16 k-> V{Int8}(k == i ? 0 : k < i ? k : k - 1)
       end;
  0.000001 seconds

Takafumi Arakaki (tkf) (Apr 20 2021 at 21:33):

how do you run it "second time"? I think, evaluating GLOBAL = $RHS twice creates different closure type if $RHS needs one. so, the inference of the closure is always done from scratch (though the inference for functions called inside is cached). you need to write f() = $RHS; GLOBAL = f() if you want to exclude the compilation.

Adam non-jedi Beckmeyer (Apr 20 2021 at 21:34):

Ah. I didn't think about the fact that the closure the second time around wouldn't have the same type; good point.

Adam non-jedi Beckmeyer (Apr 20 2021 at 21:39):

Mason Protter (Apr 20 2021 at 21:57):

Btw, if you don't want to do an explicit using Base.Cartesian, I believe that doing

@time begin
    REVMASKS = ntuple(Val(16)) do i
        ntuple(k -> V{Int8}(k <= i ? i - k : k - 1), Val(16))
    end
    SHIFTMASKS = ntuple(Val(16)) do i
        ntuple(k-> V{Int8}(k == i ? 0 : k < i ? k : k - 1), Val(16))
    end
end;

Adam non-jedi Beckmeyer (Apr 20 2021 at 22:28):

julia> const V = VecElement
VecElement

julia> @time begin
           REVMASKS = ntuple(Val(16)) do i
               ntuple(k -> V{Int8}(k <= i ? i - k : k - 1), Val(16))
           end
           SHIFTMASKS = ntuple(Val(16)) do i
               ntuple(k-> V{Int8}(k == i ? 0 : k < i ? k : k - 1), Val(16))
           end
       end;
  0.303107 seconds (1.38 M allocations: 91.848 MiB, 3.30% gc time, 100.96% compilation time)

Adam non-jedi Beckmeyer (Apr 20 2021 at 22:29):

It has the same problem as the original version where it's creating closures. @ntuple doesn't actually treat the anonymous functions it operates on as functions appearances to the contrary.

Mason Protter (Apr 20 2021 at 22:30):

Takafumi Arakaki (tkf) (Apr 20 2021 at 22:52):

ntuple(_, Val(n)) is a @generated function so the compiler tries very hard to optimize it

Takafumi Arakaki (tkf) (Apr 20 2021 at 22:53):

Ian Butterworth (Apr 22 2021 at 13:17):

I think we should open an issue to track and hopefully fix why we get >100% compilation times. These are quite nice MWE’s

Adam non-jedi Beckmeyer (Apr 22 2021 at 17:49):

I though compilation was multi-threaded now, so seeing >100% compilation time was expected behavior?

Mason Protter (Apr 22 2021 at 17:57):

Precompilation of multiple separate modules can be multithreaded, but not compilation itself (yet)

Adam non-jedi Beckmeyer (Apr 22 2021 at 17:57):

Ian Butterworth (Apr 22 2021 at 22:01):

Takafumi Arakaki (tkf) (Apr 22 2021 at 22:39):

Kinda nitpick, but parallelization of Pkg.precompile is process-based so it's not "threaded" in the sense of Base.Threads.

Stream: helpdesk (published)

Topic: Compilation/Inference time

Adam non-jedi Beckmeyer (Apr 20 2021 at 18:28):

Adam non-jedi Beckmeyer (Apr 20 2021 at 19:09):

Notification Bot (Apr 20 2021 at 19:13):

Mason Protter (Apr 20 2021 at 19:14):

Mason Protter (Apr 20 2021 at 19:15):

Adam non-jedi Beckmeyer (Apr 20 2021 at 20:32):

Mason Protter (Apr 20 2021 at 21:00):

Adam non-jedi Beckmeyer (Apr 20 2021 at 21:02):

Adam non-jedi Beckmeyer (Apr 20 2021 at 21:08):

Adam non-jedi Beckmeyer (Apr 20 2021 at 21:09):

Adam non-jedi Beckmeyer (Apr 20 2021 at 21:10):

Takafumi Arakaki (tkf) (Apr 20 2021 at 21:25):

Adam non-jedi Beckmeyer (Apr 20 2021 at 21:27):

Mason Protter (Apr 20 2021 at 21:29):

Mason Protter (Apr 20 2021 at 21:29):

Adam non-jedi Beckmeyer (Apr 20 2021 at 21:32):

Takafumi Arakaki (tkf) (Apr 20 2021 at 21:33):

Adam non-jedi Beckmeyer (Apr 20 2021 at 21:34):

Adam non-jedi Beckmeyer (Apr 20 2021 at 21:39):

Mason Protter (Apr 20 2021 at 21:57):

Adam non-jedi Beckmeyer (Apr 20 2021 at 22:28):

Adam non-jedi Beckmeyer (Apr 20 2021 at 22:29):

Mason Protter (Apr 20 2021 at 22:30):

Takafumi Arakaki (tkf) (Apr 20 2021 at 22:52):

Takafumi Arakaki (tkf) (Apr 20 2021 at 22:53):

Ian Butterworth (Apr 22 2021 at 13:17):

Adam non-jedi Beckmeyer (Apr 22 2021 at 17:49):

Mason Protter (Apr 22 2021 at 17:57):

Adam non-jedi Beckmeyer (Apr 22 2021 at 17:57):

Ian Butterworth (Apr 22 2021 at 22:01):

Takafumi Arakaki (tkf) (Apr 22 2021 at 22:39):

Mason Protter (Apr 22 2021 at 22:42):