Stream: helpdesk (published)

Topic: Compilation/Inference time


view this post on Zulip Adam non-jedi Beckmeyer (Apr 20 2021 at 18:28):

Is there a quick and dirty way to see where compilation/inference time is being spent when running a particular script? I feel like I've done this before, but I can't remember how. Is the answer just using Snoopcompile?

view this post on Zulip Adam non-jedi Beckmeyer (Apr 20 2021 at 19:09):

I didn't realize y'all had created #helpdesk (published) in addition to this stream since I last visited zulip. Could a mod please move this message to that stream?

view this post on Zulip Notification Bot (Apr 20 2021 at 19:13):

This topic was moved here from #helpdesk > Compilation/Inference time by Mason Protter

view this post on Zulip Mason Protter (Apr 20 2021 at 19:14):

Welcome back @Adam non-jedi Beckmeyer! Yeah, unfortunately this is kinda a hard thing to profile right now. SnoopCompile.jl has the best tools for this as far as I'm aware

view this post on Zulip Mason Protter (Apr 20 2021 at 19:15):

It won't necessarily help you find which functions the compiler spends the most time on necessarily, but it'll definitely help identify things like invalidation which lead to excessive re-compilation

view this post on Zulip Adam non-jedi Beckmeyer (Apr 20 2021 at 20:32):

Thanks. Guess I'll have to dig back into how snoopcompile works. This script has just a handful of new functions (none are methods of existing functions, so no invalidations afaik), so it's very confusing that it's spending over a .75 seconds on compilation/inference (compared to 5.5 second total runtime).

view this post on Zulip Mason Protter (Apr 20 2021 at 21:00):

Hmm, interesting. Are you loading packages? It could be that you are loading packages which invalidate Base methods which end up in the call chain of your code.

view this post on Zulip Adam non-jedi Beckmeyer (Apr 20 2021 at 21:02):

Nope. But I think I found the culprit; just a sec

view this post on Zulip Adam non-jedi Beckmeyer (Apr 20 2021 at 21:08):

A big chunk of the time wasn't actually compilation but evaluation of code I was using to define global consts (please don't judge the code below):

julia> @time begin
         REVMASKS = Tuple(Tuple(reverse!(VecElement.(0%Int8:15%Int8), 1, i)) for i=1:16)
         SHIFTMASKS = Tuple(Tuple(insert!(VecElement.(1%Int8:15%Int8), i, VecElement(0%Int8))) for i=1:16)
       end;
  0.273808 seconds (760.64 k allocations: 46.730 MiB, 1.84% gc time, 100.70% compilation time)

Replacing that with the equivalent using nested Base.Cartesian.@ntuple fixed things right up.

view this post on Zulip Adam non-jedi Beckmeyer (Apr 20 2021 at 21:09):

Moral of the story: the difference between /usr/bin/time and @btime shouldn't immediately be blamed on compilation/inference time.

view this post on Zulip Adam non-jedi Beckmeyer (Apr 20 2021 at 21:10):

Always fun to see "compilation time" > 100%

view this post on Zulip Takafumi Arakaki (tkf) (Apr 20 2021 at 21:25):

Code like Tuple(Tuple(reverse!(VecElement.(0%Int8:15%Int8), 1, i)) for i=1:16) defines some complicated closures so I wouldn't be surprised if it takes time to compile it. I wonder if sandwiching it with @nospecialize and @specialize helps. Another approach is to work with Any[].

view this post on Zulip Adam non-jedi Beckmeyer (Apr 20 2021 at 21:27):

Second time running it only gets you from 0.27 down to 0.17, so it's slow regardless of compilation fwiw.

view this post on Zulip Mason Protter (Apr 20 2021 at 21:29):

I think it's more the dynamism in creating the tuples than compilation per-se that's the problem. I.e. dynamically using Val types will be slow regardless of whether the code has been compiled already or not

view this post on Zulip Mason Protter (Apr 20 2021 at 21:29):

Out of curiosity, how much faster was the the @ntuple solution?

view this post on Zulip Adam non-jedi Beckmeyer (Apr 20 2021 at 21:32):

Quite a bit...

julia> using Base.Cartesian

julia> const V = VecElement
VecElement

julia> @time begin
           REVMASKS = @ntuple 16 i->
               @ntuple 16 k-> V{Int8}(k <= i ? i - k : k - 1)
           SHIFTMASKS = @ntuple 16 i->
               @ntuple 16 k-> V{Int8}(k == i ? 0 : k < i ? k : k - 1)
       end;
  0.000001 seconds

view this post on Zulip Takafumi Arakaki (tkf) (Apr 20 2021 at 21:33):

Adam non-jedi Beckmeyer said:

Second time running it only gets you from 0.27 down to 0.17, so it's slow regardless of compilation fwiw.

how do you run it "second time"? I think, evaluating GLOBAL = $RHS twice creates different closure type if $RHS needs one. so, the inference of the closure is always done from scratch (though the inference for functions called inside is cached). you need to write f() = $RHS; GLOBAL = f() if you want to exclude the compilation.

view this post on Zulip Adam non-jedi Beckmeyer (Apr 20 2021 at 21:34):

Takafumi Arakaki (tkf) said:

I think, evaluating GLOBAL = $RHS twice creates different closure type if $RHS needs one. so, the inference of the closure is always done from scratch

Ah. I didn't think about the fact that the closure the second time around wouldn't have the same type; good point.

view this post on Zulip Adam non-jedi Beckmeyer (Apr 20 2021 at 21:39):

(On a related note, julia is about to be faster than both C and Rust on one of the benchmarks from the benchmarks game...)

view this post on Zulip Mason Protter (Apr 20 2021 at 21:57):

Btw, if you don't want to do an explicit using Base.Cartesian, I believe that doing

@time begin
    REVMASKS = ntuple(Val(16)) do i
        ntuple(k -> V{Int8}(k <= i ? i - k : k - 1), Val(16))
    end
    SHIFTMASKS = ntuple(Val(16)) do i
        ntuple(k-> V{Int8}(k == i ? 0 : k < i ? k : k - 1), Val(16))
    end
end;

should be equivalent

view this post on Zulip Adam non-jedi Beckmeyer (Apr 20 2021 at 22:28):

Slower than the version with Tuple actually

julia> const V = VecElement
VecElement

julia> @time begin
           REVMASKS = ntuple(Val(16)) do i
               ntuple(k -> V{Int8}(k <= i ? i - k : k - 1), Val(16))
           end
           SHIFTMASKS = ntuple(Val(16)) do i
               ntuple(k-> V{Int8}(k == i ? 0 : k < i ? k : k - 1), Val(16))
           end
       end;
  0.303107 seconds (1.38 M allocations: 91.848 MiB, 3.30% gc time, 100.96% compilation time)

view this post on Zulip Adam non-jedi Beckmeyer (Apr 20 2021 at 22:29):

It has the same problem as the original version where it's creating closures. @ntuple doesn't actually treat the anonymous functions it operates on as functions appearances to the contrary.

view this post on Zulip Mason Protter (Apr 20 2021 at 22:30):

Ah interesting

view this post on Zulip Takafumi Arakaki (tkf) (Apr 20 2021 at 22:52):

ntuple(_, Val(n)) is a @generated function so the compiler tries very hard to optimize it

view this post on Zulip Takafumi Arakaki (tkf) (Apr 20 2021 at 22:53):

I don't really like Base.Cartesian but I think it's the best option here

view this post on Zulip Ian Butterworth (Apr 22 2021 at 13:17):

I think we should open an issue to track and hopefully fix why we get >100% compilation times. These are quite nice MWE’s

view this post on Zulip Adam non-jedi Beckmeyer (Apr 22 2021 at 17:49):

I though compilation was multi-threaded now, so seeing >100% compilation time was expected behavior?

view this post on Zulip Mason Protter (Apr 22 2021 at 17:57):

Precompilation of multiple separate modules can be multithreaded, but not compilation itself (yet)

view this post on Zulip Adam non-jedi Beckmeyer (Apr 22 2021 at 17:57):

Ah. hrm

view this post on Zulip Ian Butterworth (Apr 22 2021 at 22:01):

I think the >100% is due to some nesting of the timing blocks in src

view this post on Zulip Takafumi Arakaki (tkf) (Apr 22 2021 at 22:39):

Mason Protter said:

Precompilation of multiple separate modules can be multithreaded, but not compilation itself (yet)

Kinda nitpick, but parallelization of Pkg.precompile is process-based so it's not "threaded" in the sense of Base.Threads.

view this post on Zulip Mason Protter (Apr 22 2021 at 22:42):

Oh I didn't realize it used processes, that makes sense, thanks


Last updated: Dec 28 2024 at 04:38 UTC