Stream: helpdesk (published)

Topic: Ensuring const propagation


view this post on Zulip Jack C (Mar 29 2022 at 04:47):

(cross posted from helpdesk)
Does anyone have advice on how to ensure that the compiler will propagate constants into compiled code? I've specifically got a bunch of abstractions operating over NamedTuples, and the generated code is "forgetting" at a crucial point that a given symbol is of a certain value. Looking at the warntype output, it is okay up until this one method, but inside the method the argument is just a ::Symbol, not a Core.Const.

view this post on Zulip Jack C (Mar 29 2022 at 04:48):

I feel like I am throwing ::Val{symb} in random places without any understanding of whether it will help or not.

view this post on Zulip Mason Protter (Mar 29 2022 at 05:41):

Unfortunately there are no real systematic ways to approach this, other than perhaps the new effect macros that are on the master version of the language currently. This stuff mostly just needs to be taken on a case by case basis.

One thing I’d advise is that if your function bodies are very large, start seeing if you can break down functionality into smaller pieces and compose them. Julia’s optimizer is more likely to give up optimizing a very large function body

view this post on Zulip Zachary P Christensen (Mar 29 2022 at 07:40):

You can use Static.jl's StaticSymbol type as a more structured dispatchable type for this but Mason gave you the best answer, many small efficient methods to ensure inference works correctly. The places I've found inference to fail in large function bodies is when there are complicated conditions in if-else branches or if the condition is computed secondary to a loop.

For example, something like this can be problematic when combined with more complicated stuff

function foo(x::NamedTuple)
    index = 0
    for i in 1:length(x)
        if is_bar(keys(x)[i]) && is_buz(x[i])
            index = i
        end
    end
    x[index]
end

view this post on Zulip Mason Protter (Mar 29 2022 at 17:48):

Jack C said:

I feel like I am throwing ::Val{symb} in random places without any understanding of whether it will help or not.

Generally, what you'd find is that this, and generated functions can create a 'whack-a-mole' situation if your code is sufficiently complex.

By lifting these values explicitly to the type domain, you can force it to specialize and propagate a specific thing, but in doing so you'll end up stressing the compiler and cause it to give up more early on other things that it would have normally optimized better, which forces you to use even more type domain lifting, and it all feeds back on itself until your compile times are horrendous or you find a better way to approach it

view this post on Zulip Jack C (Mar 29 2022 at 18:11):

Awesome, thank you everyone for the advice!

view this post on Zulip Jack C (Mar 29 2022 at 18:13):

@Chad Scherrer (on other topic) you said that assume_effects may be able to guarantee constant propagation in the future? How would that work, from reading the 1.9-dev docs I do not see an obvious connection.

view this post on Zulip Mason Protter (Mar 29 2022 at 18:14):

assume_effects lets the compiler not worry about proving certain invariants that are needed for it to do optimizations, so it makes the optimizations more likely to happen

view this post on Zulip chriselrod (Mar 29 2022 at 18:46):

Mason Protter said:

By lifting these values explicitly to the type domain, you can force it to specialize and propagate a specific thing, but in doing so you'll end up stressing the compiler and cause it to give up more early on other things that it would have normally optimized better, which forces you to use even more type domain lifting, and it all feeds back on itself until your compile times are horrendous or you find a better way to approach it

Yeah, VectorizationBase.jl and LoopVectorization.jl have this problem, and why I've been pushing against others following the same path, as they are likely to also come to regret it.
Or, because my on-the-job performance work is predominantly to combat latency, I'll likely come to regret if I don't succeed in convincing other people to write low-latency code.

view this post on Zulip Mason Protter (Mar 29 2022 at 18:50):

To take @Zachary P Christensen's example, observe this:

julia> function foo(x::NamedTuple)
           index = 1
           for i in 1:length(x)
               if is_bar(keys(x)[i]) && is_buz(x[i])
                   index = i
               end
           end
           x[index]
       end
foo (generic function with 1 method)

julia> is_bar(s) = s == :hi
is_bar (generic function with 1 method)

julia> is_buz(x) = iseven(x)
is_buz (generic function with 1 method)

julia> bar() = foo((;a=1, b=2, c=3, hi=4))
bar (generic function with 1 method)

julia> @code_typed bar()
CodeInfo(
1 ──       goto #10 if not true
2 ┄─ %2  = φ (#1 => 1, #9 => %20)::Int64
    %3  = φ (#1 => 1, #9 => %21)::Int64
    %4  = φ (#1 => 1, #9 => %13)::Int64
    %5  = Base.getfield((:a, :b, :c, :hi), %2, true)::Symbol
    %6  = (%5 === :hi)::Bool
└───       goto #5 if not %6
3 ── %8  = Base.getfield($(QuoteNode((a = 1, b = 2, c = 3, hi = 4))), %2)::Int64
    %9  = Base.checked_srem_int(%8, 2)::Int64
    %10 = (%9 === 0)::Bool
└───       goto #5 if not %10
4 ──       nothing::Nothing
5 ┄─ %13 = φ (#4 => %2, #3 => %4, #2 => %4)::Int64
    %14 = (%3 === 4)::Bool
└───       goto #7 if not %14
6 ──       Base.nothing::Nothing
└───       goto #8
7 ── %18 = Base.add_int(%3, 1)::Int64
└───       goto #8
8 ┄─ %20 = φ (#7 => %18)::Int64
    %21 = φ (#7 => %18)::Int64
    %22 = φ (#6 => true, #7 => false)::Bool
    %23 = Base.not_int(%22)::Bool
└───       goto #10 if not %23
9 ──       goto #2
10  %26 = φ (#8 => %13, #1 => 1)::Int64
    %27 = Base.getfield($(QuoteNode((a = 1, b = 2, c = 3, hi = 4))), %26)::Int64
└───       goto #11
11        return %27
) => Int64

view this post on Zulip Mason Protter (Mar 29 2022 at 18:51):

Now compare (using a recent build of the master branch) with this:

julia> Base.@assume_effects :consistent :effect_free :terminates_locally function foo(x::NamedTuple)
           index = 1
           for i in 1:length(x)
               if is_bar(keys(x)[i]) && is_buz(x[i])
                   index = i
               end
           end
           x[index]
       end
foo (generic function with 1 method)

julia> @code_typed bar()
CodeInfo(
1      return 4
) => Int64

view this post on Zulip Expanding Man (Mar 29 2022 at 18:54):

That does look pretty scary, it's not realistic to expect to be doing Base.@assume_effects on every piece of code you write

view this post on Zulip Jack C (Mar 29 2022 at 18:55):

That's very impressive, but what is each of those directives doing? Is the compiler basically encouraged to "look further" when it doesn't have to waste its time checking for effects on each call?

view this post on Zulip Mason Protter (Mar 29 2022 at 18:56):

Yeah, the compiler has heuristics for how long it's allowed to spend trying to discover if various optimizations are allowed.

view this post on Zulip Mason Protter (Mar 29 2022 at 18:56):

Expanding Man said:

That does look pretty scary, it's not realistic to expect to be doing Base.@assume_effects on every piece of code you write

Sure, but IMO is this is a much more sane and scalable solution than lifting everything into generated functions

view this post on Zulip Expanding Man (Mar 29 2022 at 18:57):

I suppose, but relying on constant propagation seems like a dubious proposition in Julia right now, even with whatever fanciness is happening here on master.

view this post on Zulip Mason Protter (Mar 29 2022 at 18:57):

And it's not necessary for every piece of code you write. It's something you reach for when things like constant propagation and whatnot are not being agressive enough in a usecase you've identified.

view this post on Zulip Zachary P Christensen (Mar 29 2022 at 19:02):

It's actually good that the compiler fails for constant propagation, in a way. If we had absolute constant propagation we couldn't get away with a lot of stuff we do.

view this post on Zulip Simeon Schaub (Mar 29 2022 at 19:52):

Do you actually need to tell it about the consistent and effect free part? I would have assumed the compiler would be able to figure out that part on its own.

view this post on Zulip Mason Protter (Mar 29 2022 at 20:01):

Just tried out all the combinations of those three options and found that the minimal options necessary were :consistent :terminates_locally

view this post on Zulip Ari Katz (Mar 29 2022 at 20:01):

Simeon Schaub said:

Do you actually need to tell it about the consistent and effect free part? I would have assumed the compiler would be able to figure out that part on its own.

See this thread where Keno discusses that: https://twitter.com/akatzzzzz/status/1504433279876935685

@KenoFischer Is it feasible to come in from the invariant side also-, something like trait constraints on generic functions or abstract types? Or effect handlers? Hopefully make proving easier if people are willing to opt into that stuff.

- Ari Katz (@akatzzzzz)

view this post on Zulip Mason Protter (Mar 29 2022 at 20:03):

Also, @Simeon Schaub or anyone else, would you happen to know why functions like this that compile away due to const-prop still have a single clock cycle runtime rather than sub-nanosecond?

julia> @code_typed bar()
CodeInfo(
1      return 4
) => Int64

julia> @benchmark bar()
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min  max):  1.257 ns  15.821 ns   GC (min  max): 0.00%  0.00%
 Time  (median):     1.518 ns               GC (median):    0.00%
 Time  (mean ± σ):   1.502 ns ±  0.397 ns   GC (mean ± σ):  0.00% ± 0.00%

                             
  █▂▂▄▃▁▂▅▁▁▃▄▁▁▄▆▂▁▂▄▃▁▁█▇▂▁▂█▇▂▁▂▆▃▂▂▁▃▃▂▂▁▂▅▃▂▁▁▂▄▂▂▁▁▁▄▄ 
  1.26 ns        Histogram: frequency by time         1.8 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Wait, hm maybe benchmark tools is somehow getting smarter? this is a bit weird.

julia> bar2() = 4
bar2 (generic function with 1 method)

julia> @benchmark bar2()
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min  max):  1.258 ns  16.736 ns   GC (min  max): 0.00%  0.00%
 Time  (median):     1.522 ns               GC (median):    0.00%
 Time  (mean ± σ):   1.557 ns ±  0.441 ns   GC (mean ± σ):  0.00% ± 0.00%

                       
  ▆▂▄▃▁▄▃▂▇▃▁▅▂▁▄▃▁█▆▂▂█▅▂▃▆▂▂▃▄▂▁▂▆▃▂▁▃▃▂▂▁▅▄▂▂▂▆▄▂▂▂▂▂▂▂▂▂ 
  1.26 ns        Histogram: frequency by time        1.97 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Compare with 1.7.2 where I get

julia> bar2() = 4
bar2 (generic function with 1 method)

julia> @benchmark bar2()
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min  max):  0.033 ns  0.053 ns   GC (min  max): 0.00%  0.00%
 Time  (median):     0.038 ns              GC (median):    0.00%
 Time  (mean ± σ):   0.038 ns ± 0.002 ns   GC (mean ± σ):  0.00% ± 0.00%

                                         
  ▂▁▁▁▁▁▃▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁▁█▁▁▁▁▁▆▁▁▁▁▁▂ 
  0.033 ns       Histogram: frequency by time      0.042 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

view this post on Zulip Ari Katz (Mar 29 2022 at 20:03):

chriselrod said:

Mason Protter said:

By lifting these values explicitly to the type domain, you can force it to specialize and propagate a specific thing, but in doing so you'll end up stressing the compiler and cause it to give up more early on other things that it would have normally optimized better, which forces you to use even more type domain lifting, and it all feeds back on itself until your compile times are horrendous or you find a better way to approach it

Yeah, VectorizationBase.jl and LoopVectorization.jl have this problem, and why I've been pushing against others following the same path, as they are likely to also come to regret it.
Or, because my on-the-job performance work is predominantly to combat latency, I'll likely come to regret if I don't succeed in convincing other people to write low-latency code.

I really hope the new compiler plugin work that @Shuhei Kadowaki is working on could help here, as it's supposed to supplant/augment generated functions

view this post on Zulip Ari Katz (Mar 29 2022 at 20:04):

Ari Katz said:

Simeon Schaub said:

Do you actually need to tell it about the consistent and effect free part? I would have assumed the compiler would be able to figure out that part on its own.

See this thread where Keno discusses that: https://twitter.com/akatzzzzz/status/1504433279876935685

It seems to do this in general requires some theorem proving...and to put in more invariants to make it easier to prove would be difficult, though I'm currently asking about that to some type system wizards in Jan's group on slack here: https://julialang.slack.com/archives/C67910KEH/p1648581752673659?thread_ts=1648485094.033829&cid=C67910KEH

view this post on Zulip Ari Katz (Mar 29 2022 at 20:05):

Other languages like Koka and Dex have effects as part of the type system, and can type functions to that effect (heh)

view this post on Zulip Simeon Schaub (Mar 29 2022 at 20:06):

Mason Protter said:

Also, Simeon Schaub or anyone else, would you happen to know why functions like this that compile away due to const-prop still have a single clock cycle runtime rather than sub-nanosecond?

julia> @code_typed bar()
CodeInfo(
1      return 4
) => Int64

julia> @benchmark bar()
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min  max):  1.257 ns  15.821 ns   GC (min  max): 0.00%  0.00%
 Time  (median):     1.518 ns               GC (median):    0.00%
 Time  (mean ± σ):   1.502 ns ±  0.397 ns   GC (mean ± σ):  0.00% ± 0.00%

                             
  █▂▂▄▃▁▂▅▁▁▃▄▁▁▄▆▂▁▂▄▃▁▁█▇▂▁▂█▇▂▁▂▆▃▂▂▁▃▃▂▂▁▂▅▃▂▁▁▂▄▂▂▁▁▁▄▄ 
  1.26 ns        Histogram: frequency by time         1.8 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Wait, hm maybe benchmark tools is somehow getting smarter? this is a bit weird.

julia> bar2() = 4
bar2 (generic function with 1 method)

julia> @benchmark bar2()
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min  max):  1.258 ns  16.736 ns   GC (min  max): 0.00%  0.00%
 Time  (median):     1.522 ns               GC (median):    0.00%
 Time  (mean ± σ):   1.557 ns ±  0.441 ns   GC (mean ± σ):  0.00% ± 0.00%

                       
  ▆▂▄▃▁▄▃▂▇▃▁▅▂▁▄▃▁█▆▂▂█▅▂▃▆▂▂▃▄▂▁▂▆▃▂▁▃▃▂▂▁▅▄▂▂▂▆▄▂▂▂▂▂▂▂▂▂ 
  1.26 ns        Histogram: frequency by time        1.97 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Yeah, it's just a benchmarking artifact. That shouldn't have any effect on real-world code

view this post on Zulip Mosè Giordano (Mar 29 2022 at 20:09):

Are you using Julia master? Isn't that the effect of https://github.com/JuliaCI/BenchmarkTools.jl/pull/275?

view this post on Zulip Mason Protter (Mar 29 2022 at 20:10):

Ah nice

view this post on Zulip jar (Mar 29 2022 at 20:14):

https://github.com/olynch/CompTime.jl seems relevant

view this post on Zulip Chad Scherrer (Mar 29 2022 at 20:15):

Of course! Just talking about this with @Owen Lynch, sorry I left it out


Last updated: Nov 06 2024 at 04:40 UTC