(cross posted from helpdesk)
Does anyone have advice on how to ensure that the compiler will propagate constants into compiled code? I've specifically got a bunch of abstractions operating over NamedTuples, and the generated code is "forgetting" at a crucial point that a given symbol is of a certain value. Looking at the warntype output, it is okay up until this one method, but inside the method the argument is just a ::Symbol
, not a Core.Const
.
I feel like I am throwing ::Val{symb}
in random places without any understanding of whether it will help or not.
Unfortunately there are no real systematic ways to approach this, other than perhaps the new effect macros that are on the master version of the language currently. This stuff mostly just needs to be taken on a case by case basis.
One thing I’d advise is that if your function bodies are very large, start seeing if you can break down functionality into smaller pieces and compose them. Julia’s optimizer is more likely to give up optimizing a very large function body
You can use Static.jl's StaticSymbol
type as a more structured dispatchable type for this but Mason gave you the best answer, many small efficient methods to ensure inference works correctly. The places I've found inference to fail in large function bodies is when there are complicated conditions in if-else branches or if the condition is computed secondary to a loop.
For example, something like this can be problematic when combined with more complicated stuff
function foo(x::NamedTuple)
index = 0
for i in 1:length(x)
if is_bar(keys(x)[i]) && is_buz(x[i])
index = i
end
end
x[index]
end
Jack C said:
I feel like I am throwing
::Val{symb}
in random places without any understanding of whether it will help or not.
Generally, what you'd find is that this, and generated functions can create a 'whack-a-mole' situation if your code is sufficiently complex.
By lifting these values explicitly to the type domain, you can force it to specialize and propagate a specific thing, but in doing so you'll end up stressing the compiler and cause it to give up more early on other things that it would have normally optimized better, which forces you to use even more type domain lifting, and it all feeds back on itself until your compile times are horrendous or you find a better way to approach it
Awesome, thank you everyone for the advice!
@Chad Scherrer (on other topic) you said that assume_effects
may be able to guarantee constant propagation in the future? How would that work, from reading the 1.9-dev docs I do not see an obvious connection.
assume_effects
lets the compiler not worry about proving certain invariants that are needed for it to do optimizations, so it makes the optimizations more likely to happen
Mason Protter said:
By lifting these values explicitly to the type domain, you can force it to specialize and propagate a specific thing, but in doing so you'll end up stressing the compiler and cause it to give up more early on other things that it would have normally optimized better, which forces you to use even more type domain lifting, and it all feeds back on itself until your compile times are horrendous or you find a better way to approach it
Yeah, VectorizationBase.jl and LoopVectorization.jl have this problem, and why I've been pushing against others following the same path, as they are likely to also come to regret it.
Or, because my on-the-job performance work is predominantly to combat latency, I'll likely come to regret if I don't succeed in convincing other people to write low-latency code.
To take @Zachary P Christensen's example, observe this:
julia> function foo(x::NamedTuple)
index = 1
for i in 1:length(x)
if is_bar(keys(x)[i]) && is_buz(x[i])
index = i
end
end
x[index]
end
foo (generic function with 1 method)
julia> is_bar(s) = s == :hi
is_bar (generic function with 1 method)
julia> is_buz(x) = iseven(x)
is_buz (generic function with 1 method)
julia> bar() = foo((;a=1, b=2, c=3, hi=4))
bar (generic function with 1 method)
julia> @code_typed bar()
CodeInfo(
1 ── goto #10 if not true
2 ┄─ %2 = φ (#1 => 1, #9 => %20)::Int64
│ %3 = φ (#1 => 1, #9 => %21)::Int64
│ %4 = φ (#1 => 1, #9 => %13)::Int64
│ %5 = Base.getfield((:a, :b, :c, :hi), %2, true)::Symbol
│ %6 = (%5 === :hi)::Bool
└─── goto #5 if not %6
3 ── %8 = Base.getfield($(QuoteNode((a = 1, b = 2, c = 3, hi = 4))), %2)::Int64
│ %9 = Base.checked_srem_int(%8, 2)::Int64
│ %10 = (%9 === 0)::Bool
└─── goto #5 if not %10
4 ── nothing::Nothing
5 ┄─ %13 = φ (#4 => %2, #3 => %4, #2 => %4)::Int64
│ %14 = (%3 === 4)::Bool
└─── goto #7 if not %14
6 ── Base.nothing::Nothing
└─── goto #8
7 ── %18 = Base.add_int(%3, 1)::Int64
└─── goto #8
8 ┄─ %20 = φ (#7 => %18)::Int64
│ %21 = φ (#7 => %18)::Int64
│ %22 = φ (#6 => true, #7 => false)::Bool
│ %23 = Base.not_int(%22)::Bool
└─── goto #10 if not %23
9 ── goto #2
10 ┄ %26 = φ (#8 => %13, #1 => 1)::Int64
│ %27 = Base.getfield($(QuoteNode((a = 1, b = 2, c = 3, hi = 4))), %26)::Int64
└─── goto #11
11 ─ return %27
) => Int64
Now compare (using a recent build of the master branch) with this:
julia> Base.@assume_effects :consistent :effect_free :terminates_locally function foo(x::NamedTuple)
index = 1
for i in 1:length(x)
if is_bar(keys(x)[i]) && is_buz(x[i])
index = i
end
end
x[index]
end
foo (generic function with 1 method)
julia> @code_typed bar()
CodeInfo(
1 ─ return 4
) => Int64
That does look pretty scary, it's not realistic to expect to be doing Base.@assume_effects
on every piece of code you write
That's very impressive, but what is each of those directives doing? Is the compiler basically encouraged to "look further" when it doesn't have to waste its time checking for effects on each call?
Yeah, the compiler has heuristics for how long it's allowed to spend trying to discover if various optimizations are allowed.
Expanding Man said:
That does look pretty scary, it's not realistic to expect to be doing
Base.@assume_effects
on every piece of code you write
Sure, but IMO is this is a much more sane and scalable solution than lifting everything into generated functions
I suppose, but relying on constant propagation seems like a dubious proposition in Julia right now, even with whatever fanciness is happening here on master.
And it's not necessary for every piece of code you write. It's something you reach for when things like constant propagation and whatnot are not being agressive enough in a usecase you've identified.
It's actually good that the compiler fails for constant propagation, in a way. If we had absolute constant propagation we couldn't get away with a lot of stuff we do.
Do you actually need to tell it about the consistent and effect free part? I would have assumed the compiler would be able to figure out that part on its own.
Just tried out all the combinations of those three options and found that the minimal options necessary were :consistent :terminates_locally
Simeon Schaub said:
Do you actually need to tell it about the consistent and effect free part? I would have assumed the compiler would be able to figure out that part on its own.
See this thread where Keno discusses that: https://twitter.com/akatzzzzz/status/1504433279876935685
@KenoFischer Is it feasible to come in from the invariant side also-, something like trait constraints on generic functions or abstract types? Or effect handlers? Hopefully make proving easier if people are willing to opt into that stuff.
- Ari Katz (@akatzzzzz)Also, @Simeon Schaub or anyone else, would you happen to know why functions like this that compile away due to const-prop still have a single clock cycle runtime rather than sub-nanosecond?
julia> @code_typed bar()
CodeInfo(
1 ─ return 4
) => Int64
julia> @benchmark bar()
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
Range (min … max): 1.257 ns … 15.821 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 1.518 ns ┊ GC (median): 0.00%
Time (mean ± σ): 1.502 ns ± 0.397 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▂ █
█▂▂▄▃▁▂▅▁▁▃▄▁▁▄▆▂▁▂▄▃▁▁█▇▂▁▂█▇▂▁▂▆▃▂▂▁▃▃▂▂▁▂▅▃▂▁▁▂▄▂▂▁▁▁▄▄ ▃
1.26 ns Histogram: frequency by time 1.8 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
Wait, hm maybe benchmark tools is somehow getting smarter? this is a bit weird.
julia> bar2() = 4
bar2 (generic function with 1 method)
julia> @benchmark bar2()
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
Range (min … max): 1.258 ns … 16.736 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 1.522 ns ┊ GC (median): 0.00%
Time (mean ± σ): 1.557 ns ± 0.441 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█
▆▂▄▃▁▄▃▂▇▃▁▅▂▁▄▃▁█▆▂▂█▅▂▃▆▂▂▃▄▂▁▂▆▃▂▁▃▃▂▂▁▅▄▂▂▂▆▄▂▂▂▂▂▂▂▂▂ ▃
1.26 ns Histogram: frequency by time 1.97 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
Compare with 1.7.2 where I get
julia> bar2() = 4
bar2 (generic function with 1 method)
julia> @benchmark bar2()
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
Range (min … max): 0.033 ns … 0.053 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 0.038 ns ┊ GC (median): 0.00%
Time (mean ± σ): 0.038 ns ± 0.002 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▂ ▂ ▁ █ ▇ ▅
▂▁▁▁▁▁▃▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁▁█▁▁▁▁▁▆▁▁▁▁▁▂ ▃
0.033 ns Histogram: frequency by time 0.042 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
chriselrod said:
Mason Protter said:
By lifting these values explicitly to the type domain, you can force it to specialize and propagate a specific thing, but in doing so you'll end up stressing the compiler and cause it to give up more early on other things that it would have normally optimized better, which forces you to use even more type domain lifting, and it all feeds back on itself until your compile times are horrendous or you find a better way to approach it
Yeah, VectorizationBase.jl and LoopVectorization.jl have this problem, and why I've been pushing against others following the same path, as they are likely to also come to regret it.
Or, because my on-the-job performance work is predominantly to combat latency, I'll likely come to regret if I don't succeed in convincing other people to write low-latency code.
I really hope the new compiler plugin work that @Shuhei Kadowaki is working on could help here, as it's supposed to supplant/augment generated functions
Ari Katz said:
Simeon Schaub said:
Do you actually need to tell it about the consistent and effect free part? I would have assumed the compiler would be able to figure out that part on its own.
See this thread where Keno discusses that: https://twitter.com/akatzzzzz/status/1504433279876935685
It seems to do this in general requires some theorem proving...and to put in more invariants to make it easier to prove would be difficult, though I'm currently asking about that to some type system wizards in Jan's group on slack here: https://julialang.slack.com/archives/C67910KEH/p1648581752673659?thread_ts=1648485094.033829&cid=C67910KEH
Other languages like Koka and Dex have effects as part of the type system, and can type functions to that effect (heh)
Mason Protter said:
Also, Simeon Schaub or anyone else, would you happen to know why functions like this that compile away due to const-prop still have a single clock cycle runtime rather than sub-nanosecond?
julia> @code_typed bar() CodeInfo( 1 ─ return 4 ) => Int64 julia> @benchmark bar() BenchmarkTools.Trial: 10000 samples with 1000 evaluations. Range (min … max): 1.257 ns … 15.821 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 1.518 ns ┊ GC (median): 0.00% Time (mean ± σ): 1.502 ns ± 0.397 ns ┊ GC (mean ± σ): 0.00% ± 0.00% ▂ █ █▂▂▄▃▁▂▅▁▁▃▄▁▁▄▆▂▁▂▄▃▁▁█▇▂▁▂█▇▂▁▂▆▃▂▂▁▃▃▂▂▁▂▅▃▂▁▁▂▄▂▂▁▁▁▄▄ ▃ 1.26 ns Histogram: frequency by time 1.8 ns < Memory estimate: 0 bytes, allocs estimate: 0.
Wait, hm maybe benchmark tools is somehow getting smarter? this is a bit weird.
julia> bar2() = 4 bar2 (generic function with 1 method) julia> @benchmark bar2() BenchmarkTools.Trial: 10000 samples with 1000 evaluations. Range (min … max): 1.258 ns … 16.736 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 1.522 ns ┊ GC (median): 0.00% Time (mean ± σ): 1.557 ns ± 0.441 ns ┊ GC (mean ± σ): 0.00% ± 0.00% █ ▆▂▄▃▁▄▃▂▇▃▁▅▂▁▄▃▁█▆▂▂█▅▂▃▆▂▂▃▄▂▁▂▆▃▂▁▃▃▂▂▁▅▄▂▂▂▆▄▂▂▂▂▂▂▂▂▂ ▃ 1.26 ns Histogram: frequency by time 1.97 ns < Memory estimate: 0 bytes, allocs estimate: 0.
Yeah, it's just a benchmarking artifact. That shouldn't have any effect on real-world code
Are you using Julia master
? Isn't that the effect of https://github.com/JuliaCI/BenchmarkTools.jl/pull/275?
Ah nice
https://github.com/olynch/CompTime.jl seems relevant
Of course! Just talking about this with @Owen Lynch, sorry I left it out
Last updated: Nov 22 2024 at 04:41 UTC