Stream: helpdesk (published)

Topic: Compiler having trouble with type inference


view this post on Zulip Júlio Hoffimann (Jul 16 2025 at 15:14):

I've been fighting for hours with the compiler and can't figure out what is happening. Appreciate any help.

The following code works fine with unitless columns:

using GeoStats

# table with single column z
t = georef((z=rand(1000,300),))

d = Detrend()

@time d(t);
26.553297 seconds (10.20 M allocations: 407.415 MiB, 0.17% gc time, 0.02% compilation time)

If I add units to the column:

t = georef((z=rand(1000,300)u"K",))

@time d(t);

then the execution never ends (or takes too long, I don't know).

The output of the functor is inferred properly:

@code_warntype d(t)
MethodInstance for (::Detrend{ColumnSelectors.AllSelector})(::GeoTable{CartesianGrid{𝔼{2}, Cartesian2D{NoDatum, Quantity{Float64, 𝐋, Unitful.FreeUnits{(m,), 𝐋, nothing}}}, 2, Tuple{Quantity{Float64, 𝐋, Unitful.FreeUnits{(m,), 𝐋, nothing}}, Quantity{Float64, 𝐋, Unitful.FreeUnits{(m,), 𝐋, nothing}}}}, @NamedTuple{z::Vector{Quantity{Float64, 𝚯, Unitful.FreeUnits{(K,), 𝚯, nothing}}}}})
  from (transform::TransformsBase.Transform)(object) @ TransformsBase ~/.julia/packages/TransformsBase/w6gll/src/interface.jl:124
Arguments
  transform::Detrend{ColumnSelectors.AllSelector}
  object::GeoTable{CartesianGrid{𝔼{2}, Cartesian2D{NoDatum, Quantity{Float64, 𝐋, Unitful.FreeUnits{(m,), 𝐋, nothing}}}, 2, Tuple{Quantity{Float64, 𝐋, Unitful.FreeUnits{(m,), 𝐋, nothing}}, Quantity{Float64, 𝐋, Unitful.FreeUnits{(m,), 𝐋, nothing}}}}, @NamedTuple{z::Vector{Quantity{Float64, 𝚯, Unitful.FreeUnits{(K,), 𝚯, nothing}}}}}
Body::GeoTable{CartesianGrid{𝔼{2}, Cartesian2D{NoDatum, Quantity{Float64, 𝐋, Unitful.FreeUnits{(m,), 𝐋, nothing}}}, 2, Tuple{Quantity{Float64, 𝐋, Unitful.FreeUnits{(m,), 𝐋, nothing}}, Quantity{Float64, 𝐋, Unitful.FreeUnits{(m,), 𝐋, nothing}}}}, @NamedTuple{z::Vector{Quantity{Float64, 𝚯, Unitful.FreeUnits{(K,), 𝚯, nothing}}}}}
1  %1 = TransformsBase.:|>::Core.Const(|>)
   %2 = TransformsBase.apply(transform, object)::Tuple{GeoTable{CartesianGrid{𝔼{2}, Cartesian2D{NoDatum, Quantity{Float64, 𝐋, Unitful.FreeUnits{(m,), 𝐋, nothing}}}, 2, Tuple{Quantity{Float64, 𝐋, Unitful.FreeUnits{(m,), 𝐋, nothing}}, Quantity{Float64, 𝐋, Unitful.FreeUnits{(m,), 𝐋, nothing}}}}, @NamedTuple{z::Vector{Quantity{Float64, 𝚯, Unitful.FreeUnits{(K,), 𝚯, nothing}}}}}, Tuple{Vector{Symbol}, GeoStatsModels.FittedPolynomial{Polynomial, GeoStatsModels.PolynomialState{GeoTable{CartesianGrid{𝔼{2}, Cartesian2D{NoDatum, Quantity{Float64, 𝐋, Unitful.FreeUnits{(m,), 𝐋, nothing}}}, 2, Tuple{Quantity{Float64, 𝐋, Unitful.FreeUnits{(m,), 𝐋, nothing}}, Quantity{Float64, 𝐋, Unitful.FreeUnits{(m,), 𝐋, nothing}}}}, TableTransforms.TableSelection{@NamedTuple{z::Vector{Quantity{Float64, 𝚯, Unitful.FreeUnits{(K,), 𝚯, nothing}}}}, @NamedTuple{z::Vector{Quantity{Float64, 𝚯, Unitful.FreeUnits{(K,), 𝚯, nothing}}}}}}, Matrix{Float64}}}}}
   %3 = TransformsBase.first::Core.Const(first)
   %4 = (%1)(%2, %3)::GeoTable{CartesianGrid{𝔼{2}, Cartesian2D{NoDatum, Quantity{Float64, 𝐋, Unitful.FreeUnits{(m,), 𝐋, nothing}}}, 2, Tuple{Quantity{Float64, 𝐋, Unitful.FreeUnits{(m,), 𝐋, nothing}}, Quantity{Float64, 𝐋, Unitful.FreeUnits{(m,), 𝐋, nothing}}}}, @NamedTuple{z::Vector{Quantity{Float64, 𝚯, Unitful.FreeUnits{(K,), 𝚯, nothing}}}}}
└──      return %4

And I don't know what to do to investigate the issue further.

The actual code that is executed by the functor is here: https://github.com/JuliaEarth/GeoStatsTransforms.jl/blob/b378c9d9fe19ec094c455f6a380ab2a706b28676/src/detrend.jl#L44-L70

As you can see, there is a final map call with some captures. Is that the reason for the compiler struggle? How to fix the code? I tried to follow the let trick discussed in the Julia manual, but it didn't solve the issue: https://docs.julialang.org/en/v1/manual/performance-tips/#man-performance-captured

view this post on Zulip Kyle Beggs (Jul 16 2025 at 15:26):

these issues are tough to figure out. Usually, I would try to factor out the lines in the functor and just evaluate line by line to try to isolate which is the offender. One would say "use the debugger", too bad the Julia debugger sucks.

view this post on Zulip Júlio Hoffimann (Jul 16 2025 at 15:27):

I know the offender line is the predict call in the map. But I can't figure out a fix.

view this post on Zulip Júlio Hoffimann (Jul 16 2025 at 15:31):

And the worst thing is that this same code used to work in previous Julia versions I think.

view this post on Zulip Kyle Beggs (Jul 16 2025 at 15:36):

are you on 1.11.6?

view this post on Zulip Júlio Hoffimann (Jul 16 2025 at 15:40):

Yes

view this post on Zulip jar (Jul 16 2025 at 16:51):

JET.@report_opt might be worth trying

view this post on Zulip Mosè Giordano (Jul 16 2025 at 20:28):

Kyle Beggs said:

One would say "use the debugger", too bad the Julia debugger sucks.

I don't think anyone would suggest to use a debugger to figure out why inference is failing, since it wouldn't help at all whit that. Using Cthulhu.jl would be a better idea instead.

view this post on Zulip Júlio Hoffimann (Jul 16 2025 at 20:52):

Cthulhu.jl doesn't compile for me on Julia v1.11.6

view this post on Zulip jar (Jul 16 2025 at 20:55):

In a ]activate --temp environment, even?

view this post on Zulip Júlio Hoffimann (Jul 16 2025 at 20:57):

Yes, it fails to precompile:

julia> using Cthulhu
Precompiling Cthulhu...
Info Given Cthulhu was explicitly requested, output will be shown live
WARNING: could not import JuliaSyntax.child into Cthulhu
 Error: Errorred while running the precompile workload, the package may or may not work but latency will be long
   exeption =
    UndefVarError: `child` not defined in `TypedSyntax`
    Suggestion: check for spelling errors or missing imports.
    Stacktrace:
      [1] collect_symbol_nodes(rootnode::JuliaSyntax.SyntaxNode)
        @ TypedSyntax ~/.julia/packages/TypedSyntax/eS4sW/src/node.jl:417
      [2] map_ssas_to_source(src::Core.CodeInfo, mi::Core.MethodInstance, rootnode::JuliaSyntax.SyntaxNode, Δline::Int64)
        @ TypedSyntax ~/.julia/packages/TypedSyntax/eS4sW/src/node.jl:529
      [3] map_ssas_to_source
        @ ~/.julia/packages/TypedSyntax/eS4sW/src/node.jl:868 [inlined]
      [4] tsn_and_mappings(mi::Core.MethodInstance, src::Core.CodeInfo, rt::Any, sourcetext::SubString{String}, lineno::Int32; warn::Bool, strip_macros::Bool, kwargs::@Kwargs{})
        @ TypedSyntax ~/.julia/packages/TypedSyntax/eS4sW/src/node.jl:54
      [5] tsn_and_mappings(mi::Core.MethodInstance, src::Core.CodeInfo, rt::Any; warn::Bool, strip_macros::Bool, kwargs::@Kwargs{})
        @ TypedSyntax ~/.julia/packages/TypedSyntax/eS4sW/src/node.jl:39
      [6] get_typed_sourcetext(mi::Core.MethodInstance, src::Core.CodeInfo, rt::Any; warn::Bool)
        @ Cthulhu ~/.julia/packages/Cthulhu/kMTSr/src/reflection.jl:377
      [7] find_callsites(interp::Cthulhu.CthulhuInterpreter, CI::Core.CodeInfo, stmt_infos::Vector{Core.Compiler.CallInfo}, mi::Core.MethodInstance, slottypes::Vector{Any}, optimize::Bool, annotate_source::Bool, pc2excts::Nothing)
        @ Cthulhu ~/.julia/packages/Cthulhu/kMTSr/src/reflection.jl:33
      [8] _descend(term::REPL.Terminals.TTYTerminal, interp::Cthulhu.CthulhuInterpreter, curs::Cthulhu.CthulhuCursor; override::Nothing, debuginfo::Symbol, optimize::Bool, interruptexc::Bool, iswarn::Bool, hide_type_stable::Bool, verbose::Nothing, remarks::Bool, with_effects::Bool, exception_type::Bool, inline_cost::Bool, type_annotations::Bool, annotate_source::Bool, inlay_types_vscode::Bool, diagnostics_vscode::Bool, jump_always::Bool)
        @ Cthulhu ~/.julia/packages/Cthulhu/kMTSr/src/CthulhuBase.jl:405
      [9] _descend(term::REPL.Terminals.TTYTerminal, interp::Cthulhu.CthulhuInterpreter, mi::Core.MethodInstance; kwargs::@Kwargs{iswarn::Bool})
        @ Cthulhu ~/.julia/packages/Cthulhu/kMTSr/src/CthulhuBase.jl:709
     [10] _descend(term::REPL.Terminals.TTYTerminal, args::Any; interp::Core.Compiler.NativeInterpreter, kwargs::@Kwargs{iswarn::Bool})
        @ Cthulhu ~/.julia/packages/Cthulhu/kMTSr/src/CthulhuBase.jl:725
     [11] __descend_with_error_handling(args::Any; terminal::Any, kwargs...)
        @ Cthulhu ~/.julia/packages/Cthulhu/kMTSr/src/CthulhuBase.jl:138
     [12] _descend_with_error_handling(f::Any, argtypes::Any; kwargs::@Kwargs{iswarn::Bool, terminal::REPL.Terminals.TTYTerminal})
        @ Cthulhu ~/.julia/packages/Cthulhu/kMTSr/src/CthulhuBase.jl:127
     [13] descend_impl(::Any, ::Vararg{Any}; kwargs::@Kwargs{terminal::REPL.Terminals.TTYTerminal})
        @ Cthulhu ~/.julia/packages/Cthulhu/kMTSr/src/CthulhuBase.jl:152
     [14] descend(::Any, ::Vararg{Any}; kwargs...)
        @ Cthulhu ~/.julia/packages/Cthulhu/kMTSr/src/Cthulhu.jl:116
     [15] descend
        @ ~/.julia/packages/Cthulhu/kMTSr/src/Cthulhu.jl:115 [inlined]
     [16] macro expansion
        @ ~/.julia/packages/PrecompileTools/L8A3n/src/workloads.jl:78 [inlined]
     [17] macro expansion
        @ ~/.julia/packages/Cthulhu/kMTSr/src/Cthulhu.jl:203 [inlined]
     [18] macro expansion
        @ ~/.julia/packages/PrecompileTools/L8A3n/src/workloads.jl:140 [inlined]
     [19] top-level scope
        @ ~/.julia/packages/Cthulhu/kMTSr/src/Cthulhu.jl:197
     [20] include
        @ ./Base.jl:562 [inlined]
     [21] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt128}}, source::Nothing)
        @ Base ./loading.jl:2881
     [22] top-level scope
        @ stdin:6
     [23] eval
        @ ./boot.jl:430 [inlined]
     [24] include_string(mapexpr::typeof(identity), mod::Module, code::String, filename::String)
        @ Base ./loading.jl:2734
     [25] include_string
        @ ./loading.jl:2744 [inlined]
     [26] exec_options(opts::Base.JLOptions)
        @ Base ./client.jl:321
     [27] _start()
        @ Base ./client.jl:531
 @ Cthulhu ~/.julia/packages/Cthulhu/kMTSr/src/Cthulhu.jl:213
  2 dependencies successfully precompiled in 20 seconds. 17 already precompiled.
  2 dependencies had output during precompilation:
 Cthulhu
  [Output was shown above]

 TypedSyntax
  WARNING: could not import JuliaSyntax.child into TypedSyntax

view this post on Zulip Júlio Hoffimann (Jul 16 2025 at 20:58):

It is stuck in a previous version:

(jl_NQMOcX) pkg> st --outdated
Status `/tmp/jl_NQMOcX/Project.toml`
 [f68482b8] Cthulhu v2.16.5 (<v2.17.4): julia

view this post on Zulip Eric Hanson (Jul 16 2025 at 22:03):

to me that looks like your Cthulhu needs old JuliaSyntax (https://github.com/JuliaDebug/Cthulhu.jl/blob/747aaf89e727d35563c31c48694ccc8576ffe355/Project.toml#L11) but you likely have a different version loaded. Common issue with stacked environments, e.g. https://github.com/JuliaTesting/ExplicitImports.jl/issues/119

view this post on Zulip Júlio Hoffimann (Jul 16 2025 at 22:08):

This is a temp env

view this post on Zulip Mason Protter (Jul 16 2025 at 22:17):

Yes, but it was likely stacked on another environment that already loaded a different juliasyntax version

view this post on Zulip Júlio Hoffimann (Jul 16 2025 at 22:18):

You mean I should uninstall the packages in my @v1.11 env?

view this post on Zulip Júlio Hoffimann (Jul 16 2025 at 22:19):

In order to install Cthulhu.jl in a temp env without conflict ?

view this post on Zulip Júlio Hoffimann (Jul 16 2025 at 22:19):

I thought a temp env was completely independent from the global env

view this post on Zulip Júlio Hoffimann (Jul 16 2025 at 23:52):

I found the issue by just reading the code carefully. There is a matrix-vector multiplication P * z where z is the column of the table. When z is unitful, the call hits a super slow multiplication fallback.

view this post on Zulip Júlio Hoffimann (Jul 16 2025 at 23:58):

Suppose P = rand(3, 300000) and z = rand(300000) * u"K". How to efficiently compute P * z without allocating intermediate memory?

The following attempt is memory-hungry:

u = unit(eltype(z))
(P * ustrip.(z)) * u

and because this multiplication happens millions of times in a hot-loop I can't afford the allocations.

view this post on Zulip Júlio Hoffimann (Jul 16 2025 at 23:59):

Even if a new method Base.:*(A::AbstractMatrix, z::AbstractVector{<:Quantity}) is added to Unitful.jl, I don't see how to avoid the intermediate allocation? Is there a low-level workaround to manipulate the buffer of floats without the units during matrix-vector multiplication?

view this post on Zulip Júlio Hoffimann (Jul 17 2025 at 00:02):

I will try to open up the multiplication as a linear combination of columns of P to see how the performance is affected.

view this post on Zulip Júlio Hoffimann (Jul 17 2025 at 00:06):

The fastest alternative so far:

sum(pj * zj for (pj, zj) in zip(eachcol(P), z))

view this post on Zulip Júlio Hoffimann (Jul 17 2025 at 00:27):

Actually, I don't know if that is the actual issue. Just measured the time again and the difference is small compared to the unitless case.

view this post on Zulip Júlio Hoffimann (Jul 17 2025 at 00:27):

I am back to where I started. Will keep debugging.

view this post on Zulip wheeheee (Jul 17 2025 at 01:01):

Have you tried the profiler? If you’re already using vscode you can use @profview with the code you run.

view this post on Zulip Júlio Hoffimann (Jul 17 2025 at 01:02):

I'll dive into it again tomorrow after fixing other issues in other places

view this post on Zulip Júlio Hoffimann (Jul 17 2025 at 01:03):

It is frustrating to encounter these situations in Julia. The code is ok one day and then out of the sudden it stops working.

view this post on Zulip Júlio Hoffimann (Jul 17 2025 at 01:03):

I discovered this issue today trying to rerun a workshop material I prepared last year

view this post on Zulip wheeheee (Jul 17 2025 at 03:47):

I can't reproduce your Cthulhu problem with a temp environment with only ]add GeoStats, Cthulhu then using GeoStats, Cthulhu. But with a smaller z = rand(1000, 60); it takes > 2x the time for the unitful version. @profview shows that most of the time is spent at P * z which you've already identified so I guess you just have to hammer away at that.

view this post on Zulip Júlio Hoffimann (Jul 17 2025 at 19:34):

So it seems that this is indeed the root of the issue? The matrix-vector multiplication where the vector is unitful. I wonder why this script I wrote 10 months ago for a workshop worked without a problem. The Julia version back then was v1.10.5 but I am having issue running it with juliaup: #helpdesk (published) > juliaup failing to launch v1.10.5

view this post on Zulip wheeheee (Jul 18 2025 at 06:32):

Júlio Hoffimann said:

Suppose P = rand(3, 300000) and z = rand(300000) * u"K". How to efficiently compute P * z without allocating intermediate memory?

The following attempt is memory-hungry:

u = unit(eltype(z))
(P * ustrip.(z)) * u

and because this multiplication happens millions of times in a hot-loop I can't afford the allocations.

Actually, I think you almost have it. ustrip.(z) allocates a nice new Array, but you can get away with ustrip(z) which gets you a ReinterpretArray without allocating, like

u = unit(eltype(z))
θ = P * ustrip(z)
return V[1, :]' * θ * u

Assuming P is unitless, generic_matvecmul! should use gemv! and be fast.
incidentally vandermonde looks like it allocates a lot more, and since you use this in a hot loop...

view this post on Zulip Júlio Hoffimann (Jul 18 2025 at 06:35):

Didn't know about the ustrip method for arrays returned a ReinterpretArray. That should certainly help! Thank you! Will try as soon I'm back to my workstation.

view this post on Zulip Júlio Hoffimann (Jul 18 2025 at 11:40):

Weirdly enough, the trick with ustrip solved the case where z is unitful, but slowed down the case where it is a plain vector of floats.

view this post on Zulip Júlio Hoffimann (Jul 18 2025 at 11:40):

I will add an auxiliary function to branch between behaviors at compile time. But it is annoying to deal with this manually.

view this post on Zulip aplavin (Jul 18 2025 at 11:50):

I also wish there was an official "free" way to do ustrip (and the other way around, to add units for free).
See https://github.com/PainterQubits/Unitful.jl/issues/717 for some discussion.

view this post on Zulip Júlio Hoffimann (Jul 18 2025 at 11:52):

Very relevant issue, thanks for sharing :100:

view this post on Zulip Júlio Hoffimann (Jul 18 2025 at 11:53):

The problem I mentioned however is only concerned with Array, and I don't know why ustrip is causing a slowdown when the input array is a plain array of floats. I thought it should just forward the array as is.

view this post on Zulip Júlio Hoffimann (Jul 18 2025 at 11:54):

But it is allocating every time.

view this post on Zulip Júlio Hoffimann (Jul 18 2025 at 11:56):

Commented on the issue.

view this post on Zulip aplavin (Jul 18 2025 at 12:08):

Hmm, idk how but I get faster multiplication with Unitful and slower with plain numbers:
image.png

view this post on Zulip Júlio Hoffimann (Jul 18 2025 at 12:10):

Can you try with zu = z * u"K" just to make sure we are using the exact same inputs?

view this post on Zulip Júlio Hoffimann (Jul 18 2025 at 12:10):

I will run locally here as well.

view this post on Zulip Júlio Hoffimann (Jul 18 2025 at 12:12):

Can't reproduce:

julia> @btime $P * $z;
  71.367 μs (2 allocations: 80 bytes)

julia> @btime $P * $zu;
  747.104 μs (2 allocations: 80 bytes)

view this post on Zulip aplavin (Jul 18 2025 at 13:00):

For me it's very stable, unitful ~2x faster. Macbook m2.

view this post on Zulip Júlio Hoffimann (Jul 18 2025 at 13:02):

Julia Version 1.11.6
Commit 9615af0f269 (2025-07-09 12:58 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, skylake)
Threads: 8 default, 0 interactive, 4 GC (on 8 virtual cores)
Environment:
  JULIA_NUM_THREADS = 8

view this post on Zulip wheeheee (Jul 19 2025 at 01:48):

I mean, it’s possible gemv! hasn’t been optimized well for the M2 / ARM chips in general?

view this post on Zulip wheeheee (Jul 19 2025 at 01:49):

In which case the generic fallback could be better

view this post on Zulip wheeheee (Jul 19 2025 at 01:51):

Júlio Hoffimann said:

Weirdly enough, the trick with ustrip solved the case where z is unitful, but slowed down the case where it is a plain vector of floats.

That’s really weird, it should compile to identity once it’s specialized or something


Last updated: Jul 22 2025 at 04:56 UTC