I've been fighting for hours with the compiler and can't figure out what is happening. Appreciate any help.
The following code works fine with unitless columns:
using GeoStats
# table with single column z
t = georef((z=rand(1000,300),))
d = Detrend()
@time d(t);
26.553297 seconds (10.20 M allocations: 407.415 MiB, 0.17% gc time, 0.02% compilation time)
If I add units to the column:
t = georef((z=rand(1000,300)u"K",))
@time d(t);
then the execution never ends (or takes too long, I don't know).
The output of the functor is inferred properly:
@code_warntype d(t)
MethodInstance for (::Detrend{ColumnSelectors.AllSelector})(::GeoTable{CartesianGrid{𝔼{2}, Cartesian2D{NoDatum, Quantity{Float64, 𝐋, Unitful.FreeUnits{(m,), 𝐋, nothing}}}, 2, Tuple{Quantity{Float64, 𝐋, Unitful.FreeUnits{(m,), 𝐋, nothing}}, Quantity{Float64, 𝐋, Unitful.FreeUnits{(m,), 𝐋, nothing}}}}, @NamedTuple{z::Vector{Quantity{Float64, 𝚯, Unitful.FreeUnits{(K,), 𝚯, nothing}}}}})
from (transform::TransformsBase.Transform)(object) @ TransformsBase ~/.julia/packages/TransformsBase/w6gll/src/interface.jl:124
Arguments
transform::Detrend{ColumnSelectors.AllSelector}
object::GeoTable{CartesianGrid{𝔼{2}, Cartesian2D{NoDatum, Quantity{Float64, 𝐋, Unitful.FreeUnits{(m,), 𝐋, nothing}}}, 2, Tuple{Quantity{Float64, 𝐋, Unitful.FreeUnits{(m,), 𝐋, nothing}}, Quantity{Float64, 𝐋, Unitful.FreeUnits{(m,), 𝐋, nothing}}}}, @NamedTuple{z::Vector{Quantity{Float64, 𝚯, Unitful.FreeUnits{(K,), 𝚯, nothing}}}}}
Body::GeoTable{CartesianGrid{𝔼{2}, Cartesian2D{NoDatum, Quantity{Float64, 𝐋, Unitful.FreeUnits{(m,), 𝐋, nothing}}}, 2, Tuple{Quantity{Float64, 𝐋, Unitful.FreeUnits{(m,), 𝐋, nothing}}, Quantity{Float64, 𝐋, Unitful.FreeUnits{(m,), 𝐋, nothing}}}}, @NamedTuple{z::Vector{Quantity{Float64, 𝚯, Unitful.FreeUnits{(K,), 𝚯, nothing}}}}}
1 ─ %1 = TransformsBase.:|>::Core.Const(|>)
│ %2 = TransformsBase.apply(transform, object)::Tuple{GeoTable{CartesianGrid{𝔼{2}, Cartesian2D{NoDatum, Quantity{Float64, 𝐋, Unitful.FreeUnits{(m,), 𝐋, nothing}}}, 2, Tuple{Quantity{Float64, 𝐋, Unitful.FreeUnits{(m,), 𝐋, nothing}}, Quantity{Float64, 𝐋, Unitful.FreeUnits{(m,), 𝐋, nothing}}}}, @NamedTuple{z::Vector{Quantity{Float64, 𝚯, Unitful.FreeUnits{(K,), 𝚯, nothing}}}}}, Tuple{Vector{Symbol}, GeoStatsModels.FittedPolynomial{Polynomial, GeoStatsModels.PolynomialState{GeoTable{CartesianGrid{𝔼{2}, Cartesian2D{NoDatum, Quantity{Float64, 𝐋, Unitful.FreeUnits{(m,), 𝐋, nothing}}}, 2, Tuple{Quantity{Float64, 𝐋, Unitful.FreeUnits{(m,), 𝐋, nothing}}, Quantity{Float64, 𝐋, Unitful.FreeUnits{(m,), 𝐋, nothing}}}}, TableTransforms.TableSelection{@NamedTuple{z::Vector{Quantity{Float64, 𝚯, Unitful.FreeUnits{(K,), 𝚯, nothing}}}}, @NamedTuple{z::Vector{Quantity{Float64, 𝚯, Unitful.FreeUnits{(K,), 𝚯, nothing}}}}}}, Matrix{Float64}}}}}
│ %3 = TransformsBase.first::Core.Const(first)
│ %4 = (%1)(%2, %3)::GeoTable{CartesianGrid{𝔼{2}, Cartesian2D{NoDatum, Quantity{Float64, 𝐋, Unitful.FreeUnits{(m,), 𝐋, nothing}}}, 2, Tuple{Quantity{Float64, 𝐋, Unitful.FreeUnits{(m,), 𝐋, nothing}}, Quantity{Float64, 𝐋, Unitful.FreeUnits{(m,), 𝐋, nothing}}}}, @NamedTuple{z::Vector{Quantity{Float64, 𝚯, Unitful.FreeUnits{(K,), 𝚯, nothing}}}}}
└── return %4
And I don't know what to do to investigate the issue further.
The actual code that is executed by the functor is here: https://github.com/JuliaEarth/GeoStatsTransforms.jl/blob/b378c9d9fe19ec094c455f6a380ab2a706b28676/src/detrend.jl#L44-L70
As you can see, there is a final map
call with some captures. Is that the reason for the compiler struggle? How to fix the code? I tried to follow the let
trick discussed in the Julia manual, but it didn't solve the issue: https://docs.julialang.org/en/v1/manual/performance-tips/#man-performance-captured
these issues are tough to figure out. Usually, I would try to factor out the lines in the functor and just evaluate line by line to try to isolate which is the offender. One would say "use the debugger", too bad the Julia debugger sucks.
I know the offender line is the predict
call in the map
. But I can't figure out a fix.
And the worst thing is that this same code used to work in previous Julia versions I think.
are you on 1.11.6?
Yes
JET.@report_opt
might be worth trying
Kyle Beggs said:
One would say "use the debugger", too bad the Julia debugger sucks.
I don't think anyone would suggest to use a debugger to figure out why inference is failing, since it wouldn't help at all whit that. Using Cthulhu.jl would be a better idea instead.
Cthulhu.jl doesn't compile for me on Julia v1.11.6
In a ]activate --temp
environment, even?
Yes, it fails to precompile:
julia> using Cthulhu
Precompiling Cthulhu...
Info Given Cthulhu was explicitly requested, output will be shown live
WARNING: could not import JuliaSyntax.child into Cthulhu
┌ Error: Errorred while running the precompile workload, the package may or may not work but latency will be long
│ exeption =
│ UndefVarError: `child` not defined in `TypedSyntax`
│ Suggestion: check for spelling errors or missing imports.
│ Stacktrace:
│ [1] collect_symbol_nodes(rootnode::JuliaSyntax.SyntaxNode)
│ @ TypedSyntax ~/.julia/packages/TypedSyntax/eS4sW/src/node.jl:417
│ [2] map_ssas_to_source(src::Core.CodeInfo, mi::Core.MethodInstance, rootnode::JuliaSyntax.SyntaxNode, Δline::Int64)
│ @ TypedSyntax ~/.julia/packages/TypedSyntax/eS4sW/src/node.jl:529
│ [3] map_ssas_to_source
│ @ ~/.julia/packages/TypedSyntax/eS4sW/src/node.jl:868 [inlined]
│ [4] tsn_and_mappings(mi::Core.MethodInstance, src::Core.CodeInfo, rt::Any, sourcetext::SubString{String}, lineno::Int32; warn::Bool, strip_macros::Bool, kwargs::@Kwargs{})
│ @ TypedSyntax ~/.julia/packages/TypedSyntax/eS4sW/src/node.jl:54
│ [5] tsn_and_mappings(mi::Core.MethodInstance, src::Core.CodeInfo, rt::Any; warn::Bool, strip_macros::Bool, kwargs::@Kwargs{})
│ @ TypedSyntax ~/.julia/packages/TypedSyntax/eS4sW/src/node.jl:39
│ [6] get_typed_sourcetext(mi::Core.MethodInstance, src::Core.CodeInfo, rt::Any; warn::Bool)
│ @ Cthulhu ~/.julia/packages/Cthulhu/kMTSr/src/reflection.jl:377
│ [7] find_callsites(interp::Cthulhu.CthulhuInterpreter, CI::Core.CodeInfo, stmt_infos::Vector{Core.Compiler.CallInfo}, mi::Core.MethodInstance, slottypes::Vector{Any}, optimize::Bool, annotate_source::Bool, pc2excts::Nothing)
│ @ Cthulhu ~/.julia/packages/Cthulhu/kMTSr/src/reflection.jl:33
│ [8] _descend(term::REPL.Terminals.TTYTerminal, interp::Cthulhu.CthulhuInterpreter, curs::Cthulhu.CthulhuCursor; override::Nothing, debuginfo::Symbol, optimize::Bool, interruptexc::Bool, iswarn::Bool, hide_type_stable::Bool, verbose::Nothing, remarks::Bool, with_effects::Bool, exception_type::Bool, inline_cost::Bool, type_annotations::Bool, annotate_source::Bool, inlay_types_vscode::Bool, diagnostics_vscode::Bool, jump_always::Bool)
│ @ Cthulhu ~/.julia/packages/Cthulhu/kMTSr/src/CthulhuBase.jl:405
│ [9] _descend(term::REPL.Terminals.TTYTerminal, interp::Cthulhu.CthulhuInterpreter, mi::Core.MethodInstance; kwargs::@Kwargs{iswarn::Bool})
│ @ Cthulhu ~/.julia/packages/Cthulhu/kMTSr/src/CthulhuBase.jl:709
│ [10] _descend(term::REPL.Terminals.TTYTerminal, args::Any; interp::Core.Compiler.NativeInterpreter, kwargs::@Kwargs{iswarn::Bool})
│ @ Cthulhu ~/.julia/packages/Cthulhu/kMTSr/src/CthulhuBase.jl:725
│ [11] __descend_with_error_handling(args::Any; terminal::Any, kwargs...)
│ @ Cthulhu ~/.julia/packages/Cthulhu/kMTSr/src/CthulhuBase.jl:138
│ [12] _descend_with_error_handling(f::Any, argtypes::Any; kwargs::@Kwargs{iswarn::Bool, terminal::REPL.Terminals.TTYTerminal})
│ @ Cthulhu ~/.julia/packages/Cthulhu/kMTSr/src/CthulhuBase.jl:127
│ [13] descend_impl(::Any, ::Vararg{Any}; kwargs::@Kwargs{terminal::REPL.Terminals.TTYTerminal})
│ @ Cthulhu ~/.julia/packages/Cthulhu/kMTSr/src/CthulhuBase.jl:152
│ [14] descend(::Any, ::Vararg{Any}; kwargs...)
│ @ Cthulhu ~/.julia/packages/Cthulhu/kMTSr/src/Cthulhu.jl:116
│ [15] descend
│ @ ~/.julia/packages/Cthulhu/kMTSr/src/Cthulhu.jl:115 [inlined]
│ [16] macro expansion
│ @ ~/.julia/packages/PrecompileTools/L8A3n/src/workloads.jl:78 [inlined]
│ [17] macro expansion
│ @ ~/.julia/packages/Cthulhu/kMTSr/src/Cthulhu.jl:203 [inlined]
│ [18] macro expansion
│ @ ~/.julia/packages/PrecompileTools/L8A3n/src/workloads.jl:140 [inlined]
│ [19] top-level scope
│ @ ~/.julia/packages/Cthulhu/kMTSr/src/Cthulhu.jl:197
│ [20] include
│ @ ./Base.jl:562 [inlined]
│ [21] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt128}}, source::Nothing)
│ @ Base ./loading.jl:2881
│ [22] top-level scope
│ @ stdin:6
│ [23] eval
│ @ ./boot.jl:430 [inlined]
│ [24] include_string(mapexpr::typeof(identity), mod::Module, code::String, filename::String)
│ @ Base ./loading.jl:2734
│ [25] include_string
│ @ ./loading.jl:2744 [inlined]
│ [26] exec_options(opts::Base.JLOptions)
│ @ Base ./client.jl:321
│ [27] _start()
│ @ Base ./client.jl:531
└ @ Cthulhu ~/.julia/packages/Cthulhu/kMTSr/src/Cthulhu.jl:213
2 dependencies successfully precompiled in 20 seconds. 17 already precompiled.
2 dependencies had output during precompilation:
┌ Cthulhu
│ [Output was shown above]
└
┌ TypedSyntax
│ WARNING: could not import JuliaSyntax.child into TypedSyntax
It is stuck in a previous version:
(jl_NQMOcX) pkg> st --outdated
Status `/tmp/jl_NQMOcX/Project.toml`
⌅ [f68482b8] Cthulhu v2.16.5 (<v2.17.4): julia
to me that looks like your Cthulhu needs old JuliaSyntax (https://github.com/JuliaDebug/Cthulhu.jl/blob/747aaf89e727d35563c31c48694ccc8576ffe355/Project.toml#L11) but you likely have a different version loaded. Common issue with stacked environments, e.g. https://github.com/JuliaTesting/ExplicitImports.jl/issues/119
This is a temp env
Yes, but it was likely stacked on another environment that already loaded a different juliasyntax version
You mean I should uninstall the packages in my @v1.11
env?
In order to install Cthulhu.jl in a temp env without conflict ?
I thought a temp env was completely independent from the global env
I found the issue by just reading the code carefully. There is a matrix-vector multiplication P * z
where z
is the column of the table. When z
is unitful, the call hits a super slow multiplication fallback.
Suppose P = rand(3, 300000)
and z = rand(300000) * u"K"
. How to efficiently compute P * z
without allocating intermediate memory?
The following attempt is memory-hungry:
u = unit(eltype(z))
(P * ustrip.(z)) * u
and because this multiplication happens millions of times in a hot-loop I can't afford the allocations.
Even if a new method Base.:*(A::AbstractMatrix, z::AbstractVector{<:Quantity})
is added to Unitful.jl, I don't see how to avoid the intermediate allocation? Is there a low-level workaround to manipulate the buffer of floats without the units during matrix-vector multiplication?
I will try to open up the multiplication as a linear combination of columns of P
to see how the performance is affected.
The fastest alternative so far:
sum(pj * zj for (pj, zj) in zip(eachcol(P), z))
Actually, I don't know if that is the actual issue. Just measured the time again and the difference is small compared to the unitless case.
I am back to where I started. Will keep debugging.
Have you tried the profiler? If you’re already using vscode you can use @profview
with the code you run.
I'll dive into it again tomorrow after fixing other issues in other places
It is frustrating to encounter these situations in Julia. The code is ok one day and then out of the sudden it stops working.
I discovered this issue today trying to rerun a workshop material I prepared last year
I can't reproduce your Cthulhu problem with a temp environment with only ]add GeoStats, Cthulhu
then using GeoStats, Cthulhu
. But with a smaller z = rand(1000, 60);
it takes > 2x the time for the unitful version. @profview
shows that most of the time is spent at P * z
which you've already identified so I guess you just have to hammer away at that.
So it seems that this is indeed the root of the issue? The matrix-vector multiplication where the vector is unitful. I wonder why this script I wrote 10 months ago for a workshop worked without a problem. The Julia version back then was v1.10.5 but I am having issue running it with juliaup: #helpdesk (published) > juliaup failing to launch v1.10.5
Júlio Hoffimann said:
Suppose
P = rand(3, 300000)
andz = rand(300000) * u"K"
. How to efficiently computeP * z
without allocating intermediate memory?The following attempt is memory-hungry:
u = unit(eltype(z)) (P * ustrip.(z)) * u
and because this multiplication happens millions of times in a hot-loop I can't afford the allocations.
Actually, I think you almost have it. ustrip.(z)
allocates a nice new Array
, but you can get away with ustrip(z)
which gets you a ReinterpretArray
without allocating, like
u = unit(eltype(z))
θ = P * ustrip(z)
return V[1, :]' * θ * u
Assuming P
is unitless, generic_matvecmul!
should use gemv!
and be fast.
incidentally vandermonde
looks like it allocates a lot more, and since you use this in a hot loop...
Didn't know about the ustrip method for arrays returned a ReinterpretArray
. That should certainly help! Thank you! Will try as soon I'm back to my workstation.
Weirdly enough, the trick with ustrip
solved the case where z
is unitful, but slowed down the case where it is a plain vector of floats.
I will add an auxiliary function to branch between behaviors at compile time. But it is annoying to deal with this manually.
I also wish there was an official "free" way to do ustrip
(and the other way around, to add units for free).
See https://github.com/PainterQubits/Unitful.jl/issues/717 for some discussion.
Very relevant issue, thanks for sharing :100:
The problem I mentioned however is only concerned with Array
, and I don't know why ustrip
is causing a slowdown when the input array is a plain array of floats. I thought it should just forward the array as is.
But it is allocating every time.
Commented on the issue.
Hmm, idk how but I get faster multiplication with Unitful and slower with plain numbers:
image.png
Can you try with zu = z * u"K"
just to make sure we are using the exact same inputs?
I will run locally here as well.
Can't reproduce:
julia> @btime $P * $z;
71.367 μs (2 allocations: 80 bytes)
julia> @btime $P * $zu;
747.104 μs (2 allocations: 80 bytes)
For me it's very stable, unitful ~2x faster. Macbook m2.
Julia Version 1.11.6
Commit 9615af0f269 (2025-07-09 12:58 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 8 × Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz
WORD_SIZE: 64
LLVM: libLLVM-16.0.6 (ORCJIT, skylake)
Threads: 8 default, 0 interactive, 4 GC (on 8 virtual cores)
Environment:
JULIA_NUM_THREADS = 8
I mean, it’s possible gemv!
hasn’t been optimized well for the M2 / ARM chips in general?
In which case the generic fallback could be better
Júlio Hoffimann said:
Weirdly enough, the trick with
ustrip
solved the case wherez
is unitful, but slowed down the case where it is a plain vector of floats.
That’s really weird, it should compile to identity once it’s specialized or something
Last updated: Jul 22 2025 at 04:56 UTC