Is there currently a solid way of checking whether it's safe to precompile code making use of specific CPU instructions?
I have this package that uses llvmcall
, by checking which CPU instructions the host CPU supports. Problem is, when the user then sets export JULIA_CPU_TARGET="generic"
, it produces the wrong code and crashes.
How does LoopVectorization handle this? cc @chriselrod
Looks like the JuliaSIMD packages use https://github.com/JuliaSIMD/HostCPUFeatures.jl
e.g.
julia> using HostCPUFeatures
julia> HostCPUFeatures.simd_integer_register_size()
static(32)
julia> HostCPUFeatures.pick_vector_width(Float64)
static(4)
julia> HostCPUFeatures.pick_vector_width(Float16)
static(8)
Looks like it, but then there is https://github.com/JuliaSIMD/HostCPUFeatures.jl/issues/12 in which it's not quite clear if it was fixed.
Anyway, I can try it!
It seems like the PR that closed the issue should fix that issue right?
Oh yeah it does seem to work actually :smiley:
nice!
Nevermind, it doesn't work still
julia> using ScanByte
[ Info: Precompiling ScanByte [7b38b023-a4d7-4c5e-8d43-3f3097f304eb]
LLVM ERROR: Do not know how to split the result of this operator!
Ah that's too bad. I'd maybe open a new issue or try to get that old one re-opened
it's basically branching on whaat Base.Sys
claims
As far as I can tell, it is just not possible to robustly write ISA-specific code in Julia, because there is no documented or even semi-legit looking method of checking which code can be validly emitted :frown: Too bad
yup
there's no API checking for the kind of architecture you're actually being compiled for
Base.Sys
only tells you the host system, after all
Missed this --
HostCPUFeature can only detect the host's features (which is why it's not called TargetCPUFeatures.jl
).
We don't have an API for actually detecting Julia's LLVM's JIT target, even though one has easy/direct access to this from within LLVM. Which is another one of the motivations for working as an LLVM pass; then things like sys image multiversioning, setting pkg image targets, etc should all "just work", as they do with every other LLVM-level optimization.
So the JuliaSIMD approach is to generate code for the host, and on __init__
try to detect if the host is wrong. If so, it'll try and @eval
some methods to fix things, and invalidate any cached compiled code that may now be invalid.
(This is why all queries are functions, rather than global consts, so we get backedges)
Even though sysimages for generic targets have existed for ages now, I didn't pay enough attention to the fact that precompilation like that could fail. I guess we weren't running/actually compiling enough code, even for sysimage use cases, to run into that.
The most correct thing to do would be to get builtin support for the queries:
julia> have_fma(::Type{T}) where {T} = Core.Intrinsics.have_fma(T)
have_fma (generic function with 1 method)
julia> have_fma(Float64)
true
julia> Core.Intrinsics.have_fma(Float64)
false
Note that it actually needs to compile to not give a safe/conservative/possibly pessimizing answer.
But failing that, as discussed in the issue, I would set it up to check on precompilation whether the ENV variable has been set. If so, load the appropriate generic set of capabilities.
This is pretty bad, but better than nothing.
Note also that LLVM is able to legalize many, but not all, generic intrinsics.
It probably cannot legalize instruction-specific intrsinsics.
And some generic ones can also cause aborts, until LLVM actually implements legalizing fallbacks.
For example, the matmul intrinsics, or the compressed stores/expand loads on older versions of LLVM (recent LLVM versions can scalarize these; I haven't checked if they can scalarize or vetorize the matmul intrinsics, but maybe; these are obviously aimed at tensor cores).
Ok - I'll try to make a PR to HostCPUFeatures today or one of the next days, but to me it looks like a relatively easy feature to add to Base itself. I'll also make an issue in Julia itself.
wait, which feature is easy to add to Base?
"detecting SIMD capability" can mean a whole lot of things
Sukera said:
"detecting SIMD capability" can mean a whole lot of things
I haven't looked at how have_fma
is implemented, but at least Thayer a template we should be able to follow.
Last updated: Dec 28 2024 at 04:38 UTC