When writing a GPU kernel is it a bad idea to rely on constant propagation?
For example if I have the kernel helper function u01 from PhiloxRNG.jl
@inline function u01(::Type{Float32}, u::UInt32)::Float32
fma(Float32(u), Float32(2)^(-32), Float32(2)^(-33))
end
Is it better to instead do:
const _f32_2_to_the_neg_32 = Float32(2)^(-32)
const _f32_2_to_the_neg_33 = Float32(2)^(-33)
@inline function u01(::Type{Float32}, u::UInt32)::Float32
fma(Float32(u), _f32_2_to_the_neg_32, _f32_2_to_the_neg_33)
end
Because in the first case the ^ will need to be done on the GPU?
I see in the C++ version of this function they also needed to do some workarounds to get this to work.
Background is I am trying to understand why https://github.com/JuliaGPU/OpenCL.jl/pull/428 was needed, but this seems like a more general question for how to write GPU kernels in Julia.
I'm moderately sure the constant expressions are expanded very early in the Julia front-end, you should already see them with code_typed
Nathan Zimmerberg said:
Background is I am trying to understand why https://github.com/JuliaGPU/OpenCL.jl/pull/428 was needed, but this seems like a more general question for how to write GPU kernels in Julia.
Motivation of that PR seems to be that whenever the power operation runs on GPU, that'd widen numbers to larger type, which is either a performance hit or completely impossible (some GPUs like Metal don't support double precision), but that seems orthogonal to me to the first part of your question (when the constant expression is resolved). But maybe I'm misunderstanding something
Last updated: May 13 2026 at 07:35 UTC