I always forget when we need to use $
. The results below vary a lot:
julia> using LinearAlgebra
julia> using StaticArrays
julia> using Distances
julia> using BenchmarkTools
julia> x = SVector(1.0, 2.0, 3.0)
3-element SVector{3, Float64} with indices SOneTo(3):
1.0
2.0
3.0
julia> y = SVector(4.0, 5.0, 6.0)
3-element SVector{3, Float64} with indices SOneTo(3):
4.0
5.0
6.0
julia> w = SVector(1.0, 1.0/4.0, 1.0/9.0)
3-element SVector{3, Float64} with indices SOneTo(3):
1.0
0.25
0.1111111111111111
julia> d1 = WeightedEuclidean(w)
WeightedEuclidean{SVector{3, Float64}}([1.0, 0.25, 0.1111111111111111])
julia> d2 = Mahalanobis(Diagonal(w))
Mahalanobis{Diagonal{Float64, SVector{3, Float64}}}([1.0 0.0 0.0; 0.0 0.25 0.0; 0.0 0.0 0.1111111111111111])
julia> @btime d1(x, y);
18.138 ns (1 allocation: 16 bytes)
julia> @btime d1($x, $y);
30.080 ns (3 allocations: 80 bytes)
julia> @btime $d1($x, $y);
3.026 ns (0 allocations: 0 bytes)
julia> @btime d2(x, y);
19.406 ns (1 allocation: 16 bytes)
julia> @btime d2($x, $y);
30.923 ns (3 allocations: 80 bytes)
julia> @btime $d2($x, $y);
3.093 ns (0 allocations: 0 bytes)
Apparently d1
wins, in all scenarios, but how should I read these numbers:
The results with Chairmarks.jl are different:
julia> @b d1(x, y)
18.923 ns (1 allocs: 16 bytes)
julia> @b d1($x, $y)
30.557 ns (3 allocs: 80 bytes)
julia> @b $d1($x, $y)
3.235 ns
julia> @b d2(x, y)
19.091 ns (1 allocs: 16 bytes)
julia> @b d2($x, $y)
30.215 ns (3 allocs: 80 bytes)
julia> @b $d2($x, $y)
3.198 ns
Appreciate any help interpreting these results.
$
means treat it like a local variable. If the computation will be carried out in an inner loop somewhere where all the variables are local, then interpolate everything.
Can you explain what you see as different in the chairmarks results? they look remarkably similar to me, less than 1ns difference from their benchmarktools counterparts for each of them
They are pretty much the same, and I guess 1ns difference is irrelevant in general. Thank you for clarifying the use of $
Nowadays global variables may be types, so it often makes sense to pass a typed global variable to the benchmarked function without interpolation, to prevent unwanted constant folding.
Can you elaborate on that @Neven Sajko ?
How the example above is affected by your comment?
When you do @btime $d1($x, $y)
or @btime $d2($x, $y)
above, you're more or less benchmarking a constant expression. This can be pointless, it's possibly not what you want to benchmark, because the compiler could potentially optimize away the whole thing, leaving you benchmarking nothing. The timing would still come up at around 1-3 ns due to measurement error.
My suggestion is to instead use global variables directly when benchmarking, but those global variables should have a declared type. The compiler doesn't know the value of a global variable, so it will not be constant propagated (unless the type has only a single instance, as with Nothing
, Missing
, or Val{SomeParameter}
). If some expression is intended to actually be constant during benchmarking, personally I still never use interpolation, preferring to use const
and auxiliary functions where some arguments are constant. So my style is to run @btime f(a, b, c)
, where f
is a constant binding (usually just a global function
constant), while a
, b
, etc., are typed global variables. That way it's more clear what's happening.
Here's how I would perhaps have done the benchmark, assuming the metric functions are supposed to be compile-time constants, while the values of x
and y
should not be known to the compiler:
using LinearAlgebra, StaticArrays, Distances, BenchmarkTools
x::typeof(SVector(1.0, 2.0, 3.0)) = SVector(1.0, 2.0, 3.0)
y::typeof(x) = SVector(4.0, 5.0, 6.0)
const w = SVector(1.0, 1.0/4.0, 1.0/9.0)
const d1 = WeightedEuclidean(w)
const d2 = Mahalanobis(Diagonal(w))
@btime d1(x, y);
@btime d2(x, y);
There are also some tips specifically relevant to the floating-point code that's being benchmarked here: benchmarking merely a single call of the metric doesn't seem relevant, because that's not what happens in programs in the real world. In the real world, usually what you want is for the code to be vectorized by the compiler, which may happen if you're running the metric function in each iteration of a loop. So, to benchmark something like this, a more relevant experiment might be to benchmark a function which calls the metric for each point in a vector of points.
Thank you @Neven Sajko . Very useful tips.
The usual trick to "interpolate the type but not the value" is @btime f(($(Ref(x)))[])
, which isn't the most readable thing, so you may instead want to do xref = Ref(x); @btime f(($xref)[])
.
This is similar to doing const xref = Ref(x)
and not interpolating, but without limiting the rest of your Julia session by adding a const
to your global namespace. (Admittedly less of an issue with the new binding partitioning in 1.12, but even there you can't un-const
a name, only rebind it to a different const
value).
It's also very similar to a typed global x::MyType = ...
and no interpolation, but similarly avoids irreversibly attaching MyType
to x
for the rest of the Julia session. Also, the implementation of typed globals guarantees atomic reads and writes, which adds a tiny but nonzero extra cost to accessing them, compared to dereferencing a constant/interpolated Ref
.
I wonder if BenchmarkTools.jl or Chairmarks.jl could save users from these pitfalls. It is really tricky to get things right even as an experienced Julia programmer. You basically need to understand how the compiler works to do benchmarks correctly nowadays.
I don't think there's been any change over time except the introduction of typed globals, which gives you a third alternative to const Ref(...)
or ($(Ref(...)))[]
but otherwise hasn't changed anything. (And the only effect of the binding partitioning in 1.12 is that you can change the value of a const
later, which may reduce your hesitation to use const
for one-off benchmarking values. It changes nothing about how const
and benchmarking interact.)
The point being, you've always needed some understanding of this to get microbenchmarks right.
But honestly, I think the best way of addressing most of these things is to use setup
more actively. You never need to interpolate values defined in setup
(and it's not possible anyway).
@btime f(x, y[], z) setup=begin
x = rand(...) # random value, compiler can only know the type
y = Ref(...) # fixed value but the compiler should only see the type, no constprop
z = ... # otherwise
end
Setup is a good suggestion. Will try to use it more.
Last updated: Jul 01 2025 at 04:54 UTC