How do I implement three_bytes_to_UInt32
more efficiently?
julia> three_bytes_to_UInt32( bytes::NTuple{3,UInt8} ) = reinterpret(UInt32, [ bytes..., UInt8(0) ] )[1]
three_bytes_to_UInt32 (generic function with 1 method)
julia> data = (0x39,0xfa,0x14)
(0x39, 0xfa, 0x14)
julia> three_bytes_to_UInt32( data )
0x0014fa39
julia> f(a,b,c) = (UInt32(c) << 2^4) | (UInt32(b) << 2^3) | (UInt32(a))
f (generic function with 1 method)
julia> f(data) = f(data...)
f (generic function with 2 methods)
julia> @btime f($data)
1.700 ns (0 allocations: 0 bytes)
0x0014fa39
Performance is 15x slower if the compiler is prevented from optimizing the operation away :(
julia> @btime f(d) setup=(d=data)
26.908 ns (1 allocation: 16 bytes)
0x0014fa39
I believe setup
is not doing what you think it is there. One should use Ref
for the benchmark barrier instead:
julia> f(a,b,c) = (UInt32(c) << 2^4) | (UInt32(b) << 2^3) | (UInt32(a))
f (generic function with 1 method)
julia> f(data) = f(data...)
f (generic function with 2 methods)
julia> let data = Ref((0x39,0xfa,0x14))
@btime f($data[])
end
1.290 ns (0 allocations: 0 bytes)
0x0014fa39
Hah, you're right!
julia> v = fill(data, 1000);
julia> @btime f.($v);
541.327 ns (1 allocation: 4.06 KiB)
versus
julia> @btime three_bytes_to_UInt32.($v);
41.300 μs (1001 allocations: 109.50 KiB)
Thanks. I generalized this to convert any tuple of bytes into a larger unsigned integer:
julia> (::Type{T})(bytes::NTuple{N,UInt8}) where {T <: Unsigned} where N = N > sizeof(T) ?
error("Number of bytes larger than sizeof($T)") :
|( [T(b) << 8(p-1) for (p,b) in enumerate(bytes)]... )
julia> UInt32((0x01,0x03,0x05))
0x00050301
Last updated: Nov 22 2024 at 04:41 UTC