Is there any package, which provides non allocating shuffle for tuples or SVectors?
I've tried to use StaticArrays.jl, but default algorithm works only with MVector
and it allocates
using Random
using StaticArrays
x = MVector(1, 2, 3, 4)
julia> @btime Random.shuffle($x)
38.575 ns (1 allocation: 48 bytes)
I can use Random.shuffle!
, but then I still need to allocate initial MVector
.
Interestingly enough, it is easy to write non allocating version, but it is slower than MVector
version
using Setfield
function myshuffle(x)
L = length(x)
for i in 1:(L-1)
j = rand(i:L)
i == j && continue
t = x[i]
@set! x[i] = x[j]
@set! x[j] = t
end
return x
end
y = (1, 2, 3, 4)
julia> @btime myshuffle($y)
46.262 ns (0 allocations: 0 bytes)
which is substantially worse than MVector
version. And issue is not with Setfield
, I've tried to write setindex manually, but performance is the same.
Maybe https://jutho.github.io/TupleTools.jl/latest/#TupleTools.permute and https://docs.julialang.org/en/v1/stdlib/Random/#Random.randperm! with a workspace permutation? Not ideal :/
Ah, thank you for TupleTools.jl, it's definitely useful package.
Unfortunately, randperm
produces Vector
, so it allocates...
Could you wrap it in a SVector?
How?
Who?
I mean, who should be wrapped in SVector
?
Sorry, I should have read the OP more carefully, one sec
I hoped that because this doesn’t allocate:
julia> let x = Ref(MVector(1,2,3,4))
@btime shuffle!($x[])
end
30.424 ns (0 allocations: 0 bytes)
4-element MArray{Tuple{4},Int64,1,4} with indices SOneTo(4):
2
1
4
3
That you could do
julia> function Random.shuffle(v::SArray)
mv = MArray(v)
shuffle!(mv)
SArray(mv)
end
But it didn’t end up stack allocating, I guess shuffle!
is too complicated to in-line or something:
julia> let x = Ref(SVector(1,2,3,4))
@btime shuffle($x[])
end
38.550 ns (1 allocation: 48 bytes)
4-element SArray{Tuple{4},Int64,1,4} with indices SOneTo(4):
3
1
4
2
I yet again wish we had call-site inlining.
Ah, yes, that's what I see as well.
I have opposite feelings :-) MVector
is faster than hand-written, but it allocates, so it can potentially give pressure to GC.
What do you have opposite feelings about?
Which one to use :-)
Allocating but fast version with MVector
Or handwritten nonallocating but slow version over tuples.
I don’t know how you disagree with me then because I didn’t advocate either of those :stuck_out_tongue:
I was trying to get the best of both worlds by stack allocating an MVector, then in place shuffling it, the casting back to SVector.
But it didn’t work because shuffle!
doesn’t inline
Note that shuffle!(::MVector)
does not allocate
Ah! Sorry! I was not disagreeing with you, it's just a wrong translation.
It should have been something like "I have contradictorary (conflicting?) feelings, which approach to choose, because they both have good and bad sides"
Opposing
and conflicting
are synonyms for me :-)
Oh, and it should be opposing
not opposite
.
Damn, I am sorry, that was really stupid.
no worries!
Okay, this is a little janky, but I managed to get StrideArrays.jl working for this so that it's non-allocating and fast:
julia> using StrideArrays, StaticArrays, Random
julia> function myshuffle(v::StaticArray)
strv = StrideArray(undef, eltype(v), StrideArrays.size(v)...)
pv = PtrArray(strv)
GC.@preserve strv begin
pv .= v
shuffle!(pv)
typeof(v)(strv.data)
end
end
julia> let v = Ref(SVector(1,2,3,4))
@btime myshuffle($v[])
end
33.899 ns (0 allocations: 0 bytes)
4-element SVector{4, Int64} with indices SOneTo(4):
4
1
2
3
That said, we should maybe just open an issue in StaticArrays.jl requesting an implementation of shuffle
for static arrays.
Mason Protter said:
Note that
shuffle!(::MVector)
does not allocate
Ah! I wanted to answer you that I need to create new MVector
in a loop, but it occurs to me, that shuffle!
of shuffle!
is shuffle!
! I mean, I can shuffle shuffled vector and use it again.
I.e. my original task was something like
for x in v
a0 = shuffle((1, 2, 3, 4))
apply(a0, x)
end
And I can do it
a0 = MVector(1, 2, 3, 4)
for x in v
a0 = shuffle!(a0)
apply(a0, x)
end
So yes, it solves my task.
The only question that is left, why MVector
shuffles faster than Tuple
(I've looked at the definition of Random.shuffle!
it's basically the same as my handwritten version).
The difference is that in your handwritten version,
@set! x[i] = x[j]
is something like
x = Tuple(k == i ? x[j] : x[k] for k in eachindex(x))
so it's creating entire tuples instead of moving around individual values.
Often LLVM / julia are smart enough to not create values on the stack that aren't actually used, but I suspect they aren't able to prove which values aren't being used in each iteration, so space is being wasted on the stack.
So if we can use mutable stack allocated arrays we can speed up this calculation, right?
Ah, I guess this is where StrideArrays.jl comes in, right?
Should take a look at them.
Yeah, that’s correct, I’m using a StrideArrays.PtrArray
to get a stack allocated mutable array. You can often do this with just a plain MArray, but it’s harder to force julia to do it.
This is awesome.
@Mason Protter @gc_preserve
from StrideArrays automates it
julia> using StrideArrays, StaticArrays, Random
julia> function shuffle_v1(v::SArray)
mv = MArray(v)
shuffle!(mv)
SArray(mv)
end
shuffle_v1 (generic function with 1 method)
julia> function shuffle_v2(v::SArray)
mv = MArray(v)
@gc_preserve shuffle!(mv)
SArray(mv)
end
shuffle_v2 (generic function with 1 method)
julia> x = @SVector([1,2,3,4]); x'
1×4 adjoint(::SVector{4, Int64}) with eltype Int64 with indices SOneTo(1)×SOneTo(4):
1 2 3 4
julia> @btime shuffle_v1($x)'
26.022 ns (1 allocation: 48 bytes)
1×4 adjoint(::SVector{4, Int64}) with eltype Int64 with indices SOneTo(1)×SOneTo(4):
4 3 2 1
julia> @btime shuffle_v2($x)'
20.040 ns (0 allocations: 0 bytes)
1×4 adjoint(::SVector{4, Int64}) with eltype Int64 with indices SOneTo(1)×SOneTo(4):
3 4 1 2
julia> @btime myshuffle($x)' # your version
19.653 ns (0 allocations: 0 bytes)
1×4 adjoint(::SVector{4, Int64}) with eltype Int64 with indices SOneTo(1)×SOneTo(4):
4 3 2 1
Nice!
Wow, it is really short and fast.
Last updated: Dec 28 2024 at 04:38 UTC