Nonallocating shuffle · helpdesk (published)

Is there any package, which provides non allocating shuffle for tuples or SVectors?

I've tried to use StaticArrays.jl, but default algorithm works only with MVector and it allocates

using Random
using StaticArrays
x = MVector(1, 2, 3, 4)

julia> @btime Random.shuffle($x)
  38.575 ns (1 allocation: 48 bytes)

I can use Random.shuffle!, but then I still need to allocate initial MVector.

Interestingly enough, it is easy to write non allocating version, but it is slower than MVector version

using Setfield

function myshuffle(x)
    L = length(x)
    for i in 1:(L-1)
        j = rand(i:L)
        i == j && continue
        t = x[i]
        @set! x[i] = x[j]
        @set! x[j] = t
    end

    return x
end

y = (1, 2, 3, 4)

julia> @btime myshuffle($y)
  46.262 ns (0 allocations: 0 bytes)

which is substantially worse than MVector version. And issue is not with Setfield, I've tried to write setindex manually, but performance is the same.

Eric Hanson (Jun 22 2021 at 21:48):

Kwaku Oskin (Jun 23 2021 at 18:05):

Mason Protter (Jun 23 2021 at 18:06):

Kwaku Oskin (Jun 23 2021 at 18:06):

Mason Protter (Jun 23 2021 at 18:07):

Mason Protter (Jun 23 2021 at 18:16):

julia> let x = Ref(MVector(1,2,3,4))
    @btime shuffle!($x[])
end
 30.424 ns (0 allocations: 0 bytes)
4-element MArray{Tuple{4},Int64,1,4} with indices SOneTo(4):
 2
 1
 4
 3

julia> function Random.shuffle(v::SArray)
    mv = MArray(v)
    shuffle!(mv)
    SArray(mv)
end

But it didn’t end up stack allocating, I guess shuffle! is too complicated to in-line or something:

julia> let x = Ref(SVector(1,2,3,4))
    @btime shuffle($x[])
end
38.550 ns (1 allocation: 48 bytes)
4-element SArray{Tuple{4},Int64,1,4} with indices SOneTo(4):
 3
 1
 4
 2

Mason Protter (Jun 23 2021 at 18:16):

Kwaku Oskin (Jun 23 2021 at 18:26):

I have opposite feelings :-) MVector is faster than hand-written, but it allocates, so it can potentially give pressure to GC.

Mason Protter (Jun 23 2021 at 18:30):

Kwaku Oskin (Jun 23 2021 at 18:47):

Which one to use :-)
Allocating but fast version with MVector
Or handwritten nonallocating but slow version over tuples.

Mason Protter (Jun 23 2021 at 18:56):

I don’t know how you disagree with me then because I didn’t advocate either of those :stuck_out_tongue:

I was trying to get the best of both worlds by stack allocating an MVector, then in place shuffling it, the casting back to SVector.

Mason Protter (Jun 23 2021 at 18:56):

Mason Protter (Jun 23 2021 at 19:16):

Kwaku Oskin (Jun 23 2021 at 19:16):

It should have been something like "I have contradictorary (conflicting?) feelings, which approach to choose, because they both have good and bad sides"

Kwaku Oskin (Jun 23 2021 at 19:17):

Kwaku Oskin (Jun 23 2021 at 19:18):

Oh, and it should be opposing not opposite.
Damn, I am sorry, that was really stupid.

Mason Protter (Jun 23 2021 at 19:18):

Mason Protter (Jun 23 2021 at 19:26):

Okay, this is a little janky, but I managed to get StrideArrays.jl working for this so that it's non-allocating and fast:

julia> using StrideArrays, StaticArrays, Random

julia> function myshuffle(v::StaticArray)
           strv = StrideArray(undef, eltype(v), StrideArrays.size(v)...)
           pv   = PtrArray(strv)
           GC.@preserve strv begin
               pv .= v
               shuffle!(pv)
               typeof(v)(strv.data)
           end
       end

julia> let v = Ref(SVector(1,2,3,4))
           @btime myshuffle($v[])
       end
  33.899 ns (0 allocations: 0 bytes)
4-element SVector{4, Int64} with indices SOneTo(4):
 4
 1
 2
 3

Mason Protter (Jun 23 2021 at 19:27):

That said, we should maybe just open an issue in StaticArrays.jl requesting an implementation of shuffle for static arrays.

Kwaku Oskin (Jun 23 2021 at 19:30):

Ah! I wanted to answer you that I need to create new MVector in a loop, but it occurs to me, that shuffle! of shuffle! is shuffle!! I mean, I can shuffle shuffled vector and use it again.

for x in v
   a0 = shuffle((1, 2, 3, 4))
  apply(a0, x)
end

a0 = MVector(1, 2, 3, 4)
for x in v
  a0 = shuffle!(a0)
  apply(a0, x)
end

So yes, it solves my task.
The only question that is left, why MVector shuffles faster than Tuple (I've looked at the definition of Random.shuffle! it's basically the same as my handwritten version).

Mason Protter (Jun 23 2021 at 19:35):

x = Tuple(k == i ? x[j] : x[k] for k in eachindex(x))

Mason Protter (Jun 23 2021 at 19:36):

Often LLVM / julia are smart enough to not create values on the stack that aren't actually used, but I suspect they aren't able to prove which values aren't being used in each iteration, so space is being wasted on the stack.

Kwaku Oskin (Jun 23 2021 at 19:39):

So if we can use mutable stack allocated arrays we can speed up this calculation, right?
Ah, I guess this is where StrideArrays.jl comes in, right?

Kwaku Oskin (Jun 23 2021 at 19:39):

Mason Protter (Jun 23 2021 at 20:17):

Yeah, that’s correct, I’m using a StrideArrays.PtrArray to get a stack allocated mutable array. You can often do this with just a plain MArray, but it’s harder to force julia to do it.

Kwaku Oskin (Jun 23 2021 at 20:41):

chriselrod (Jun 23 2021 at 23:27):

chriselrod (Jun 23 2021 at 23:28):

julia> using StrideArrays, StaticArrays, Random

julia> function shuffle_v1(v::SArray)
          mv = MArray(v)
          shuffle!(mv)
          SArray(mv)
       end
shuffle_v1 (generic function with 1 method)

julia> function shuffle_v2(v::SArray)
          mv = MArray(v)
          @gc_preserve shuffle!(mv)
          SArray(mv)
       end
shuffle_v2 (generic function with 1 method)

julia> x = @SVector([1,2,3,4]); x'
1×4 adjoint(::SVector{4, Int64}) with eltype Int64 with indices SOneTo(1)×SOneTo(4):
 1  2  3  4

julia> @btime shuffle_v1($x)'
  26.022 ns (1 allocation: 48 bytes)
1×4 adjoint(::SVector{4, Int64}) with eltype Int64 with indices SOneTo(1)×SOneTo(4):
 4  3  2  1

julia> @btime shuffle_v2($x)'
  20.040 ns (0 allocations: 0 bytes)
1×4 adjoint(::SVector{4, Int64}) with eltype Int64 with indices SOneTo(1)×SOneTo(4):
 3  4  1  2

julia> @btime myshuffle($x)' # your version
  19.653 ns (0 allocations: 0 bytes)
1×4 adjoint(::SVector{4, Int64}) with eltype Int64 with indices SOneTo(1)×SOneTo(4):
 4  3  2  1

Stream: helpdesk (published)

Topic: Nonallocating shuffle

Kwaku Oskin (Jun 22 2021 at 08:43):

Eric Hanson (Jun 22 2021 at 21:48):

Kwaku Oskin (Jun 23 2021 at 18:05):

Mason Protter (Jun 23 2021 at 18:06):

Kwaku Oskin (Jun 23 2021 at 18:06):

Kwaku Oskin (Jun 23 2021 at 18:06):

Kwaku Oskin (Jun 23 2021 at 18:06):

Mason Protter (Jun 23 2021 at 18:07):

Mason Protter (Jun 23 2021 at 18:16):

Mason Protter (Jun 23 2021 at 18:16):

Kwaku Oskin (Jun 23 2021 at 18:26):

Mason Protter (Jun 23 2021 at 18:30):

Kwaku Oskin (Jun 23 2021 at 18:47):

Mason Protter (Jun 23 2021 at 18:56):

Mason Protter (Jun 23 2021 at 18:56):

Mason Protter (Jun 23 2021 at 19:16):

Kwaku Oskin (Jun 23 2021 at 19:16):

Kwaku Oskin (Jun 23 2021 at 19:17):

Kwaku Oskin (Jun 23 2021 at 19:18):

Mason Protter (Jun 23 2021 at 19:18):

Mason Protter (Jun 23 2021 at 19:26):

Mason Protter (Jun 23 2021 at 19:27):

Kwaku Oskin (Jun 23 2021 at 19:30):

Mason Protter (Jun 23 2021 at 19:35):

Mason Protter (Jun 23 2021 at 19:36):

Kwaku Oskin (Jun 23 2021 at 19:39):

Kwaku Oskin (Jun 23 2021 at 19:39):

Mason Protter (Jun 23 2021 at 20:17):

Kwaku Oskin (Jun 23 2021 at 20:41):

chriselrod (Jun 23 2021 at 23:27):

chriselrod (Jun 23 2021 at 23:28):

Mason Protter (Jun 23 2021 at 23:35):

Kwaku Oskin (Jun 24 2021 at 13:20):