Stream: helpdesk (published)

Topic: Nonallocating shuffle


view this post on Zulip Kwaku Oskin (Jun 22 2021 at 08:43):

Is there any package, which provides non allocating shuffle for tuples or SVectors?

I've tried to use StaticArrays.jl, but default algorithm works only with MVector and it allocates

using Random
using StaticArrays
x = MVector(1, 2, 3, 4)

julia> @btime Random.shuffle($x)
  38.575 ns (1 allocation: 48 bytes)

I can use Random.shuffle!, but then I still need to allocate initial MVector.

Interestingly enough, it is easy to write non allocating version, but it is slower than MVector version

using Setfield

function myshuffle(x)
    L = length(x)
    for i in 1:(L-1)
        j = rand(i:L)
        i == j && continue
        t = x[i]
        @set! x[i] = x[j]
        @set! x[j] = t
    end

    return x
end

y = (1, 2, 3, 4)

julia> @btime myshuffle($y)
  46.262 ns (0 allocations: 0 bytes)

which is substantially worse than MVector version. And issue is not with Setfield, I've tried to write setindex manually, but performance is the same.

view this post on Zulip Eric Hanson (Jun 22 2021 at 21:48):

Maybe https://jutho.github.io/TupleTools.jl/latest/#TupleTools.permute and https://docs.julialang.org/en/v1/stdlib/Random/#Random.randperm! with a workspace permutation? Not ideal :/

view this post on Zulip Kwaku Oskin (Jun 23 2021 at 18:05):

Ah, thank you for TupleTools.jl, it's definitely useful package.

Unfortunately, randperm produces Vector, so it allocates...

view this post on Zulip Mason Protter (Jun 23 2021 at 18:06):

Could you wrap it in a SVector?

view this post on Zulip Kwaku Oskin (Jun 23 2021 at 18:06):

How?

view this post on Zulip Kwaku Oskin (Jun 23 2021 at 18:06):

Who?

view this post on Zulip Kwaku Oskin (Jun 23 2021 at 18:06):

I mean, who should be wrapped in SVector?

view this post on Zulip Mason Protter (Jun 23 2021 at 18:07):

Sorry, I should have read the OP more carefully, one sec

view this post on Zulip Mason Protter (Jun 23 2021 at 18:16):

I hoped that because this doesn’t allocate:

julia> let x = Ref(MVector(1,2,3,4))
    @btime shuffle!($x[])
end
 30.424 ns (0 allocations: 0 bytes)
4-element MArray{Tuple{4},Int64,1,4} with indices SOneTo(4):
 2
 1
 4
 3

That you could do

julia> function Random.shuffle(v::SArray)
    mv = MArray(v)
    shuffle!(mv)
    SArray(mv)
end

But it didn’t end up stack allocating, I guess shuffle! is too complicated to in-line or something:

julia> let x = Ref(SVector(1,2,3,4))
    @btime shuffle($x[])
end
38.550 ns (1 allocation: 48 bytes)
4-element SArray{Tuple{4},Int64,1,4} with indices SOneTo(4):
 3
 1
 4
 2

view this post on Zulip Mason Protter (Jun 23 2021 at 18:16):

I yet again wish we had call-site inlining.

view this post on Zulip Kwaku Oskin (Jun 23 2021 at 18:26):

Ah, yes, that's what I see as well.

I have opposite feelings :-) MVector is faster than hand-written, but it allocates, so it can potentially give pressure to GC.

view this post on Zulip Mason Protter (Jun 23 2021 at 18:30):

What do you have opposite feelings about?

view this post on Zulip Kwaku Oskin (Jun 23 2021 at 18:47):

Which one to use :-)
Allocating but fast version with MVector
Or handwritten nonallocating but slow version over tuples.

view this post on Zulip Mason Protter (Jun 23 2021 at 18:56):

I don’t know how you disagree with me then because I didn’t advocate either of those :stuck_out_tongue:

I was trying to get the best of both worlds by stack allocating an MVector, then in place shuffling it, the casting back to SVector.

view this post on Zulip Mason Protter (Jun 23 2021 at 18:56):

But it didn’t work because shuffle! doesn’t inline

view this post on Zulip Mason Protter (Jun 23 2021 at 19:16):

Note that shuffle!(::MVector) does not allocate

view this post on Zulip Kwaku Oskin (Jun 23 2021 at 19:16):

Ah! Sorry! I was not disagreeing with you, it's just a wrong translation.

It should have been something like "I have contradictorary (conflicting?) feelings, which approach to choose, because they both have good and bad sides"

view this post on Zulip Kwaku Oskin (Jun 23 2021 at 19:17):

Opposing and conflicting are synonyms for me :-)

view this post on Zulip Kwaku Oskin (Jun 23 2021 at 19:18):

Oh, and it should be opposing not opposite.
Damn, I am sorry, that was really stupid.

view this post on Zulip Mason Protter (Jun 23 2021 at 19:18):

no worries!

view this post on Zulip Mason Protter (Jun 23 2021 at 19:26):

Okay, this is a little janky, but I managed to get StrideArrays.jl working for this so that it's non-allocating and fast:

julia> using StrideArrays, StaticArrays, Random

julia> function myshuffle(v::StaticArray)
           strv = StrideArray(undef, eltype(v), StrideArrays.size(v)...)
           pv   = PtrArray(strv)
           GC.@preserve strv begin
               pv .= v
               shuffle!(pv)
               typeof(v)(strv.data)
           end
       end

julia> let v = Ref(SVector(1,2,3,4))
           @btime myshuffle($v[])
       end
  33.899 ns (0 allocations: 0 bytes)
4-element SVector{4, Int64} with indices SOneTo(4):
 4
 1
 2
 3

view this post on Zulip Mason Protter (Jun 23 2021 at 19:27):

That said, we should maybe just open an issue in StaticArrays.jl requesting an implementation of shuffle for static arrays.

view this post on Zulip Kwaku Oskin (Jun 23 2021 at 19:30):

Mason Protter said:

Note that shuffle!(::MVector) does not allocate

Ah! I wanted to answer you that I need to create new MVector in a loop, but it occurs to me, that shuffle! of shuffle! is shuffle!! I mean, I can shuffle shuffled vector and use it again.

I.e. my original task was something like

for x in v
   a0 = shuffle((1, 2, 3, 4))
  apply(a0, x)
end

And I can do it

a0 = MVector(1, 2, 3, 4)
for x in v
  a0 = shuffle!(a0)
  apply(a0, x)
end

So yes, it solves my task.
The only question that is left, why MVector shuffles faster than Tuple (I've looked at the definition of Random.shuffle! it's basically the same as my handwritten version).

view this post on Zulip Mason Protter (Jun 23 2021 at 19:35):

The difference is that in your handwritten version,

 @set! x[i] = x[j]

is something like

x = Tuple(k == i ? x[j] : x[k] for k in eachindex(x))

so it's creating entire tuples instead of moving around individual values.

view this post on Zulip Mason Protter (Jun 23 2021 at 19:36):

Often LLVM / julia are smart enough to not create values on the stack that aren't actually used, but I suspect they aren't able to prove which values aren't being used in each iteration, so space is being wasted on the stack.

view this post on Zulip Kwaku Oskin (Jun 23 2021 at 19:39):

So if we can use mutable stack allocated arrays we can speed up this calculation, right?
Ah, I guess this is where StrideArrays.jl comes in, right?

view this post on Zulip Kwaku Oskin (Jun 23 2021 at 19:39):

Should take a look at them.

view this post on Zulip Mason Protter (Jun 23 2021 at 20:17):

Yeah, that’s correct, I’m using a StrideArrays.PtrArray to get a stack allocated mutable array. You can often do this with just a plain MArray, but it’s harder to force julia to do it.

view this post on Zulip Kwaku Oskin (Jun 23 2021 at 20:41):

This is awesome.

view this post on Zulip chriselrod (Jun 23 2021 at 23:27):

@Mason Protter @gc_preserve from StrideArrays automates it

view this post on Zulip chriselrod (Jun 23 2021 at 23:28):

julia> using StrideArrays, StaticArrays, Random

julia> function shuffle_v1(v::SArray)
          mv = MArray(v)
          shuffle!(mv)
          SArray(mv)
       end
shuffle_v1 (generic function with 1 method)

julia> function shuffle_v2(v::SArray)
          mv = MArray(v)
          @gc_preserve shuffle!(mv)
          SArray(mv)
       end
shuffle_v2 (generic function with 1 method)

julia> x = @SVector([1,2,3,4]); x'
1×4 adjoint(::SVector{4, Int64}) with eltype Int64 with indices SOneTo(1)×SOneTo(4):
 1  2  3  4

julia> @btime shuffle_v1($x)'
  26.022 ns (1 allocation: 48 bytes)
1×4 adjoint(::SVector{4, Int64}) with eltype Int64 with indices SOneTo(1)×SOneTo(4):
 4  3  2  1

julia> @btime shuffle_v2($x)'
  20.040 ns (0 allocations: 0 bytes)
1×4 adjoint(::SVector{4, Int64}) with eltype Int64 with indices SOneTo(1)×SOneTo(4):
 3  4  1  2

julia> @btime myshuffle($x)' # your version
  19.653 ns (0 allocations: 0 bytes)
1×4 adjoint(::SVector{4, Int64}) with eltype Int64 with indices SOneTo(1)×SOneTo(4):
 4  3  2  1

view this post on Zulip Mason Protter (Jun 23 2021 at 23:35):

Nice!

view this post on Zulip Kwaku Oskin (Jun 24 2021 at 13:20):

Wow, it is really short and fast.


Last updated: Dec 28 2024 at 04:38 UTC