Unexpected behavior in `Iterators.take` · helpdesk (published)

struct RandIter
    x :: Vector{Int}
end

Base.length(ri :: RandIter) = 5

function Base.iterate(i::RandIter, state = 0)
    if state < 5
        i.x[1] = rand((1,2,3))
        i.x[2] = rand((1,2,3))
        i.x[3] = rand((1,2,3))
        state += 1
        return (i.x, state)
    else
        return nothing
    end
end

julia> ri = RandIter([0,0,0]);

julia> collect(Iterators.take(ri, 4))
4-element Vector{Any}:
 [1, 3, 2]
 [1, 3, 2]
 [1, 3, 2]
 [1, 3, 2]

julia> for v in ri println(v); end
[3, 3, 1]
[2, 3, 1]
[3, 3, 3]
[1, 2, 2]
[1, 1, 2]

Ultimately, what I want to do is generate the random values without allocating; that's the idea behind reusing the memory allocated in ri.x. How can I do that and still have a working take iterator?

Sukera (Dec 22 2022 at 20:26):

mbaz (Dec 22 2022 at 20:27):

Hmm, I'm starting to think that this will be impossible... for take to work, at some point a copy will have to be made. Unfortunately, this makes the iterator almost 10x slower

Sukera (Dec 22 2022 at 20:27):

Sukera (Dec 22 2022 at 20:28):

the only reason println works is because it prints the current state of the array, and doesn't keep a reference to it around for longer than one iteration

mbaz (Dec 22 2022 at 20:28):

Yep -- the problem with tuples is that later on I'd like to use the random values in more flexible ways.

mbaz (Dec 22 2022 at 20:29):

I guess what I was hoping would happen is that collect(take(...)) would allocate an array of the required size and stick the iterator values in there, requiring only one allocation.

If I need to copy the array, I end up doing one allocation per iteration, which is sad.

Sukera (Dec 22 2022 at 20:30):

it's just that every entry returned is the exact same array (===), where you just changed its contents

Sukera (Dec 22 2022 at 20:31):

julia> fill([], 2)
2-element Vector{Vector{Any}}:
 []
 []

julia> ans[1] === ans[2]
true

mbaz (Dec 22 2022 at 20:31):

Sukera (Dec 22 2022 at 20:32):

Michael Abbott (Dec 22 2022 at 20:32):

If you do something with the result, though, it might be fine that it gets re-used. E.g. map(sum, iter) ought to call sum before getting the next element, and thus have distinct values.

mbaz (Dec 22 2022 at 20:33):

Sukera (Dec 22 2022 at 20:33):

if you want to copy, you need to copy explicitly - either by doing Iterators.map(copy, RandIter([1,2,3])) or by copying in your iterator

Sukera (Dec 22 2022 at 20:33):

again - the for doesn't copy. The println just looks at the elements of the array before you change them again.

mbaz (Dec 22 2022 at 20:34):

mbaz (Dec 22 2022 at 20:35):

I'm not complaining, by the way, I just was very confused by the behavior of collect(take(itr)). I assumed that, just like println uses the values before they change, collect would take the returned values and stick them in its own array.

Sukera (Dec 22 2022 at 20:36):

mbaz (Dec 22 2022 at 20:38):

Maybe I don't know what you mean by "same array". I understand the array (in the sense of the array pointer) is the same. However, the values of the array are different in each iteration.

Michael Abbott (Dec 22 2022 at 20:38):

Maybe this is like copy vs deepcopy. collect is making a new array to hold things, but those things themselves are just pointers to memory. If I do A = [[1,2], [3,4]]; B = copy(A), doing B[1] = [5,6] mutates B but not A. But B[2] .= 0 mutates not B but B[2], which is === A[2], the same memory.

Sukera (Dec 22 2022 at 20:40):

By "same array" I mean that they are === (called egal) - they are literally the same exact array object! They not only have the same contents, they occupy the same memory because they are the exact same object.

Michael Abbott (Dec 22 2022 at 20:40):

I also had a mapslices example... internally it does exactly this re-use of one array for every slice. This is written into, passed to f, the result saved elsewhere, and repeat -- safe. But if you explicitly save the array elsewhere, it's always the same one:

julia> store = []; mapslices(rand(Int8,2,3); dims=1) do x
                       push!(store, x)
                       sum(x)
                   end
1×3 Matrix{Int64}:
 -225  90  -78

julia> store
3-element Vector{Any}:
 Int8[-64, -14]
 Int8[-64, -14]
 Int8[-64, -14]

Sukera (Dec 22 2022 at 20:41):

Whether the content of the array differs in each iteration is irrelevant - your iterator always returns the same object, so that's what collect sticks in the collected array. It doesn't go around checking whether the contents have changed - that'd be quite a lot of overhead (and hard to do in a generic way).

mbaz (Dec 22 2022 at 20:41):

Let's say A is collect's array and ri.x is what my iterator returns. I thought what would happen is A[:, n] = ri.x[:], where n is the iteration index.

Sukera (Dec 22 2022 at 20:41):

mbaz (Dec 22 2022 at 20:42):

Michael Abbott (Dec 22 2022 at 20:43):

That's what mapslices will do, it makes a matrix A to write slices into -- they are moved to new memory. But the collected array is not a matrix like A, it's a vector containing Vectors, each of which has its own pointer (and in this case they all agree)

Sukera (Dec 22 2022 at 20:43):

Michael Abbott (Dec 22 2022 at 20:44):

Making one big output array and writing into it while iterating is also what stack will do:

julia> using Compat

julia> stack(Iterators.take(ri, 4))
3×4 Matrix{Int64}:
 2  1  2  1
 3  1  3  2
 1  2  3  3

Sukera (Dec 22 2022 at 20:45):

out = Array{eltype(itr)}(undef, length(itr))
for (idx, obj) in enumerate(itr)
    out[idx] = obj
end
out

Sukera (Dec 22 2022 at 20:45):

there is no copy involved and it has no knowledge of whether or not you return the same exact object in each iteration or not

Sukera (Dec 22 2022 at 20:46):

(there are some more cases with multidimensional iterators, but those share the non-copying semantics so I'll leave them out for simplicity)

Michael Abbott (Dec 22 2022 at 20:46):

And this is very different to having out[:, idx] .= obj in the loop, which copies the values of obj immediately (into out::Matrix instead of out::Vector{Vector{...}})

Stream: helpdesk (published)

Topic: Unexpected behavior in `Iterators.take`

mbaz (Dec 22 2022 at 19:25):

Sukera (Dec 22 2022 at 20:26):

Sukera (Dec 22 2022 at 20:26):

mbaz (Dec 22 2022 at 20:27):

Sukera (Dec 22 2022 at 20:27):

Sukera (Dec 22 2022 at 20:27):

Sukera (Dec 22 2022 at 20:28):

mbaz (Dec 22 2022 at 20:28):

mbaz (Dec 22 2022 at 20:29):

Sukera (Dec 22 2022 at 20:30):

Sukera (Dec 22 2022 at 20:30):

Sukera (Dec 22 2022 at 20:31):

mbaz (Dec 22 2022 at 20:31):

Sukera (Dec 22 2022 at 20:32):

Michael Abbott (Dec 22 2022 at 20:32):

mbaz (Dec 22 2022 at 20:33):

Sukera (Dec 22 2022 at 20:33):

Sukera (Dec 22 2022 at 20:33):

mbaz (Dec 22 2022 at 20:34):

mbaz (Dec 22 2022 at 20:35):

Sukera (Dec 22 2022 at 20:36):

Sukera (Dec 22 2022 at 20:36):

Sukera (Dec 22 2022 at 20:36):

mbaz (Dec 22 2022 at 20:38):

Michael Abbott (Dec 22 2022 at 20:38):

Sukera (Dec 22 2022 at 20:40):

Michael Abbott (Dec 22 2022 at 20:40):

Sukera (Dec 22 2022 at 20:41):

mbaz (Dec 22 2022 at 20:41):

Sukera (Dec 22 2022 at 20:41):

Sukera (Dec 22 2022 at 20:41):

mbaz (Dec 22 2022 at 20:42):

Michael Abbott (Dec 22 2022 at 20:43):

Sukera (Dec 22 2022 at 20:43):

Michael Abbott (Dec 22 2022 at 20:44):

Sukera (Dec 22 2022 at 20:45):

Sukera (Dec 22 2022 at 20:45):

Sukera (Dec 22 2022 at 20:46):

Michael Abbott (Dec 22 2022 at 20:46):

mbaz (Dec 22 2022 at 20:46):

mbaz (Dec 22 2022 at 20:50):

mbaz (Dec 22 2022 at 20:51):