I'm writing an infinite, random iterator. Something like this (simplified):
struct RandIter
x :: Vector{Int}
end
Base.length(ri :: RandIter) = 5
function Base.iterate(i::RandIter, state = 0)
if state < 5
i.x[1] = rand((1,2,3))
i.x[2] = rand((1,2,3))
i.x[3] = rand((1,2,3))
state += 1
return (i.x, state)
else
return nothing
end
end
and then
julia> ri = RandIter([0,0,0]);
julia> collect(Iterators.take(ri, 4))
4-element Vector{Any}:
[1, 3, 2]
[1, 3, 2]
[1, 3, 2]
[1, 3, 2]
Here, [1, 3, 2]
is the last random value assigned to ri.x
.
This is specific to take
; for example, this works:
julia> for v in ri println(v); end
[3, 3, 1]
[2, 3, 1]
[3, 3, 3]
[1, 2, 2]
[1, 1, 2]
Ultimately, what I want to do is generate the random values without allocating; that's the idea behind reusing the memory allocated in ri.x
. How can I do that and still have a working take
iterator?
you're returning the exact same array in every iteration
if you want to create distinct arrays, you HAVE to copy
Hmm, I'm starting to think that this will be impossible... for take
to work, at some point a copy will have to be made. Unfortunately, this makes the iterator almost 10x slower
you could always return a tuple instead, if the size is limited
but yeah, at some point you will have to copy
the only reason println
works is because it prints the current state of the array, and doesn't keep a reference to it around for longer than one iteration
Yep -- the problem with tuples is that later on I'd like to use the random values in more flexible ways.
I guess what I was hoping would happen is that collect(take(...))
would allocate an array of the required size and stick the iterator values in there, requiring only one allocation.
If I need to copy the array, I end up doing one allocation per iteration, which is sad.
well it does only allocate once, the outer array
it's just that every entry returned is the exact same array (===
), where you just changed its contents
it's the same reason why
julia> fill([], 2)
2-element Vector{Vector{Any}}:
[]
[]
julia> ans[1] === ans[2]
true
Yeah, but collect
is not copying the values returned by my iterator.
why should it?
If you do something with the result, though, it might be fine that it gets re-used. E.g. map(sum, iter)
ought to call sum before getting the next element, and thus have distinct values.
@Michael Abbott Yeah, in some cases (like for
) above it does work.
if you want to copy, you need to copy explicitly - either by doing Iterators.map(copy, RandIter([1,2,3]))
or by copying in your iterator
mbaz said:
Michael Abbott Yeah, in some cases (like
for
) above it does work.
again - the for
doesn't copy. The println
just looks at the elements of the array before you change them again.
I understand.
I'm not complaining, by the way, I just was very confused by the behavior of collect(take(itr))
. I assumed that, just like println
uses the values before they change, collect
would take the returned values and stick them in its own array.
that is exactly what it's doing though! :D
you just always return the same array
so it always sticks the same exact array into the new location
Maybe I don't know what you mean by "same array". I understand the array (in the sense of the array pointer) is the same. However, the values of the array are different in each iteration.
Maybe this is like copy vs deepcopy. collect
is making a new array to hold things, but those things themselves are just pointers to memory. If I do A = [[1,2], [3,4]]; B = copy(A)
, doing B[1] = [5,6]
mutates B but not A. But B[2] .= 0
mutates not B but B[2], which is === A[2]
, the same memory.
By "same array" I mean that they are ===
(called egal) - they are literally the same exact array object! They not only have the same contents, they occupy the same memory because they are the exact same object.
I also had a mapslices example... internally it does exactly this re-use of one array for every slice. This is written into, passed to f
, the result saved elsewhere, and repeat -- safe. But if you explicitly save the array elsewhere, it's always the same one:
julia> store = []; mapslices(rand(Int8,2,3); dims=1) do x
push!(store, x)
sum(x)
end
1×3 Matrix{Int64}:
-225 90 -78
julia> store
3-element Vector{Any}:
Int8[-64, -14]
Int8[-64, -14]
Int8[-64, -14]
Whether the content of the array differs in each iteration is irrelevant - your iterator always returns the same object, so that's what collect
sticks in the collected array. It doesn't go around checking whether the contents have changed - that'd be quite a lot of overhead (and hard to do in a generic way).
Let's say A
is collect
's array and ri.x
is what my iterator returns. I thought what would happen is A[:, n] = ri.x[:]
, where n
is the iteration index.
an iteration is not indexable - your iterator only returns a single object
not a collection of objects
No, but collect
knows, since it iterates over the Take
iterator.
That's what mapslices will do, it makes a matrix A to write slices into -- they are moved to new memory. But the collected array is not a matrix like A
, it's a vector containing Vector
s, each of which has its own pointer (and in this case they all agree)
that is also not indexable!
Making one big output array and writing into it while iterating is also what stack will do:
julia> using Compat
julia> stack(Iterators.take(ri, 4))
3×4 Matrix{Int64}:
2 1 2 1
3 1 3 2
1 2 3 3
collect
(roughly) does this, provided the array has some knowable size:
out = Array{eltype(itr)}(undef, length(itr))
for (idx, obj) in enumerate(itr)
out[idx] = obj
end
out
there is no copy
involved and it has no knowledge of whether or not you return the same exact object in each iteration or not
(there are some more cases with multidimensional iterators, but those share the non-copying semantics so I'll leave them out for simplicity)
And this is very different to having out[:, idx] .= obj
in the loop, which copies the values of obj
immediately (into out::Matrix
instead of out::Vector{Vector{...}}
)
@Sukera Agreed -- I see it now, but I didn't see it before.
@Michael Abbott stack
looks like what I need here.
stack
is superfast as well... almost zero overhead. I can't wait for v1.9
@Sukera @Michael Abbott Thanks for you help!
Last updated: Dec 28 2024 at 04:38 UTC