Suppose I have a vector that may contain values of type Missing
, Float64
, or Vector{Float64}
. As long as I don't have a missing value in such a vector, I can flatten it like this:
julia> Iterators.flatten([1.0, 2.0, [3.0, 4.0]]) |> collect
4-element Vector{Float64}:
1.0
2.0
3.0
4.0
However, if the vector contains a missing value, it throws an exception.
julia> Iterators.flatten([1.0, missing, [3.0, 4.0]]) |> collect
ERROR: MethodError: no method matching iterate(::Missing)
...
[1.0, missing, 3.0, 4.0]
back?Sounds like a bug in flatten
, it shouldn’t do that
@Mason Protter : I'm too new to Julia to feel qualified to say whether that's a bug or not. It is kind of annoying that it didn't just do the seemingly obvious thing and let the missing value pass through. In the meantime, I came up with a good enough solution.
julia> foldl([1,missing,[missing,4]]; init=[]) do m,a
if a isa Vector
push!(m, a...)
else
push!(m, a)
end
end
4-element Vector{Any}:
1
missing
missing
4
Do you think I should report this as a bug in Iterators.flatten
?
Yes
Okay. I'll do that now. Thanks for advising.
G Gundam has marked this topic as resolved.
https://github.com/JuliaLang/julia/issues/54682
In the meantime, I came up with a good enough solution.
If you want this to be more efficient, here's how I'd write it:
julia> using BangBang: push!!, append!!
julia> foldl([1,missing,[missing,4]]; init=Union{}[]) do m, a
if a isa Vector
append!!(m, a)
else
push!!(m, a)
end
end
4-element Vector{Union{Missing, Int64}}:
1
missing
missing
4
I'm not familiar with BangBang, but I'll take a look.
This makes it so you get a concrete eltype array instead of a Vector{Any}
, and it avoids doing push!(m, a...)
which is slow
Okay, I'll take easy speed ups like this.
(TBH, aside from startup speed, things are already way faster than I'm used to.)
The idea with BangBang is that when you write push!!(m, a)
it basically checks if it's possible to do push!(m, a)
and if it is possible, it does that. If it's not possible (because e.g. m
has too narrow an eltype
, then it'll widen the element type of m
e.g.
julia> push!(Int[], "hi")
ERROR: MethodError: Cannot `convert` an object of type String to an object of type Int64
versus
julia> push!!(Int[], "hi")
1-element Vector{Any}:
"hi"
This means it only sometimes mutates the array, and sometimes it creates a new array though, so you need to be careful about that
That last part is good to know. I'll keep that in mind. Thanks again.
I wouldn't say it's a bug. flatten([[missing]])
should go to [missing]
, but it's not clear to me what flatten([missing])
would do: how many values should it expand to? I think it's ambiguous and so error is the best option.
The behavior I expected was something similar to _.flatten(array) from underscore.js. It just works with what it sees, and undefined
(the closest thing to missing
in JS) is not a collection, so it resolves to one value while flattening.
With that said, I'm OK with whatever the maintainers decide. I already solved my problem without using Iterators.flatten
, but its behavior did surprise me.
missing
in particular is often difficult to deal with generically
the semantics of missing
are a bit different from undefined
, though in this case it really should just produce the missing
Wait. Is it really a bug? I thought Iterators.flatten
was only supposed to flatten an iterator of iterators, and indeed that's what the docstring says. Iterators.flatten([1.0, 2.0, [3.0, 4.0]])
working is basically an accident because numbers happen to be iterable. What happens when a missing
or any other non-iterable value is included is actually more reasonable.
I think the (one-liner) way you're supposed to do something like this in just base Julia without third-party packages would look something like:
Iterators.flatmap(x -> ifelse(ismissing(x), (missing,), x),[1.0, missing, [3.0, 4.0]]) |> collect
Mhm, yeah my bad I had forgotten that flatten
was only supposed to take iterators of iterators, I had thought it tried to be smart but misremembered
Last updated: Nov 06 2024 at 04:40 UTC