Stream: helpdesk (published)

Topic: ✔ Flattening Vectors with Missing Values


view this post on Zulip G Gundam (Jun 05 2024 at 01:34):

Is there a nice way to flatten vectors that might have missing values?

Suppose I have a vector that may contain values of type Missing, Float64, or Vector{Float64}. As long as I don't have a missing value in such a vector, I can flatten it like this:

julia> Iterators.flatten([1.0, 2.0, [3.0, 4.0]]) |> collect
4-element Vector{Float64}:
 1.0
 2.0
 3.0
 4.0

However, if the vector contains a missing value, it throws an exception.

julia> Iterators.flatten([1.0, missing, [3.0, 4.0]]) |> collect
ERROR: MethodError: no method matching iterate(::Missing)

  ...

What do I have to do to get [1.0, missing, 3.0, 4.0] back?

view this post on Zulip Mason Protter (Jun 05 2024 at 03:42):

Sounds like a bug in flatten, it shouldn’t do that

view this post on Zulip G Gundam (Jun 05 2024 at 03:45):

@Mason Protter : I'm too new to Julia to feel qualified to say whether that's a bug or not. It is kind of annoying that it didn't just do the seemingly obvious thing and let the missing value pass through. In the meantime, I came up with a good enough solution.

julia> foldl([1,missing,[missing,4]]; init=[]) do m,a
           if a isa Vector
               push!(m, a...)
           else
               push!(m, a)
           end
       end
4-element Vector{Any}:
 1
  missing
  missing
 4

view this post on Zulip G Gundam (Jun 05 2024 at 03:55):

Do you think I should report this as a bug in Iterators.flatten?

view this post on Zulip Mason Protter (Jun 05 2024 at 03:55):

Yes

view this post on Zulip G Gundam (Jun 05 2024 at 03:56):

Okay. I'll do that now. Thanks for advising.

view this post on Zulip Notification Bot (Jun 05 2024 at 04:04):

G Gundam has marked this topic as resolved.

view this post on Zulip G Gundam (Jun 05 2024 at 04:14):

https://github.com/JuliaLang/julia/issues/54682

view this post on Zulip Mason Protter (Jun 05 2024 at 04:21):

In the meantime, I came up with a good enough solution.

If you want this to be more efficient, here's how I'd write it:

julia> using BangBang: push!!, append!!

julia> foldl([1,missing,[missing,4]]; init=Union{}[]) do m, a
           if a isa Vector
               append!!(m, a)
           else
               push!!(m, a)
           end
       end
4-element Vector{Union{Missing, Int64}}:
 1
  missing
  missing
 4

view this post on Zulip G Gundam (Jun 05 2024 at 04:22):

I'm not familiar with BangBang, but I'll take a look.

view this post on Zulip Mason Protter (Jun 05 2024 at 04:22):

This makes it so you get a concrete eltype array instead of a Vector{Any}, and it avoids doing push!(m, a...) which is slow

view this post on Zulip G Gundam (Jun 05 2024 at 04:22):

Okay, I'll take easy speed ups like this.

view this post on Zulip G Gundam (Jun 05 2024 at 04:25):

(TBH, aside from startup speed, things are already way faster than I'm used to.)

view this post on Zulip Mason Protter (Jun 05 2024 at 04:27):

The idea with BangBang is that when you write push!!(m, a) it basically checks if it's possible to do push!(m, a) and if it is possible, it does that. If it's not possible (because e.g. m has too narrow an eltype, then it'll widen the element type of m

view this post on Zulip Mason Protter (Jun 05 2024 at 04:29):

e.g.

julia> push!(Int[], "hi")
ERROR: MethodError: Cannot `convert` an object of type String to an object of type Int64

versus

julia> push!!(Int[], "hi")
1-element Vector{Any}:
 "hi"

This means it only sometimes mutates the array, and sometimes it creates a new array though, so you need to be careful about that

view this post on Zulip G Gundam (Jun 05 2024 at 04:31):

That last part is good to know. I'll keep that in mind. Thanks again.

view this post on Zulip jar (Jun 05 2024 at 08:04):

I wouldn't say it's a bug. flatten([[missing]]) should go to [missing], but it's not clear to me what flatten([missing]) would do: how many values should it expand to? I think it's ambiguous and so error is the best option.

view this post on Zulip G Gundam (Jun 05 2024 at 10:58):

The behavior I expected was something similar to _.flatten(array) from underscore.js. It just works with what it sees, and undefined (the closest thing to missing in JS) is not a collection, so it resolves to one value while flattening.

With that said, I'm OK with whatever the maintainers decide. I already solved my problem without using Iterators.flatten, but its behavior did surprise me.

view this post on Zulip Sukera (Jun 05 2024 at 11:27):

missing in particular is often difficult to deal with generically

view this post on Zulip Sukera (Jun 05 2024 at 11:28):

the semantics of missing are a bit different from undefined, though in this case it really should just produce the missing

view this post on Zulip Adam non-jedi Beckmeyer (Jun 05 2024 at 13:02):

Wait. Is it really a bug? I thought Iterators.flatten was only supposed to flatten an iterator of iterators, and indeed that's what the docstring says. Iterators.flatten([1.0, 2.0, [3.0, 4.0]]) working is basically an accident because numbers happen to be iterable. What happens when a missing or any other non-iterable value is included is actually more reasonable.

view this post on Zulip Adam non-jedi Beckmeyer (Jun 05 2024 at 13:10):

I think the (one-liner) way you're supposed to do something like this in just base Julia without third-party packages would look something like:

Iterators.flatmap(x -> ifelse(ismissing(x), (missing,), x),[1.0, missing, [3.0, 4.0]]) |> collect

view this post on Zulip Mason Protter (Jun 05 2024 at 13:46):

Mhm, yeah my bad I had forgotten that flatten was only supposed to take iterators of iterators, I had thought it tried to be smart but misremembered


Last updated: Nov 22 2024 at 04:41 UTC