Stream: helpdesk (published)

Topic: ✔ Lazy `reduce` with `vcat`


view this post on Zulip Júlio Hoffimann (Nov 12 2024 at 18:22):

Is there a AbstractVector implementation somewhere that takes a vector of vectors and returns a single reduce(vcat, vecs) without copying the data?

view this post on Zulip Michael Abbott (Nov 12 2024 at 18:32):

Are the vectors all the same size?

view this post on Zulip Michael Abbott (Nov 12 2024 at 18:33):

If so, any of these https://github.com/mcabbott/LazyStack.jl?tab=readme-ov-file#other-stack-like-packages combined with vec.

If not, then maybe I only know of CatViews.jl

view this post on Zulip Júlio Hoffimann (Nov 12 2024 at 19:13):

We are actually trying to reduce the vertices of geometries lazily here:

https://github.com/JuliaGeometry/Meshes.jl/issues/1127

But it seems that this is not possible?

view this post on Zulip Expanding Man (Nov 12 2024 at 19:18):

It seems like a really hard problem. As a practical matter I expect that the only solution is to have MultiPolytope store its vertices in this form (i.e. with the vertices already inlined but with a wrapper that gives it the interface of an array of arrays). I don't know if that's compatible with what you need the representation to do elsewhere, at the very least I'm guessing it's not compatible with how the individual primitives are stored. Perhaps a better approach would be trying to reduce the number of times this needs to be done?

view this post on Zulip Expanding Man (Nov 12 2024 at 19:19):

In case it's not already abundantly clear: Array{<:AbstractArray} simply does not store data this way, the elements are actually pointers, so avoiding copying is impossible in the general case. That pretty much leaves you with a wrapper type that makes an Array behave like an Array{<:AbstractArray}, but that'll inevitably affect your implementation of MultiPolytope.

view this post on Zulip Mason Protter (Nov 12 2024 at 19:20):

Does it need to be a Vector{Vector} or can it be a Tuple of vectors?

view this post on Zulip Mason Protter (Nov 12 2024 at 19:20):

i.e. is it really large, and is its size stable?

view this post on Zulip Júlio Hoffimann (Nov 12 2024 at 19:23):

The Multi geometry represents regions in the world like countries that are made of polygons in the continent plus polygons in the ocean (islands). It is common to have as many as a couple of 100s of polygons sometimes inside a Multi geom.

view this post on Zulip Expanding Man (Nov 12 2024 at 19:24):

Perhaps what you want is more like a lazy iterator over vertices? I think that would be pretty efficient even if you did it very naively. Collecting them efficiently, by contrast, is hard.

view this post on Zulip Júlio Hoffimann (Nov 12 2024 at 19:25):

We thought of lazy iterators, but it is not user-friendly. Perhaps we will need to introduce a new function to return interators.

view this post on Zulip Júlio Hoffimann (Nov 12 2024 at 19:25):

So vertices remains as is, and a new vertexiterator returns an iterator

view this post on Zulip Expanding Man (Nov 12 2024 at 19:27):

If the primitives in MultiPolytope were all of the same type, you could efficiently calculate the address of each vertex, in which case you could return a wrapper object that behaves exactly like your statement, but I'm guessing this is not the case. If that's not the case you could potentially index the primitive types and then do something like CatArray from LazyArrays.jl but that sounds like way more trouble than a lazy iterator.

view this post on Zulip Expanding Man (Nov 12 2024 at 19:28):

Might also be worth reminding yourself that it's really easy for users to get your explicit array out of a lazy iterator, i.e. collect(lazyiterator).

view this post on Zulip Júlio Hoffimann (Nov 12 2024 at 19:29):

All the comments are really helpful. Thank you. I think we will go with two functions in the API: vertices and vertexiterator.

view this post on Zulip Júlio Hoffimann (Nov 12 2024 at 19:29):

That way we introduce a new lazy function in a non-breaking way.

view this post on Zulip Notification Bot (Nov 12 2024 at 19:32):

Júlio Hoffimann has marked this topic as resolved.

view this post on Zulip Expanding Man (Nov 12 2024 at 19:33):

I like this problem because it seems like a paradigmatic example of one of maybe 5 problems that are all that seem to occur over and over again in software engineering: procedure aa is good for representation AA, but bad for representation BB, but procedure bb is good for representation BB but bad for representation AA, and converting between AA and BB is expensive, then how can I compute both aa and bb? I think it was weinberg or someone like that who said that the peculiar feature of reality is plurality of description.

view this post on Zulip Mason Protter (Nov 12 2024 at 19:38):

After this object is built, would you want to be able to make insertions into the individual sub-vectors later on, or would its structure be "frozen" after it's created?

view this post on Zulip Mason Protter (Nov 12 2024 at 19:40):

that is, do you ever need to push! / append!?

view this post on Zulip Júlio Hoffimann (Nov 12 2024 at 19:41):

These geometries are often "frozen" after creation. We can modify the underlying vectors of vertices, but that is not what happens in most algorithms. We tend to implement algorithms that avoid mutation (thinking of parallel jobs).

view this post on Zulip Mason Protter (Nov 12 2024 at 19:46):

If that's the case, you might be able to do something like

struct FlattenedVector{T} <: AbstractVector{T}
    vov::Vector{Vector{T}}
    inds::Vector{Tuple{Int, Int}}
    len::Int
    function FlattenedVector(vov::Vector{Vector{T}}) where {T}
        len = sum(length, vov)
        inds = Vector{Tuple{Int, Int}}(undef, len)
        ind = 1
        for i in eachindex(vov)
            for j  eachindex(vov[i])
                inds[ind] = (i, j)
                ind += 1
            end
        end
        new{T}(vov, inds, len)
    end
end
Base.size(v::FlattenedVector) = (v.len,)
function Base.getindex(v::FlattenedVector, ind::Integer)
    (i, j) = v.inds[ind]
    v.vov[i][j]
end
function Base.setindex!(v::FlattenedVector, x, ind::Integer)
    (i, j) = v.inds[ind]
    v.vov[i][j] = x
    v
end

view this post on Zulip Mason Protter (Nov 12 2024 at 19:48):

I don't know if this sort of construction is really worth it for you use-case but I think it's at least a possibility.

view this post on Zulip Júlio Hoffimann (Nov 12 2024 at 19:50):

The main challenge we face here is that we don't want to materialize the vov field as a vector of vector. We want to just consume the underlying geometries, querying their vectors. So the input of the function is a vector of geometries, and the output is a vector of concatenated vertices.

view this post on Zulip DrChainsaw (Nov 13 2024 at 09:34):

Just a hipshot suggestion for the initial question without having digested the conversation after: https://github.com/SciML/RecursiveArrayTools.jl

view this post on Zulip John Lapeyre (Nov 13 2024 at 15:00):

DrChainsaw said:

without having digested the conversation

That's covered in the link given above by @Michael Abbott

view this post on Zulip John Lapeyre (Nov 13 2024 at 15:03):

FWIW CatViews.jl is slow, as noted in the README of LazyStack.jl. The type instability of the registered version 1.0 was fixed five years ago in 1.0.1. But it appears that 1.0.1 was never registered.

view this post on Zulip Fredrik Bagge Carlson (Nov 15 2024 at 10:55):

Is the inner vector type of static size, e.g., similar to a static array? If so, reshape(reinterpret(Float64, vector_of_vectors), length(first(vector_of_vectors)), :) gives you a matrix without copying


Last updated: Nov 22 2024 at 04:41 UTC