✔ Lazy `reduce` with `vcat` · helpdesk (published)

Is there a AbstractVector implementation somewhere that takes a vector of vectors and returns a single reduce(vcat, vecs) without copying the data?

Michael Abbott (Nov 12 2024 at 18:32):

Michael Abbott (Nov 12 2024 at 18:33):

Júlio Hoffimann (Nov 12 2024 at 19:13):

Expanding Man (Nov 12 2024 at 19:18):

It seems like a really hard problem. As a practical matter I expect that the only solution is to have MultiPolytope store its vertices in this form (i.e. with the vertices already inlined but with a wrapper that gives it the interface of an array of arrays). I don't know if that's compatible with what you need the representation to do elsewhere, at the very least I'm guessing it's not compatible with how the individual primitives are stored. Perhaps a better approach would be trying to reduce the number of times this needs to be done?

Expanding Man (Nov 12 2024 at 19:19):

In case it's not already abundantly clear: Array{<:AbstractArray} simply does not store data this way, the elements are actually pointers, so avoiding copying is impossible in the general case. That pretty much leaves you with a wrapper type that makes an Array behave like an Array{<:AbstractArray}, but that'll inevitably affect your implementation of MultiPolytope.

Mason Protter (Nov 12 2024 at 19:20):

Júlio Hoffimann (Nov 12 2024 at 19:23):

The Multi geometry represents regions in the world like countries that are made of polygons in the continent plus polygons in the ocean (islands). It is common to have as many as a couple of 100s of polygons sometimes inside a Multi geom.

Expanding Man (Nov 12 2024 at 19:24):

Perhaps what you want is more like a lazy iterator over vertices? I think that would be pretty efficient even if you did it very naively. Collecting them efficiently, by contrast, is hard.

Júlio Hoffimann (Nov 12 2024 at 19:25):

We thought of lazy iterators, but it is not user-friendly. Perhaps we will need to introduce a new function to return interators.

Júlio Hoffimann (Nov 12 2024 at 19:25):

Expanding Man (Nov 12 2024 at 19:27):

If the primitives in MultiPolytope were all of the same type, you could efficiently calculate the address of each vertex, in which case you could return a wrapper object that behaves exactly like your statement, but I'm guessing this is not the case. If that's not the case you could potentially index the primitive types and then do something like CatArray from LazyArrays.jl but that sounds like way more trouble than a lazy iterator.

Expanding Man (Nov 12 2024 at 19:28):

Might also be worth reminding yourself that it's really easy for users to get your explicit array out of a lazy iterator, i.e. collect(lazyiterator).

Júlio Hoffimann (Nov 12 2024 at 19:29):

All the comments are really helpful. Thank you. I think we will go with two functions in the API: vertices and vertexiterator.

Júlio Hoffimann (Nov 12 2024 at 19:29):

Notification Bot (Nov 12 2024 at 19:32):

Expanding Man (Nov 12 2024 at 19:33):

I like this problem because it seems like a paradigmatic example of one of maybe 5 problems that are all that seem to occur over and over again in software engineering: procedure

a

is good for representation

A

, but bad for representation

B

, but procedure

b

is good for representation

B

but bad for representation

A

, and converting between

A

and

B

is expensive, then how can I compute both

a

and

b

? I think it was weinberg or someone like that who said that the peculiar feature of reality is plurality of description.

Mason Protter (Nov 12 2024 at 19:38):

After this object is built, would you want to be able to make insertions into the individual sub-vectors later on, or would its structure be "frozen" after it's created?

Mason Protter (Nov 12 2024 at 19:40):

Júlio Hoffimann (Nov 12 2024 at 19:41):

These geometries are often "frozen" after creation. We can modify the underlying vectors of vertices, but that is not what happens in most algorithms. We tend to implement algorithms that avoid mutation (thinking of parallel jobs).

Mason Protter (Nov 12 2024 at 19:46):

struct FlattenedVector{T} <: AbstractVector{T}
    vov::Vector{Vector{T}}
    inds::Vector{Tuple{Int, Int}}
    len::Int
    function FlattenedVector(vov::Vector{Vector{T}}) where {T}
        len = sum(length, vov)
        inds = Vector{Tuple{Int, Int}}(undef, len)
        ind = 1
        for i in eachindex(vov)
            for j ∈ eachindex(vov[i])
                inds[ind] = (i, j)
                ind += 1
            end
        end
        new{T}(vov, inds, len)
    end
end
Base.size(v::FlattenedVector) = (v.len,)
function Base.getindex(v::FlattenedVector, ind::Integer)
    (i, j) = v.inds[ind]
    v.vov[i][j]
end
function Base.setindex!(v::FlattenedVector, x, ind::Integer)
    (i, j) = v.inds[ind]
    v.vov[i][j] = x
    v
end

Mason Protter (Nov 12 2024 at 19:48):

I don't know if this sort of construction is really worth it for you use-case but I think it's at least a possibility.

Júlio Hoffimann (Nov 12 2024 at 19:50):

The main challenge we face here is that we don't want to materialize the vov field as a vector of vector. We want to just consume the underlying geometries, querying their vectors. So the input of the function is a vector of geometries, and the output is a vector of concatenated vertices.

DrChainsaw (Nov 13 2024 at 09:34):

John Lapeyre (Nov 13 2024 at 15:00):

John Lapeyre (Nov 13 2024 at 15:03):

FWIW CatViews.jl is slow, as noted in the README of LazyStack.jl. The type instability of the registered version 1.0 was fixed five years ago in 1.0.1. But it appears that 1.0.1 was never registered.

Fredrik Bagge Carlson (Nov 15 2024 at 10:55):

Is the inner vector type of static size, e.g., similar to a static array? If so, reshape(reinterpret(Float64, vector_of_vectors), length(first(vector_of_vectors)), :) gives you a matrix without copying

Stream: helpdesk (published)

Topic: ✔ Lazy `reduce` with `vcat`

Júlio Hoffimann (Nov 12 2024 at 18:22):

Michael Abbott (Nov 12 2024 at 18:32):

Michael Abbott (Nov 12 2024 at 18:33):

Júlio Hoffimann (Nov 12 2024 at 19:13):

Expanding Man (Nov 12 2024 at 19:18):

Expanding Man (Nov 12 2024 at 19:19):

Mason Protter (Nov 12 2024 at 19:20):

Mason Protter (Nov 12 2024 at 19:20):

Júlio Hoffimann (Nov 12 2024 at 19:23):

Expanding Man (Nov 12 2024 at 19:24):

Júlio Hoffimann (Nov 12 2024 at 19:25):

Júlio Hoffimann (Nov 12 2024 at 19:25):

Expanding Man (Nov 12 2024 at 19:27):

Expanding Man (Nov 12 2024 at 19:28):

Júlio Hoffimann (Nov 12 2024 at 19:29):

Júlio Hoffimann (Nov 12 2024 at 19:29):

Notification Bot (Nov 12 2024 at 19:32):

Expanding Man (Nov 12 2024 at 19:33):

Mason Protter (Nov 12 2024 at 19:38):

Mason Protter (Nov 12 2024 at 19:40):

Júlio Hoffimann (Nov 12 2024 at 19:41):

Mason Protter (Nov 12 2024 at 19:46):

Mason Protter (Nov 12 2024 at 19:48):

Júlio Hoffimann (Nov 12 2024 at 19:50):

DrChainsaw (Nov 13 2024 at 09:34):

John Lapeyre (Nov 13 2024 at 15:00):

John Lapeyre (Nov 13 2024 at 15:03):

Fredrik Bagge Carlson (Nov 15 2024 at 10:55):