Stream: helpdesk (published)

Topic: Expensive `length(::AbstractArray)`


view this post on Zulip Lilith Hafner (Nov 24 2023 at 14:29):

Does anyone know of an AbstractArray subtype that has an expensive length method? ref

view this post on Zulip Mason Protter (Nov 24 2023 at 14:37):

Here's one :troll:

struct MyArray{T, N} <: AbstractVector{T, N}
     A::Array{T, N}
end
Base.size(a::MyArray) = (sleep(10); size(a.A))
Base.getindex(a::MyArray, i...) = getindex(a.A, i...)

view this post on Zulip Mason Protter (Nov 24 2023 at 14:40):

More seriously, I doubt there are many cases where the length is expensive

view this post on Zulip Mason Protter (Nov 24 2023 at 14:40):

but it's not hard to dream up such cases

view this post on Zulip Mason Protter (Nov 24 2023 at 14:40):

so I don't really see the harm in trying to avoid calling it twice if possible

view this post on Zulip Timothy (Nov 25 2023 at 05:14):

A linked list type?

view this post on Zulip Mason Protter (Nov 25 2023 at 10:49):

it'd be kinda weird to make a linked list an AbstractArray, but people do crazy stuff

view this post on Zulip Sukera (Nov 25 2023 at 10:50):

there are no big-O requirements on the AbstractArray interface :shrugging:

view this post on Zulip Mason Protter (Nov 25 2023 at 10:51):

yep

view this post on Zulip Mason Protter (Nov 25 2023 at 10:53):

I guess some real examples out in the ecosystem would be stuff like https://github.com/JuliaArrays/LazyArrays.jl#concatenation and https://github.com/JuliaArrays/LazyArrays.jl#kronecker-products

view this post on Zulip Mason Protter (Nov 25 2023 at 10:53):

they're not particularly expensive to compute, but can become arbitrarily expensive through nesting

view this post on Zulip Sukera (Nov 25 2023 at 10:54):

there's also distributed arrays, with O(does your network like you today)

view this post on Zulip Mason Protter (Nov 25 2023 at 10:59):

julia> using LazyArrays

julia> let
           A = foldl(1:10, init=rand(3,3)) do A, _
               ApplyArray(kron, A, rand(3,3))
           end
           @benchmark length($A)
       end
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min … max):  1.143 μs …  2.572 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.195 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.211 μs ± 74.610 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

    ▁▆█▆▄▂
  ▁▂████████▆▅▄▃▃▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  1.14 μs        Histogram: frequency by time        1.62 μs <

 Memory estimate: 1.73 KiB, allocs estimate: 46.

view this post on Zulip Mason Protter (Nov 25 2023 at 11:04):

though to be fair, this really doesn't need to be so slow and they could have made it less slow

view this post on Zulip Mason Protter (Nov 25 2023 at 11:07):

ah okay, well if you just construct it using a flat structure instead of nesting like a psychopath, it's just O(number of arrays), but still.

view this post on Zulip Mason Protter (Nov 25 2023 at 11:07):

julia> let
           A = ApplyArray(kron, (rand(i,i) for i  1:10)...)
           @benchmark length($A)
       end
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  6.261 ns … 14.106 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     6.272 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   6.352 ns ±  0.197 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▆    ▁▅▅▅▁                               ▃▃               ▂
  ██▄▄▄▃█████▇▄▁▅█▇▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇▆▁▁███▇▅▆▁▅▇█▆▁▁▁▁▄█ █
  6.26 ns      Histogram: log(frequency) by time     7.09 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> let
           A = ApplyArray(kron, (rand(i,i) for i  1:20)...)
           @benchmark length($A)
       end
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
 Range (min … max):  11.854 ns … 15.595 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     11.995 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   12.084 ns ±  0.193 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

      ▅█ ▁▆▄▂                                       ▂ ▃▁
  ▂▃▅████████▇▅▃▁▄▄▃▃▂▂▂▂▂▂▂▂▂▂▁▂▃▄▅▆▅▁▅▆▅▅▅▆▆▃▅▃▄▄▇█▄██▄▄▃▃▂ ▄
  11.9 ns         Histogram: frequency by time        12.4 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

view this post on Zulip Sukera (Nov 25 2023 at 11:18):

well, turns out DArrays aren't resizeable, which makes their length a trivial O(1)

view this post on Zulip Brian Chen (Nov 25 2023 at 16:21):

Mason Protter said:

it'd be kinda weird to make a linked list an AbstractArray, but people do crazy stuff

A real-world example of this is LinkedList implementing the List API in Java

view this post on Zulip Brian Chen (Nov 25 2023 at 16:21):

O(N) random access and all

view this post on Zulip Mason Protter (Nov 25 2023 at 18:30):

Sure, but is it an AbstractArray?

view this post on Zulip Brian Chen (Nov 25 2023 at 18:51):

java.util.List is pretty close to AbstractVector. And funnily enough the old Vector class was retrofitted to implement the List interface

view this post on Zulip Timothy (Nov 25 2023 at 19:05):

Mason Protter said:

it'd be kinda weird to make a linked list an AbstractArray, but people do crazy stuff

I don't see what aspect of the Array API would be violated.

view this post on Zulip Mason Protter (Nov 25 2023 at 20:45):

None of it

view this post on Zulip Eric Hanson (Nov 25 2023 at 22:59):

I don’t think abstract array is a very useful concept to write performant algorithms against if one isn’t assuming O(1) random access and length checks. If it’s easy to avoid such assumptions I don’t see why not but I don’t really think one needs to go out of their way too far for it.

view this post on Zulip Mason Protter (Nov 25 2023 at 23:13):

Yeah. Ultimately, when people write methods targeting AbstractArray, regular Array is their first consideration, and any different subtype comes secondary.

view this post on Zulip Mason Protter (Nov 25 2023 at 23:14):

Array is the defacto standard, but you're allowed to create subtypes that deviate from Array's behaviour in pretty wild ways, with the implicit understanding that the further you deviate, the more likely it is that something will go wrong

view this post on Zulip Brian Chen (Nov 25 2023 at 23:41):

I wish there was a better and more scalable way of nudging people away from writing code which promises to take AbstractArrays but only really works for Array. Even measures such as asking for functions to add guards like Base.require_one_based_indexing seem to be a bridge too far :(

view this post on Zulip jar (Nov 26 2023 at 00:10):

function f(::Array)?

view this post on Zulip Brenhin Keller (Nov 26 2023 at 13:50):

Brian Chen said:

I wish there was a better and more scalable way of nudging people away from writing code which promises to take AbstractArrays but only really works for Array. Even measures such as asking for functions to add guards like Base.require_one_based_indexing seem to be a bridge too far :(

I just write my methods for ::DenseArray rather than ::AbstractArray if I want to be able to assume one-based indexing, and have been happy with that

view this post on Zulip Lilith Hafner (Nov 26 2023 at 14:27):

Are DenseArrays guarunteed to have one-based indexing?

help?> DenseArray
search: DenseArray DenseMatrix

  DenseArray{T, N} <: AbstractArray{T,N}

  N-dimensional dense array with elements of type T. The elements of a dense array are stored contiguously in memory.

view this post on Zulip Sukera (Nov 26 2023 at 14:54):

they are not

view this post on Zulip Mason Protter (Nov 26 2023 at 14:56):

These days I mostly think that offset arrays were a bad idea. We should have just demanded that abstract arrays do 1-based indexing, and then purused offset index types, rather than offset array types.

view this post on Zulip Mason Protter (Nov 26 2023 at 14:56):

hindsight is 20/20 though

view this post on Zulip Brian Chen (Nov 26 2023 at 16:25):

I argued something similar on Slack the other day. That said, I think this goes beyond index bases. For example, functions assuming their array inputs are mutable or that setindex! will work over the entire index range of the array.

view this post on Zulip Mason Protter (Nov 26 2023 at 16:27):

yep

view this post on Zulip Mason Protter (Nov 26 2023 at 16:31):

I do think though that the whole offset-array thing is one of the biggest offenders for AbstractArray problems. In particular, I'm pretty sure it's by far the worst for causing people to get silently incorrect results rather than errors, either because @inbounds was misused, or because the whole array wasn't indexed over (e.g. arr[1:5] or whatever)

view this post on Zulip Mason Protter (Nov 26 2023 at 16:33):

Getting a MethodError because your program tried to do setindex! but that wasn't supported is pretty benign by comparison.

view this post on Zulip Mason Protter (Nov 26 2023 at 16:38):

It's a big mental drain because a lot of the time, I'll write a function, and it's a perfectly fine AbstractArray function, except for the fact that it assumes 1-based indexing, so then I need to decide. Do I

  1. fix it to use arbitrary indexing? that's sometimes easy and sometimes pretty annoying
  2. constraint it to only take Array? Fine, but what if I want to use views, or Adjoint or whatever
  3. add a Base.require_one_based_indexing, I guess, but that's also just a kinda ugly kludge.

It's not a big deal, and usually there is a good solution from that list, but I don't like that I need to make that decision every time I encounter this, when it really could have just been solved once and for all by demanding 1-based indexing when you pass an Integer or CartesianIndex.

view this post on Zulip Timothy (Nov 26 2023 at 16:39):

Isn't there some wrapper type that makes one based indexing apply? I have a vague recollection.

view this post on Zulip Mason Protter (Nov 26 2023 at 16:40):

Welcome to array wrapper hell

view this post on Zulip Mason Protter (Nov 26 2023 at 16:40):

I don't recommend it.

view this post on Zulip Mason Protter (Nov 26 2023 at 16:43):

The easiest way to wrap your way out of it would be to just use another OffsetArray, i.e.

OffsetArray(A, map(s -> 1:s, size(A)))

but that's yet another unpleasant kldugy solution

view this post on Zulip aplavin (Nov 26 2023 at 17:24):

Note that one shouldn't blindly turn offsetarrays into 1-indexed ones in the function, and claim "now my code supports offsetarrays". Different indices is the actual reason to use offsetarrays in the first place! For some functions ignoring indices is fine, for many it isn't.
For example, fft(offsetarray) is different from fft(1-array with the same data).
Or, cor(A::AbstractArray, B::AbstractArray) = cor(collect(A), collect(B)) method if it existed would be wrong for offsetarrays.

view this post on Zulip Brian Chen (Nov 26 2023 at 17:55):

Mason Protter said:

  1. constraint it to only take Array? Fine, but what if I want to use views, or Adjoint or whatever

This is what makes me wonder if Array shouldn't have been more like StrideArray and internalized some of this information instead of relying on wrappers and crazy unions like StridedArray

view this post on Zulip Brian Chen (Nov 26 2023 at 17:58):

Mason Protter said:

Getting a MethodError because your program tried to do setindex! but that wasn't supported is pretty benign by comparison.

Was thinking more of a case where e.g. an in-place function assumes it can re-use parts of the output as scratch space, but then it gets passed a wrapper type like Diagonal and throws an ArgumentError trying to setindex! where it's not allowed to. This example is a little contrived, but I feel like similar issues have shown up in the wild beofre.

view this post on Zulip Brenhin Keller (Nov 26 2023 at 19:27):

Lilith Hafner said:

Are DenseArrays guarunteed to have one-based indexing?

Oh I may be getting confused with column-major, which they are documented as being (despite that not being in the docstring): https://github.com/JuliaLang/julia/blob/c30d45dfdf63c83eb864ebe924a715fe283013b4/doc/src/manual/arrays.md?plain=1#L1124C1-L1131C1

IMO if they're not, it might be worth requiring them or a subtype to be one-based too, since it might be useful to have that information in the type system

view this post on Zulip Sukera (Nov 26 2023 at 19:30):

much of the array library is implemented in a generic manner that allows all custom arrays to behave similarly.

:rolling_on_the_floor_laughing:

view this post on Zulip Lilith Hafner (Nov 26 2023 at 22:14):

It would not be breaking to add a OneBasedAbstractArray <: AbstractArray, but then things like SubArray would need to either 1) not subtype OneBasedAbstractArray or 2) split into OneBasedSubArray <: SubArray.

#5 would be the best solution. Non-dispatchable traits would be sufficient in this case and wouldn't have any of the ambiguity issues

view this post on Zulip Mason Protter (Nov 26 2023 at 22:39):

Brian Chen said:

Mason Protter said:

  1. constraint it to only take Array? Fine, but what if I want to use views, or Adjoint or whatever

This is what makes me wonder if Array shouldn't have been more like StrideArray and internalized some of this information instead of relying on wrappers and crazy unions like StridedArray

It's a difficult trade off. StrideArray has a million different opaque, undocumented type parameters that mean different things about the array. It certainly has really nice properties, but it does come at a cost to comprehension

view this post on Zulip Mason Protter (Nov 26 2023 at 22:40):

But I really do think the current situation with array wrappers is borderline intolerable.


Last updated: Nov 22 2024 at 04:41 UTC