Expensive `length(::AbstractArray)` · helpdesk (published)

Stream: helpdesk (published)

Topic: Expensive `length(::AbstractArray)`

Lilith Hafner (Nov 24 2023 at 14:29):

Does anyone know of an AbstractArray subtype that has an expensive length method? ref

Mason Protter (Nov 24 2023 at 14:37):

Here's one

struct MyArray{T, N} <: AbstractVector{T, N}
     A::Array{T, N}
end
Base.size(a::MyArray) = (sleep(10); size(a.A))
Base.getindex(a::MyArray, i...) = getindex(a.A, i...)

Mason Protter (Nov 24 2023 at 14:40):

More seriously, I doubt there are many cases where the length is expensive

Mason Protter (Nov 24 2023 at 14:40):

but it's not hard to dream up such cases

Mason Protter (Nov 24 2023 at 14:40):

so I don't really see the harm in trying to avoid calling it twice if possible

Timothy (Nov 25 2023 at 05:14):

A linked list type?

Mason Protter (Nov 25 2023 at 10:49):

it'd be kinda weird to make a linked list an AbstractArray, but people do crazy stuff

Sukera (Nov 25 2023 at 10:50):

there are no big-O requirements on the AbstractArray interface

Mason Protter (Nov 25 2023 at 10:51):

yep

Mason Protter (Nov 25 2023 at 10:53):

I guess some real examples out in the ecosystem would be stuff like https://github.com/JuliaArrays/LazyArrays.jl#concatenation and https://github.com/JuliaArrays/LazyArrays.jl#kronecker-products

Mason Protter (Nov 25 2023 at 10:53):

they're not particularly expensive to compute, but can become arbitrarily expensive through nesting

Sukera (Nov 25 2023 at 10:54):

there's also distributed arrays, with O(does your network like you today)

Mason Protter (Nov 25 2023 at 10:59):

julia> using LazyArrays

julia> let
           A = foldl(1:10, init=rand(3,3)) do A, _
               ApplyArray(kron, A, rand(3,3))
           end
           @benchmark length($A)
       end
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min … max):  1.143 μs …  2.572 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.195 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.211 μs ± 74.610 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

    ▁▆█▆▄▂
  ▁▂████████▆▅▄▃▃▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  1.14 μs        Histogram: frequency by time        1.62 μs <

 Memory estimate: 1.73 KiB, allocs estimate: 46.

Mason Protter (Nov 25 2023 at 11:04):

though to be fair, this really doesn't need to be so slow and they could have made it less slow

Mason Protter (Nov 25 2023 at 11:07):

ah okay, well if you just construct it using a flat structure instead of nesting like a psychopath, it's just O(number of arrays), but still.

Mason Protter (Nov 25 2023 at 11:07):

julia> let
           A = ApplyArray(kron, (rand(i,i) for i ∈ 1:10)...)
           @benchmark length($A)
       end
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  6.261 ns … 14.106 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     6.272 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   6.352 ns ±  0.197 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▆    ▁▅▅▅▁                               ▃▃               ▂
  ██▄▄▄▃█████▇▄▁▅█▇▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇▆▁▁███▇▅▆▁▅▇█▆▁▁▁▁▄█ █
  6.26 ns      Histogram: log(frequency) by time     7.09 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> let
           A = ApplyArray(kron, (rand(i,i) for i ∈ 1:20)...)
           @benchmark length($A)
       end
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
 Range (min … max):  11.854 ns … 15.595 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     11.995 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   12.084 ns ±  0.193 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

      ▅█ ▁▆▄▂                                       ▂ ▃▁
  ▂▃▅████████▇▅▃▁▄▄▃▃▂▂▂▂▂▂▂▂▂▂▁▂▃▄▅▆▅▁▅▆▅▅▅▆▆▃▅▃▄▄▇█▄██▄▄▃▃▂ ▄
  11.9 ns         Histogram: frequency by time        12.4 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Sukera (Nov 25 2023 at 11:18):

well, turns out DArrays aren't resizeable, which makes their length a trivial O(1)

Brian Chen (Nov 25 2023 at 16:21):

Mason Protter said:

it'd be kinda weird to make a linked list an AbstractArray, but people do crazy stuff

A real-world example of this is LinkedList implementing the List API in Java

Brian Chen (Nov 25 2023 at 16:21):

O(N) random access and all

Mason Protter (Nov 25 2023 at 18:30):

Sure, but is it an AbstractArray?

Brian Chen (Nov 25 2023 at 18:51):

java.util.List is pretty close to AbstractVector. And funnily enough the old Vector class was retrofitted to implement the List interface

Timothy (Nov 25 2023 at 19:05):

Mason Protter said:

it'd be kinda weird to make a linked list an AbstractArray, but people do crazy stuff

I don't see what aspect of the Array API would be violated.

Mason Protter (Nov 25 2023 at 20:45):

None of it

Eric Hanson (Nov 25 2023 at 22:59):

I don’t think abstract array is a very useful concept to write performant algorithms against if one isn’t assuming O(1) random access and length checks. If it’s easy to avoid such assumptions I don’t see why not but I don’t really think one needs to go out of their way too far for it.

Mason Protter (Nov 25 2023 at 23:13):

Yeah. Ultimately, when people write methods targeting AbstractArray, regular Array is their first consideration, and any different subtype comes secondary.

Mason Protter (Nov 25 2023 at 23:14):

Array is the defacto standard, but you're allowed to create subtypes that deviate from Array's behaviour in pretty wild ways, with the implicit understanding that the further you deviate, the more likely it is that something will go wrong

Brian Chen (Nov 25 2023 at 23:41):

I wish there was a better and more scalable way of nudging people away from writing code which promises to take AbstractArrays but only really works for Array. Even measures such as asking for functions to add guards like Base.require_one_based_indexing seem to be a bridge too far :(

jar (Nov 26 2023 at 00:10):

function f(::Array)?

Brenhin Keller (Nov 26 2023 at 13:50):

Brian Chen said:

I wish there was a better and more scalable way of nudging people away from writing code which promises to take AbstractArrays but only really works for Array. Even measures such as asking for functions to add guards like Base.require_one_based_indexing seem to be a bridge too far :(

I just write my methods for ::DenseArray rather than ::AbstractArray if I want to be able to assume one-based indexing, and have been happy with that

Lilith Hafner (Nov 26 2023 at 14:27):

Are DenseArrays guarunteed to have one-based indexing?

help?> DenseArray
search: DenseArray DenseMatrix

  DenseArray{T, N} <: AbstractArray{T,N}

  N-dimensional dense array with elements of type T. The elements of a dense array are stored contiguously in memory.

Sukera (Nov 26 2023 at 14:54):

they are not

Mason Protter (Nov 26 2023 at 14:56):

These days I mostly think that offset arrays were a bad idea. We should have just demanded that abstract arrays do 1-based indexing, and then purused offset index types, rather than offset array types.

Mason Protter (Nov 26 2023 at 14:56):

hindsight is 20/20 though

Brian Chen (Nov 26 2023 at 16:25):

I argued something similar on Slack the other day. That said, I think this goes beyond index bases. For example, functions assuming their array inputs are mutable or that setindex! will work over the entire index range of the array.

Mason Protter (Nov 26 2023 at 16:27):

yep

Mason Protter (Nov 26 2023 at 16:31):

I do think though that the whole offset-array thing is one of the biggest offenders for AbstractArray problems. In particular, I'm pretty sure it's by far the worst for causing people to get silently incorrect results rather than errors, either because @inbounds was misused, or because the whole array wasn't indexed over (e.g. arr[1:5] or whatever)

Mason Protter (Nov 26 2023 at 16:33):

Getting a MethodError because your program tried to do setindex! but that wasn't supported is pretty benign by comparison.

Mason Protter (Nov 26 2023 at 16:38):

It's a big mental drain because a lot of the time, I'll write a function, and it's a perfectly fine AbstractArray function, except for the fact that it assumes 1-based indexing, so then I need to decide. Do I

fix it to use arbitrary indexing? that's sometimes easy and sometimes pretty annoying
constraint it to only take Array? Fine, but what if I want to use views, or Adjoint or whatever
add a Base.require_one_based_indexing, I guess, but that's also just a kinda ugly kludge.

It's not a big deal, and usually there is a good solution from that list, but I don't like that I need to make that decision every time I encounter this, when it really could have just been solved once and for all by demanding 1-based indexing when you pass an Integer or CartesianIndex.

Timothy (Nov 26 2023 at 16:39):

Isn't there some wrapper type that makes one based indexing apply? I have a vague recollection.

Mason Protter (Nov 26 2023 at 16:40):

Welcome to array wrapper hell

Mason Protter (Nov 26 2023 at 16:40):

I don't recommend it.

Mason Protter (Nov 26 2023 at 16:43):

The easiest way to wrap your way out of it would be to just use another OffsetArray, i.e.

OffsetArray(A, map(s -> 1:s, size(A)))

but that's yet another unpleasant kldugy solution

aplavin (Nov 26 2023 at 17:24):

Note that one shouldn't blindly turn offsetarrays into 1-indexed ones in the function, and claim "now my code supports offsetarrays". Different indices is the actual reason to use offsetarrays in the first place! For some functions ignoring indices is fine, for many it isn't.
For example, fft(offsetarray) is different from fft(1-array with the same data).
Or, cor(A::AbstractArray, B::AbstractArray) = cor(collect(A), collect(B)) method if it existed would be wrong for offsetarrays.

Brian Chen (Nov 26 2023 at 17:55):

Mason Protter said:

constraint it to only take Array? Fine, but what if I want to use views, or Adjoint or whatever

This is what makes me wonder if Array shouldn't have been more like StrideArray and internalized some of this information instead of relying on wrappers and crazy unions like StridedArray

Brian Chen (Nov 26 2023 at 17:58):

Mason Protter said:

Getting a MethodError because your program tried to do setindex! but that wasn't supported is pretty benign by comparison.

Was thinking more of a case where e.g. an in-place function assumes it can re-use parts of the output as scratch space, but then it gets passed a wrapper type like Diagonal and throws an ArgumentError trying to setindex! where it's not allowed to. This example is a little contrived, but I feel like similar issues have shown up in the wild beofre.

Brenhin Keller (Nov 26 2023 at 19:27):

Lilith Hafner said:

Are DenseArrays guarunteed to have one-based indexing?

Oh I may be getting confused with column-major, which they are documented as being (despite that not being in the docstring): https://github.com/JuliaLang/julia/blob/c30d45dfdf63c83eb864ebe924a715fe283013b4/doc/src/manual/arrays.md?plain=1#L1124C1-L1131C1

IMO if they're not, it might be worth requiring them or a subtype to be one-based too, since it might be useful to have that information in the type system

Sukera (Nov 26 2023 at 19:30):

much of the array library is implemented in a generic manner that allows all custom arrays to behave similarly.

:rolling_on_the_floor_laughing:

Lilith Hafner (Nov 26 2023 at 22:14):

It would not be breaking to add a OneBasedAbstractArray <: AbstractArray, but then things like SubArray would need to either 1) not subtype OneBasedAbstractArray or 2) split into OneBasedSubArray <: SubArray.

#5 would be the best solution. Non-dispatchable traits would be sufficient in this case and wouldn't have any of the ambiguity issues

Mason Protter (Nov 26 2023 at 22:39):

Brian Chen said:

Mason Protter said:

constraint it to only take Array? Fine, but what if I want to use views, or Adjoint or whatever

This is what makes me wonder if Array shouldn't have been more like StrideArray and internalized some of this information instead of relying on wrappers and crazy unions like StridedArray

It's a difficult trade off. StrideArray has a million different opaque, undocumented type parameters that mean different things about the array. It certainly has really nice properties, but it does come at a cost to comprehension

Mason Protter (Nov 26 2023 at 22:40):

But I really do think the current situation with array wrappers is borderline intolerable.

Last updated: Aug 14 2025 at 04:51 UTC