Does anyone know of an AbstractArray subtype that has an expensive length method? ref
Here's one
struct MyArray{T, N} <: AbstractVector{T, N}
A::Array{T, N}
end
Base.size(a::MyArray) = (sleep(10); size(a.A))
Base.getindex(a::MyArray, i...) = getindex(a.A, i...)
More seriously, I doubt there are many cases where the length is expensive
but it's not hard to dream up such cases
so I don't really see the harm in trying to avoid calling it twice if possible
A linked list type?
it'd be kinda weird to make a linked list an AbstractArray
, but people do crazy stuff
there are no big-O requirements on the AbstractArray
interface
yep
I guess some real examples out in the ecosystem would be stuff like https://github.com/JuliaArrays/LazyArrays.jl#concatenation and https://github.com/JuliaArrays/LazyArrays.jl#kronecker-products
they're not particularly expensive to compute, but can become arbitrarily expensive through nesting
there's also distributed arrays, with O(does your network like you today)
julia> using LazyArrays
julia> let
A = foldl(1:10, init=rand(3,3)) do A, _
ApplyArray(kron, A, rand(3,3))
end
@benchmark length($A)
end
BenchmarkTools.Trial: 10000 samples with 10 evaluations.
Range (min … max): 1.143 μs … 2.572 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 1.195 μs ┊ GC (median): 0.00%
Time (mean ± σ): 1.211 μs ± 74.610 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▁▆█▆▄▂
▁▂████████▆▅▄▃▃▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
1.14 μs Histogram: frequency by time 1.62 μs <
Memory estimate: 1.73 KiB, allocs estimate: 46.
though to be fair, this really doesn't need to be so slow and they could have made it less slow
ah okay, well if you just construct it using a flat structure instead of nesting like a psychopath, it's just O(number of arrays)
, but still.
julia> let
A = ApplyArray(kron, (rand(i,i) for i ∈ 1:10)...)
@benchmark length($A)
end
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
Range (min … max): 6.261 ns … 14.106 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 6.272 ns ┊ GC (median): 0.00%
Time (mean ± σ): 6.352 ns ± 0.197 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█▆ ▁▅▅▅▁ ▃▃ ▂
██▄▄▄▃█████▇▄▁▅█▇▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇▆▁▁███▇▅▆▁▅▇█▆▁▁▁▁▄█ █
6.26 ns Histogram: log(frequency) by time 7.09 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> let
A = ApplyArray(kron, (rand(i,i) for i ∈ 1:20)...)
@benchmark length($A)
end
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
Range (min … max): 11.854 ns … 15.595 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 11.995 ns ┊ GC (median): 0.00%
Time (mean ± σ): 12.084 ns ± 0.193 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▅█ ▁▆▄▂ ▂ ▃▁
▂▃▅████████▇▅▃▁▄▄▃▃▂▂▂▂▂▂▂▂▂▂▁▂▃▄▅▆▅▁▅▆▅▅▅▆▆▃▅▃▄▄▇█▄██▄▄▃▃▂ ▄
11.9 ns Histogram: frequency by time 12.4 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
well, turns out DArray
s aren't resizeable, which makes their length
a trivial O(1)
Mason Protter said:
it'd be kinda weird to make a linked list an
AbstractArray
, but people do crazy stuff
A real-world example of this is LinkedList implementing the List API in Java
O(N) random access and all
Sure, but is it an AbstractArray?
java.util.List is pretty close to AbstractVector. And funnily enough the old Vector class was retrofitted to implement the List interface
Mason Protter said:
it'd be kinda weird to make a linked list an
AbstractArray
, but people do crazy stuff
I don't see what aspect of the Array API would be violated.
None of it
I don’t think abstract array is a very useful concept to write performant algorithms against if one isn’t assuming O(1) random access and length checks. If it’s easy to avoid such assumptions I don’t see why not but I don’t really think one needs to go out of their way too far for it.
Yeah. Ultimately, when people write methods targeting AbstractArray
, regular Array
is their first consideration, and any different subtype comes secondary.
Array
is the defacto standard, but you're allowed to create subtypes that deviate from Array
's behaviour in pretty wild ways, with the implicit understanding that the further you deviate, the more likely it is that something will go wrong
I wish there was a better and more scalable way of nudging people away from writing code which promises to take AbstractArrays but only really works for Array. Even measures such as asking for functions to add guards like Base.require_one_based_indexing
seem to be a bridge too far :(
function f(::Array)
?
Brian Chen said:
I wish there was a better and more scalable way of nudging people away from writing code which promises to take AbstractArrays but only really works for Array. Even measures such as asking for functions to add guards like
Base.require_one_based_indexing
seem to be a bridge too far :(
I just write my methods for ::DenseArray
rather than ::AbstractArray
if I want to be able to assume one-based indexing, and have been happy with that
Are DenseArray
s guarunteed to have one-based indexing?
help?> DenseArray
search: DenseArray DenseMatrix
DenseArray{T, N} <: AbstractArray{T,N}
N-dimensional dense array with elements of type T. The elements of a dense array are stored contiguously in memory.
they are not
These days I mostly think that offset arrays were a bad idea. We should have just demanded that abstract arrays do 1
-based indexing, and then purused offset index types, rather than offset array types.
hindsight is 20/20 though
I argued something similar on Slack the other day. That said, I think this goes beyond index bases. For example, functions assuming their array inputs are mutable or that setindex! will work over the entire index range of the array.
yep
I do think though that the whole offset-array thing is one of the biggest offenders for AbstractArray
problems. In particular, I'm pretty sure it's by far the worst for causing people to get silently incorrect results rather than errors, either because @inbounds
was misused, or because the whole array wasn't indexed over (e.g. arr[1:5]
or whatever)
Getting a MethodError
because your program tried to do setindex!
but that wasn't supported is pretty benign by comparison.
It's a big mental drain because a lot of the time, I'll write a function, and it's a perfectly fine AbstractArray
function, except for the fact that it assumes 1-based indexing, so then I need to decide. Do I
Array
? Fine, but what if I want to use views, or Adjoint
or whateverBase.require_one_based_indexing
, I guess, but that's also just a kinda ugly kludge.It's not a big deal, and usually there is a good solution from that list, but I don't like that I need to make that decision every time I encounter this, when it really could have just been solved once and for all by demanding 1-based indexing when you pass an Integer
or CartesianIndex
.
Isn't there some wrapper type that makes one based indexing apply? I have a vague recollection.
Welcome to array wrapper hell
I don't recommend it.
The easiest way to wrap your way out of it would be to just use another OffsetArray
, i.e.
OffsetArray(A, map(s -> 1:s, size(A)))
but that's yet another unpleasant kldugy solution
Note that one shouldn't blindly turn offsetarrays into 1-indexed ones in the function, and claim "now my code supports offsetarrays". Different indices is the actual reason to use offsetarrays in the first place! For some functions ignoring indices is fine, for many it isn't.
For example, fft(offsetarray)
is different from fft(1-array with the same data)
.
Or, cor(A::AbstractArray, B::AbstractArray) = cor(collect(A), collect(B))
method if it existed would be wrong for offsetarrays.
Mason Protter said:
- constraint it to only take
Array
? Fine, but what if I want to use views, orAdjoint
or whatever
This is what makes me wonder if Array
shouldn't have been more like StrideArray
and internalized some of this information instead of relying on wrappers and crazy unions like StridedArray
Mason Protter said:
Getting a
MethodError
because your program tried to dosetindex!
but that wasn't supported is pretty benign by comparison.
Was thinking more of a case where e.g. an in-place function assumes it can re-use parts of the output as scratch space, but then it gets passed a wrapper type like Diagonal
and throws an ArgumentError trying to setindex!
where it's not allowed to. This example is a little contrived, but I feel like similar issues have shown up in the wild beofre.
Lilith Hafner said:
Are
DenseArray
s guarunteed to have one-based indexing?
Oh I may be getting confused with column-major, which they are documented as being (despite that not being in the docstring): https://github.com/JuliaLang/julia/blob/c30d45dfdf63c83eb864ebe924a715fe283013b4/doc/src/manual/arrays.md?plain=1#L1124C1-L1131C1
IMO if they're not, it might be worth requiring them or a subtype to be one-based too, since it might be useful to have that information in the type system
much of the array library is implemented in a generic manner that allows all custom arrays to behave similarly.
:rolling_on_the_floor_laughing:
It would not be breaking to add a OneBasedAbstractArray <: AbstractArray
, but then things like SubArray
would need to either 1) not subtype OneBasedAbstractArray
or 2) split into OneBasedSubArray <: SubArray
.
#5 would be the best solution. Non-dispatchable traits would be sufficient in this case and wouldn't have any of the ambiguity issues
Brian Chen said:
Mason Protter said:
- constraint it to only take
Array
? Fine, but what if I want to use views, orAdjoint
or whateverThis is what makes me wonder if
Array
shouldn't have been more likeStrideArray
and internalized some of this information instead of relying on wrappers and crazy unions likeStridedArray
It's a difficult trade off. StrideArray
has a million different opaque, undocumented type parameters that mean different things about the array. It certainly has really nice properties, but it does come at a cost to comprehension
But I really do think the current situation with array wrappers is borderline intolerable.
Last updated: Nov 22 2024 at 04:41 UTC