Labelled/named arrays · helpdesk (published)

What's the current state of labelled arrays? I recall there were many approaches in early Julia, has something turned out to be the favourite?

To give some background, I'm considering transitioning my SynthControl.jl package over to use some sort of named arrays. Essentially the package solves a bunch of problems of the following form:

We have a set of outcomes Y::Matrix{Float64} which is size N by T, where N is the number of observed units and T the time periods during which they are observed. Now one (or more, but let's go with one for simplicity) of these units is "treated" at some point 1 < T0 < T and we want to find out how the outcome was affected by this treatment. The idea is to find a weighted combination of the other N-1 units that closely approximates the evolution of the outcome of the treated unit before T0. So essentially we do

minimize(w -> sum(abs2, Y[i, 1:T0] - Y[Not(i), 1:T0]*w))

Now every row in Y represents an observed unit, e.g. a US state, and every column a time, e.g. 1981. Likewise the w vector is a weight for every non-treated unit, so it would be nice to have, instead of

julia> s.treatment_panel.Y
39×31 Matrix{Float64}:
  89.8   95.4  101.1  102.9  108.2  (...)

julia> s.w
38-element Vector{Float64}:
 0.0
 0.0
 0.014810770450243859
 0.10908962424043207
 0.0
 0.0
(...)

julia> s.treatment_panel.Y
39×31 SomeMatrix{Float64}:
                  /  1980  1981  1982  1983  1984
Alabama |  89.8   95.4  101.1  102.9  108.2  (...)
Alaska      |

julia> s.w
38-element Vector{Float64}:
Alabama |  0.0
Alaska |  0.0
Arkansas | 0.014810770450243859
(...)

Ideally with as little overhead as possible for things like Y*w. Any suggestions?

Nils (Feb 08 2024 at 14:50):

julia> y_test = NamedArray(s.treatment_panel.Y, (s.treatment_panel.is, s.treatment_panel.ts), ("State", "Year"))
39×31 Named Matrix{Float64}
State ╲ Year │  1970   1971   1972   1973   1974   1975   1976   1977   1978   1979   1980   1981   1982  …   1988   1989   1990   1991   1992   1993   1994   1995   1996   1997   1998   1999   2000
─────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1            │  89.8   95.4  101.1  102.9  108.2  111.7  116.2  117.1  123.0  121.4  123.2  119.6  119.1  …  112.1  105.6  108.6  107.9  109.1  108.5  107.1  102.6  101.4  104.9  106.2  100.7   96.2
2            │ 100.3  104.1  103.9  108.0  109.7  114.8  119.1  122.6  127.3  126.5  131.8  128.7  127.4     121.5  118.3  113.1  116.8  126.0  113.8  108.8  113.0  110.7  108.7  109.5  104.8   99.4
3            │ 123.0  121.0  123.5  124.4  126.7  127.1  128.0  126.4  126.1  121.9  120.2  118.6  115.4      90.1   82.4   77.8   68.7   67.5   63.4   58.6   56.4   54.5   53.8   52.3   47.2   41.6
(...)

julia> w_test = NamedArray(s.w, filter(!=(3), s.treatment_panel.is), "State")
38-element Named Vector{Float64}
State  │
───────┼──────────
1      │       0.0
2      │       0.0
4      │ 0.0148108
5      │   0.10909
(...)

julia> y_test[Not(3), :]'*s.w
31-element Named Vector{Float64}
Year  │
──────┼────────
1970  │ 117.424
1971  │ 119.823
1972  │ 124.646
1973  │ 124.367

jar (Feb 08 2024 at 19:49):

aplavin (Feb 09 2024 at 02:02):

Yeah, AxisKeys have the most lightweight data structure among all "keyed arrays" packages that I saw.

jar (Feb 09 2024 at 04:24):

AxisKeys also distinguishes between selecting from the axes vs from the axiskeys, which I think is important for a clean interface.

Nils (Feb 09 2024 at 09:23):

Michael Abbott (Feb 09 2024 at 19:39):

AxisKeys.jl was my attempt, trying to be fairly lightweight and make few assumptions. It could still be much simpler (e.g. I think double-wrapping with NamedDims.jl ends up more complex than putting all the info in one struct). The ideal for me is something as natural & inevitable as Base's NamedTuple. But I don't use it much in the end & should probably hand over maintenance somehow.

DimensionalData.jl is probably the most actively developed package, and the largest, builds in many things instead of farming out? Aimed at spatial data, special meanings to X, Y. In my ideal world all of this could be built on top of some minimal NamedTuple-esque package... but this is unlikely to happen.

AxisArrays.jl is older, and seemed abandoned for a bit (when the above two were written), has many undocumented features. But is still in use, e.g. I think by the Images.jl ecosystem. It worked hard to be low-overhead, and some of this hard work was aimed at Julia <1 & could be simplified.

NamedArrays.jl is also older. It's generally much more mutable, and I think doesn't try so hard to be type-stable etc. Got the best name though!

Andy Dienes (Feb 09 2024 at 22:45):

I know https://github.com/JuliaDataCubes/YAXArrays.jl exists as well. though I don't know much about it, it does seem actively developed

jar (Feb 09 2024 at 23:00):

Michael Abbott (Feb 09 2024 at 23:54):

Alec (Feb 10 2024 at 05:49):

I use LabelledArrays.jl in places where I would normally use a named tuple but the interface requires an array.