What's the current state of labelled arrays? I recall there were many approaches in early Julia, has something turned out to be the favourite?
To give some background, I'm considering transitioning my SynthControl.jl package over to use some sort of named arrays. Essentially the package solves a bunch of problems of the following form:
We have a set of outcomes Y::Matrix{Float64} which is size N by T, where N is the number of observed units and T the time periods during which they are observed. Now one (or more, but let's go with one for simplicity) of these units is "treated" at some point 1 < T0 < T and we want to find out how the outcome was affected by this treatment. The idea is to find a weighted combination of the other N-1 units that closely approximates the evolution of the outcome of the treated unit before T0. So essentially we do
minimize(w -> sum(abs2, Y[i, 1:T0] - Y[Not(i), 1:T0]*w))
Now every row in Y represents an observed unit, e.g. a US state, and every column a time, e.g. 1981. Likewise the w vector is a weight for every non-treated unit, so it would be nice to have, instead of
julia> s.treatment_panel.Y
39×31 Matrix{Float64}:
89.8 95.4 101.1 102.9 108.2 (...)
julia> s.w
38-element Vector{Float64}:
0.0
0.0
0.014810770450243859
0.10908962424043207
0.0
0.0
(...)
To have something like
julia> s.treatment_panel.Y
39×31 SomeMatrix{Float64}:
/ 1980 1981 1982 1983 1984
Alabama | 89.8 95.4 101.1 102.9 108.2 (...)
Alaska |
julia> s.w
38-element Vector{Float64}:
Alabama | 0.0
Alaska | 0.0
Arkansas | 0.014810770450243859
(...)
Ideally with as little overhead as possible for things like Y*w. Any suggestions?
NamedArrays seems pretty good:
julia> y_test = NamedArray(s.treatment_panel.Y, (s.treatment_panel.is, s.treatment_panel.ts), ("State", "Year"))
39×31 Named Matrix{Float64}
State ╲ Year │ 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 … 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000
─────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ 89.8 95.4 101.1 102.9 108.2 111.7 116.2 117.1 123.0 121.4 123.2 119.6 119.1 … 112.1 105.6 108.6 107.9 109.1 108.5 107.1 102.6 101.4 104.9 106.2 100.7 96.2
2 │ 100.3 104.1 103.9 108.0 109.7 114.8 119.1 122.6 127.3 126.5 131.8 128.7 127.4 121.5 118.3 113.1 116.8 126.0 113.8 108.8 113.0 110.7 108.7 109.5 104.8 99.4
3 │ 123.0 121.0 123.5 124.4 126.7 127.1 128.0 126.4 126.1 121.9 120.2 118.6 115.4 90.1 82.4 77.8 68.7 67.5 63.4 58.6 56.4 54.5 53.8 52.3 47.2 41.6
(...)
julia> w_test = NamedArray(s.w, filter(!=(3), s.treatment_panel.is), "State")
38-element Named Vector{Float64}
State │
───────┼──────────
1 │ 0.0
2 │ 0.0
4 │ 0.0148108
5 │ 0.10909
(...)
julia> y_test[Not(3), :]'*s.w
31-element Named Vector{Float64}
Year │
──────┼────────
1970 │ 117.424
1971 │ 119.823
1972 │ 124.646
1973 │ 124.367
Any known downsides to this?
AxisKeys.jl is my favorite
Yeah, AxisKeys have the most lightweight data structure among all "keyed arrays" packages that I saw.
AxisKeys also distinguishes between selecting from the axes vs from the axiskeys, which I think is important for a clean interface.
Thanks I'll try it out
Sadly we never got the ecosystem to consolidate on one.
AxisKeys.jl was my attempt, trying to be fairly lightweight and make few assumptions. It could still be much simpler (e.g. I think double-wrapping with NamedDims.jl ends up more complex than putting all the info in one struct). The ideal for me is something as natural & inevitable as Base's NamedTuple. But I don't use it much in the end & should probably hand over maintenance somehow.
DimensionalData.jl is probably the most actively developed package, and the largest, builds in many things instead of farming out? Aimed at spatial data, special meanings to X, Y. In my ideal world all of this could be built on top of some minimal NamedTuple-esque package... but this is unlikely to happen.
AxisArrays.jl is older, and seemed abandoned for a bit (when the above two were written), has many undocumented features. But is still in use, e.g. I think by the Images.jl ecosystem. It worked hard to be low-overhead, and some of this hard work was aimed at Julia <1 & could be simplified.
NamedArrays.jl is also older. It's generally much more mutable, and I think doesn't try so hard to be type-stable etc. Got the best name though!
I know https://github.com/JuliaDataCubes/YAXArrays.jl exists as well. though I don't know much about it, it does seem actively developed
Oh right I forgot that one. It's built on top of DimensionalData.jl I think.
There's also at least one such thing built into JuMP.jl e.g. here https://jump.dev/JuMP.jl/stable/manual/containers/
I use LabelledArrays.jl in places where I would normally use a named tuple but the interface requires an array.
So I'm finally getting around to rewriting my TreatmenPanels.jl package using AxisKeys.jl thanks to this thread. It's going reasonably well, albeit the documentation is a little... sparse.
The main issue I'm having is that a lot of the API seems to be based around kwargs, and I'm not sure how to deal with this in cases where the dimension names are stored in variables. Generally my package will have an observation identifier and a time identifier, something like data(id = "Peru", t = Date(2022)). But I want the names of the dimensions to be given by users, i.e. I have a function make_data(matrix, id_var, t_var) which someone might call with make_data(some_data, "country", "year").
I've found the function rekey but that just changes the actual keys of a dimension, not the name of the dimension. I tried poking around in the internals a bit and it appears the dimension names are encoded in the type of the NamedDimsArray that's part of the KeyedArray, so I'm wondering if what I'm trying to do just isn't supported?
One workaround would be to splat kwarg tuples like data(; (id_var => "Peru", t_var => Date(2022))...) but that seems a bit hacky
I consider splatting kwargs more a normal thing than hacky. But if you prefer you can splat from a dictionary or any collection of two-element members. (The keys need to be symbols though.)
Fair enough, I'll go down that route then, thank you.
Ah sorry I missed the bit about keys having to be symbols - is that a general restriction of AxisKeys?
I don't know anything about AxisKeys but keyword argument names must be symbols.
Yes, splatting is the way to go in this scenario. If names aren't known in compiletime, there'll be a dynamic dispatch, but this is fundamentally unavoidable – the type of the result differs anyways.
So, either data(; (id_var => "Peru", t_var => Date(2022))...) or data(; NamedTuple{(id_var, t_var)}("Peru", Date(2022))...) depending on what fits best.
Or data("Peru", Date(2022)) of course, if you can rely on the axis order.
Ah yes I can rely on the order, as I don't allow users to create the KeyedArray themselves and afaiu this can't be changed after construction
Yes, axis assignment is immutable
Last updated: Nov 07 2025 at 04:42 UTC