Suppose I have a dataframe like this:
julia> map(Iterators.product('a':'c', 1:3)) do (x, y)
           (;x, y, z=rand())
       end[:] |> DataFrame
9×3 DataFrame
 Row │ x     y      z
     │ Char  Int64  Float64
─────┼────────────────────────
   1 │ a         1  0.74997
   2 │ b         1  0.828512
   3 │ c         1  0.840793
   4 │ a         2  0.97316
   5 │ b         2  0.329049
   6 │ c         2  0.963159
   7 │ a         3  0.263909
   8 │ b         3  0.0101475
   9 │ c         3  0.162562
Is there a natural way to kinda disaggregate this back into a matrix? I.e. I want one dimension of the array to correspond to the unique values of the x column, another dimension to correspond to the unique values of the y column, and then entries of the matrix would be the corresponding entries in the dataframe
I could do this like so:
julia> map(Iterators.product(unique(df.x), unique(df.y))) do (x, y)
           row = only(filter(row -> row.x == x && row.y == y, df))
           row.z
       end
3×3 Matrix{Float64}:
 0.71842   0.992283   0.102177
 0.901056  0.481119   0.835783
 0.921512  0.0621665  0.822107
but this is very inefficient, and I suspect there's a smart way to do this that I'm not seeing
Here is one way:
julia> using DataFrames, AxisKeys
julia> df = map(Iterators.product('a':'c', 1:3)) do (x, y)
                  (;x, y, z=rand())
              end[:] |> DataFrame
9×3 DataFrame
 Row │ x     y      z
     │ Char  Int64  Float64
─────┼────────────────────────
   1 │ a         1  0.972967
   2 │ b         1  0.255974
   3 │ c         1  0.0945194
   4 │ a         2  0.621327
   5 │ b         2  0.0908171
   6 │ c         2  0.763769
   7 │ a         3  0.342196
   8 │ b         3  0.187913
   9 │ c         3  0.972685
julia> wrapdims(df, :z, :x, :y)
2-dimensional KeyedArray(NamedDimsArray(...)) with keys:
↓   x ∈ 3-element Vector{Char}
→   y ∈ 3-element Vector{Int64}
And data, 3×3 Matrix{Float64}:
         (1)          (2)          (3)
  ('a')    0.972967     0.621327     0.342196
  ('b')    0.255974     0.0908171    0.187913
  ('c')    0.0945194    0.763769     0.972685
But am far from an expert at these things. I don't know if this is efficient or not.
julia> (df = map(Iterators.product('a':'c', 1:3)) do (x, y)
                  (;x, y)
              end[:] |> DataFrame); df.z = 1:9;df
9×3 DataFrame
 Row │ x     y      z
     │ Char  Int64  Int64
─────┼────────────────────
   1 │ a         1      1
   2 │ b         1      2
   3 │ c         1      3
   4 │ a         2      4
   5 │ b         2      5
   6 │ c         2      6
   7 │ a         3      7
   8 │ b         3      8
   9 │ c         3      9
julia> select(unstack(df, :x,:y,:z), Not(:x))
3×3 DataFrame
 Row │ 1       2       3
     │ Int64?  Int64?  Int64?
─────┼────────────────────────
   1 │      1       4       7
   2 │      2       5       8
   3 │      3       6       9
Nice, thanks guys. AxisKeys is a very slick solution. It's about an order of magnitude slower than @jar's solution, but also an order of magnitude faster than my attempt with filter.
My problem with Jar's solution though is that it's not a dense matrix at the end, but rather a collection of vectors
The fastest method I've fouind so far is
let arr = Matrix{Float64}(undef, length(unique($df.x)), length(unique($df.y)))
    gdf = groupby($df, :x)
    for (i, sdf) ∈ enumerate(gdf)
        arr[i, :] .= sdf.z
    end
    arr
end
but it's ugly and not very generic
How does something like Matrix{Float64}(unstack(df, :x,:y,:z)[:, 2:end]) perform?
Seems about as fast as Jar's solution
Thanks guys!
Last updated: Oct 25 2025 at 04:39 UTC