Stream: helpdesk (published)

Topic: ✔ get rows of dataframes which are maximal in some value


view this post on Zulip Mason Protter (Jan 26 2022 at 00:12):

Nice, I like that. Thanks Nils and Andrey!

view this post on Zulip Notification Bot (Jan 26 2022 at 00:12):

Mason Protter has marked this topic as resolved.

view this post on Zulip Eric Hanson (Jan 28 2022 at 00:20):

Nils said:

(this will of course return more than one row for y groups in which the maximum value of x occurs multiple times)

That's not true, right? Since argmax always returns a single index. E.g. modifying the example a bit to be degenerate,

julia> df = DataFrame(x = [1, 2, 3, 1, 2, 3, 1, 2], y = [:foo, :foo, :foo, :bar, :bar, :bar, :baz, :baz], z = ones(8))
8×3 DataFrame
 Row  x      y       z
      Int64  Symbol  Float64
─────┼────────────────────────
   1      1  foo         1.0
   2      2  foo         1.0
   3      3  foo         1.0
   4      1  bar         1.0
   5      2  bar         1.0
   6      3  bar         1.0
   7      1  baz         1.0
   8      2  baz         1.0

julia> combine(groupby(df, :y), sdf -> sdf[argmax(sdf.x), :])
3×3 DataFrame
 Row  y       x      z
      Symbol  Int64  Float64
─────┼────────────────────────
   1  foo         3      1.0
   2  bar         3      1.0
   3  baz         2      1.0

view this post on Zulip Mason Protter (Jan 28 2022 at 00:46):

I think you're correct Eric, but I don't think your example demonstrates it

view this post on Zulip Mason Protter (Jan 28 2022 at 00:47):

For a given y group, there's no repeat x values in your example

view this post on Zulip Mason Protter (Jan 28 2022 at 00:48):

Rather I think the demonstration would be

julia> df = DataFrame(x = [1, 2, 3, 3, 1, 2, 3, 1, 2], y = [:foo, :foo, :foo, :foo, :bar, :bar, :bar, :baz, :baz], z = rand(9))
9×3 DataFrame
 Row  x      y       z
      Int64  Symbol  Float64
─────┼──────────────────────────
   1      1  foo     0.288488
   2      2  foo     0.722006
   3      3  foo     0.654092
   4      3  foo     0.262445
   5      1  bar     0.932314
   6      2  bar     0.0627638
   7      3  bar     0.856708
   8      1  baz     0.45854
   9      2  baz     0.233986

julia> combine(groupby(df, :y), sdf -> sdf[argmax(sdf.x), :])
3×3 DataFrame
 Row  y       x      z
      Symbol  Int64  Float64
─────┼─────────────────────────
   1  foo         3  0.654092
   2  bar         3  0.856708
   3  baz         2  0.233986

view this post on Zulip Eric Hanson (Jan 28 2022 at 00:56):

Ah oops, thought we were maximizing z, not x

view this post on Zulip Nils (Jan 28 2022 at 12:34):

Ah yes of course, it would have been combine(groupby(df, :y), sdf -> sdf[sdf.x .== argmax(sdf.x), :]) to return multiple maxima. Migh actually be the better way to write this to guard against duplicates silently vanishing


Last updated: Oct 02 2023 at 04:34 UTC