Stream: helpdesk (published)

Topic: ✔ Remove Rows in DataFrame Based on Mismatch


view this post on Zulip Dale Black (Jul 21 2022 at 18:44):

Is there a way to filter different sized DataFrames and keep only the rows of both DataFrames where there are matching values in a specific column? E.g.

Given

df1_test = DataFrame(
    :col1 => ["low", "med", "high"],
    :col2 => ["a1", "b1", "c1"]
)
df2_test = DataFrame(
    :col1 => ["med"],
    :col2 => ["b2"]
)

I would like to return only the "med" row of each DataFrame since only "med" of :col1 are matching

df1_new = DataFrame(
    :col1 => ["med"],
    :col2 => ["b1"]
)
df2_new = DataFrame(
    :col1 => ["med"],
    :col2 => ["b2"]
)

view this post on Zulip jar (Jul 21 2022 at 19:34):

subset(d, :col1 => ByRow(==("med")))

view this post on Zulip jar (Jul 21 2022 at 19:35):

or do you mean innerjoin(l,r; on=:col1)

view this post on Zulip Dale Black (Jul 21 2022 at 19:39):

I am looking for a more generic approach. One that will work if "low" and "med" are matching, for example. innerjoin might do that, but this gives me an error

innerjoin(df1_test, df2_test; on=:col1)
ArgumentError: Duplicate variable names: :col2. Pass makeunique=true to make them unique using a suffix automatically.

var"#make_unique!#1"(::Bool, ::typeof(DataFrames.make_unique!), ::Vector{Symbol}, ::Vector{Symbol})@utils.jl:83
#make_unique#2@utils.jl:105[inlined]
#Index#5@index.jl:27[inlined]
var"#DataFrame#188"(::Bool, ::Bool, ::Type{DataFrames.DataFrame}, ::Vector{AbstractVector}, ::Vector{Symbol})@dataframe.jl:339
compose_inner_table(::DataFrames.DataFrameJoiner, ::Bool, ::typeof(identity), ::typeof(identity))@composer.jl:135
var"#_join#572"(::Symbol, ::Symbol, ::Bool, ::Nothing, ::Tuple{Bool, Bool}, ::typeof(identity), ::typeof(identity), ::Symbol, ::typeof(DataFrames._join), ::DataFrames.DataFrame, ::DataFrames.DataFrame)@composer.jl:400
#innerjoin#573@composer.jl:576[inlined]
top-level scope@Local: 1[inlined]

view this post on Zulip Sundar R (Jul 21 2022 at 19:44):

You can do something like:

julia> matching = df1_test.col1  df2_test.col1
1-element Vector{String}:
 "med"
julia> df1_new = subset(df1_test, :col1 => ByRow(in(matching)))
1×2 DataFrame
 Row  col1    col2
      String  String
─────┼────────────────
   1  med     b1
julia> df2_new = subset(df2_test, :col1 => ByRow(in(matching)))
1×2 DataFrame
 Row  col1    col2
      String  String
─────┼────────────────
   1  med     b2

view this post on Zulip jar (Jul 21 2022 at 19:48):

You can pass makeunique=true like it says

view this post on Zulip Dale Black (Jul 21 2022 at 19:49):

Thank you! Both of those solutions work!

view this post on Zulip Notification Bot (Jul 21 2022 at 19:52):

Dale Black has marked this topic as resolved.


Last updated: Nov 22 2024 at 04:41 UTC