Stream: helpdesk (published)

Topic: Outlier Removal via @filter


view this post on Zulip QuBit (Mar 24 2021 at 21:40):

Fredrik Bagge Carlson

Thanks for sharing.

I wanted to extend the question a little
and was curious if the detectOutliers()
method could be used to remove rows
from a dataFrame based on row values
of another DataFrame.

An initial approach:

function mysetdiff(y, x)
    res = Vector{eltype(y)}(undef, length(y) - length(x))
    i = 1
    @inbounds for el in y
        el  x && continue
        res[i] = el
        i += 1
    end
    res
end

Or if possible:

 i = (Vector of Row Value Indices)
df2 = df1[[1:(i-1]; (i+1): end], :]

How can I delete rows from DF1
based on rows of DF2 where the
elements of DF2 are outliers in
DF1?

I believe Query.jl has the least
computational expensive method
to approach this with @filter or
@join, would you agree?

@Andrey Oskin

Best,

view this post on Zulip QuBit (Mar 25 2021 at 01:25):

(deleted)

view this post on Zulip Kwaku Oskin (Mar 25 2021 at 04:32):

Not quite sure what you want, but it looks like typical use case for antijoin

view this post on Zulip Kwaku Oskin (Mar 25 2021 at 04:33):

https://dataframes.juliadata.org/stable/man/joins/#Database-Style-Joins

view this post on Zulip Kwaku Oskin (Mar 25 2021 at 04:34):

But you need a key to join them of course


Last updated: Nov 22 2024 at 04:41 UTC