Stream: helpdesk (published)

Topic: Outlier Removal


view this post on Zulip QuBit (Mar 15 2021 at 14:21):

Hello Everyone,

using GLM, Lathe, MLBase

I have a dataframe similar to:

DF.A = [1,2,3,4,5,6,7,8]
DF.B = [2,45,53,42,51,34,900,55,36]

I would like to remove the outliers from
the DF.B and am using (for 1st quartile):

    first_percentile = percentile(DF.B, 25)
    iqr_value = iqr(DF.B)
    FirstOut = DF[DF.B  .> (first_percentile - 1.5*iqr_value),:]

Similiarly I am using (for the 4th quartile):

fourth_percentile = percentile(DF.B, 75)
    iqr_value = iqr(DF.B)
    FourthOut = DF[DF.B  .< (fourth_percentile + 1.5*iqr_value),:]

For both dataframes, these approaches have not removed the
outliers. Could someone explain how I could adjust my code to
produce a resultant dataframe without outliers in the specified
column (and associated rows).

Thanks,

view this post on Zulip QuBit (Mar 18 2021 at 04:56):

qu bit

Apply the filter & percentile method to
remove outliers from the IQR as:

begin
first_perc = percentile(DF.B, 25)
last_perc = percentile(DF.B,75)
IQR_value = iqr(DF.B)
DF_NO = filter(x -> x.B .> first_perc - 1.5*IQR_value &&
             x.B .< last_perc + 1.5*IQR_value  , DF)
end

You can then check the :std using the describe() method
to see what improvement might have been made from the
process (es).

view this post on Zulip Fredrik Bagge Carlson (Mar 18 2021 at 05:30):

Have a look at
https://github.com/jbytecode/LinRegOutliers

view this post on Zulip QuBit (Mar 25 2021 at 00:10):

Fredrik Bagge Carlson

Could you provide an example how this could
be applied to the question I posted?

Thank you,


Last updated: Nov 06 2024 at 04:40 UTC