Hello Everyone,
using GLM, Lathe, MLBase
I have a dataframe similar to:
DF.A = [1,2,3,4,5,6,7,8]
DF.B = [2,45,53,42,51,34,900,55,36]
I would like to remove the outliers from
the DF.B and am using (for 1st quartile):
first_percentile = percentile(DF.B, 25)
iqr_value = iqr(DF.B)
FirstOut = DF[DF.B .> (first_percentile - 1.5*iqr_value),:]
Similiarly I am using (for the 4th quartile):
fourth_percentile = percentile(DF.B, 75)
iqr_value = iqr(DF.B)
FourthOut = DF[DF.B .< (fourth_percentile + 1.5*iqr_value),:]
For both dataframes, these approaches have not removed the
outliers. Could someone explain how I could adjust my code to
produce a resultant dataframe without outliers in the specified
column (and associated rows).
Thanks,
qu bit
Apply the filter & percentile method to
remove outliers from the IQR as:
begin
first_perc = percentile(DF.B, 25)
last_perc = percentile(DF.B,75)
IQR_value = iqr(DF.B)
DF_NO = filter(x -> x.B .> first_perc - 1.5*IQR_value &&
x.B .< last_perc + 1.5*IQR_value , DF)
end
You can then check the :std using the describe() method
to see what improvement might have been made from the
process (es).
Have a look at
https://github.com/jbytecode/LinRegOutliers
Fredrik Bagge Carlson
Could you provide an example how this could
be applied to the question I posted?
Thank you,
Last updated: Nov 06 2024 at 04:40 UTC