Outlier Removal via @filter · helpdesk (published)

Stream: helpdesk (published)

Topic: Outlier Removal via @filter

qu bit (Mar 25 2021 at 01:25):

(deleted)

Andrey Oskin (Mar 25 2021 at 04:32):

Not quite sure what you want, but it looks like typical use case for antijoin

Andrey Oskin (Mar 25 2021 at 04:33):

https://dataframes.juliadata.org/stable/man/joins/#Database-Style-Joins

Andrey Oskin (Mar 25 2021 at 04:34):

But you need a key to join them of course

qu bit (Mar 25 2021 at 10:21):

Andrey Oskin

Thank you - it was successful.

I was able to identify outliers from a regression
curve I generated for Cook's Distance. I made a
dataframe to show the outlier values across four
variables.

TeddyForce =
A.Panda = [ 20,30,40]
A.Wombat = [100,120,140]
A.Zebra = [35,66,78]
A.Lilo = [12,16,19]

These values (considered as sets) originated
from a dataframe, WookieDF, & were considered
outliers for WookieDF.

The goal was to remove the TeddyForce values
(and associated rows) from WookieDF.

I was experimenting with:

Output = [x for x ∈ eachrow(TeddyForce[!,[:Panda,:Wombat,:Zebra,:Lilo]]) if x ∉ eachrow(WookieDF)]

But the execution time was very long (100 seconds +) and the output was in the
format DataFrames.DataFrameRow{DataFrames.DataFrame,DataFrames.Index}.

Last updated: Oct 02 2023 at 04:34 UTC