If I plot a histogram of an array, like the one below, what is the easiest way to remove values in that array that correspond to bins below a certain threshold? For example, once I look at the histogram, it's pretty easy to see that something like -160 is a good threshold in order to turn it into a more normal distribution, but I would like to do this automatically without using a plot
array_new = array[array .> -160]
Maybe you want to find the best fitting gaussian, and plot around the fitted mean and the fitted variance?
Or an easier and cruder way might be to just throw out the lowest 10% of the data (and maybe also the top 1% ? ):
using Statistics
using Plots
data = vcat(randn(1000),-10.0.+randn(100))
left = quantile!(data,0.1)
right = quantile!(data,0.99)
histogram(data[ left .< data .< right])
Or for that matter you could estimate the mean and variance of the gaussian from more quantiles
approx_mean = quantile!(data,0.5)
approx_stdev = ( quantile!(data,0.185) + quantile!(data,0.815) ) / 2 # middle 63%
Thank you! That works great
Dale Black has marked this topic as resolved.
Last updated: Nov 06 2024 at 04:40 UTC