Stream: helpdesk (published)

Topic: Remove histogram bins under certain quantity


view this post on Zulip Dale Black (Sep 10 2021 at 20:57):

If I plot a histogram of an array, like the one below, what is the easiest way to remove values in that array that correspond to bins below a certain threshold? For example, once I look at the histogram, it's pretty easy to see that something like -160 is a good threshold in order to turn it into a more normal distribution, but I would like to do this automatically without using a plot

image.png

array_new = array[array .> -160]

image.png

view this post on Zulip Robbie Rosati (Sep 10 2021 at 23:31):

Maybe you want to find the best fitting gaussian, and plot around the fitted mean and the fitted variance?
Or an easier and cruder way might be to just throw out the lowest 10% of the data (and maybe also the top 1% ? ):

using Statistics
using Plots

data = vcat(randn(1000),-10.0.+randn(100))

left = quantile!(data,0.1)
right = quantile!(data,0.99)

histogram(data[ left .< data .< right])

view this post on Zulip Robbie Rosati (Sep 10 2021 at 23:36):

Or for that matter you could estimate the mean and variance of the gaussian from more quantiles

approx_mean = quantile!(data,0.5)
approx_stdev = ( quantile!(data,0.185) + quantile!(data,0.815) ) / 2   # middle 63%

Last updated: Oct 02 2023 at 04:34 UTC