How can you use a calculation to decide whether a data point is an outlier in a data set?
The exact definition of which points are considered to be
outliers is up to the experimenters.
A simple way to define an outlier is by using the lower (LQ) and
upper (UQ) quartiles and the interquartile range (IQR); for
example:
Define two boundaries b1 and b2 at each end of the data:
b1 = LQ - 1.5 × IQR and UQ + 1.5 × IQR
b2 = LQ - 3 × IQR and UQ + 3 × IQR
If a data point occurs between b1 and b2 it can be defined as a
mild outlier
If a data point occurs beyond b2 it can be defined as an extreme
outlier.
The multipliers of the IQR for the boundaries, and the number of
boundaries, can be adjusted depending upon what definitions are
required/make sense.