1 | \secA{Robust statistics for a Normal distribution} |
---|
2 | \label{app-madfm} |
---|
3 | |
---|
4 | The Normal, or Gaussian, distribution for mean $\mu$ and standard |
---|
5 | deviation $\sigma$ can be written as |
---|
6 | \[ |
---|
7 | f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}\ e^{-(x-\mu)^2/2\sigma^2}. |
---|
8 | \] |
---|
9 | |
---|
10 | When one has a purely Gaussian signal, it is straightforward to |
---|
11 | estimate $\sigma$ by calculating the standard deviation (or rms) of |
---|
12 | the data. However, if there is a small amount of signal present on top |
---|
13 | of Gaussian noise, and one wants to estimate the $\sigma$ for the |
---|
14 | noise, the presence of the large values from the signal can bias the |
---|
15 | estimator to higher values. |
---|
16 | |
---|
17 | An alternative way is to use the median ($m$) and median absolute |
---|
18 | deviation from the median ($s$) to estimate $\mu$ and $\sigma$. The |
---|
19 | median is the middle of the distribution, defined for a continuous |
---|
20 | distribution by |
---|
21 | \[ |
---|
22 | \int_{-\infty}^{m} f(x) \diff x = \int_{m}^{\infty} f(x) \diff x. |
---|
23 | \] |
---|
24 | From symmetry, we quickly see that for the continuous Normal |
---|
25 | distribution, $m=\mu$. We consider the case henceforth of $\mu=0$, |
---|
26 | without loss of generality. |
---|
27 | |
---|
28 | To find $s$, we find the distribution of the absolute deviation from |
---|
29 | the median, and then find the median of that distribution. This |
---|
30 | distribution is given by |
---|
31 | \begin{eqnarray*} |
---|
32 | g(x) &= &{\mbox{\rm distribution of }} |x|\\ |
---|
33 | &= &f(x) + f(-x),\ x\ge0\\ |
---|
34 | &= &\sqrt{\frac{2}{\pi\sigma^2}}\, e^{-x^2/2\sigma^2},\ x\ge0. |
---|
35 | \end{eqnarray*} |
---|
36 | So, the median absolute deviation from the median, $s$, is given by |
---|
37 | \[ |
---|
38 | \int_{0}^{s} g(x) \diff x = \int_{s}^{\infty} g(x) \diff x. |
---|
39 | \] |
---|
40 | Now, |
---|
41 | $\int_{0}^{\infty}e^{-x^2/2\sigma^2} \diff x = \sqrt{\pi\sigma^2/2}$, |
---|
42 | and so |
---|
43 | $\int_{s}^{\infty} e^{-x^2/2\sigma^2} \diff x = |
---|
44 | \sqrt{\pi\sigma^2/2}-\int_{0}^{s} e^{-\frac{x^2}{2\sigma^2}}\diff x$. |
---|
45 | Hence, to find $s$ we simply solve the following equation (setting |
---|
46 | $\sigma=1$ for simplicity -- equivalent to stating $x$ and $s$ in |
---|
47 | units of $\sigma$): |
---|
48 | \[ |
---|
49 | \int_{0}^{s}e^{-x^2/2} \diff x - \sqrt{\pi/8} = 0. |
---|
50 | \] |
---|
51 | This is hard to solve analytically (no nice analytic solution exists |
---|
52 | for the finite integral that I'm aware of), but straightforward to |
---|
53 | solve numerically, yielding the value of $s=0.6744888$. Thus, to |
---|
54 | estimate $\sigma$ for a Normally distributed data set, one can |
---|
55 | calculate $s$, then divide by 0.6744888 (or multiply by 1.4826042) to |
---|
56 | obtain the correct estimator. |
---|
57 | |
---|
58 | Note that this is different to solutions quoted elsewhere, |
---|
59 | specifically in \citet{meyer04:trunc}, where the same robust estimator |
---|
60 | is used but with an incorrect conversion to standard deviation -- they |
---|
61 | assume $\sigma = s\sqrt{\pi/2}$. This, in fact, is the conversion used |
---|
62 | to convert the \emph{mean} absolute deviation from the mean to the |
---|
63 | standard deviation. This means that the cube noise in the \hipass\ |
---|
64 | catalogue (their parameter Rms$_{\rm cube}$) should be 18\% larger |
---|
65 | than quoted. |
---|