[158] | 1 | \secA{Robust statistics for a Normal distribution} |
---|
| 2 | \label{app-madfm} |
---|
| 3 | |
---|
| 4 | The Normal, or Gaussian, distribution for mean $\mu$ and standard |
---|
| 5 | deviation $\sigma$ can be written as |
---|
| 6 | \[ |
---|
| 7 | f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}\ e^{-(x-\mu)^2/2\sigma^2}. |
---|
| 8 | \] |
---|
| 9 | |
---|
| 10 | When one has a purely Gaussian signal, it is straightforward to |
---|
| 11 | estimate $\sigma$ by calculating the standard deviation (or rms) of |
---|
| 12 | the data. However, if there is a small amount of signal present on top |
---|
| 13 | of Gaussian noise, and one wants to estimate the $\sigma$ for the |
---|
| 14 | noise, the presence of the large values from the signal can bias the |
---|
| 15 | estimator to higher values. |
---|
| 16 | |
---|
| 17 | An alternative way is to use the median ($m$) and median absolute |
---|
| 18 | deviation from the median ($s$) to estimate $\mu$ and $\sigma$. The |
---|
| 19 | median is the middle of the distribution, defined for a continuous |
---|
| 20 | distribution by |
---|
| 21 | \[ |
---|
[231] | 22 | \int_{-\infty}^{m} f(x) dx = \int_{m}^{\infty} f(x) dx. |
---|
[158] | 23 | \] |
---|
| 24 | From symmetry, we quickly see that for the continuous Normal |
---|
| 25 | distribution, $m=\mu$. We consider the case henceforth of $\mu=0$, |
---|
| 26 | without loss of generality. |
---|
| 27 | |
---|
| 28 | To find $s$, we find the distribution of the absolute deviation from |
---|
| 29 | the median, and then find the median of that distribution. This |
---|
| 30 | distribution is given by |
---|
| 31 | \begin{eqnarray*} |
---|
[231] | 32 | g(x) &= &{\text{distribution of }} |x|\\ |
---|
[158] | 33 | &= &f(x) + f(-x),\ x\ge0\\ |
---|
| 34 | &= &\sqrt{\frac{2}{\pi\sigma^2}}\, e^{-x^2/2\sigma^2},\ x\ge0. |
---|
| 35 | \end{eqnarray*} |
---|
| 36 | So, the median absolute deviation from the median, $s$, is given by |
---|
| 37 | \[ |
---|
[231] | 38 | \int_{0}^{s} g(x) dx = \int_{s}^{\infty} g(x) dx. |
---|
[158] | 39 | \] |
---|
[231] | 40 | If we use the identity |
---|
| 41 | \[ |
---|
| 42 | \int_{0}^{\infty}e^{-x^2/2\sigma^2} dx = \sqrt{\pi\sigma^2/2} |
---|
| 43 | \] |
---|
| 44 | we find that |
---|
| 45 | \[ |
---|
| 46 | \int_{s}^{\infty} e^{-x^2/2\sigma^2} dx = |
---|
| 47 | \sqrt{\pi\sigma^2/2}-\int_{0}^{s} e^{-x^2/2\sigma^2}dx. |
---|
| 48 | \] |
---|
[158] | 49 | Hence, to find $s$ we simply solve the following equation (setting |
---|
| 50 | $\sigma=1$ for simplicity -- equivalent to stating $x$ and $s$ in |
---|
| 51 | units of $\sigma$): |
---|
| 52 | \[ |
---|
[231] | 53 | \int_{0}^{s}e^{-x^2/2} dx - \sqrt{\pi/8} = 0. |
---|
[158] | 54 | \] |
---|
| 55 | This is hard to solve analytically (no nice analytic solution exists |
---|
| 56 | for the finite integral that I'm aware of), but straightforward to |
---|
| 57 | solve numerically, yielding the value of $s=0.6744888$. Thus, to |
---|
| 58 | estimate $\sigma$ for a Normally distributed data set, one can |
---|
| 59 | calculate $s$, then divide by 0.6744888 (or multiply by 1.4826042) to |
---|
| 60 | obtain the correct estimator. |
---|
| 61 | |
---|
| 62 | Note that this is different to solutions quoted elsewhere, |
---|
| 63 | specifically in \citet{meyer04:trunc}, where the same robust estimator |
---|
| 64 | is used but with an incorrect conversion to standard deviation -- they |
---|
| 65 | assume $\sigma = s\sqrt{\pi/2}$. This, in fact, is the conversion used |
---|
| 66 | to convert the \emph{mean} absolute deviation from the mean to the |
---|
[258] | 67 | standard deviation. This means that the cube noise in the \hipass |
---|
[158] | 68 | catalogue (their parameter Rms$_{\rm cube}$) should be 18\% larger |
---|
| 69 | than quoted. |
---|