1 | % ----------------------------------------------------------------------- |
---|
2 | % app-stats.tex: Section on how the robust statistics are calculated |
---|
3 | % for a Normal distribution. |
---|
4 | % ----------------------------------------------------------------------- |
---|
5 | % Copyright (C) 2006, Matthew Whiting, ATNF |
---|
6 | % |
---|
7 | % This program is free software; you can redistribute it and/or modify it |
---|
8 | % under the terms of the GNU General Public License as published by the |
---|
9 | % Free Software Foundation; either version 2 of the License, or (at your |
---|
10 | % option) any later version. |
---|
11 | % |
---|
12 | % Duchamp is distributed in the hope that it will be useful, but WITHOUT |
---|
13 | % ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or |
---|
14 | % FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
---|
15 | % for more details. |
---|
16 | % |
---|
17 | % You should have received a copy of the GNU General Public License |
---|
18 | % along with Duchamp; if not, write to the Free Software Foundation, |
---|
19 | % Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA |
---|
20 | % |
---|
21 | % Correspondence concerning Duchamp may be directed to: |
---|
22 | % Internet email: Matthew.Whiting [at] atnf.csiro.au |
---|
23 | % Postal address: Dr. Matthew Whiting |
---|
24 | % Australia Telescope National Facility, CSIRO |
---|
25 | % PO Box 76 |
---|
26 | % Epping NSW 1710 |
---|
27 | % AUSTRALIA |
---|
28 | % ----------------------------------------------------------------------- |
---|
29 | \secA{Robust statistics for a Normal distribution} |
---|
30 | \label{app-madfm} |
---|
31 | |
---|
32 | The Normal, or Gaussian, distribution for mean $\mu$ and standard |
---|
33 | deviation $\sigma$ can be written as |
---|
34 | \[ |
---|
35 | f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}\ e^{-(x-\mu)^2/2\sigma^2}. |
---|
36 | \] |
---|
37 | |
---|
38 | When one has a purely Gaussian signal, it is straightforward to |
---|
39 | estimate $\sigma$ by calculating the standard deviation (or rms) of |
---|
40 | the data. However, if there is a small amount of signal present on top |
---|
41 | of Gaussian noise, and one wants to estimate the $\sigma$ for the |
---|
42 | noise, the presence of the large values from the signal can bias the |
---|
43 | estimator to higher values. |
---|
44 | |
---|
45 | An alternative way is to use the median ($m$) and median absolute |
---|
46 | deviation from the median ($s$) to estimate $\mu$ and $\sigma$. The |
---|
47 | median is the middle of the distribution, defined for a continuous |
---|
48 | distribution by |
---|
49 | \[ |
---|
50 | \int_{-\infty}^{m} f(x) dx = \int_{m}^{\infty} f(x) dx. |
---|
51 | \] |
---|
52 | From symmetry, we quickly see that for the continuous Normal |
---|
53 | distribution, $m=\mu$. We consider the case henceforth of $\mu=0$, |
---|
54 | without loss of generality. |
---|
55 | |
---|
56 | To find $s$, we find the distribution of the absolute deviation from |
---|
57 | the median, and then find the median of that distribution. This |
---|
58 | distribution is given by |
---|
59 | \begin{eqnarray*} |
---|
60 | g(x) &= &{\text{distribution of }} |x|\\ |
---|
61 | &= &f(x) + f(-x),\ x\ge0\\ |
---|
62 | &= &\sqrt{\frac{2}{\pi\sigma^2}}\, e^{-x^2/2\sigma^2},\ x\ge0. |
---|
63 | \end{eqnarray*} |
---|
64 | So, the median absolute deviation from the median, $s$, is given by |
---|
65 | \[ |
---|
66 | \int_{0}^{s} g(x) dx = \int_{s}^{\infty} g(x) dx. |
---|
67 | \] |
---|
68 | If we use the identity |
---|
69 | \[ |
---|
70 | \int_{0}^{\infty}e^{-x^2/2\sigma^2} dx = \sqrt{\pi\sigma^2/2} |
---|
71 | \] |
---|
72 | we find that |
---|
73 | \[ |
---|
74 | \int_{s}^{\infty} e^{-x^2/2\sigma^2} dx = |
---|
75 | \sqrt{\pi\sigma^2/2}-\int_{0}^{s} e^{-x^2/2\sigma^2}dx. |
---|
76 | \] |
---|
77 | Hence, to find $s$ we simply solve the following equation (setting |
---|
78 | $\sigma=1$ for simplicity -- equivalent to stating $x$ and $s$ in |
---|
79 | units of $\sigma$): |
---|
80 | \[ |
---|
81 | \int_{0}^{s}e^{-x^2/2} dx - \sqrt{\pi/8} = 0. |
---|
82 | \] |
---|
83 | This is hard to solve analytically (no nice analytic solution exists |
---|
84 | for the finite integral that I'm aware of), but straightforward to |
---|
85 | solve numerically, yielding the value of $s=0.6744888$. Thus, to |
---|
86 | estimate $\sigma$ for a Normally distributed data set, one can |
---|
87 | calculate $s$, then divide by 0.6744888 (or multiply by 1.4826042) to |
---|
88 | obtain the correct estimator. |
---|
89 | |
---|
90 | Note that this is different to solutions quoted elsewhere, |
---|
91 | specifically in \citet{meyer04-alt}, where the same robust estimator |
---|
92 | is used but with an incorrect conversion to standard deviation -- they |
---|
93 | assume $\sigma = s\sqrt{\pi/2}$. This, in fact, is the conversion used |
---|
94 | to convert the \emph{mean} absolute deviation from the mean to the |
---|
95 | standard deviation. This means that the cube noise in the \hipass |
---|
96 | catalogue (their parameter Rms$_{\rm cube}$) should be 18\% larger |
---|
97 | than quoted. |
---|