source: tags/release-1.6.1/docs/app-stats.tex

Last change on this file was 964, checked in by MatthewWhiting, 12 years ago

Lots of updates to the user guide.

File size: 4.1 KB
Line 
1% -----------------------------------------------------------------------
2% app-stats.tex: Section on how the robust statistics are calculated
3%                for a Normal distribution.
4% -----------------------------------------------------------------------
5% Copyright (C) 2006, Matthew Whiting, ATNF
6%
7% This program is free software; you can redistribute it and/or modify it
8% under the terms of the GNU General Public License as published by the
9% Free Software Foundation; either version 2 of the License, or (at your
10% option) any later version.
11%
12% Duchamp is distributed in the hope that it will be useful, but WITHOUT
13% ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
14% FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
15% for more details.
16%
17% You should have received a copy of the GNU General Public License
18% along with Duchamp; if not, write to the Free Software Foundation,
19% Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA
20%
21% Correspondence concerning Duchamp may be directed to:
22%    Internet email: Matthew.Whiting [at] atnf.csiro.au
23%    Postal address: Dr. Matthew Whiting
24%                    Australia Telescope National Facility, CSIRO
25%                    PO Box 76
26%                    Epping NSW 1710
27%                    AUSTRALIA
28% -----------------------------------------------------------------------
29\secA{Robust statistics for a Normal distribution}
30\label{app-madfm}
31
32The Normal, or Gaussian, distribution for mean $\mu$ and standard
33deviation $\sigma$ can be written as
34\[
35f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}\ e^{-(x-\mu)^2/2\sigma^2}.
36 \]
37
38When one has a purely Gaussian signal, it is straightforward to
39estimate $\sigma$ by calculating the standard deviation (or rms) of
40the data. However, if there is a small amount of signal present on top
41of Gaussian noise, and one wants to estimate the $\sigma$ for the
42noise, the presence of the large values from the signal can bias the
43estimator to higher values.
44
45An alternative way is to use the median ($m$) and median absolute
46deviation from the median ($s$) to estimate $\mu$ and $\sigma$. The
47median is the middle of the distribution, defined for a continuous
48distribution by
49\[
50\int_{-\infty}^{m} f(x) dx = \int_{m}^{\infty} f(x) dx.
51\]
52From symmetry, we quickly see that for the continuous Normal
53distribution, $m=\mu$. We consider the case henceforth of $\mu=0$,
54without loss of generality.
55
56To find $s$, we find the distribution of the absolute deviation from
57the median, and then find the median of that distribution. This
58distribution is given by
59\begin{eqnarray*}
60g(x) &= &{\text{distribution of }} |x|\\
61     &= &f(x) + f(-x),\ x\ge0\\
62     &= &\sqrt{\frac{2}{\pi\sigma^2}}\, e^{-x^2/2\sigma^2},\ x\ge0.
63\end{eqnarray*}
64So, the median absolute deviation from the median, $s$, is given by
65\[
66\int_{0}^{s} g(x) dx = \int_{s}^{\infty} g(x) dx.
67\]
68If we use the identity
69\[
70\int_{0}^{\infty}e^{-x^2/2\sigma^2} dx = \sqrt{\pi\sigma^2/2}
71\]
72we find that
73\[
74\int_{s}^{\infty} e^{-x^2/2\sigma^2} dx =
75\sqrt{\pi\sigma^2/2}-\int_{0}^{s} e^{-x^2/2\sigma^2}dx.
76\]
77Hence, to find $s$ we simply solve the following equation (setting
78$\sigma=1$ for simplicity -- equivalent to stating $x$ and $s$ in
79units of $\sigma$):
80\[
81\int_{0}^{s}e^{-x^2/2} dx - \sqrt{\pi/8} = 0.
82\]
83This is hard to solve analytically (no nice analytic solution exists
84for the finite integral that I'm aware of), but straightforward to
85solve numerically, yielding the value of $s=0.6744888$. Thus, to
86estimate $\sigma$ for a Normally distributed data set, one can
87calculate $s$, then divide by 0.6744888 (or multiply by 1.4826042) to
88obtain the correct estimator.
89
90Note that this is different to solutions quoted elsewhere,
91specifically in \citet{meyer04-alt}, where the same robust estimator
92is used but with an incorrect conversion to standard deviation -- they
93assume $\sigma = s\sqrt{\pi/2}$. This, in fact, is the conversion used
94to convert the \emph{mean} absolute deviation from the mean to the
95standard deviation. This means that the cube noise in the \hipass
96catalogue (their parameter Rms$_{\rm cube}$) should be 18\% larger
97than quoted.
Note: See TracBrowser for help on using the repository browser.