Context Navigation

← Previous Revision
Next Revision →
Blame
Revision Log

app-stats.tex

Visit:

Last change on this file was 964, checked in by MatthewWhiting, 12 years ago
Lots of updates to the user guide.
File size: 4.1 KB

Line
1	% -----------------------------------------------------------------------
2	% app-stats.tex: Section on how the robust statistics are calculated
3	% for a Normal distribution.
4	% -----------------------------------------------------------------------
5	% Copyright (C) 2006, Matthew Whiting, ATNF
6	%
7	% This program is free software; you can redistribute it and/or modify it
8	% under the terms of the GNU General Public License as published by the
9	% Free Software Foundation; either version 2 of the License, or (at your
10	% option) any later version.
11	%
12	% Duchamp is distributed in the hope that it will be useful, but WITHOUT
13	% ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
14	% FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
15	% for more details.
16	%
17	% You should have received a copy of the GNU General Public License
18	% along with Duchamp; if not, write to the Free Software Foundation,
19	% Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA
20	%
21	% Correspondence concerning Duchamp may be directed to:
22	% Internet email: Matthew.Whiting [at] atnf.csiro.au
23	% Postal address: Dr. Matthew Whiting
24	% Australia Telescope National Facility, CSIRO
25	% PO Box 76
26	% Epping NSW 1710
27	% AUSTRALIA
28	% -----------------------------------------------------------------------
29	\secA{Robust statistics for a Normal distribution}
30	\label{app-madfm}
31
32	The Normal, or Gaussian, distribution for mean $\mu$ and standard
33	deviation $\sigma$ can be written as
34	\[
35	f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}\ e^{-(x-\mu)^2/2\sigma^2}.
36	\]
37
38	When one has a purely Gaussian signal, it is straightforward to
39	estimate $\sigma$ by calculating the standard deviation (or rms) of
40	the data. However, if there is a small amount of signal present on top
41	of Gaussian noise, and one wants to estimate the $\sigma$ for the
42	noise, the presence of the large values from the signal can bias the
43	estimator to higher values.
44
45	An alternative way is to use the median ($m$) and median absolute
46	deviation from the median ($s$) to estimate $\mu$ and $\sigma$. The
47	median is the middle of the distribution, defined for a continuous
48	distribution by
49	\[
50	\int_{-\infty}^{m} f(x) dx = \int_{m}^{\infty} f(x) dx.
51	\]
52	From symmetry, we quickly see that for the continuous Normal
53	distribution, $m=\mu$. We consider the case henceforth of $\mu=0$,
54	without loss of generality.
55
56	To find $s$, we find the distribution of the absolute deviation from
57	the median, and then find the median of that distribution. This
58	distribution is given by
59	\begin{eqnarray*}
60	g(x) &= &{\text{distribution of }} \|x\|\\
61	&= &f(x) + f(-x),\ x\ge0\\
62	&= &\sqrt{\frac{2}{\pi\sigma^2}}\, e^{-x^2/2\sigma^2},\ x\ge0.
63	\end{eqnarray*}
64	So, the median absolute deviation from the median, $s$, is given by
65	\[
66	\int_{0}^{s} g(x) dx = \int_{s}^{\infty} g(x) dx.
67	\]
68	If we use the identity
69	\[
70	\int_{0}^{\infty}e^{-x^2/2\sigma^2} dx = \sqrt{\pi\sigma^2/2}
71	\]
72	we find that
73	\[
74	\int_{s}^{\infty} e^{-x^2/2\sigma^2} dx =
75	\sqrt{\pi\sigma^2/2}-\int_{0}^{s} e^{-x^2/2\sigma^2}dx.
76	\]
77	Hence, to find $s$ we simply solve the following equation (setting
78	$\sigma=1$ for simplicity -- equivalent to stating $x$ and $s$ in
79	units of $\sigma$):
80	\[
81	\int_{0}^{s}e^{-x^2/2} dx - \sqrt{\pi/8} = 0.
82	\]
83	This is hard to solve analytically (no nice analytic solution exists
84	for the finite integral that I'm aware of), but straightforward to
85	solve numerically, yielding the value of $s=0.6744888$. Thus, to
86	estimate $\sigma$ for a Normally distributed data set, one can
87	calculate $s$, then divide by 0.6744888 (or multiply by 1.4826042) to
88	obtain the correct estimator.
89
90	Note that this is different to solutions quoted elsewhere,
91	specifically in \citet{meyer04-alt}, where the same robust estimator
92	is used but with an incorrect conversion to standard deviation -- they
93	assume $\sigma = s\sqrt{\pi/2}$. This, in fact, is the conversion used
94	to convert the \emph{mean} absolute deviation from the mean to the
95	standard deviation. This means that the cube noise in the \hipass
96	catalogue (their parameter Rms$_{\rm cube}$) should be 18\% larger
97	than quoted.

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: tags/release-1.6.1/docs/app-stats.tex

Download in other formats: