Context Navigation

source: tags/release-1.0.5/docs/app-stats.tex @ 1077

Visit:

Last change on this file since 1077 was 158, checked in by Matthew Whiting, 18 years ago
Changed the layout of the Guide, so that each section has its own .tex file. This makes it a bit easier to edit. Also changed the layout of the title page, and added some page breaks between sections. Bibliography is now a .bib file, controlled by mn2e.bst. Might look at using a different one that can incorporate hyperlinks...
File size: 2.7 KB

Line
1	\secA{Robust statistics for a Normal distribution}
2	\label{app-madfm}
3
4	The Normal, or Gaussian, distribution for mean $\mu$ and standard
5	deviation $\sigma$ can be written as
6	\[
7	f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}\ e^{-(x-\mu)^2/2\sigma^2}.
8	\]
9
10	When one has a purely Gaussian signal, it is straightforward to
11	estimate $\sigma$ by calculating the standard deviation (or rms) of
12	the data. However, if there is a small amount of signal present on top
13	of Gaussian noise, and one wants to estimate the $\sigma$ for the
14	noise, the presence of the large values from the signal can bias the
15	estimator to higher values.
16
17	An alternative way is to use the median ($m$) and median absolute
18	deviation from the median ($s$) to estimate $\mu$ and $\sigma$. The
19	median is the middle of the distribution, defined for a continuous
20	distribution by
21	\[
22	\int_{-\infty}^{m} f(x) \diff x = \int_{m}^{\infty} f(x) \diff x.
23	\]
24	From symmetry, we quickly see that for the continuous Normal
25	distribution, $m=\mu$. We consider the case henceforth of $\mu=0$,
26	without loss of generality.
27
28	To find $s$, we find the distribution of the absolute deviation from
29	the median, and then find the median of that distribution. This
30	distribution is given by
31	\begin{eqnarray*}
32	g(x) &= &{\mbox{\rm distribution of }} \|x\|\\
33	&= &f(x) + f(-x),\ x\ge0\\
34	&= &\sqrt{\frac{2}{\pi\sigma^2}}\, e^{-x^2/2\sigma^2},\ x\ge0.
35	\end{eqnarray*}
36	So, the median absolute deviation from the median, $s$, is given by
37	\[
38	\int_{0}^{s} g(x) \diff x = \int_{s}^{\infty} g(x) \diff x.
39	\]
40	Now,
41	$\int_{0}^{\infty}e^{-x^2/2\sigma^2} \diff x = \sqrt{\pi\sigma^2/2}$,
42	and so
43	$\int_{s}^{\infty} e^{-x^2/2\sigma^2} \diff x =
44	\sqrt{\pi\sigma^2/2}-\int_{0}^{s} e^{-\frac{x^2}{2\sigma^2}}\diff x$.
45	Hence, to find $s$ we simply solve the following equation (setting
46	$\sigma=1$ for simplicity -- equivalent to stating $x$ and $s$ in
47	units of $\sigma$):
48	\[
49	\int_{0}^{s}e^{-x^2/2} \diff x - \sqrt{\pi/8} = 0.
50	\]
51	This is hard to solve analytically (no nice analytic solution exists
52	for the finite integral that I'm aware of), but straightforward to
53	solve numerically, yielding the value of $s=0.6744888$. Thus, to
54	estimate $\sigma$ for a Normally distributed data set, one can
55	calculate $s$, then divide by 0.6744888 (or multiply by 1.4826042) to
56	obtain the correct estimator.
57
58	Note that this is different to solutions quoted elsewhere,
59	specifically in \citet{meyer04:trunc}, where the same robust estimator
60	is used but with an incorrect conversion to standard deviation -- they
61	assume $\sigma = s\sqrt{\pi/2}$. This, in fact, is the conversion used
62	to convert the \emph{mean} absolute deviation from the mean to the
63	standard deviation. This means that the cube noise in the \hipass\
64	catalogue (their parameter Rms$_{\rm cube}$) should be 18\% larger
65	than quoted.

Note: See TracBrowser for help on using the repository browser.

Download in other formats: