source: trunk/docs/app-stats.tex @ 201

Last change on this file since 201 was 158, checked in by Matthew Whiting, 18 years ago

Changed the layout of the Guide, so that each section has its own .tex file.
This makes it a bit easier to edit.
Also changed the layout of the title page, and added some page breaks between
sections.
Bibliography is now a .bib file, controlled by mn2e.bst. Might look at
using a different one that can incorporate hyperlinks...

File size: 2.7 KB
Line 
1\secA{Robust statistics for a Normal distribution}
2\label{app-madfm}
3
4The Normal, or Gaussian, distribution for mean $\mu$ and standard
5deviation $\sigma$ can be written as
6\[
7f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}\ e^{-(x-\mu)^2/2\sigma^2}.
8 \]
9
10When one has a purely Gaussian signal, it is straightforward to
11estimate $\sigma$ by calculating the standard deviation (or rms) of
12the data. However, if there is a small amount of signal present on top
13of Gaussian noise, and one wants to estimate the $\sigma$ for the
14noise, the presence of the large values from the signal can bias the
15estimator to higher values.
16
17An alternative way is to use the median ($m$) and median absolute
18deviation from the median ($s$) to estimate $\mu$ and $\sigma$. The
19median is the middle of the distribution, defined for a continuous
20distribution by
21\[
22\int_{-\infty}^{m} f(x) \diff x = \int_{m}^{\infty} f(x) \diff x.
23\]
24From symmetry, we quickly see that for the continuous Normal
25distribution, $m=\mu$. We consider the case henceforth of $\mu=0$,
26without loss of generality.
27
28To find $s$, we find the distribution of the absolute deviation from
29the median, and then find the median of that distribution. This
30distribution is given by
31\begin{eqnarray*}
32g(x) &= &{\mbox{\rm distribution of }} |x|\\
33     &= &f(x) + f(-x),\ x\ge0\\
34     &= &\sqrt{\frac{2}{\pi\sigma^2}}\, e^{-x^2/2\sigma^2},\ x\ge0.
35\end{eqnarray*}
36So, the median absolute deviation from the median, $s$, is given by
37\[
38\int_{0}^{s} g(x) \diff x = \int_{s}^{\infty} g(x) \diff x.
39\]
40Now,
41$\int_{0}^{\infty}e^{-x^2/2\sigma^2} \diff x = \sqrt{\pi\sigma^2/2}$,
42and so
43$\int_{s}^{\infty} e^{-x^2/2\sigma^2} \diff x =
44\sqrt{\pi\sigma^2/2}-\int_{0}^{s} e^{-\frac{x^2}{2\sigma^2}}\diff x$.
45Hence, to find $s$ we simply solve the following equation (setting
46$\sigma=1$ for simplicity -- equivalent to stating $x$ and $s$ in
47units of $\sigma$):
48\[
49\int_{0}^{s}e^{-x^2/2} \diff x - \sqrt{\pi/8} = 0.
50\]
51This is hard to solve analytically (no nice analytic solution exists
52for the finite integral that I'm aware of), but straightforward to
53solve numerically, yielding the value of $s=0.6744888$. Thus, to
54estimate $\sigma$ for a Normally distributed data set, one can
55calculate $s$, then divide by 0.6744888 (or multiply by 1.4826042) to
56obtain the correct estimator.
57
58Note that this is different to solutions quoted elsewhere,
59specifically in \citet{meyer04:trunc}, where the same robust estimator
60is used but with an incorrect conversion to standard deviation -- they
61assume $\sigma = s\sqrt{\pi/2}$. This, in fact, is the conversion used
62to convert the \emph{mean} absolute deviation from the mean to the
63standard deviation. This means that the cube noise in the \hipass\
64catalogue (their parameter Rms$_{\rm cube}$) should be 18\% larger
65than quoted.
Note: See TracBrowser for help on using the repository browser.