source: trunk/docs/app-stats.tex @ 258

Last change on this file since 258 was 258, checked in by Matthew Whiting, 17 years ago

Merging pixel-map-branch revisions 236:257 back into trunk.
The use of the PixelMap? functions is sorted now, so we put everything back into a uniform location.

File size: 2.7 KB
Line 
1\secA{Robust statistics for a Normal distribution}
2\label{app-madfm}
3
4The Normal, or Gaussian, distribution for mean $\mu$ and standard
5deviation $\sigma$ can be written as
6\[
7f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}\ e^{-(x-\mu)^2/2\sigma^2}.
8 \]
9
10When one has a purely Gaussian signal, it is straightforward to
11estimate $\sigma$ by calculating the standard deviation (or rms) of
12the data. However, if there is a small amount of signal present on top
13of Gaussian noise, and one wants to estimate the $\sigma$ for the
14noise, the presence of the large values from the signal can bias the
15estimator to higher values.
16
17An alternative way is to use the median ($m$) and median absolute
18deviation from the median ($s$) to estimate $\mu$ and $\sigma$. The
19median is the middle of the distribution, defined for a continuous
20distribution by
21\[
22\int_{-\infty}^{m} f(x) dx = \int_{m}^{\infty} f(x) dx.
23\]
24From symmetry, we quickly see that for the continuous Normal
25distribution, $m=\mu$. We consider the case henceforth of $\mu=0$,
26without loss of generality.
27
28To find $s$, we find the distribution of the absolute deviation from
29the median, and then find the median of that distribution. This
30distribution is given by
31\begin{eqnarray*}
32g(x) &= &{\text{distribution of }} |x|\\
33     &= &f(x) + f(-x),\ x\ge0\\
34     &= &\sqrt{\frac{2}{\pi\sigma^2}}\, e^{-x^2/2\sigma^2},\ x\ge0.
35\end{eqnarray*}
36So, the median absolute deviation from the median, $s$, is given by
37\[
38\int_{0}^{s} g(x) dx = \int_{s}^{\infty} g(x) dx.
39\]
40If we use the identity
41\[
42\int_{0}^{\infty}e^{-x^2/2\sigma^2} dx = \sqrt{\pi\sigma^2/2}
43\]
44we find that
45\[
46\int_{s}^{\infty} e^{-x^2/2\sigma^2} dx =
47\sqrt{\pi\sigma^2/2}-\int_{0}^{s} e^{-x^2/2\sigma^2}dx.
48\]
49Hence, to find $s$ we simply solve the following equation (setting
50$\sigma=1$ for simplicity -- equivalent to stating $x$ and $s$ in
51units of $\sigma$):
52\[
53\int_{0}^{s}e^{-x^2/2} dx - \sqrt{\pi/8} = 0.
54\]
55This is hard to solve analytically (no nice analytic solution exists
56for the finite integral that I'm aware of), but straightforward to
57solve numerically, yielding the value of $s=0.6744888$. Thus, to
58estimate $\sigma$ for a Normally distributed data set, one can
59calculate $s$, then divide by 0.6744888 (or multiply by 1.4826042) to
60obtain the correct estimator.
61
62Note that this is different to solutions quoted elsewhere,
63specifically in \citet{meyer04:trunc}, where the same robust estimator
64is used but with an incorrect conversion to standard deviation -- they
65assume $\sigma = s\sqrt{\pi/2}$. This, in fact, is the conversion used
66to convert the \emph{mean} absolute deviation from the mean to the
67standard deviation. This means that the cube noise in the \hipass
68catalogue (their parameter Rms$_{\rm cube}$) should be 18\% larger
69than quoted.
Note: See TracBrowser for help on using the repository browser.