source: trunk/docs/app-stats.tex @ 231

Last change on this file since 231 was 231, checked in by Matthew Whiting, 17 years ago

Mostly improvements to the Guide, with better formatting and descriptions of one of the changes below. The changes to the code are:

  • Addition of the option of specifying in the subsection string a number of pixels to be chopped from either end of an axis (as requested in #5).
  • Addition of a test in the reconstruction functions to see whether the number of non-BLANK pixels is zero. If so, the input array is returned as the output without having to do the reconstruction (and avoiding problems with the stats functions).
File size: 2.7 KB
Line 
1\secA{Robust statistics for a Normal distribution}
2\label{app-madfm}
3
4The Normal, or Gaussian, distribution for mean $\mu$ and standard
5deviation $\sigma$ can be written as
6\[
7f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}\ e^{-(x-\mu)^2/2\sigma^2}.
8 \]
9
10When one has a purely Gaussian signal, it is straightforward to
11estimate $\sigma$ by calculating the standard deviation (or rms) of
12the data. However, if there is a small amount of signal present on top
13of Gaussian noise, and one wants to estimate the $\sigma$ for the
14noise, the presence of the large values from the signal can bias the
15estimator to higher values.
16
17An alternative way is to use the median ($m$) and median absolute
18deviation from the median ($s$) to estimate $\mu$ and $\sigma$. The
19median is the middle of the distribution, defined for a continuous
20distribution by
21\[
22\int_{-\infty}^{m} f(x) dx = \int_{m}^{\infty} f(x) dx.
23\]
24From symmetry, we quickly see that for the continuous Normal
25distribution, $m=\mu$. We consider the case henceforth of $\mu=0$,
26without loss of generality.
27
28To find $s$, we find the distribution of the absolute deviation from
29the median, and then find the median of that distribution. This
30distribution is given by
31\begin{eqnarray*}
32g(x) &= &{\text{distribution of }} |x|\\
33     &= &f(x) + f(-x),\ x\ge0\\
34     &= &\sqrt{\frac{2}{\pi\sigma^2}}\, e^{-x^2/2\sigma^2},\ x\ge0.
35\end{eqnarray*}
36So, the median absolute deviation from the median, $s$, is given by
37\[
38\int_{0}^{s} g(x) dx = \int_{s}^{\infty} g(x) dx.
39\]
40If we use the identity
41\[
42\int_{0}^{\infty}e^{-x^2/2\sigma^2} dx = \sqrt{\pi\sigma^2/2}
43\]
44we find that
45\[
46\int_{s}^{\infty} e^{-x^2/2\sigma^2} dx =
47\sqrt{\pi\sigma^2/2}-\int_{0}^{s} e^{-x^2/2\sigma^2}dx.
48\]
49Hence, to find $s$ we simply solve the following equation (setting
50$\sigma=1$ for simplicity -- equivalent to stating $x$ and $s$ in
51units of $\sigma$):
52\[
53\int_{0}^{s}e^{-x^2/2} dx - \sqrt{\pi/8} = 0.
54\]
55This is hard to solve analytically (no nice analytic solution exists
56for the finite integral that I'm aware of), but straightforward to
57solve numerically, yielding the value of $s=0.6744888$. Thus, to
58estimate $\sigma$ for a Normally distributed data set, one can
59calculate $s$, then divide by 0.6744888 (or multiply by 1.4826042) to
60obtain the correct estimator.
61
62Note that this is different to solutions quoted elsewhere,
63specifically in \citet{meyer04:trunc}, where the same robust estimator
64is used but with an incorrect conversion to standard deviation -- they
65assume $\sigma = s\sqrt{\pi/2}$. This, in fact, is the conversion used
66to convert the \emph{mean} absolute deviation from the mean to the
67standard deviation. This means that the cube noise in the \hipass\
68catalogue (their parameter Rms$_{\rm cube}$) should be 18\% larger
69than quoted.
Note: See TracBrowser for help on using the repository browser.