Context Navigation

source: trunk/docs/app-stats.tex @ 231

Visit:

Last change on this file since 231 was 231, checked in by Matthew Whiting, 17 years ago

Mostly improvements to the Guide, with better formatting and descriptions of one of the changes below. The changes to the code are:

Addition of the option of specifying in the subsection string a number of pixels to be chopped from either end of an axis (as requested in #5).
Addition of a test in the reconstruction functions to see whether the number of non-BLANK pixels is zero. If so, the input array is returned as the output without having to do the reconstruction (and avoiding problems with the stats functions).

File size: 2.7 KB

Line
1	\secA{Robust statistics for a Normal distribution}
2	\label{app-madfm}
3
4	The Normal, or Gaussian, distribution for mean $\mu$ and standard
5	deviation $\sigma$ can be written as
6	\[
7	f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}\ e^{-(x-\mu)^2/2\sigma^2}.
8	\]
9
10	When one has a purely Gaussian signal, it is straightforward to
11	estimate $\sigma$ by calculating the standard deviation (or rms) of
12	the data. However, if there is a small amount of signal present on top
13	of Gaussian noise, and one wants to estimate the $\sigma$ for the
14	noise, the presence of the large values from the signal can bias the
15	estimator to higher values.
16
17	An alternative way is to use the median ($m$) and median absolute
18	deviation from the median ($s$) to estimate $\mu$ and $\sigma$. The
19	median is the middle of the distribution, defined for a continuous
20	distribution by
21	\[
22	\int_{-\infty}^{m} f(x) dx = \int_{m}^{\infty} f(x) dx.
23	\]
24	From symmetry, we quickly see that for the continuous Normal
25	distribution, $m=\mu$. We consider the case henceforth of $\mu=0$,
26	without loss of generality.
27
28	To find $s$, we find the distribution of the absolute deviation from
29	the median, and then find the median of that distribution. This
30	distribution is given by
31	\begin{eqnarray*}
32	g(x) &= &{\text{distribution of }} \|x\|\\
33	&= &f(x) + f(-x),\ x\ge0\\
34	&= &\sqrt{\frac{2}{\pi\sigma^2}}\, e^{-x^2/2\sigma^2},\ x\ge0.
35	\end{eqnarray*}
36	So, the median absolute deviation from the median, $s$, is given by
37	\[
38	\int_{0}^{s} g(x) dx = \int_{s}^{\infty} g(x) dx.
39	\]
40	If we use the identity
41	\[
42	\int_{0}^{\infty}e^{-x^2/2\sigma^2} dx = \sqrt{\pi\sigma^2/2}
43	\]
44	we find that
45	\[
46	\int_{s}^{\infty} e^{-x^2/2\sigma^2} dx =
47	\sqrt{\pi\sigma^2/2}-\int_{0}^{s} e^{-x^2/2\sigma^2}dx.
48	\]
49	Hence, to find $s$ we simply solve the following equation (setting
50	$\sigma=1$ for simplicity -- equivalent to stating $x$ and $s$ in
51	units of $\sigma$):
52	\[
53	\int_{0}^{s}e^{-x^2/2} dx - \sqrt{\pi/8} = 0.
54	\]
55	This is hard to solve analytically (no nice analytic solution exists
56	for the finite integral that I'm aware of), but straightforward to
57	solve numerically, yielding the value of $s=0.6744888$. Thus, to
58	estimate $\sigma$ for a Normally distributed data set, one can
59	calculate $s$, then divide by 0.6744888 (or multiply by 1.4826042) to
60	obtain the correct estimator.
61
62	Note that this is different to solutions quoted elsewhere,
63	specifically in \citet{meyer04:trunc}, where the same robust estimator
64	is used but with an incorrect conversion to standard deviation -- they
65	assume $\sigma = s\sqrt{\pi/2}$. This, in fact, is the conversion used
66	to convert the \emph{mean} absolute deviation from the mean to the
67	standard deviation. This means that the cube noise in the \hipass\
68	catalogue (their parameter Rms$_{\rm cube}$) should be 18\% larger
69	than quoted.

Note: See TracBrowser for help on using the repository browser.

Download in other formats: