% -----------------------------------------------------------------------
% app-stats.tex: Section on how the robust statistics are calculated
%                for a Normal distribution.
% -----------------------------------------------------------------------
% Copyright (C) 2006, Matthew Whiting, ATNF
%
% This program is free software; you can redistribute it and/or modify it
% under the terms of the GNU General Public License as published by the
% Free Software Foundation; either version 2 of the License, or (at your
% option) any later version.
%
% Duchamp is distributed in the hope that it will be useful, but WITHOUT
% ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
% FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
% for more details.
%
% You should have received a copy of the GNU General Public License
% along with Duchamp; if not, write to the Free Software Foundation,
% Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA
%
% Correspondence concerning Duchamp may be directed to:
%    Internet email: Matthew.Whiting [at] atnf.csiro.au
%    Postal address: Dr. Matthew Whiting
%                    Australia Telescope National Facility, CSIRO
%                    PO Box 76
%                    Epping NSW 1710
%                    AUSTRALIA
% -----------------------------------------------------------------------
\secA{Robust statistics for a Normal distribution}
\label{app-madfm}

The Normal, or Gaussian, distribution for mean $\mu$ and standard
deviation $\sigma$ can be written as 
\[ 
f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}\ e^{-(x-\mu)^2/2\sigma^2}.
 \]

When one has a purely Gaussian signal, it is straightforward to
estimate $\sigma$ by calculating the standard deviation (or rms) of
the data. However, if there is a small amount of signal present on top
of Gaussian noise, and one wants to estimate the $\sigma$ for the
noise, the presence of the large values from the signal can bias the
estimator to higher values.

An alternative way is to use the median ($m$) and median absolute
deviation from the median ($s$) to estimate $\mu$ and $\sigma$. The
median is the middle of the distribution, defined for a continuous
distribution by
\[
\int_{-\infty}^{m} f(x) dx = \int_{m}^{\infty} f(x) dx.
\]
From symmetry, we quickly see that for the continuous Normal
distribution, $m=\mu$. We consider the case henceforth of $\mu=0$,
without loss of generality.

To find $s$, we find the distribution of the absolute deviation from
the median, and then find the median of that distribution. This
distribution is given by
\begin{eqnarray*}
g(x) &= &{\text{distribution of }} |x|\\
     &= &f(x) + f(-x),\ x\ge0\\
     &= &\sqrt{\frac{2}{\pi\sigma^2}}\, e^{-x^2/2\sigma^2},\ x\ge0.
\end{eqnarray*}
So, the median absolute deviation from the median, $s$, is given by
\[
\int_{0}^{s} g(x) dx = \int_{s}^{\infty} g(x) dx.
\]
If we use the identity
\[
\int_{0}^{\infty}e^{-x^2/2\sigma^2} dx = \sqrt{\pi\sigma^2/2}
\] 
we find that 
\[
\int_{s}^{\infty} e^{-x^2/2\sigma^2} dx =
\sqrt{\pi\sigma^2/2}-\int_{0}^{s} e^{-x^2/2\sigma^2}dx.
\]
Hence, to find $s$ we simply solve the following equation (setting
$\sigma=1$ for simplicity -- equivalent to stating $x$ and $s$ in
units of $\sigma$):
\[
\int_{0}^{s}e^{-x^2/2} dx - \sqrt{\pi/8} = 0.
\]
This is hard to solve analytically (no nice analytic solution exists
for the finite integral that I'm aware of), but straightforward to
solve numerically, yielding the value of $s=0.6744888$. Thus, to
estimate $\sigma$ for a Normally distributed data set, one can
calculate $s$, then divide by 0.6744888 (or multiply by 1.4826042) to
obtain the correct estimator.

Note that this is different to solutions quoted elsewhere,
specifically in \citet{meyer04:trunc}, where the same robust estimator
is used but with an incorrect conversion to standard deviation -- they
assume $\sigma = s\sqrt{\pi/2}$. This, in fact, is the conversion used
to convert the \emph{mean} absolute deviation from the mean to the
standard deviation. This means that the cube noise in the \hipass
catalogue (their parameter Rms$_{\rm cube}$) should be 18\% larger
than quoted.