% ----------------------------------------------------------------------- % app-stats.tex: Section on how the robust statistics are calculated % for a Normal distribution. % ----------------------------------------------------------------------- % Copyright (C) 2006, Matthew Whiting, ATNF % % This program is free software; you can redistribute it and/or modify it % under the terms of the GNU General Public License as published by the % Free Software Foundation; either version 2 of the License, or (at your % option) any later version. % % Duchamp is distributed in the hope that it will be useful, but WITHOUT % ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or % FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License % for more details. % % You should have received a copy of the GNU General Public License % along with Duchamp; if not, write to the Free Software Foundation, % Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA % % Correspondence concerning Duchamp may be directed to: % Internet email: Matthew.Whiting [at] atnf.csiro.au % Postal address: Dr. Matthew Whiting % Australia Telescope National Facility, CSIRO % PO Box 76 % Epping NSW 1710 % AUSTRALIA % ----------------------------------------------------------------------- \secA{Robust statistics for a Normal distribution} \label{app-madfm} The Normal, or Gaussian, distribution for mean $\mu$ and standard deviation $\sigma$ can be written as \[ f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}\ e^{-(x-\mu)^2/2\sigma^2}. \] When one has a purely Gaussian signal, it is straightforward to estimate $\sigma$ by calculating the standard deviation (or rms) of the data. However, if there is a small amount of signal present on top of Gaussian noise, and one wants to estimate the $\sigma$ for the noise, the presence of the large values from the signal can bias the estimator to higher values. An alternative way is to use the median ($m$) and median absolute deviation from the median ($s$) to estimate $\mu$ and $\sigma$. The median is the middle of the distribution, defined for a continuous distribution by \[ \int_{-\infty}^{m} f(x) dx = \int_{m}^{\infty} f(x) dx. \] From symmetry, we quickly see that for the continuous Normal distribution, $m=\mu$. We consider the case henceforth of $\mu=0$, without loss of generality. To find $s$, we find the distribution of the absolute deviation from the median, and then find the median of that distribution. This distribution is given by \begin{eqnarray*} g(x) &= &{\text{distribution of }} |x|\\ &= &f(x) + f(-x),\ x\ge0\\ &= &\sqrt{\frac{2}{\pi\sigma^2}}\, e^{-x^2/2\sigma^2},\ x\ge0. \end{eqnarray*} So, the median absolute deviation from the median, $s$, is given by \[ \int_{0}^{s} g(x) dx = \int_{s}^{\infty} g(x) dx. \] If we use the identity \[ \int_{0}^{\infty}e^{-x^2/2\sigma^2} dx = \sqrt{\pi\sigma^2/2} \] we find that \[ \int_{s}^{\infty} e^{-x^2/2\sigma^2} dx = \sqrt{\pi\sigma^2/2}-\int_{0}^{s} e^{-x^2/2\sigma^2}dx. \] Hence, to find $s$ we simply solve the following equation (setting $\sigma=1$ for simplicity -- equivalent to stating $x$ and $s$ in units of $\sigma$): \[ \int_{0}^{s}e^{-x^2/2} dx - \sqrt{\pi/8} = 0. \] This is hard to solve analytically (no nice analytic solution exists for the finite integral that I'm aware of), but straightforward to solve numerically, yielding the value of $s=0.6744888$. Thus, to estimate $\sigma$ for a Normally distributed data set, one can calculate $s$, then divide by 0.6744888 (or multiply by 1.4826042) to obtain the correct estimator. Note that this is different to solutions quoted elsewhere, specifically in \citet{meyer04:trunc}, where the same robust estimator is used but with an incorrect conversion to standard deviation -- they assume $\sigma = s\sqrt{\pi/2}$. This, in fact, is the conversion used to convert the \emph{mean} absolute deviation from the mean to the standard deviation. This means that the cube noise in the \hipass catalogue (their parameter Rms$_{\rm cube}$) should be 18\% larger than quoted.