\documentclass[12pt,a4paper]{article} %Define a test for doing PDF format -- use different code below \newif\ifPDF \ifx\pdfoutput\undefined\PDFfalse \else\ifnum\pdfoutput > 0\PDFtrue \else\PDFfalse \fi \fi \textwidth=161 mm \textheight=245 mm \topmargin=-15 mm \oddsidemargin=0 mm \parindent=6 mm \usepackage[sort]{natbib} \usepackage{lscape} \bibpunct[,]{(}{)}{;}{a}{}{,} \newcommand{\eg}{e.g.\ } \newcommand{\ie}{i.e.\ } \newcommand{\hi}{H{\sc i}} \newcommand{\hipass}{{\sc hipass}} \newcommand{\progname}{{\tt Duchamp}} \newcommand{\diff}{{\rm d}} \newcommand{\entrylabel}[1]{\mbox{\textsf{\bf{#1:}}}\hfil} \newenvironment{entry} {\begin{list}{}% {\renewcommand{\makelabel}{\entrylabel}% \setlength{\labelwidth}{30mm}% \setlength{\labelsep}{5pt}% \setlength{\itemsep}{2pt}% \setlength{\parsep}{2pt}% \setlength{\leftmargin}{35mm}% }% }% {\end{list}} \title{A Guide to the {\it Duchamp} Source Finding Software} \author{Matthew Whiting\\ %{\small \href{mailto:Matthew.Whiting@csiro.au}{Matthew.Whiting@csiro.au}}\\ Australia Telescope National Facility\\CSIRO} %\date{January 2006} \date{} % If we are creating a PDF, use different options for graphicx, hyperref. \ifPDF \usepackage[pdftex]{graphicx,color} \usepackage[pdftex]{hyperref} \hypersetup{colorlinks=true,% citecolor=red,% filecolor=red,% linkcolor=red,% urlcolor=red,% } \else \usepackage[dvips]{graphicx} \usepackage[dvips]{hyperref} \fi \pagestyle{headings} \begin{document} \maketitle \thispagestyle{empty} \begin{figure}[!h] \begin{center} \includegraphics[width=\textwidth]{cover_image} \end{center} \end{figure} \newpage \tableofcontents \newpage \section{Introduction and getting going quickly} This document gives details on the use of the program Duchamp. This has been designed to provide a source-detection facility for spectral-line data cubes. The basic execution of Duchamp is to read in a FITS data cube, find sources in the cube, and produce a text file of positions, velocities and fluxes of the detections, as well as a postscript file of the spectra of each detection. So, you have a FITS cube, and you want to find the sources in it. What do you do? The first step is to make an input file that contains the list of parameters. Brief and detailed examples are shown in Appendix~\ref{app-input}. This provides the input file name, the various output files, and defines various parameters that control the execution. The standard way to run Duchamp is by the command \begin{quote} {\tt Duchamp -p [parameter file]} \end{quote} replacing {\tt [parameter file]} with the name of the file you have just created/copied. Alternatively, you can use the syntax \begin{quote} {\tt Duchamp -f [FITS file]} \end{quote} where {\tt [FITS file]} is the file you wish to search. In the latter case, the rest of the parameters will take their default values detailed in Appendix~\ref{app-param}. In either case, the program will then work away and give you the list of detections and their spectra. The program execution is summarised below, and detailed in \S\ref{sec-flow}. Information on inputs is in \S\ref{sec-param} and Appendix~\ref{app-param}, and descriptions of the output is in \S\ref{sec-output}. \subsection{A summary of the execution steps} The basic flow of the program is summarised here. All these steps are discussed in more detail in the following sections, so read on if you have questions! \begin{enumerate} \item The parameter file given on the command line is read in, and the parameters absorbed. \item From the parameter file, the FITS image is located and read in to memory. \item If requested, a FITS image with a previously reconstructed array is read in. \item If requested, blank pixels are trimmed from the edges, and channels corresponding to bright (\eg Galactic) emission are excised. \item If requested, the baseline of each spectrum is removed. \item If the reconstruction method is requested, and the reconstructed array has not been read in at Step 3 above, the cube is reconstructed using the {\it {\' a} trous} wavelet method. \item Searching for objects then takes place, using the requested thresholding method. \item The list of objects is trimmed by merging neighbouring objects and removing those deemed unacceptable. \item The baselines and trimmed pixels are replaced prior to output. \item The details on the detections are written to screen and to the requested output file. \item Maps showing the spatial location of the detections are written. \item The integrated spectra of each detection are written to a postscript file. \item If requested, the reconstructed array can be written to a new FITS file. \end{enumerate} \subsection{Guide to terminology} First, a brief note on the use of terminology in this guide. Duchamp is designed to work on FITS ``cubes''. These are FITS\footnote{FITS is the Flexible Image Transport System -- see \citet{hanisch01} or websites such as \href{http://fits.cv.nrao.edu/FITS.html}{http://fits.cv.nrao.edu/FITS.html} for details.} image arrays with three dimensions -- they are assumed to have the following form: the first two dimensions (referred to as $x$ and $y$) are spatial directions (that is, relating to the position on the sky), while the third dimension, $z$, is the spectral direction, which can correspond to frequency, wavelength, or velocity. Each spatial pixel (a given $(x,y)$ coordinate) can be said to be a single spectrum, while a slice through the cube perpendicular to the spectral direction at a given $z$-value is a single channel (the 2-D image is a channel map). Features that are detected are assumed to be positive. The user can choose to search for negative features by setting an input parameter -- this inverts the cube prior to the search (see \S~\ref{sec-detection} for details). Note that it is possible to run Duchamp on a two-dimensional image (\ie one with no frequency or velocity information), or indeed a one-dimensional array, and many of the features of the program will work fine. The focus, however, is on object detection in three dimensions. \subsection{Why ``Duchamp''?} Well, it's important for a program to have a name, and it certainly beats the initial working title of ``cubefind''. I had planned to call it ``Picasso'' (as in the father of cubism), but sadly this had already been used before \citep{minchin99}. So I settled on naming it after Marcel Duchamp, another cubist, but also one of the first artists to work with ``found objects''. \section{User Inputs} \label{sec-param} Input to the program is provided by means of a parameter file. Parameters are listed in the file, followed by the value that should be assigned to them. The syntax used is {\tt paramName value}. The file is not case-sensitive, and lines in the input file that start with {\tt \#} are ignored. If a parameter is listed more than once, the latter value is used, but otherwise the order in which the parameters are listed in the input file is arbitrary. If a parameter is not listed, the default value is assumed. The defaults are chosen to provide a good result (using the reconstruction method), so the user doesn't need to specify many new parameters in the input file. Note that the image file {\bf must} be specified! The parameters that can be set are listed in Appendix~\ref{app-param}, with their default values in parentheses. The 'flag' parameters are stored as {\tt bool} variables, and so are either {\tt true = 1} or {\tt false = 0}. Currently the program only reads them from the file as integers, and so they should be entered in the file as 0 or 1 (see example file in Appendix~\ref{app-input}). \section{What the program is doing} \label{sec-flow} The execution flow of the program is detailed here, indicating the main algorithmic steps that are used. The program is written in C/C++ and makes use of the {\sc cfitsio}, {\sc wcslib} and {\sc pgplot} libraries. %\subsection{Parameter input} % %The user provides parameters that govern the selection of files and %the parameters used by the various subroutines in the program. This is %done via a parameter file, and the parameters are stored in a C++ %class for use throughout the program. The form of the parameter file is %discussed in \S\ref{sec-param}, and the parameters themselves are %listed in Appendix~\ref{app-param}. \subsection{Image input} The cube is read in using basic {\sc cfitsio} commands, and stored as an array in a special C++ class structure. This class keeps track of the list of detected objects, as well as any reconstructed arrays that are made (see \S\ref{sec-recon}). The World Coordinate System (WCS) information for the cube is also obtained from the FITS header by {\sc wcslib} functions \citep{greisen02, calabretta02}, and this information, in the form of a {\tt wcsprm} structure, is also stored in the same class. A sub-section of an image can be requested via the {\tt subsection} parameter in the parameter file -- this can be a good idea if the cube has very noisy edges, which may produce many spurious detections. The generalised form of the subsection that is used by {\sc cfitsio} is {\tt [x1:x2:dx,y1:y2:dy,z1:z2:dz]}, such that the x-coordinates run from {\tt x1} to {\tt x2} (inclusive), with steps of {\tt dx}. The step value can be omitted (so a subsection of the form {\tt [2:50,2:50,10:1000]} is still valid). Duchamp does not at this stage deal with the presence of steps in the subsection string, and any that are present are removed before the file is opened. If one wants the full range of a coordinate then replace the range with an asterisk, \eg {\tt [2:50,2:50,*]}. If one wants to use just a subsection, one must set {\tt flagSubsection = 1}. A complete description of the section syntax can be found at the {\sc fitsio} web site \footnote{ \href{http://heasarc.gsfc.nasa.gov/docs/software/fitsio/c/c\_user/node90.html}% {http://heasarc.gsfc.nasa.gov/docs/software/fitsio/c/c\_user/node90.html}}. \subsection{Image modification} \label{sec-modify} Several modifications to the cube can be made that improve the execution and efficiency of Duchamp (these are optional -- their use is indicated by the relevant flags set in the input parameter file). \subsubsection{Milky-Way removal} First, a single set of contiguous channels can be removed -- these may exhibit very strong emission, such as that from the Milky Way as seen in extragalactic \hi\ cubes (hence the references to ``Milky Way'' in relation to this task -- apologies to Galactic astronomers!). Such dominant channels will both produce many unnecessary, uninteresting and large (in size and hence in memory usage) detections, and will also affect any reconstruction that is performed (see next section). The use of this feature is controlled by the {\tt flagMW} parameter, and the exact channels concerned are able to be set by the user (using {\tt maxMW} and {\tt minMW}). When employed, the flux in these channels is set to zero. The information in those channels is not kept. \subsubsection{Blank pixel removal} Second, the cube is trimmed of any BLANK pixels that pad the image out to a rectangular shape. This is also optional, being determined by the {\tt flagBlankPix} parameter. The value for these pixels is read from the FITS header (using the BLANK, BSCALE and BZERO keywords), but if these are not present then the value can be specified by the user in the parameter file. If these blank pixels are stored as NaNs, then a normal number will be substituted (allowing these pixels to be accurately removed without adverse effects). [NOTE: this appears not to be working correctly at time of writing. If your data has unspecified BLANKs, be wary, or use the subsectioning option to trim the BLANKs.] This stage is particularly important for the reconstruction step, as lots of BLANK pixels on the edges will smooth out features in the wavelet calculation stage. The trimming will also reduce the size of the cube's array, speeding up the execution. The amount of trimming is recorded, and these pixels are added back in once the source-detection is completed (so that quoted pixel positions are applicable to the original cube). Rows and columns are trimmed one at a time until the first non-BLANK pixel is reached, so that the image remains rectangular. In practice, this means that there will be BLANK pixels left in the trimmed image (if the non-BLANK region is non-rectangular). However, these are ignored in all further calculations done on the cube. \subsubsection{Baseline removal} Finally, the user may request the removal of baselines from the spectra, via the parameter {\tt flagBaseline}. This may be necessary if there is a strong baseline ripple present, which can result in spurious detections on the high points of the ripple. The baseline is calculated from a wavelet reconstruction procedure (see \S\ref{sec-recon}) that keeps only the two largest scales. This is done separately for each spatial pixel (\ie for each spectrum in the cube), and the baselines are stored and added back in before any output is done. In this way the quoted fluxes and displayed spectra are as one would see from the input cube itself -- even though the detection (and reconstruction if applicable) is done on the baseline-removed cube. The presence of very strong signals (for instance, masers at several hundred Jy) can affect the determination of the baseline, leading to a large dip centred on the signal in the baseline-subtracted spectrum. To prevent this, the signal is trimmed prior to the reconstruction process at some standard threshold (at $8\sigma$ above the mean). The baseline determined should thus be representative of the true, signal-free baseline. Note that this trimming is only a temporary measure which does not affect the source-detection. \subsection{Image reconstruction} \label{sec-recon} This is an optional step, but one that greatly enhances the source-detection process. The user can direct Duchamp to reconstruct the data cube using the {\it {\`a} trous} wavelet procedure. A good description of the procedure can be found in \citet{starck02:book}. The reconstruction is an effective way of removing a lot of the noise in the image, allowing one to search reliably to fainter levels, and reducing the number of spurious detections. The payoff is that it can be relatively time- and memory-intensive. The steps in the procedure are as follows: \begin{enumerate} \item Set the reconstructed array to 0 everywhere. \item The cube is discretely convolved with a given filter function. This is determined from the parameter file via the {\tt filterCode} parameter -- see Appendix~\ref{app-param} for details on the filters available. \item The wavelet coefficients are calculated by taking the difference between the convolved array and the input array. \item If the wavelet coefficients at a given point are above the threshold requested (given by {\tt snrRecon} as the number of $\sigma$ above the mean and adjusted to the current scale), add these to the reconstructed array. \item The separation of the filter coefficients is doubled. \item The procedure is repeated from step 2, using the convolved array as the input array. \item Continue until the required maximum number of scales is reached. \item Add the final smoothed (\ie convolved) array to the reconstructed array. This provides the ``DC offset'', as each of the wavelet coefficient arrays will have zero mean. \end{enumerate} Note that any BLANK pixels that are still in the cube will not be altered by the reconstruction -- they will be left as BLANK so that the shape of the valid part of the cube is preserved. It is important to note that the {\it {\`a} trous} decomposition is an example of a ``redundant'' transformation. If no thresholding is performed, the sum of all the wavelet coefficient arrays and the final smoothed array is identical to the input array. The thresholding thus removes only the unwanted structure in the array. The statistics of the cube are estimated using robust methods, to avoid corruption by strong outlying points. The mean is actually estimated by the median, while the median absolute deviation from the median (MADFM) is calculated and corrected assuming Gaussianity to estimate the standard deviation $\sigma$. The Gaussianity (or Normality) assumption is critical, as the MADFM does not give the same value as the usual rms or standard deviation value -- for a normal distribution $N(\mu,\sigma)$ we find MADFM$=0.6744888\sigma$. The difference between the MADFM and $\sigma$ is corrected for, so the user need only think in the usual multiples of $\sigma$ when setting {\tt snrRecon}. See Appendix~\ref{app-madfm} for a derivation of this value. When thresholding the different wavelet scales, the value of $\sigma$ as measured from the input array needs to be scaled to account for the increased amount of correlation between neighbouring pixels (due to the convolution). See Appendix~\ref{app-scaling} for details on this scaling. The user can also select the minimum scale to be used in the reconstruction -- the first scale exhibits the highest frequency variations, and so ignoring this one can sometimes be beneficial in removing excess noise. The default, however, is to use all scales ({\tt minscale = 1}). The reconstruction has at least two iterations. The first iteration makes a first pass at the wavelet reconstruction (the process outlined in the 8 stages above), but the residual array will inevitably have some structure still in it, so the wavelet filtering is done on the residual, and any significant wavelet terms are added to the final reconstruction. This step is repeated until the change in the $\sigma$ of the background is less than some fiducial amount. \subsection{Reconstruction I/O} The reconstruction stage can be relatively time-consuming, particularly for large cubes. Duchamp thus has a shortcut to allow users to quickly do multiple searches (\eg with different thresholds) on the same reconstruction. The first step is to select to save the reconstructed image as a FITS file -- at the moment this is just saved in the same directory as the input file, so it won't work if the user does not have write permissions on that directory. The name of the file will be derived from the input file, in the following manner: if the input file is {\tt image.fits}, the reconstructed array will be saved in {\tt image.RECON?.fits}, where {\tt ?} stands for the value of {\tt snrRecon} (for instance, if {\tt snrRecon}$=4$, it will be {\tt image.RECON4.fits}, and if {\tt snrRecon}$=4.5$, it will be {\tt image.RECON4.5.fits}). To save the reconstructed array, set {\tt flagOutputRecon = true}. Likewise, the residual image, defined as the difference between the input image and the reconstructed image, can also be saved in the same manner -- its filename will be {\tt image.RESID?.fits}. This is done by setting {\tt flagOutputResid = true}. If a reconstructed image has been saved, it can be read in and used instead of redoing the reconstruction. To do so, the user should set {\tt flagReconExists = true}. The user can indicate the name of the reconstructed FITS file using the {\tt reconFile} parameter, or, if this is not specified, Duchamp searches for the file {\tt image.RECON?.fits} (as defined above). If the file is not found, the reconstruction is performed as normal. Note that to do this, the user needs to set {\tt flagAtrous = true} (obviously, if this is {\tt false}, the reconstruction is not needed). \subsection{Searching the image} \label{sec-detection} The image is searched for detections in two ways: spectrally (a 1-dimensional search in the spectrum in each spatial pixel), and spatially (a 2-dimensional search in the spatial image in each channel). In both cases, the algorithm finds connected pixels that are above the user-specified threshold. In the case of the spatial image search, the algorithm of \citet{lutz80} is used to raster scan through the image and connect groups of pixels on neighbouring rows. Note that this algorithm cannot be applied directly to a 3-dimensional case, as it requires that objects are completely nested in a row: that is, if you are scanning along a row, and one object finishes and another starts, you know that you will not get back to the first one (if at all) until the second is finished for that row. Three-dimensional data does not have this property, which is why we break up the searching into 1- and 2-dimensional cases. The determination of the threshold is done in one of two ways. The first way is a simple sigma-clipping, where a threshold is set at $n\sigma$ above the mean and pixels above this threshold are flagged as detected. The value of $n$ is set with the parameter {\tt snrCut}. As before, the value for $\sigma$ is estimated by the MADFM, and corrected by the ratio derived in Appendix~\ref{app-madfm}. The second method uses the False Discovery Rate (FDR) technique \citep{miller01,hopkins02}, whose basis we briefly detail here. The false discovery rate (given by the number of false detections divided by the total number of detections) is fixed at a certain value $\alpha$ (\eg $\alpha=0.05$ implies 5\% of detections are false positives). In practice, an $\alpha$ value is chosen, and the ensemble average FDR (\ie $$) when the method is used will be less than $\alpha$. One calculates $p$ -- the probability, assuming the null hypothesis is true, of obtaining a test statistic as extreme as the pixel value (the observed test statistic) -- for each pixel, and sorts them in increasing order. One then calculates $d$ where \[ d = \max_j \left\{ j : P_j < \frac{j\alpha}{c_N N} \right\}, \] and then rejects all hypotheses whose $p$-values are less than or equal to $P_d$. (So a $P_i Detected sources and parameters from running the Duchamp source finder. (... table truncated for clarity ...)
1 J0609-2200 92.410416-22.013390 48.50 39.42 213.061 65.957 17.572
2 J0608-2605 92.042633-26.085157 44.47 39.47 233.119 39.574 4.144
3 J0606-2724 91.637840-27.412022 52.48 47.57 302.213 39.574 17.066
\end{verbatim} } \end{landscape} \newpage \section{Example Karma Annotation File output} \label{app-karma} This is the format of the Karma Annotation file, showing the locations of the detected objects. This can be loaded by the plotting tools of the Karma package (for instance, {\tt kvis}) as an overlay on the FITS file. \begin{verbatim} # Duchamp Source Finder results for # cube /DATA/SITAR_1/whi550/cubes/H201_abcde_luther_chop.fits COLOR RED COORD W CIRCLE 92.3376 -21.9475 0.403992 TEXT 92.3376 -21.9475 1 CIRCLE 91.9676 -26.0193 0.37034 TEXT 91.9676 -26.0193 2 CIRCLE 91.5621 -27.3459 0.437109 TEXT 91.5621 -27.3459 3 CIRCLE 92.8285 -21.6344 0.269914 TEXT 92.8285 -21.6344 4 CIRCLE 90.1381 -28.9838 0.234179 TEXT 90.1381 -28.9838 5 CIRCLE 89.72 -26.6513 0.132743 TEXT 89.72 -26.6513 6 CIRCLE 94.2743 -27.4003 0.195175 TEXT 94.2743 -27.4003 7 CIRCLE 92.2739 -21.6941 0.134538 TEXT 92.2739 -21.6941 8 CIRCLE 89.7133 -25.4259 0.232252 TEXT 89.7133 -25.4259 9 CIRCLE 90.2206 -21.6993 0.266247 TEXT 90.2206 -21.6993 10 CIRCLE 93.8581 -26.5766 0.163153 TEXT 93.8581 -26.5766 11 CIRCLE 91.176 -26.1064 0.234356 TEXT 91.176 -26.1064 12 CIRCLE 90.2844 -23.6716 0.299509 TEXT 90.2844 -23.6716 13 CIRCLE 93.8774 -22.581 0.130925 TEXT 93.8774 -22.581 14 CIRCLE 94.3882 -23.0934 0.137108 TEXT 94.3882 -23.0934 15 CIRCLE 93.0491 -21.8223 0.202928 TEXT 93.0491 -21.8223 16 CIRCLE 94.0685 -21.5603 0.168456 TEXT 94.0685 -21.5603 17 CIRCLE 86.0568 -27.6095 0.101113 TEXT 86.0568 -27.6095 18 CIRCLE 88.7932 -29.9453 0.202624 TEXT 88.7932 -29.9453 19 \end{verbatim} \newpage \section{Installing Duchamp (README file)} \begin{verbatim} There is an executable (Duchamp) that has been compiled on a Debian Linux kernel 2.6.8-2-686, with gcc version 3.3.5 (Debian 1:3.3.5-13) If that is no good to you, you can compile it yourself using the Makefile included in this directory (sorry for not having a configure script or similar yet!). Duchamp uses three main external libraries: pgplot, cfitsio and wcslib. You will need to set the paths for the base directory and three libraries, as they are currently configured for my use and will not be of much use to you! These are: BASE --> the current directory PGDIR --> where the pgplot libraries (and header files) are located CFITSIODIR --> where the header file fitsio.h is CFITSIOLDIR --> where the cfitsio library is located (libcfitsio.a) WCSDIR --> where the wcslib header files are WCSLDIR --> where the wcslib library is located (libwcs.a) If you do not have the libraries, they can be downloaded from the following locations: PGPlot -- http://www.astro.caltech.edu/~tjp/pgplot/ cfitsio -- http://heasarc.gsfc.nasa.gov/docs/software/fitsio/fitsio.html wcslib -- http://www.atnf.csiro.au/people/Mark.Calabretta/WCS/index.html Once you've set up the Makefile correctly, then simply typing > make duchamp will compile the program. To run it, you need to use the syntax > Duchamp -p parameterFile where parameterFile is a file with the input parameters, including the name of the cube you want to search. There are two example input files included with the distribution. The smaller one, InputExample, shows the typical parameters one might want to set. The large one, InputComplete, lists all parameters that can be entered, and a brief description of them. Refer to the documentation for further details. To get going quickly, just replace the "your-file-here" in InputExample with your image name, and type > Duchamp -p InputExample and you're off! \end{verbatim} \section{Robust statistics for a Normal distribution} \label{app-madfm} The Normal, or Gaussian, distribution for mean $\mu$ and standard deviation $\sigma$ can be written as \[ f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}\ e^{-(x-\mu)^2/2\sigma^2}. \] When one has a purely Gaussian signal, it is straightforward to estimate $\sigma$ by calculating the standard deviation (or rms) of the data. However, if there is a small amount of signal present on top of Gaussian noise, and one wants to estimate the $\sigma$ for the noise, the presence of the large values from the signal can bias the estimator to higher values. An alternative way is to use the median ($m$) and median absolute deviation from the median ($s$) to estimate $\mu$ and $\sigma$. The median is the middle of the distribution, defined for a continuous distribution by \[ \int_{-\infty}^{m} f(x) \diff x = \int_{m}^{\infty} f(x) \diff x. \] From symmetry, we quickly see that for the continuous Normal distribution, $m=\mu$. We consider the case henceforth of $\mu=0$, without loss of generality. To find $s$, we find the distribution of the absolute deviation from the median, and then find the median of that distribution. This distribution is given by \begin{eqnarray*} g(x) &= &{\mbox{\rm distribution of }} |x|\\ &= &f(x) + f(-x),\ x\ge0\\ &= &\sqrt{\frac{2}{\pi\sigma^2}}\, e^{-x^2/2\sigma^2},\ x\ge0. \end{eqnarray*} So, the median absolute deviation from the median, $s$, is given by \[ \int_{0}^{s} g(x) \diff x = \int_{s}^{\infty} g(x) \diff x. \] Now, $\int_{0}^{\infty}e^{-x^2/2\sigma^2} \diff x = \sqrt{\pi\sigma^2/2}$, and so $\int_{s}^{\infty} e^{-x^2/2\sigma^2} \diff x = \sqrt{\pi\sigma^2/2} - \int_{0}^{s} e^{-\frac{x^2}{2\sigma^2}} \diff x $. Hence, to find $s$ we simply solve the following equation (setting $\sigma=1$ for simplicity -- equivalent to stating $x$ and $s$ in units of $\sigma$): \[ \int_{0}^{s}e^{-x^2/2} \diff x - \sqrt{\pi/8} = 0. \] This is hard to solve analytically (no nice analytic solution exists for the finite integral that I'm aware of), but straightforward to solve numerically, yielding the value of $s=0.6744888$. Thus, to estimate $\sigma$ for a Normally distributed data set, one can calculate $s$, then divide by 0.6744888 (or multiply by 1.4826042) to obtain the correct estimator. Note that this is different to solutions quoted elsewhere, specifically in \citet{meyer04:trunc}, where the same robust estimator is used but with an incorrect conversion to standard deviation -- they assume $\sigma = s\sqrt{\pi/2}$. This, in fact, is the conversion used to convert the {\it mean} absolute deviation from the mean to the standard deviation. This means that the cube noise in the \hipass\ catalogue (their parameter Rms$_{\rm cube}$) should be 18\% larger than quoted. \section{How Gaussian noise changes with wavelet scale.} \label{app-scaling} The key element in the wavelet reconstruction of an array is the thresholding of the individual wavelet coefficient arrays. This is usually done by choosing a level to be some number of standard deviations above the mean value. However, since the wavelet arrays are produced by convolving the input array by an increasingly large filter, the pixels in the coefficient arrays become increasingly correlated as the scale of the filter increases. This results in the measured standard deviation from a given coefficient array decreasing with increasing scale. To calculate this, we need to take into account how many other pixels each pixel in the convolved array depends on. To demonstrate, suppose we have a 1-D array with $N$ pixel values given by $F_i,\ i=1,...,N$, and we convolve it with the B$_3$-spline filter, defined by the set of coefficients $\{1/16,1/4,3/8,1/4,1/16\}$. The flux of the $i$th pixel in the convolved array will be \[ F'_i = \frac{1}{16}F_{i-2} + \frac{1}{16}F_{i-2} + \frac{3}{8}F_{i} + \frac{1}{4}F_{i-1} + \frac{1}{16}F_{i+2} \] and the flux of the corresponding pixel in the wavelet array will be \[ W'_i = F_i - F'_i = \frac{1}{16}F_{i-2} + \frac{1}{16}F_{i-2} + \frac{5}{8}F_{i} + \frac{1}{4}F_{i-1} + \frac{1}{16}F_{i+2} \] Now, assuming each pixel has the same standard deviation $\sigma_i=\sigma$, we can work out the standard deviation for the coefficient array: \[ \sigma'_i = \sigma \sqrt{\left(\frac{1}{16}\right)^2 + \left(\frac{1}{4}\right)^2 + \left(\frac{5}{8}\right)^2 + \left(\frac{1}{4}\right)^2 + \left(\frac{1}{16}\right)^2} = 0.72349\ \sigma \] Thus, the first scale wavelet coefficient array will have a standard deviation of 72.3\% of the input array. This procedure can be followed to calculate the necessary values for all scales, dimensions and filters used by Duchamp. Calculating these values is, therefore, a critical step in performing the reconstruction. \citet{starck02:book} did so by simulating data sets with Gaussian noise, taking the wavelet transform, and measuring the value of $\sigma$ for each scale. We take a different approach, by calculating the scaling factors directly from the filter coefficients by taking the wavelet transform of an array made up of a 1 in the central pixel and 0s everywhere else. The scaling value is then derived by adding in quadrature all the wavelet coefficient values at each scale. We give the scaling factors for the three filters available to Duchamp on the following page. These values are hard-coded into Duchamp, so no on-the-fly calculation of them is necessary. Memory limitations prevent us from calculating factors for large scales, particularly for the three-dimensional case (hence the -- symbols in the tables). To calculate factors for higher scales than those available, we note the following relationships apply for large scales to a sufficient level of precision: \begin{itemize} \item 1-D: factor(scale $i$) = factor(scale $i-1$)$/\sqrt{2}$. \item 2-D: factor(scale $i$) = factor(scale $i-1$)$/2$. \item 1-D: factor(scale $i$) = factor(scale $i-1$)$/\sqrt{8}$. \end{itemize} \newpage \begin{itemize} \item {\bf B$_3$-Spline Function:} $\{1/16,1/4,3/8,1/4,1/16\}$ \begin{tabular}{llll} Scale & 1 dimension & 2 dimension & 3 dimension\\ \hline 1 & 0.723489806 & 0.890796310 & 0.956543592\\ 2 & 0.285450405 & 0.200663851 & 0.120336499\\ 3 & 0.177947535 & 0.0855075048 & 0.0349500154\\ 4 & 0.122223156 & 0.0412474444 & 0.0118164242\\ 5 & 0.0858113122 & 0.0204249666 & 0.00413233507\\ 6 & 0.0605703043 & 0.0101897592 & 0.00145703714\\ 7 & 0.0428107206 & 0.00509204670 & 0.000514791120\\ 8 & 0.0302684024 & 0.00254566946 & --\\ 9 & 0.0214024008 & 0.00127279050 & --\\ 10 & 0.0151336781 & 0.000636389722 & --\\ 11 & 0.0107011079 & 0.000318194170 & --\\ 12 & 0.00756682272 & -- & --\\ 13 & 0.00535055108 & -- & --\\ %14 & 0.00378341085 & -- & --\\ %15 & 0.00267527545 & -- & --\\ %16 & 0.00189170541 & -- & --\\ %17 & 0.00133763772 & -- & --\\ %18 & 0.000945852704 & -- & -- \end{tabular} \item {\bf Triangle Function:} $\{1/4,1/2,1/4\}$ \begin{tabular}{llll} Scale & 1 dimension & 2 dimension & 3 dimension\\ \hline 1 & 0.612372436 & 0.800390530 & 0.895954449 \\ 2 & 0.330718914 & 0.272878894 & 0.192033014\\ 3 & 0.211947812 & 0.119779282 & 0.0576484078\\ 4 & 0.145740298 & 0.0577664785 & 0.0194912393\\ 5 & 0.102310944 & 0.0286163283 & 0.00681278387\\ 6 & 0.0722128185 & 0.0142747506 & 0.00240175885\\ 7 & 0.0510388224 & 0.00713319703 & 0.000848538128 \\ 8 & 0.0360857673 & 0.00356607618 & 0.000299949455 \\ 9 & 0.0255157615 & 0.00178297280 & -- \\ 10 & 0.0180422389 & 0.000891478237 & -- \\ 11 & 0.0127577667 & 0.000445738098 & -- \\ 12 & 0.00902109930 & 0.000222868922 & -- \\ 13 & 0.00637887978 & -- & -- \\ %14 & 0.00451054902 & -- & -- \\ %15 & 0.00318942978 & -- & -- \\ %16 & 0.00225527449 & -- & -- \\ %17 & 0.00159471988 & -- & -- \\ %18 & 0.000112763724 & -- & -- \end{tabular} \item {\bf Haar Wavelet:} $\{0,1/2,1/2\}$ \begin{tabular}{llll} Scale & 1 dimension & 2 dimension & 3 dimension\\ \hline 1 & 0.707167810 & 0.433012702 & 0.935414347 \\ 2 & 0.500000000 & 0.216506351 & 0.330718914\\ 3 & 0.353553391 & 0.108253175 & 0.116926793\\ 4 & 0.250000000 & 0.0541265877 & 0.0413398642\\ 5 & 0.176776695 & 0.0270632939 & 0.0146158492\\ 6 & 0.125000000 & 0.0135316469 & 0.00516748303 \end{tabular} \end{itemize} \end{document}