source: tags/release-0.9/docs/Guide.tex @ 813

Last change on this file since 813 was 37, checked in by Matthew Whiting, 18 years ago

Changed reference to the Meyer et al determination of sigma, better describing
how they did it and the reason for the difference in our conversions.

File size: 66.0 KB
Line 
1\documentclass[12pt,a4paper]{article}
2
3%Define a test for doing PDF format -- use different code below
4\newif\ifPDF
5\ifx\pdfoutput\undefined\PDFfalse
6\else\ifnum\pdfoutput > 0\PDFtrue
7     \else\PDFfalse
8     \fi
9\fi
10
11\textwidth=161 mm
12\textheight=240 mm
13\topmargin=-18 mm
14\oddsidemargin=0 mm
15\parindent=6 mm
16
17\usepackage[sort]{natbib}
18\usepackage{lscape}
19\bibpunct[,]{(}{)}{;}{a}{}{,}
20
21\newcommand{\eg}{e.g.\ }
22\newcommand{\ie}{i.e.\ }
23\newcommand{\hi}{H{\sc i}}
24\newcommand{\hipass}{{\sc hipass}}
25\newcommand{\progname}{{\tt Duchamp}}
26\newcommand{\diff}{{\rm d}}
27\newcommand{\entrylabel}[1]{\mbox{\textsf{\bf{#1:}}}\hfil}
28\newenvironment{entry}
29        {\begin{list}{}%
30                {\renewcommand{\makelabel}{\entrylabel}%
31                        \setlength{\labelwidth}{30mm}%
32                        \setlength{\labelsep}{5pt}%
33                        \setlength{\itemsep}{2pt}%
34                        \setlength{\parsep}{2pt}%
35                        \setlength{\leftmargin}{35mm}%
36                }%
37        }%
38{\end{list}}
39
40
41\title{A Guide to the {\it Duchamp} Source Finding Software}
42\author{Matthew Whiting\\
43%{\small \href{mailto:Matthew.Whiting@csiro.au}{Matthew.Whiting@csiro.au}}\\
44Australia Telescope National Facility\\CSIRO}
45%\date{January 2006}
46\date{}
47
48% If we are creating a PDF, use different options for graphicx, hyperref.
49\ifPDF
50  \usepackage[pdftex]{graphicx,color}
51  \usepackage[pdftex]{hyperref}
52  \hypersetup{colorlinks=true,%             
53            % citecolor=black,%
54            % filecolor=black,%
55            % linkcolor=black,%
56            % urlcolor=black,%
57              }
58\else
59  \usepackage[dvips]{graphicx}
60  \usepackage[dvips]{hyperref}
61\fi
62
63\begin{document}
64
65\maketitle
66\tableofcontents
67
68\newpage
69\section{Introduction and getting going quickly}
70
71This document gives details on the use of the program Duchamp. This
72has been written to provide a source-detection facility for
73spectral-line data cubes. The basic execution of Duchamp is to read
74in a FITS data cube, find sources in the cube, and produce a text
75file of positions, velocities and fluxes of the detections, as well as
76a postscript file of the spectra of each detection.
77
78So, you have a FITS cube, and you want to find the sources in it. What
79do you do? The first step is to make an input file that contains the
80list of parameters. Brief and detailed examples are shown in
81Appendix~\ref{app-input}. This provides the input file name, the various
82output files, and defines various parameters that control the
83execution.
84
85The program is run by the command
86\begin{quote}
87{\tt Duchamp -p [parameter file]}
88\end{quote}
89replacing {\tt [parameter file]} with the name of the file you have
90just created/copied. The program will then work away and give you the
91list of detections and their spectra. The program execution is
92summarise below, and detailed in \S\ref{sec-flow}. Information on
93inputs is in \S\ref{sec-param} and Appendix~\ref{app-param}, and
94descriptions of the output is in \S\ref{sec-output}.
95
96\subsection{A summary of the execution steps}
97
98The basic flow of the program is summarised here. All these steps are
99discussed in more detail in the following sections, so read on if
100you have questions!
101\begin{enumerate}
102\item The parameter file given on the command line is read in, and the
103  parameters absorbed.
104\item From the parameter file, the FITS image is located and read in
105  to memory.
106\item If requested, blank pixels are trimmed from the edges, and
107  channels corresponding to bright (\eg Galactic) emission are
108  excised.
109\item If requested, the baseline of each spectrum is removed.
110\item If the reconstruction method is requested, the cube is
111  reconstructed using the {\it {\' a} trous} wavelet method.
112\item Searching for objects then takes place, using the requested
113  thresholding method.
114\item The list of objects is trimmed by merging neighbouring objects
115  and removing those deemed unacceptable.
116\item The baselines and trimmed pixels are replaced prior to output.
117\item The details on the detections are written to screen and to the
118  requested output file.
119\item Maps showing the spatial location of the detections are written.
120\item The integrated spectra of each detection are written to a
121  postscript file.
122\item If requested, the reconstructed array can be written to a new
123  FITS file.
124\end{enumerate}
125
126\subsection{Guide to terminology}
127
128First, a brief note on the use of terminology in this guide. Duchamp
129is designed to work on FITS ``cubes''. These are FITS\footnote{FITS is
130the Flexible Image Transport System -- see \citet{hanisch01} or
131websites such as
132\href{http://fits.cv.nrao.edu/FITS.html}{http://fits.cv.nrao.edu/FITS.html}
133for details.} image arrays with three dimensions -- they are assumed
134to have the following form: the first two dimensions (referred to as
135$x$ and $y$) are spatial directions (that is, relating to the position
136on the sky), while the third dimension, $z$, is the spectral
137direction, which can correspond to frequency, wavelength, or velocity.
138
139Each spatial pixel (a given $(x,y)$ coordinate) can be said to be a
140single spectrum, while a slice through the cube perpendicular to the
141spectral direction at a given $z$-value is a single channel (the 2-D
142image is a channel map).
143
144Features that are detected are assumed to be positive. If one wants to
145search for absorption (negative) features, try multiplying your cube
146by $-1$ before running Duchamp.
147
148Note that it is possible to run Duchamp on a two-dimensional image
149(\ie one with no frequency or velocity information), or indeed a
150one-dimensional array, and many of the features of the program will
151work fine. The focus, however, is on object detection in three
152dimensions.
153
154\subsection{Why ``Duchamp''?}
155
156Well, it's important for a program to have a name, and it certainly
157beats the initial version of ``cubefind''. I had planned to call it
158``Picasso'' (as in the father of cubism), but sadly this had already
159been used before \citep{minchin99}. So I settled on naming it after
160Marcel Duchamp, another cubist, but also one of the first artists to
161work with ``found objects''.
162
163\section{User Inputs}
164\label{sec-param}
165
166Input to the program is provided by means of a parameter file. Parameters
167are listed in the file, followed by the value that should be assigned
168to them. The syntax used is {\tt paramName value}. The file is not
169case-sensitive, and lines in the input file that start with {\tt \#} are
170ignored. If a parameter is listed more than once, the latter value is
171used, but otherwise the order in which the parameters are listed in the
172input file is arbitrary.
173
174If a parameter is not listed, the default value is assumed. The
175defaults are chosen to provide a good result (using the reconstruction
176method), so the user doesn't need to specify many new parameters in
177the input file. Note that the image file {\bf must} be specified! The
178parameters that can be set are listed in Appendix~\ref{app-param},
179with their default values in parentheses.
180
181The 'flag' parameters are stored as {\tt bool} variables, and so are
182either {\tt true = 1} or {\tt false = 0}. Currently the program only
183reads them from the file as integers, and so they should be entered in
184the file as 0 or 1 (see example file in Appendix~\ref{app-input}).
185
186\section{What the program is doing}
187\label{sec-flow}
188
189The execution flow of the program is detailed here, indicating the
190main algorithmic steps that are used. The program is written in C/C++
191and makes use of the {\sc cfitsio}, {\sc wcslib} and {\sc pgplot}
192libraries.
193
194%\subsection{Parameter input}
195%
196%The user provides parameters that govern the selection of files and
197%the parameters used by the various subroutines in the program. This is
198%done via a parameter file, and the parameters are stored in a C++
199%class for use throughout the program. The form of the parameter file is
200%discussed in \S\ref{sec-param}, and the parameters themselves are
201%listed in Appendix~\ref{app-param}.
202
203\subsection{Image input}
204
205The cube is read in using basic {\sc cfitsio} commands, and stored as
206an array in a special C++ class structure. This class keeps track of
207the list of detected objects, as well as any reconstructed arrays that
208are made (see \S\ref{sec-recon}). The World Coordinate System (WCS)
209information for the cube is also obtained from the FITS header by {\sc
210wcslib} functions \citep{greisen02, calabretta02}, and this
211information, in the form of a {\tt wcsprm} structure, is also stored
212in the same class.
213
214A sub-section of an image can be requested. This is done via the {\tt
215subsection} parameter in the parameter file. The generalised form of
216the subsection that is used by {\sc cfitsio} is {\tt
217[x1:x2:dx,y1:y2:dy,z1:z2:dz]}, such that the x-coordinates run from
218{\tt x1} to {\tt x2} (inclusive), with steps of {\tt dx}. The step
219value can be omitted (so a subsection of the form {\tt
220[2:50,2:50,10:1000]} is still valid). Duchamp does not at this
221stage deal with the presence of steps in the subsection string, and
222any that are present are removed before the file is opened.
223
224If one wants the full range of a value then replace the range with an
225asterisk, \eg {\tt [2:50,2:50,*]}. If one wants to use just a
226subsection, one must set {\tt flagSubsection = 1}. A complete
227description of the section syntax can be found at the {\sc fitsio} web
228site
229\footnote{
230\href{http://heasarc.gsfc.nasa.gov/docs/software/fitsio/c/c\_user/node90.html}%
231{http://heasarc.gsfc.nasa.gov/docs/software/fitsio/c/c\_user/node90.html}}.
232
233\subsection{Image modification}
234\label{sec-modify}
235
236Several modifications to the cube can be made that improve the
237execution and efficiency of Duchamp (these are optional -- their
238use is indicated by the relevant flags set in the input parameter
239file).
240
241\subsubsection{Milky-Way removal}
242
243First, a single set of contiguous channels can be removed -- these may
244exhibit very strong emission, such as that from the Milky Way as seen
245in extragalactic \hi\ cubes (hence the references to ``Milky Way'' in
246relation to this task -- apologies to Galactic astronomers!). Such
247dominant channels will both produce many unnecessary, uninteresting
248and large (in size and hence in memory usage) detections, and will
249also affect any reconstruction that is performed (see next
250section). The use of this feature is controlled by the {\tt flagMW}
251parameter, and the exact channels concerned are able to be set by the
252user (using {\tt maxMW} and {\tt minMW}). When employed, the flux in
253these channels is set to zero. The information in those channels is
254not kept.
255
256\subsubsection{Blank pixel removal}
257
258Second, the cube is trimmed of any BLANK pixels that pad the image
259out to a rectangular shape. This is also optional, being determined by
260the {\tt flagBlankPix} parameter. The value for these pixels is read from
261the FITS header (using the BLANK, BSCALE and BZERO keywords), but if
262these are not present then the value can be specified by the user in
263the parameter file. If these blank pixels are stored as NaNs, then a
264normal number will be substituted (allowing these pixels to be
265accurately removed without adverse effects). [NOTE: this appears not
266  to be working correctly at time of writing. If your data has
267  unspecified BLANKs, be wary...]
268
269This stage is particularly important for the reconstruction step, as
270lots of BLANK pixels on the edges will smooth out features in the
271wavelet calculation stage. The trimming will also reduce the size of
272the cube's array, speeding up the execution. The amount of trimming is
273recorded, and these pixels are added back in once the source-detection
274is completed (so that quoted pixel positions are applicable to the
275original cube).
276
277Rows and columns are trimmed one at a time until the first non-BLANK
278pixel is reached, so that the image remains rectangular. In practice,
279this means that there will be BLANK pixels left in the trimmed image
280(if the non-BLANK region is non-rectangular). However, these are
281ignored in all further calculations done on the cube.
282
283\subsubsection{Baseline removal}
284
285Finally, the user may request the removal of baselines from the
286spectra, via the parameter {\tt flagBaseline}. This may be necessary
287if there is a strong baseline ripple present, which can result in
288spurious detections on the high points of the ripple. The baseline is
289calculated from a wavelet reconstruction procedure (see
290\S\ref{sec-recon}) that keeps only the two largest scales. This is
291done separately for each spatial pixel (\ie for each spectrum in the
292cube), and the baselines are stored and added back in before any
293output is done. In this way the quoted fluxes and displayed spectra
294are as one would see from the input cube itself -- even though the
295detection (and reconstruction if applicable) is done on the
296baseline-removed cube.
297
298\subsection{Image reconstruction}
299\label{sec-recon}
300
301This is an optional step. The user can direct Duchamp to
302reconstruct the data cube using the {\it {\`a} trous} wavelet
303procedure. A good description of the procedure can be found in
304\citet{starck02:book}. This is an effective way of removing a
305lot of the noise in the image, but at this stage is relatively time-
306and memory-intensive. The steps in the procedure are as follows:
307\begin{enumerate}
308\item Set the reconstructed array to 0 everywhere.
309\item The cube is discretely convolved with a given filter
310  function. This is determined from the parameter file via the {\tt
311  filterCode} parameter -- see Appendix~\ref{app-param} for details on
312  the filters available.
313\item The wavelet coefficients are calculated by taking the difference
314  between the convolved array and the input array.
315\item If the wavelet coefficients at a given point are above the
316  threshold requested (given by {\tt snrRecon} as the number of
317  $\sigma$ above the mean and adjusted to the current scale), add
318  these to the reconstructed array.
319\item The separation of the filter coefficients is doubled.
320\item The procedure is repeated from step 2, using the convolved array
321  as the input array.
322\item Continue until the required maximum number of scales is reached.
323\item Add the final smoothed (\ie convolved) array to the
324  reconstructed array. This provides the ``DC offset'', as each of the
325  wavelet coefficient arrays will have zero mean.
326\end{enumerate}
327
328It is important to note that the {\it {\`a} trous} decomposition is
329an example of a ``redundant'' transformation. If no thresholding is
330performed, the sum of all the wavelet coefficient arrays and the final
331smoothed array is identical to the input array. The thresholding thus
332removes only the unwanted structure in the array.
333
334The statistics of the cube are estimated using robust methods, to
335avoid corruption by strong outlying points. The mean is actually
336estimated by the median, while the median absolute deviation from the
337median (MADFM) is calculated and corrected assuming Gaussianity to
338estimate the standard deviation $\sigma$. The Gaussianity (or
339Normality) assumption is critical, as the MADFM does not give the same
340value as the usual rms or standard deviation value -- for a normal
341distribution $N(\mu,\sigma)$ we find MADFM$=0.6744888\sigma$. The
342difference between the MADFM and $\sigma$ is corrected for, so the
343user need only think in the usual multiples of $\sigma$ when setting
344{\tt snrRecon}. See Appendix~\ref{app-madfm} for a derivation of this
345value.
346
347When thresholding the different wavelet scales, the value of $\sigma$
348as measured from the input array needs to be scaled to account for the
349increased amount of correlation between neighbouring pixels (due to
350the convolution). See Appendix~\ref{app-madfm} for details on this
351scaling.
352
353The user can also select the minimum scale to be used in the
354reconstruction -- the first scale exhibits the highest frequency
355variations, and so ignoring this one can sometimes be beneficial in
356removing excess noise. The default, however, is to use all scales
357({\tt minscale = 1}).
358
359The reconstruction has at least two iterations. The first iteration
360makes a first pass at the wavelet reconstruction (the process outlined
361in the 8 stages above), but the residual array will inevitably have
362some structure still in it, so the wavelet filtering is done on the
363residual, and any significant wavelet terms are added to the final
364reconstruction. This step is repeated until the change in the $\sigma$
365of the background is less than some fiducial amount.
366
367The user can optionally select to save the reconstructed image as a
368FITS file -- at the moment this is just saved in the same directory as
369the input file, so it won't work if the user does not have write
370permissions on that directory. See Appendix~\ref{app-param} for details
371on the naming of the output image. The residual image, which is the
372difference between the input image and the reconstructed image, can
373also be saved in the same manner.
374
375Finally, note that any BLANK pixels that are still in the cube
376will not be altered by the reconstruction -- they will be left as
377BLANK so that the shape of the valid part of the cube is preserved.
378
379\subsection{Searching the image}
380\label{sec-detection}
381
382The image is searched for detections in two ways: spectrally (a
3831-dimensional search in the spectrum in each spatial pixel), and
384spatially (a 2-dimensional search in the spatial image in each
385channel). In both cases, the algorithm finds connected pixels that are
386above the user-specified threshold. In the case of the spatial image
387search, the algorithm of \citet{lutz80} is used to raster scan through
388the image and connect groups of pixels on neighbouring rows.
389
390Note that this algorithm cannot be applied directly to a 3-dimensional
391case, as it requires that objects are completely nested in a row: that
392is, if you are scanning along a row, and one object finishes and
393another starts, you know that you will not get back to the first one
394(if at all) until the second is finished for that
395row. Three-dimensional data does not have this property, which is why
396we break up the searching into 1- and 2-dimensional cases.
397
398The determination of the threshold is done in one of two ways. The
399first way is a simple sigma-clipping, where a threshold defined as
400$n\sigma$ above the mean is set and pixels above this threshold are
401flagged as detected. As before, the value for $\sigma$ is estimated by
402the MADFM, and corrected by the ratio derived in
403Appendix~\ref{app-madfm}.
404
405The second method uses the False Discovery Rate (FDR) technique
406\citep{miller01,hopkins02}, whose basis we briefly detail here. The
407false discovery rate (given by the number of false detections divided
408by the total number of detections) is fixed at a certain value
409$\alpha$ (\eg $\alpha=0.05$ implies 5\% of detections are false
410positives). In practice, an $\alpha$ value is chosen, and the ensemble
411average FDR (\ie $<FDR>$) when the method is used will be less than
412$\alpha$.  One calculates $p$ -- the probability, assuming the null
413hypothesis is true, of obtaining a test statistic as extreme as the
414pixel value (the observed test statistic) -- for each pixel, and sorts
415them in increasing order. One then calculates $d$ where
416\[
417d = \max_j \left\{ j : P_j < \frac{j\alpha}{c_N N} \right\},
418\]
419and then rejects all hypotheses whose $p$-values are less than or equal
420to $P_d$. (So a $P_i<P_d$ will be rejected even if $P_i \geq
421j\alpha/c_N N$.) Note that ``reject hypothesis'' here means ``accept
422the pixel as an object pixel'' (\ie we are rejecting the null
423hypothesis that the pixel belongs to the background).
424
425The $c_N$ values here are normalisation constants that depend on the
426correlated nature of the pixel values. If all the pixels are
427uncorrelated, then $c_N=1$. If $N$ pixels are correlated, then their
428tests will be dependent on each other, and so $c_N = \sum_{i=1}^N
429i^{-1}$. \citet{hopkins02} consider real radio data, where the pixels
430are correlated over the beam. In this case the sum is made over the
431$N$ pixels that make up the beam. The value of $N$ is calculated from
432the FITS header (if the correct keywords -- BMAJ, BMIN -- are not
433present, a default value of 10 pixels is assumed).
434
435If a reconstruction has been made, the residuals (defined as original
436$-$ reconstruction) are used to estimate the noise parameters of the
437cube. Otherwise they are estimated directly from the cube itself. In
438both cases, the median is used as a robust estimator of the mean
439value, although the $\sigma$ is estimated by the standard deviation
440(of the residual array, in the case of the reconstruction, but of the
441original array otherwise).
442
443Detections must have a minimum number of pixels to be counted. This
444minimum number is given by the input parameters {\tt minPix} (for
4452-dimensional searches) and {\tt minChannels} (for 1-dimensional
446searches). Note again that only positive thresholding is done --
447negative features are not searched for.
448
449\subsection{Merging detected objects}
450\label{sec-merger}
451
452The searching step produces a list of detections that will have many
453repeated detections of a given object -- for instance, spectral
454detections in adjacent pixels of the same object and/or spatial
455detections in neighbouring channels. These are then combined in an
456algorithm that matches all objects judged to be ``close''. This
457determination is made in one of two ways.
458
459One way is to define two thresholds -- one spatial and one in velocity
460-- and say that two objects should be merged if there is at least one
461pair of pixels that lie within these threshold distances of each
462other. These thresholds are specified by the parameters {\tt
463threshSpatial} and {\tt threshVelocity} (in units of pixels and
464channels respectively).
465
466Alternatively, the spatial requirement can be changed to say that
467there must be a pair of pixels that are {\it adjacent} -- a stricter,
468but more realistic requirement, particularly when the spatial pixels
469have a large angular size (as is the case for \hi\ surveys). This
470method can be selected by setting the parameter
471{\tt flagAdjacent} to 1 (\ie {\tt true}) in the parameter file. The
472velocity thresholding is done in the same way as the first option.
473
474Once the detections have been merged, they may be ``grown''. This is a
475process of increasing the size of the detection by adding adjacent
476pixels that are above some secondary threshold. This threshold is
477lower than the one used for the initial detection, but above the noise
478level, so that faint pixels are only detected when they are close to a
479bright pixel. The value of this threshold is a possible input
480parameter ({\tt growthCut}), with a default value of $1.5\sigma$. The
481use of the growth algorithm is controlled by the {\tt flagGrowth}
482parameter -- the default value of which is {\tt false}. If the
483detections are grown, they are sent through the merging algorithm a
484second time, to pick up any detections that now overlap or have grown
485over each other.
486
487Finally, to be accepted, the detections must span {\it both} a minimum
488number of channels (to remove any spurious single-channel spikes that
489may be present), and a minimum number of spatial pixels. These
490numbers, as for the original detection step, are set with the {\tt
491minChannels} and {\tt minPix} parameters. The channel requirement
492means there must be at least one set of this many consecutive channels
493in the source for it to be accepted.
494
495\section{Outputs}
496\label{sec-output}
497
498\subsection{During execution}
499
500Duchamp provides the user with feedback whilst it is running, to
501keep the user informed on the progress of the analysis. Most of this
502consists of self-explanatory messages about the particular stage the
503program is up to. The relevant parameters are printed to the screen at
504the start (once the file has been successfully read in), so the user
505is able to make a quick check that the setup is correct.
506
507If the cube is being trimmed (\S\ref{sec-modify}), the resulting
508dimensions are printed to indicate how much has been trimmed. If a
509reconstruction is being done, a continually updating message shows the
510current iteration and scale (compared to the maximum scale).
511
512During the searching algorithms, the progress through the 1D and 2D
513searches are shown. When the searches have completed,
514the number of objects found in both the 1D and 2D searches are
515reported (see \S\ref{sec-detection} for details).
516
517In the merging process (where multiple detections of the same object
518are combined -- see \S\ref{sec-merger}), two stages of output
519occur. The first is when each object in the list is compared with all
520others. The output shows two numbers: the first being how far through
521the list we are, and the second being the length of the list. As the
522algorithm proceeds, the first number should increase and the second
523should decrease (as objects are being combined). When the numbers
524meet, the second phase begins, of removing multiply-appearing pixels
525in each object and removing objects not meeting the minimum channels
526requirement. During this phase, the total number of accepted objects
527is shown, which should steadily increase until all have been accepted
528or rejected. Note that these steps can be very quick for small numbers
529of detections.
530
531Since this continual printing to screen has some overhead of time and
532CPU involved, the user can elect to not print this information by
533setting the parameter {\tt verbose = 0}. In this case, the user is
534still informed as to the steps being undertaken, but the details of
535the progress are not shown.
536
537\subsection{Results}
538
539Finally, we get to the results -- the reason for running Duchamp in
540the first place. Once the detection list is finalised, it is sorted by
541the mean velocity of the detections (or, if there is no good WCS with
542the cube, by the mean Z-pixel position). The results are then
543printed to the screen and to the output file, denoted by the {\tt
544OutFile} parameter. The results list, an example of which can be seen
545in Appendix~\ref{app-output}, contains the following columns:
546
547\begin{entry}
548\item[Obj\#] The ID number of the detection (simply the sequential
549  count for the list, which is ordered by increasing velocity).
550\item[Name] The IAU-format name of the detection (based on the RA \&
551Dec).
552\item[X] The average X-pixel position.
553\item[Y] The average Y-pixel position.
554\item[Z] The average Z-pixel position.
555\item[RA] The Right Ascension of the centre of the object.
556\item[DEC] The Declination of the centre of the object.
557\item[w\_RA] The width of the object in Right Ascension [arcmin].
558\item[w\_DEC] The width of the object in Declination [arcmin].
559\item[VEL] The mean velocity of the object [km/s].
560\item[w\_VEL] The full velocity width of the detection (max channel
561  $-$ min channel, in velocity units [km/s]).
562\item[X1, X2] The minimum and maximum X-pixel coordinates.
563\item[Y1, Y2] The minimum and maximum Y-pixel coordinates.
564\item[Z1, Z2] The minimum and maximum Z-pixel coordinates.
565\item[Npix] The number of pixels \& channels (\ie distinct $(x,y,z)$
566  coordinates) in the detection.
567\item[F\_tot] The integrated flux over the object, in the units of the
568  FITS file. %This is calculated
569\item[F\_peak] The peak flux over the object, in the units of the FITS
570  file.
571\end{entry}
572If the WCS is not valid (\ie is not present or does not have all the
573necessary information), the Name, RA, DEC, VEL and related columns are not
574printed, but the pixel coordinates are still provided.
575
576\begin{figure}[t]
577\begin{center}
578\includegraphics[width=\textwidth]{example_spectrum}
579\end{center}
580\caption{\footnotesize An example of the spectrum output. Note several
581  of the features discussed in the text: the removal of the Milky Way
582  emission around 0 km/s; the red lines indicating the reconstructed
583  spectrum; the blue dashed lines indicating the spectral extent of
584  the detection; and the blue border showing its spatial extent on the
585  0th moment map.}
586\label{fig-spect}
587\end{figure}
588
589Two alternative results files can also be requested. One option is a
590VOTable-format XML file, containing just the RA, Dec, Velocity and the
591corresponding widths of the detections, as well as the fluxes. The
592user should set {\tt flagVOT = 1}, and put the desired filename in the
593parameter {\tt votFile} -- note that the default is for it not to be
594produced. This file should be compatible with all Virtual Observatory
595tools (such as Aladin\footnote{ Aladin can be found on the web at
596\href{http://aladin.u-strasbg.fr/}{http://aladin.u-strasbg.fr/}}). The
597second option is an annotation file for use with the Karma toolkit of
598visualisation tools (in particular, with {\tt kvis}). This will draw a
599circle at the position of each detection, and number it according to
600the Obj\# given above. To use, the user should set {\tt flagKarma = 1},
601and put the desired filename in the parameter {\tt karmaFile} -- again,
602the default is for it not to be produced.
603
604As the program is running, it also (optionally) records the detections
605made in each individual spectrum or channel (see
606\S\ref{sec-detection} for details on this process). This is
607recorded in the file denoted by the parameter {\tt LogFile}. This file
608does not include the columns {\tt Name, RA, DEC, w\_RA, w\_DEC, VEL,
609w\_VEL}. This file is designed primarily for diagnostic purposes: \eg
610to see if a given set of pixels is detected in, say, one channel
611image, but does not survive the merging process. The list of pixels
612(and their fluxes) in the final detection list are also printed to
613this file, again for diagnostic purposes. This feature can be turned
614off by setting {\tt flagLog = false}. (This may be a good idea if you
615are not interested in its contents, as it can be a large file.)
616
617\begin{figure}[!t]
618\begin{center}
619\includegraphics[width=\textwidth]{example_moment_map}
620\end{center}
621\caption{\footnotesize An example of the moment map created by
622  Duchamp. The full extent of the cube is covered, and the 0th moment
623  of each object is shown (integrated individually over all the
624  detected channels).}
625\label{fig-moment}
626\end{figure}
627
628As well as the output data file, a postscript file is created that
629shows the integrated spectra of each detection, together with a small
630cutout image (0th moment) and basic information of the detection. If
631the cube was reconstructed, the spectrum from the reconstruction is
632shown in red, over the top of the original spectrum. The spectral
633extent of the detection is indicated with green lines, and a zoom is
634shown in a separate window. The cutout image can optionally include a
635border around the spatial pixels that are in the detection (turned on
636and off by the parameter {\tt drawBorders}). It also includes a scale
637bar in the bottom left corner to indicate size -- it is 30~arcmin
638long. An example detection can be seen below in Fig.~\ref{fig-spect}.
639
640Finally, a couple of images are optionally produced: a 0th moment map
641of the cube, combining just the detected channels in each object,
642showing the integrated flux in grey-scale; and a ``detection image'',
643a grey-scale image where the pixel values are the number of channels
644that spatial pixel is detected in. In both cases, if {\tt drawBorders =
645true}, a border is drawn around the spatial extent of each
646detection. An example moment map is shown in Fig.~\ref{fig-moment}.
647The production or otherwise of these images is governed by the {\tt
648flagMaps} parameter.
649
650The purpose of these images are to provide a visual guide to where the
651detections have been made, and, particularly in the case of the moment
652map, to provide an indication of the strength of the source. In both
653cases, the detections are numbered (in the same way as the output
654list), and the spatial borders are marked out as for the cutout images
655in the spectra file. Both these images are saved as postscript files
656(given by the parameters {\tt momentMap} and {\tt detectionMap}
657respectively), with the latter also displayed in a {\sc pgplot}
658window (regardless of the state of {\tt flagMaps}).
659
660\section{Notes and hints on the use of Duchamp}
661
662In using Duchamp, the user has to make a number of decisions about
663the way the program runs. This section is designed to give the user
664some idea about what the various selections do...
665
666The main choice is whether or not to use the wavelet
667reconstruction. The main benefits of this are the marked reduction in
668the noise level, leading to regularly-shaped detections, and good
669reliability for faint sources. The main drawback with its use is the
670long execution time: to reconstruct a $170\times160\times1024$
671(\hipass) cube often requires three iterations and takes about 20-25
672minutes. The searching part of the procedure is much quicker (although
673see the note on merging, below), so if one uses the FDR method on the
674un-reconstructed cube, the execution time is only a couple of minutes.
675
676A further drawback with the reconstruction is that it is susceptible
677to edge effects. If the valid area in the cube (\ie the part that is
678not BLANK) has very curved edges (such as the \hipass\ polar cap cube,
679H001, which has a roughly circular shape after gridding), the
680convolution can produce artefacts in the reconstruction that mimic
681the edges and lead to some spurious sources. Caution is advised with
682such data -- the user is advised to check carefully the reconstructed
683cube for the presence of such artefacts.
684
685If one chooses the reconstruction method, a further decision is
686required on the signal-to-noise cutoff used in determining acceptable
687wavelet coefficients. A larger value will remove more noise from the
688cube, at the expense of losing fainter sources, while a smaller value
689will include more noise, which may produce spurious detections, but
690will be more sensitive to faint sources. Values of less than about
691$3\sigma$ tend to not reduce the noise a great deal and can lead to
692many spurious sources.
693
694The FDR method certainly produces more reliable results than a simple
695sigma-clipping (\ie thresholding at some number of $\sigma$ above the
696mean). However, at this point it does not seem to be giving the
697sensitivity expected for the supplied value of {\tt alpha} (\ie it is
698not finding as many sources as expected). Work is
699being done to assess this, and to judge whether there is a real
700problem (such as with the determination of the statistics), or simply
701a result of working in 3 dimensions as opposed to 2.
702
703A further point to bear in mind is that the shape of the detections in
704a cube that has been reconstructed will be much more regular and
705smooth -- the ragged edges that objects in the raw cube possess are
706smoothed by the removal of most of the noise.
707
708Finally, as Duchamp is still undergoing development, there are some
709elements that are not fully developed. In particular, it is not as
710clever as I would like at avoiding interference. The ability to place
711requirements on the minimum number of channels and pixels partially
712circumvents this problem, but work is being done to make Duchamp
713smarter at rejecting signals that are clearly (to a human eye at
714least) interference. See the following section for further
715improvements that are planned.
716
717%\section{Drawbacks of the current program}
718%
719%The program currently has a few problems/drawbacks/things to be aware
720%of that will hopefully be fixed in the future:
721%\begin{itemize}
722%
723%\item Narrow interference spikes are still getting found, particularly
724%  if there is no reconstruction, or reconstruction with a relatively
725%  low {\tt snrRecon} (such as 2 or 3). Increasing the {\tt
726%  minChannels} parameter is one way to circumvent this, but making the
727%  algorithm a bit more clever would be preferable.
728%
729%\item Sources that have strong continuum ripple and/or artefacts often
730%  generate many spurious detections. This needs some work to avoid
731%  Duchamp doing this, and until then users are advised to be aware
732%  of the possibility. Strong continuum ripples may generate many
733%  sources on the same spatial pixel, and this will be apparent on the
734%  detection images.
735%
736%\item Spectra are integrated over every spatial pixel of the
737%  detection, and this may dilute the actual detection, making it
738%  harder to see \ie the apparent strength of the line as plotted may
739%  not give a true indication of how strong it really is.
740%
741%%\item A caution on the merging part of the procedure. This can be time
742%%  consuming if there are many detections that do not require merging
743%%  -- in this case, the time will go like $N^2$ ($N$ = number of
744%%  detections). If there are plenty of mergers, the size of the list
745%%  reduces quickly, so the execution time will be less.
746%
747%
748%\end{itemize}
749
750
751%\section{Comparison with other software (to be developed further...)}
752%
753%\subsection{fred, by Matt Howlett}
754%
755%This is the program used in the \hipass\ analysis. It smoothes the
756%data spectrally with a boxcar filter of a size that varies over a
757%user-specified range, and then thresholds the data.
758%
759%Works effectively, but generally doesn't find as many sources as
760%Duchamp, particularly when the reconstruction is used. Sensitive to
761%faint, broad features that fall below the reconstruction threshold.
762%
763%Execution takes a long time, depending on the range of filter widths
764%that are used.
765%
766%\subsection{sfind}
767%
768%Hard to evaluate, as it does not (as far as I can see) output the
769%channel number at which detections are made, and does not merge
770%detections made at adjacent channels (\ie it just works in 2
771%dimensions).
772%
773
774\section{Future Developments}
775
776This is both a list of planned improvements and a wish-list of
777features that would be nice to include (but are not planned in the
778immediate future):
779
780\begin{itemize}
781
782\item Ability to invert cube to search for absorption features. {\bf
783  Planned.}
784
785\item More varied output formats. {\bf Planned.}
786
787\item Better determination of the noise characteristics of
788  spectral-line cubes, including understanding how the noise is
789  generated and developing a model for it. {\bf Planned.}
790 
791\item Include more source analysis. Examples could be: shape
792  information; measurements of HI mass; better measurements of
793  velocity width and profile... {\bf Some planned.}
794
795\item Provide some indication of the significance of the detection
796  (\ie some S/N-like value). {\bf Planned.}
797
798\item Improved ability to reject interference, possibly on the
799  spectral shape of features. {\bf Planned.}
800
801\item Link to lists of possible counterparts (\eg via NED/SIMBAD/other
802  VO tools?). {\bf Wishlist.}
803
804\item Add ability to read in a reconstructed cube that has been
805  saved. In this case the residual array will also need to be read
806  in. The idea of this will be to avoid the extended time required for
807  the reconstruction if the same cube is being analysed multiple
808  times. {\bf Wishlist.}
809 
810\item At this point, the ``Milky Way'' channels are discarded and set
811  to zero. It may be that users would like to have those put back in
812  the final cube after the source detection is done, so at some point
813  this option may be added. {\bf Wishlist -- if needed.}
814
815\end{itemize}
816
817
818%\bibliographystyle{mn2e}
819%\bibliographystyle{abbrvnat}
820%\bibliography{mnrasmnemonic,sourceDetection}
821\begin{thebibliography}{}
822
823\bibitem[\protect\citeauthoryear{{Calabretta} \& {Greisen}}{{Calabretta} \&
824  {Greisen}}{2002}]{calabretta02}
825{Calabretta} M.,  {Greisen} E.,  2002, A\&A, 395, 1077
826
827\bibitem[\protect\citeauthoryear{{Greisen} \& {Calabretta}}{{Greisen} \&
828  {Calabretta}}{2002}]{greisen02}
829{Greisen} E.,  {Calabretta} M.,  2002, A\&A, 395, 1061
830
831\bibitem[\protect\citeauthoryear{{Hanisch}, {Farris}, {Greisen}, {Pence},
832  {Schlesinger}, {Teuben}, {Thompson} \& {Warnock}}{{Hanisch}
833  et~al.}{2001}]{hanisch01}
834{Hanisch} R.,  {Farris} A.,  {Greisen} E.,  {Pence} W.,  {Schlesinger} B.,
835  {Teuben} P.,  {Thompson} R.,    {Warnock} A.,  2001, A\&A, 376, 359
836
837\bibitem[\protect\citeauthoryear{{Hopkins}, {Miller}, {Connolly}, {Genovese},
838  {Nichol} \& {Wasserman}}{{Hopkins} et~al.}{2002}]{hopkins02}
839{Hopkins} A.,  {Miller} C.,  {Connolly} A.,  {Genovese} C.,  {Nichol} R.,
840  {Wasserman} L.,  2002, AJ, 123, 1086
841
842\bibitem[\protect\citeauthoryear{Lutz}{Lutz}{1980}]{lutz80}
843Lutz R.,  1980, The Computer Journal, 23, 262
844
845\bibitem[\protect\citeauthoryear{{Meyer} et~al.,}{{Meyer}
846  et~al.}{2004}]{meyer04:trunc}
847{Meyer} M.,  et~al., 2004, MNRAS, 350, 1195
848
849\bibitem[\protect\citeauthoryear{{Miller}, {Genovese}, {Nichol}, {Wasserman},
850  {Connolly}, {Reichart}, {Hopkins}, {Schneider} \& {Moore}}{{Miller}
851  et~al.}{2001}]{miller01}
852{Miller} C.,  {Genovese} C.,  {Nichol} R.,  {Wasserman} L.,  {Connolly} A.,
853  {Reichart} D.,  {Hopkins} A.,  {Schneider} J.,    {Moore} A.,  2001, AJ, 122,
854  3492
855
856\bibitem[\protect\citeauthoryear{Minchin}{Minchin}{1999}]{minchin99}
857Minchin R.,  1999, PASA, 16, 12
858
859\bibitem[\protect\citeauthoryear{Starck \& Murtagh}{Starck \&
860  Murtagh}{2002}]{starck02:book}
861Starck J.-L.,  Murtagh F.,  2002, {``Astronomical Image and Data Analysis''}.
862Springer
863
864\end{thebibliography}
865
866
867\appendix
868\newpage
869\section{Available parameters}
870\label{app-param}
871
872The full list of parameters that can be listed in the input file are
873given here. If not listed, they take the default value given in
874parentheses. Since the order of the parameters in the input file does
875not matter, they are grouped here in logical sections.
876
877\subsection*{Input-output related}
878\begin{entry}
879\item[ImageFile (no default assumed)] The filename of the
880  data cube to be analysed.
881\item[OutFile {\tt [./duchamp-Results]}] The file where the final
882  detections are to be recorded. This also records the list of input
883  parameters.
884\item[SpectraFile {\tt [./duchamp-Spectra.ps]}] The postscript file
885  containing the resulting integrated spectra and images of the
886  detections.
887\item[flagLog {\tt [true]}] A flag to indicate whether intermediate
888  detections should be logged.
889\item[LogFile {\tt [./duchamp-Logfile]}] The file in which intermediate
890  detections are logged. These are detections that have not been
891  merged. This is primarily for use in debugging and diagnostic
892  purposes -- normal use of the program will probably not require
893  this.
894\item[flagSubsection {\tt [false]}] A flag to indicate whether one
895  wants a subsection of the requested image.
896\item[Subsection {\tt [ [*,*,*] ]}] The requested subsection, which
897  should be specified in the format {\tt [x1:x2,y1:y2,z1:z2]}, where
898  the limits are inclusive. If the full range of a dimension is
899  required, use a {\tt *}, \eg if you want the full spectral range of
900  a subsection of the image, use {\tt [30:140,30:140,*]}.
901\item[flagOutputRecon {\tt [false]}] A flag to say whether or not to
902  save the reconstructed cube as a FITS file. The filename will be
903  derived from the ImageFile -- the reconstruction of {\tt image.fits}
904  will be saved as {\tt image.RECON?.fits}, where {\tt ?} stands for
905  the value of {\tt snrRecon} (see below).
906\item[flagOutputResid {\tt [false]}] As for {\tt flagOutputRecon}, but
907  for the residual array -- the difference between the original cube
908  and the reconstructed cube. The filename will be {\tt
909  image.RESID?.fits}.
910\item[flagVOT {\tt [false]}] A flag to say whether to create a VOTable
911  file corresponding to the information in {\tt outfile}. This will be
912  an XML file in the Virtual Observatory VOTable format.
913\item[votFile {\tt [./duchamp-Results.xml]}] The VOTable file with the
914  list of final detections. Some input parameters are also recorded.
915\item[flagKarma {\tt [false]}] A flag to say whether to create a Karma
916  annotation file corresponding to the information in {\tt
917  outfile}. This can be used as an overlay for the Karma programs such
918  as {\tt kvis}.
919\item[karmaFile {\tt [./duchamp-Results.ann]}] The Karma annotation
920  file showing the list of final detections.
921\item[flagMaps {\tt [true]}] A flag to say whether to save postscript
922  files showing the 0th moment map of the whole cube (parameter {\tt
923  momentMap}) and the detection image ({\tt detectionMap}).
924\item[momentMap {\tt [./latest-moment-map.ps]}] A postscript file
925  containing a map of the 0th moment of the detected sources, as well
926  as pixel and WCS coordinates.
927\item[detectionMap {\tt [./latest-detection-map.ps]}] A postscript
928  file showing each of the detected objects, coloured in greyscale by
929  the number of channels they span. Also shows pixel and WCS
930  coordinates.
931\end{entry}
932
933\subsection*{Modifying the cube}
934\begin{entry}
935\item[flagBlankPix {\tt [true]}] A flag to say whether to remove BLANK
936  pixels from the analysis -- these are pixels set to some particular
937  value because they fall outside the imaged area.
938\item[blankPixValue {\tt [-8.00061]}] The value of the BLANK pixels,
939  if this information is not contained in the FITS header (the usual
940  procedure is to obtain this value from the header information -- in
941  which case the value set by this parameter is ignored).
942\item[flagMW {\tt [true]}] A flag to say whether to remove channels
943  contaminated by Milky Way (or other) emission -- the flux in these
944  channels is currently just set to 0.
945\item[maxMW {\tt [112]}] The maximum channel for the Milky Way
946  emission.
947\item[minMW {\tt [75]}] The minimum channel for the Milky Way
948  emission. Note that the channels specified by {\tt maxMW} and {\tt
949  minMW} are assumed to be Milky Way channels (\ie the range is
950  inclusive).
951\item[flagBaseline {\tt [false]}] A flag to say whether to remove the
952  baseline from each spectrum in the cube for the purposes of
953  reconstruction and detection.
954\end{entry}
955
956\subsection*{Detection related}
957
958\subsubsection*{General detection}
959\begin{entry}
960\item[snrCut {\tt [3.]}] The cut-off value for thresholding, in terms
961  of number of $\sigma$ above the mean.
962\item[flagGrowth {\tt [true]}] A flag indicating whether or not to
963  grow the detected objects to a smaller threshold.
964\item[growthCut {\tt [1.5]}] The smaller threshold using in growing
965  detections. In units of $\sigma$ above the mean.
966\end{entry}
967
968\subsubsection*{{\` a} trous reconstruction}
969\begin{entry}
970\item [flagATrous {\tt [true]}] A flag indicating whether or not to
971  reconstruct the cube using the {\it {\`a} trous} wavelet
972  reconstruction. Currently does this in 3-dimensions. See
973  \S\ref{sec-recon} for details.
974\item[scaleMin {\tt [1]}] The minimum wavelet scale to be used in the
975  reconstruction. A value of 1 means ``use all scales''.
976\item[snrRecon {\tt [4]}] The thresholding cutoff used in the
977  reconstruction -- only wavelet coefficients this many $\sigma$ above
978  the mean (or greater) are included in the reconstruction.
979\item[filterCode {\tt [2]}] The code number of the filter to use in
980  the reconstruction. The options are:
981  \begin{itemize}
982  \item {\bf 1:} B$_3$-spline filter: coefficients =
983    $(\frac{1}{16}, \frac{1}{4}, \frac{3}{8}, \frac{1}{4}, \frac{1}{16})$
984  \item {\bf 2:} Triangle filter: coefficients = $(\frac{1}{4}, \frac{1}{2}, \frac{1}{4})$
985  \item {\bf 3:} Haar wavelet: coefficients = $(0, \frac{1}{2}, \frac{1}{2})$
986  \end{itemize}
987\end{entry}
988
989\subsubsection*{FDR method}
990\begin{entry}
991\item[flagFDR {\tt [false]}] A flag indicating whether or not to use
992  the False Discovery Rate method in thresholding the pixels.
993\item[alphaFDR {\tt [0.01]}] The $\alpha$ parameter used in the FDR
994analysis. The average number of false detections, as a fraction of the
995total number, will be less than $\alpha$ (see \S\ref{sec-detection}).
996\end{entry}
997
998\subsubsection*{Merging detections}
999\begin{entry}
1000\item[flagAdjacent {\tt [true]}] A flag indicating whether to use the
1001  ``adjacent pixel'' criterion to decide whether to merge objects. If
1002  not, the next two parameters are used to determine whether objects
1003  are within the necessary thresholds.
1004\item[minPix {\tt [2]}] The minimum number of spatial pixels for a single
1005  detection to be counted.
1006\item[minChannels {\tt [3]}] The minimum number of consecutive
1007  channels that must be present in the detection for it to be accepted
1008  by the Merging algorithm.
1009%The minimum number of channels that a
1010%  detection must span for it to be accepted by the Merging algorithm.
1011\item[threshSpatial {\tt [3.]}] The maximum allowed minimum spatial
1012  separation (in pixels) between two detections for them to be merged
1013  into one. Only used if {\tt flagAdjacent = false}.
1014\item[threshVelocity {\tt [7.]}] The maximum allowed minimum channel
1015  separation between two detections for them to be merged into
1016  one. %Only used if {\tt flagAdjacent = false}.
1017\end{entry}
1018
1019\subsubsection*{Other parameters}
1020\begin{entry}
1021\item[drawBorders {\tt [true]}] A flag indicating whether borders
1022  are to be drawn around the detected objects in the moment maps
1023  included in the output (see for example Fig.~\ref{fig-spect}).
1024\item[verbose {\tt [true]}] A flag indicating whether to print the
1025  progress of computationally-intensive algorithms (such as the
1026  searching and merging) to screen.
1027\end{entry}
1028
1029
1030\newpage
1031\section{Example parameter files}
1032\label{app-input}
1033
1034This is what a typical parameter file would look like.
1035
1036\begin{verbatim}
1037imageFile       /DATA/SITAR_1/whi550/cubes/H201_abcde_luther_chop.fits
1038logFile         temp-Logfile
1039outFile         temp-Results
1040spectraFile     spectra.ps
1041flagSubsection  0
1042flagOutputRecon 0
1043flagOutputResid 0
1044flagBlankPix    1
1045blankPixValue   -8.00061
1046flagMW          1
1047minMW           75
1048maxMW           112
1049minPix          3
1050flagGrowth      1
1051growthCut       1.5
1052flagATrous      0
1053scaleMin        1
1054snrRecon        4
1055flagFDR         1
1056alphaFDR        0.1
1057numPixPSF       20
1058snrCut          3
1059threshSpatial   3
1060threshVelocity  7
1061minChannels     4
1062\end{verbatim}
1063
1064Note that it is not necessary to include all these parameters in the
1065file, only those that need to be changed from the defaults (as listed
1066in Appendix~\ref{app-param}), which in this case would be very few. A
1067minimal parameter file might look like:
1068\begin{verbatim}
1069imageFile       /DATA/SITAR_1/whi550/cubes/H201_abcde_luther_chop.fits
1070flagLog         0
1071snrRecon        3
1072snrCut          2.5
1073minChannels     3
1074\end{verbatim}
1075This will reconstruct the cube with a lower SNR value than the
1076default, select objects at a lower threshold,  with a looser minimum
1077channel requirement, and not keep a log of the intermediate
1078detections.
1079
1080The following page demonstrates how the parameters are presented to
1081the user, both on the screen at execution time and in the output and
1082log files:
1083\newpage
1084\begin{landscape}
1085Presentation of parameters in output and log files: 
1086\begin{verbatim}
1087---- Parameters ----
1088Image to be analysed                    = /DATA/SITAR_1/whi550/cubes/H201_abcde_luther_chop.fits
1089Intermediate Logfile                    = logfile.txt
1090Final Results file                      = results.txt
1091Spectrum file                           = spectra.ps
1092VOTable file                            = results.xml
10930th Moment Map                          = latest-moment-map.ps
1094Detection Map                           = latest-detection-map.ps
1095Saving reconstructed cube?              = false
1096Saving residuals from reconstruction?   = false
1097------
1098Fixing Blank Pixels?                    = true
1099Blank Pixel Value                       = -8.00061
1100Removing Milky Way channels?            = true
1101Milky Way Channels                      = 75-112
1102Beam Size (pixels)                      = 10.1788
1103Removing baselines before search?       = false
1104Minimum # Pixels in a detection         = 2
1105Growing objects after detection?        = false
1106Using A Trous reconstruction?           = true
1107Minimum scale in reconstruction         = 1
1108SNR Threshold within reconstruction     = 4
1109Filter being used for reconstruction    = B3 spline function
1110Using FDR analysis?                     = false
1111SNR Threshold                           = 2.5
1112Using Adjacent-pixel criterion?         = true
1113Min. # channels for merging             = 4
1114--------------------
1115\end{verbatim}
1116
1117\newpage
1118\section{Example output file}
1119\label{app-output}
1120This the typical content of an output file, after running Duchamp
1121with the parameters illustrated on the previous page.
1122
1123{\scriptsize
1124  \begin{verbatim}
1125Results of the Duchamp source finder: Tue Mar 21 16:28:50 EST 2006
1126---- Parameters ----
1127
1128(... omitted for clarity -- see previous page for examples...)
1129
1130--------------------
1131Total number of detections = 23
1132--------------------
1133 Obj#          Name     X     Y      Z           RA          DEC    w_RA   w_DEC       VEL    w_VEL  X1  X2  Y1  Y2   Z1   Z2  Npix     F_tot  F_peak
1134-----------------------------------------------------------------------------------------------------------------------------------------------------
1135    1    J0609-2200  59.4 140.6  114.7  06:09:38.50 -22:00:48.20   48.50   39.42   213.061   65.957  55  66 136 145  113  118   185   17.5725  0.2125
1136    2    J0608-2605  65.2  79.6  116.2  06:08:10.23 -26:05:06.57   44.47   39.47   233.119   39.574  60  70  76  85  115  118    50    4.1441  0.1002
1137    3    J0606-2724  70.8  59.8  121.4  06:06:33.08 -27:24:43.28   52.48   47.57   302.213   39.574  65  77  53  64  120  123   213   17.0659  0.1497
1138    4    J0611-2142  52.5 145.1  162.5  06:11:36.34 -21:42:00.01   32.40   23.47   843.727  118.722  49  56 142 147  158  167   303   44.3940  0.4103
1139    5    J0600-2903  89.7  35.3  202.4  06:00:51.38 -29:03:02.51   23.94   28.09  1370.285  184.679  87  92  32  38  195  209   319   26.5725  0.1729
1140    6    J0559-2643  95.5  70.2  222.6  05:59:10.59 -26:43:05.94   15.94   12.09  1637.316  105.531  94  97  69  71  219  227    35    1.9253  0.0630
1141    7    J0617-2727  34.8  58.3  227.5  06:17:24.45 -27:27:53.89   20.77   23.41  1701.802  303.400  33  37  56  61  215  238   176   11.4138  0.0929
1142    8    J0609-2145  60.3 144.4  229.6  06:09:23.17 -21:45:36.06   16.15   11.81  1729.279  105.531  59  62 143 145  225  233    25    1.4760  0.0679
1143    9    J0559-2529  95.7  88.6  231.1  05:59:08.81 -25:29:34.50   27.88   24.14  1749.440  250.635  92  98  86  91  220  239   257   16.9297  0.1155
1144   10    J0601-2145  88.9 144.4  232.3  06:01:10.14 -21:45:58.59   31.96   24.13  1764.657  224.253  86  93 142 147  222  239   415   34.0304  0.1655
1145   11    J0615-2638  40.0  70.8  232.6  06:15:44.32 -26:38:29.42   16.56   19.57  1769.033   52.765  38  41  69  73  231  235    44    2.7565  0.0685
1146   12    J0605-2610  75.9  78.4  233.1  06:05:00.16 -26:10:21.68   28.14   23.84  1775.066  224.253  73  79  76  81  225  242   352   27.0587  0.1545
1147   13    J0601-2344  88.0 114.9  235.7  06:01:25.72 -23:44:18.18   35.96   32.07  1809.749  263.826  84  92 112 119  226  246   724   85.1317  0.2968
1148   14    J0615-2238  38.2 130.6  253.6  06:15:48.32 -22:38:45.75   12.39   15.70  2046.530  118.722  37  39 129 132  248  257    40    2.3169  0.0697
1149   15    J0617-2309  31.4 122.8  258.0  06:17:51.07 -23:09:29.22   16.46   15.53  2103.912   39.574  30  33 121 124  256  259    23    1.4243  0.0624
1150   16    J0612-2153  49.5 142.3  271.1  06:12:29.32 -21:53:16.05   24.36   19.56  2276.976  395.740  47  52 140 144  257  287   318   20.7117  0.1008
1151   17    J0616-2137  35.2 145.9  300.0  06:16:34.07 -21:37:30.95   20.22    7.46  2658.607  224.252  33  37 145 146  294  311    40    3.8507  0.1271
1152   18    J0544-2740 144.0  54.9  325.4  05:44:31.07 -27:40:42.30    3.58   12.13  2993.384   39.574 144 144  54  56  324  327     7    0.4362  0.0569
1153   19    J0555-3000 107.2  20.7  367.5  05:55:28.58 -30:00:46.76   19.67   24.31  3547.812   39.574 105 109  18  23  366  369    72    6.4819  0.1692
1154   20    J0559-2325  96.0 119.6  532.1  05:59:04.98 -23:25:19.04   11.92   16.08  5720.289   52.765  95  97 118 121  530  534    27    1.2865  0.0508
1155   21    J0616-2653  37.9  67.0  547.0  06:16:23.08 -26:53:11.73   12.36   11.67  5916.731   39.574  37  39  66  68  546  549    25    1.6374  0.0642
1156   22    J0619-2256  25.1 125.9  724.2  06:19:39.49 -22:56:06.15   12.38   11.60  8254.112   39.573  24  26 125 127  723  726    13    0.6982  0.0593
1157   23    J0552-2920 116.9  30.5  727.0  05:52:33.03 -29:20:54.43   11.60   20.25  8290.842  303.400 116 118  28  32  716  739   132   35.8343  0.4787
1158  \end{verbatim}
1159}
1160
1161%\end{landscape}
1162
1163\newpage
1164\section{Example VOTable output}
1165\label{app-votable}
1166This is part of the VOTable, in XML format, corresponding to the
1167output file in Appendix~\ref{app-output} (the indentation has been removed to make it fit on the page!).
1168
1169%\begin{landscape}
1170{\scriptsize
1171  \begin{verbatim}
1172<?xml version="1.0"?>
1173<VOTABLE version="1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
1174 xsi:noNamespaceSchemaLocation="http://www.ivoa.net/xml/VOTable/VOTable/v1.1">
1175<COOSYS ID="J2000" equinox="J2000." epoch="J2000." system="eq_FK5"/>
1176<RESOURCE name="Duchamp Output">
1177<TABLE name="Detections">
1178<DESCRIPTION>Detected sources and parameters from running the Duchamp source finder.</DESCRIPTION>
1179<PARAM name="FITS file" datatype="char" ucd="meta.file;meta.fits" value="/DATA/SITAR_1/whi550/cubes/H201_abcde_luther_chop.fits"/>
1180<FIELD name="ID" ID="col1" ucd="meta.id" datatype="int" width="4"/>
1181<FIELD name="Name" ID="col2" ucd="meta.id;meta.main" datatype="char" arraysize="14"/>
1182<FIELD name="RA" ID="col3" ucd="pos.eq.ra;meta.main" ref="J2000" datatype="float" width="10" precision="6" unit="deg"/>
1183<FIELD name="Dec" ID="col4" ucd="pos.eq.dec;meta.main" ref="J2000" datatype="float" width="10" precision="6" unit="deg"/>
1184<FIELD name="w_RA" ID="col3" ucd="phys.angSize;pos.eq.ra" ref="J2000" datatype="float" width="7" precision="2" unit="arcmin"/>
1185<FIELD name="w_Dec" ID="col4" ucd="phys.angSize;pos.eq.dec" ref="J2000" datatype="float" width="7" precision="2" unit="arcmin"/>
1186<FIELD name="Vel" ID="col4" ucd="phys.veloc;src.dopplerVeloc" datatype="float" width="9" precision="3" unit="km/s"/>
1187<FIELD name="w_Vel" ID="col4" ucd="phys.veloc;src.dopplerVeloc;spect.line.width" datatype="float" width="8" precision="3" unit="km/s"/>
1188<FIELD name="Integrated_Flux" ID="col4" ucd="phys.flux;spect.line.intensity" datatype="float" width="10" precision="3" unit="km/s"/>
1189<DATA>
1190<TABLEDATA>
1191<TR>
1192<TD>   1</TD><TD>    J0609-2200</TD><TD> 92.410416</TD><TD>-22.013390</TD><TD>  48.50</TD><TD>  39.42</TD><TD>  213.061</TD><TD>  65.957</TD><TD>    17.572</TD>
1193</TR>
1194<TR>
1195<TD>   2</TD><TD>    J0608-2605</TD><TD> 92.042633</TD><TD>-26.085157</TD><TD>  44.47</TD><TD>  39.47</TD><TD>  233.119</TD><TD>  39.574</TD><TD>     4.144</TD>
1196</TR>
1197<TR>
1198<TD>   3</TD><TD>    J0606-2724</TD><TD> 91.637840</TD><TD>-27.412022</TD><TD>  52.48</TD><TD>  47.57</TD><TD>  302.213</TD><TD>  39.574</TD><TD>    17.066</TD>
1199</TR>
1200<TR>
1201<TD>   4</TD><TD>    J0611-2142</TD><TD> 92.901421</TD><TD>-21.700003</TD><TD>  32.40</TD><TD>  23.47</TD><TD>  843.727</TD><TD> 118.722</TD><TD>    44.394</TD>
1202</TR>
1203<TR>
1204<TD>   5</TD><TD>    J0600-2903</TD><TD> 90.214081</TD><TD>-29.050697</TD><TD>  23.94</TD><TD>  28.09</TD><TD> 1370.285</TD><TD> 184.679</TD><TD>    26.573</TD>
1205</TR>
1206(... table truncated for clarity ...)
1207</TABLEDATA>
1208</DATA>
1209</TABLE>
1210</RESOURCE>
1211</VOTABLE>
1212  \end{verbatim}
1213}
1214\end{landscape}
1215
1216\section{Robust statistics for a Normal distribution}
1217\label{app-madfm}
1218
1219The Normal, or Gaussian, distribution for mean $\mu$ and standard
1220deviation $\sigma$ can be written as
1221\[
1222f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}\ e^{-(x-\mu)^2/2\sigma^2}.
1223 \]
1224
1225When one has a purely Gaussian signal, it is straightforward to
1226estimate $\sigma$ by calculating the standard deviation (or rms) of
1227the data. However, if there is a small amount of signal present on top
1228of Gaussian noise, and one wants to estimate the $\sigma$ for the
1229noise, the presence of the large values from the signal can bias the
1230estimator to higher values.
1231
1232An alternative way is to use the median ($m$) and median absolute deviation
1233from the median ($s$) to estimate $\mu$ and $\sigma$. The median is the
1234middle of the distribution, defined for a continuous distribution by
1235\[
1236\int_{-\infty}^{m} f(x) \diff x = \int_{m}^{\infty} f(x) \diff x.
1237\]
1238From symmetry, we quickly see that for the continuous Normal
1239distribution, $m=\mu$. We consider the case henceforth of $\mu=0$,
1240without loss of generality.
1241
1242To find $s$, we find the distribution of the absolute deviation from
1243the median, and then find the median of that distribution. This
1244distribution is given by
1245\begin{eqnarray*}
1246g(x) &= &{\mbox{\rm distribution of }} |x|\\
1247     &= &f(x) + f(-x),\ x\ge0\\
1248     &= &\sqrt{\frac{2}{\pi\sigma^2}}\, e^{-x^2/2\sigma^2},\ x\ge0.
1249\end{eqnarray*}
1250So, the median absolute deviation from the median, $s$, is given by
1251\[
1252\int_{0}^{s} g(x) \diff x = \int_{s}^{\infty} g(x) \diff x.
1253\]
1254Now, $\int_{0}^{\infty}e^{-x^2/2\sigma^2} \diff x = \sqrt{\pi\sigma^2/2}$, and
1255so $\int_{s}^{\infty} e^{-x^2/2\sigma^2} \diff x =
1256\sqrt{\pi\sigma^2/2} - \int_{0}^{s} e^{-\frac{x^2}{2\sigma^2}} \diff x
1257$. Hence, to find $s$ we simply solve the following equation (setting $\sigma=1$ for
1258simplicity -- equivalent to stating $x$ and $s$ in units of $\sigma$):
1259\[
1260\int_{0}^{s}e^{-x^2/2} \diff x - \sqrt{\pi/8} = 0.
1261\]
1262This is hard to solve analytically (no nice analytic solution exists
1263for the finite integral that I'm aware of), but straightforward to
1264solve numerically, yielding the value of $s=0.6744888$. Thus, to
1265estimate $\sigma$ for a Normally distributed data set, one can calculate
1266$s$, then divide by 0.6744888 (or multiply by 1.4826042) to obtain the
1267correct estimator.
1268
1269Note that this is different to solutions quoted elsewhere,
1270specifically in \citet{meyer04:trunc}, where the same robust estimator
1271is used but with an incorrect conversion to standard deviation -- they
1272assume $\sigma = s\sqrt{\pi/2}$. This, in fact, is the conversion used
1273to convert the {\it mean} absolute deviation from the mean to the
1274standard deviation. This means that the cube noise in the \hipass\
1275catalogue (parameter Rms$_{\rm cube}$) should be 18\% larger than
1276quoted.
1277
1278%a different value for $s/\sigma$ (of
1279%$\sqrt{2/\pi}\approx0.797885$) is quoted. It should thus be noted that
1280%this means the values quoted by \citet{meyer04:trunc} for the cube noise in
1281%the \hipass\ catalogue should be 18\% larger (since 0.1486042 is 18\%
1282%larger than $\sqrt{\pi/2}\approx1.253314$).
1283
1284\section{How Gaussian noise changes with wavelet scale.}
1285\label{app-scaling}
1286
1287The key element in the wavelet reconstruction of an array is the
1288thresholding of the individual wavelet coefficient arrays. This is
1289usually done by choosing a level to be some number of standard
1290deviations above the mean value.
1291
1292However, since the wavelet arrays are produced by convolving the input
1293array by an increasingly large filter, the pixels in the coefficient
1294arrays become increasingly correlated as the scale of the filter
1295increases. This results in the measured standard deviation from a
1296given coefficient array decreasing with increasing scale. To calculate
1297this, we need to take into account how many other pixels each pixel in
1298the convolved array depends on.
1299
1300To demonstrate, suppose we have a 1-D array with $N$ pixel values
1301given by $F_i,\ i=1,...,N$, and we convolve it with the B$_3$-spline
1302filter with coefficients $\{1/16,1/4,3/8,1/4,1/16\}$. The flux of the
1303$i$th pixel in the convolved array will be
1304\[
1305F'_i = \frac{1}{16}F_{i-2} + \frac{1}{16}F_{i-2} + \frac{3}{8}F_{i}
1306+ \frac{1}{4}F_{i-1} + \frac{1}{16}F_{i+2}
1307\]
1308and the flux of the corresponding pixel in the wavelet array will be
1309\[
1310W'_i = F_i - F'_i = \frac{1}{16}F_{i-2} + \frac{1}{16}F_{i-2} + \frac{5}{8}F_{i}
1311+ \frac{1}{4}F_{i-1} + \frac{1}{16}F_{i+2}
1312\]
1313Now, assuming each pixel has the same standard deviation
1314$\sigma_i=\sigma$, we can work out the standard deviation for the
1315coefficient array:
1316\[
1317\sigma'_i = \sigma \sqrt{\left(\frac{1}{16}\right)^2 + \left(\frac{1}{4}\right)^2
1318  + \left(\frac{5}{8}\right)^2 + \left(\frac{1}{4}\right)^2 + \left(\frac{1}{16}\right)^2}
1319          = 0.72349\ \sigma
1320\]
1321Thus, the first scale wavelet coefficient array will have a standard
1322deviation of 72.3\% of the input array. This procedure can be followed
1323to calculate the necessary values for all scales, dimensions and
1324filters used by Duchamp.
1325
1326Calculating these values is, therefore, a critical step in performing
1327the reconstruction. \citet{starck02:book} did so by simulating data sets
1328with Gaussian noise, taking the wavelet transform, and measuring the
1329value of $\sigma$ for each scale. We take a different approach, by
1330calculating the scaling factors directly from the filter coefficients
1331by taking the wavelet transform of an array made up of a 1 in the
1332central pixel and 0s everywhere else. The scaling value is then
1333derived by adding in quadrature all the wavelet coefficient values at
1334each scale. We give the scaling factors for the three filters
1335available to Duchamp on the following page. These values are
1336hard-coded into Duchamp, so no on-the-fly calculation of them is
1337necessary.
1338
1339Memory limitations prevent us from calculating factors for large
1340scales, particularly for the three-dimensional case (hence the --
1341symbols in the tables). To calculate factors for
1342higher scales than those available, we note the following
1343relationships apply for large scales to a sufficient level of precision:
1344\begin{itemize}
1345\item 1-D: factor(scale $i$) = factor(scale $i-1$)$/\sqrt{2}$.
1346\item 2-D: factor(scale $i$) = factor(scale $i-1$)$/2$.
1347\item 1-D: factor(scale $i$) = factor(scale $i-1$)$/\sqrt{8}$.
1348\end{itemize}
1349
1350\newpage
1351\begin{itemize}
1352\item {\bf B$_3$-Spline Function:} $\{1/16,1/4,3/8,1/4,1/16\}$
1353
1354\begin{tabular}{llll}
1355Scale & 1 dimension      & 2 dimension     & 3 dimension\\ \hline
13561     & 0.723489806      & 0.890796310     & 0.956543592\\
13572     & 0.285450405      & 0.200663851     & 0.120336499\\
13583     & 0.177947535      & 0.0855075048    & 0.0349500154\\
13594     & 0.122223156      & 0.0412474444    & 0.0118164242\\
13605     & 0.0858113122     & 0.0204249666    & 0.00413233507\\
13616     & 0.0605703043     & 0.0101897592    & 0.00145703714\\
13627     & 0.0428107206     & 0.00509204670   & 0.000514791120\\
13638     & 0.0302684024     & 0.00254566946   & --\\
13649     & 0.0214024008     & 0.00127279050   & --\\
136510    & 0.0151336781     & 0.000636389722  & --\\
136611    & 0.0107011079     & 0.000318194170  & --\\
136712    & 0.00756682272    & --              & --\\
136813    & 0.00535055108    & --              & --\\
1369%14    & 0.00378341085   & --              & --\\
1370%15    & 0.00267527545   & --              & --\\
1371%16    & 0.00189170541   & --              & --\\
1372%17    & 0.00133763772   & --              & --\\
1373%18    & 0.000945852704   & --             & --
1374\end{tabular}
1375
1376\item {\bf Triangle Function:} $\{1/4,1/2,1/4\}$
1377
1378\begin{tabular}{llll}
1379Scale & 1 dimension      & 2 dimension     & 3 dimension\\ \hline
13801     & 0.612372436      & 0.800390530     & 0.895954449  \\
13812     & 0.330718914      & 0.272878894     & 0.192033014\\
13823     & 0.211947812      & 0.119779282     & 0.0576484078\\
13834     & 0.145740298      & 0.0577664785    & 0.0194912393\\
13845     & 0.102310944      & 0.0286163283    & 0.00681278387\\
13856     & 0.0722128185     & 0.0142747506    & 0.00240175885\\
13867     & 0.0510388224     & 0.00713319703   & 0.000848538128 \\
13878     & 0.0360857673     & 0.00356607618   & 0.000299949455 \\
13889     & 0.0255157615     & 0.00178297280   & -- \\
138910    & 0.0180422389     & 0.000891478237  & --  \\
139011    & 0.0127577667     & 0.000445738098  & --  \\
139112    & 0.00902109930    & 0.000222868922  & --  \\
139213    & 0.00637887978    & --              & -- \\
1393%14   & 0.00451054902    & --              & -- \\
1394%15   & 0.00318942978    & --              & -- \\
1395%16   & 0.00225527449    & --              & -- \\
1396%17   & 0.00159471988    & --              & -- \\
1397%18   & 0.000112763724   & --              & --
1398
1399\end{tabular}
1400
1401\item {\bf Haar Wavelet:} $\{0,1/2,1/2\}$
1402
1403\begin{tabular}{llll}
1404Scale & 1 dimension      & 2 dimension     & 3 dimension\\ \hline
14051     & 0.707167810      & 0.433012702     & 0.935414347 \\
14062     & 0.500000000      & 0.216506351     & 0.330718914\\
14073     & 0.353553391      & 0.108253175     & 0.116926793\\
14084     & 0.250000000      & 0.0541265877    & 0.0413398642\\
14095     & 0.176776695      & 0.0270632939    & 0.0146158492\\
14106     & 0.125000000      & 0.0135316469    & 0.00516748303
1411
1412\end{tabular}
1413
1414
1415\end{itemize}
1416
1417\end{document}
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
Note: See TracBrowser for help on using the repository browser.