Context Navigation

source: tags/release-0.9.2/docs/Guide.tex @ 1455

Visit:

Last change on this file since 1455 was 88, checked in by Matthew Whiting, 18 years ago
Some minor fixes to the spectral plots to aid readability and consistency of presentation. Some minor edits to the Guide.
File size: 75.8 KB

Line
1	\documentclass[12pt,a4paper]{article}
2
3	%Define a test for doing PDF format -- use different code below
4	\newif\ifPDF
5	\ifx\pdfoutput\undefined\PDFfalse
6	\else\ifnum\pdfoutput > 0\PDFtrue
7	\else\PDFfalse
8	\fi
9	\fi
10
11	\textwidth=161 mm
12	\textheight=245 mm
13	\topmargin=-15 mm
14	\oddsidemargin=0 mm
15	\parindent=6 mm
16
17	\usepackage[sort]{natbib}
18	\usepackage{lscape}
19	\bibpunct[,]{(}{)}{;}{a}{}{,}
20
21	\newcommand{\eg}{e.g.\ }
22	\newcommand{\ie}{i.e.\ }
23	\newcommand{\hi}{H{\sc i}}
24	\newcommand{\hipass}{{\sc hipass}}
25	\newcommand{\progname}{{\tt Duchamp}}
26	\newcommand{\diff}{{\rm d}}
27	\newcommand{\entrylabel}[1]{\mbox{\textsf{\bf{#1:}}}\hfil}
28	\newenvironment{entry}
29	{\begin{list}{}%
30	{\renewcommand{\makelabel}{\entrylabel}%
31	\setlength{\labelwidth}{30mm}%
32	\setlength{\labelsep}{5pt}%
33	\setlength{\itemsep}{2pt}%
34	\setlength{\parsep}{2pt}%
35	\setlength{\leftmargin}{35mm}%
36	}%
37	}%
38	{\end{list}}
39
40
41	\title{A Guide to the {\it Duchamp} Source Finding Software}
42	\author{Matthew Whiting\\
43	%{\small \href{mailto:Matthew.Whiting@csiro.au}{Matthew.Whiting@csiro.au}}\\
44	Australia Telescope National Facility\\CSIRO}
45	%\date{January 2006}
46	\date{}
47
48	% If we are creating a PDF, use different options for graphicx, hyperref.
49	\ifPDF
50	\usepackage[pdftex]{graphicx,color}
51	\usepackage[pdftex]{hyperref}
52	\hypersetup{colorlinks=true,%
53	citecolor=red,%
54	filecolor=red,%
55	linkcolor=red,%
56	urlcolor=red,%
57	}
58	\else
59	\usepackage[dvips]{graphicx}
60	\usepackage[dvips]{hyperref}
61	\fi
62
63	\pagestyle{headings}
64	\begin{document}
65
66	\maketitle
67	\thispagestyle{empty}
68	\begin{figure}[!h]
69	\begin{center}
70	\includegraphics[width=\textwidth]{cover_image}
71	\end{center}
72	\end{figure}
73
74	\newpage
75	\tableofcontents
76
77	\newpage
78	\section{Introduction and getting going quickly}
79
80	This document gives details on the use of the program Duchamp. This
81	has been designed to provide a source-detection facility for
82	spectral-line data cubes. The basic execution of Duchamp is to read
83	in a FITS data cube, find sources in the cube, and produce a text
84	file of positions, velocities and fluxes of the detections, as well as
85	a postscript file of the spectra of each detection.
86
87	So, you have a FITS cube, and you want to find the sources in it. What
88	do you do? The first step is to make an input file that contains the
89	list of parameters. Brief and detailed examples are shown in
90	Appendix~\ref{app-input}. This provides the input file name, the various
91	output files, and defines various parameters that control the
92	execution.
93
94	The standard way to run Duchamp is by the command
95	\begin{quote}
96	{\tt Duchamp -p [parameter file]}
97	\end{quote}
98	replacing {\tt [parameter file]} with the name of the file you have
99	just created/copied. Alternatively, you can use the syntax
100	\begin{quote}
101	{\tt Duchamp -f [FITS file]}
102	\end{quote}
103	where {\tt [FITS file]} is the file you wish to search. In the latter
104	case, the rest of the parameters will take their default values
105	detailed in Appendix~\ref{app-param}. In either case, the program will
106	then work away and give you the list of detections and their
107	spectra. The program execution is summarised below, and detailed in
108	\S\ref{sec-flow}. Information on inputs is in \S\ref{sec-param} and
109	Appendix~\ref{app-param}, and descriptions of the output is in
110	\S\ref{sec-output}.
111
112	\subsection{A summary of the execution steps}
113
114	The basic flow of the program is summarised here. All these steps are
115	discussed in more detail in the following sections, so read on if
116	you have questions!
117	\begin{enumerate}
118	\item The parameter file given on the command line is read in, and the
119	parameters absorbed.
120	\item From the parameter file, the FITS image is located and read in
121	to memory.
122	\item If requested, a FITS image with a previously reconstructed array
123	is read in.
124	\item If requested, blank pixels are trimmed from the edges, and
125	channels corresponding to bright (\eg Galactic) emission are
126	excised.
127	\item If requested, the baseline of each spectrum is removed.
128	\item If the reconstruction method is requested, and the reconstructed
129	array has not been read in at Step 3 above, the cube is
130	reconstructed using the {\it {\' a} trous} wavelet method.
131	\item Searching for objects then takes place, using the requested
132	thresholding method.
133	\item The list of objects is trimmed by merging neighbouring objects
134	and removing those deemed unacceptable.
135	\item The baselines and trimmed pixels are replaced prior to output.
136	\item The details on the detections are written to screen and to the
137	requested output file.
138	\item Maps showing the spatial location of the detections are written.
139	\item The integrated spectra of each detection are written to a
140	postscript file.
141	\item If requested, the reconstructed array can be written to a new
142	FITS file.
143	\end{enumerate}
144
145	\subsection{Guide to terminology}
146
147	First, a brief note on the use of terminology in this guide. Duchamp
148	is designed to work on FITS ``cubes''. These are FITS\footnote{FITS is
149	the Flexible Image Transport System -- see \citet{hanisch01} or
150	websites such as
151	\href{http://fits.cv.nrao.edu/FITS.html}{http://fits.cv.nrao.edu/FITS.html}
152	for details.} image arrays with three dimensions -- they are assumed
153	to have the following form: the first two dimensions (referred to as
154	$x$ and $y$) are spatial directions (that is, relating to the position
155	on the sky), while the third dimension, $z$, is the spectral
156	direction, which can correspond to frequency, wavelength, or velocity.
157
158	Each spatial pixel (a given $(x,y)$ coordinate) can be said to be a
159	single spectrum, while a slice through the cube perpendicular to the
160	spectral direction at a given $z$-value is a single channel (the 2-D
161	image is a channel map).
162
163	Features that are detected are assumed to be positive. The user can
164	choose to search for negative features by setting an input parameter
165	-- this inverts the cube prior to the search (see
166	\S~\ref{sec-detection} for details).
167
168	Note that it is possible to run Duchamp on a two-dimensional image
169	(\ie one with no frequency or velocity information), or indeed a
170	one-dimensional array, and many of the features of the program will
171	work fine. The focus, however, is on object detection in three
172	dimensions.
173
174	\subsection{Why ``Duchamp''?}
175
176	Well, it's important for a program to have a name, and it certainly
177	beats the initial working title of ``cubefind''. I had planned to call
178	it ``Picasso'' (as in the father of cubism), but sadly this had
179	already been used before \citep{minchin99}. So I settled on naming it
180	after Marcel Duchamp, another cubist, but also one of the first
181	artists to work with ``found objects''.
182
183	\section{User Inputs}
184	\label{sec-param}
185
186	Input to the program is provided by means of a parameter file. Parameters
187	are listed in the file, followed by the value that should be assigned
188	to them. The syntax used is {\tt paramName value}. The file is not
189	case-sensitive, and lines in the input file that start with {\tt \#} are
190	ignored. If a parameter is listed more than once, the latter value is
191	used, but otherwise the order in which the parameters are listed in the
192	input file is arbitrary.
193
194	If a parameter is not listed, the default value is assumed. The
195	defaults are chosen to provide a good result (using the reconstruction
196	method), so the user doesn't need to specify many new parameters in
197	the input file. Note that the image file {\bf must} be specified! The
198	parameters that can be set are listed in Appendix~\ref{app-param},
199	with their default values in parentheses.
200
201	The 'flag' parameters are stored as {\tt bool} variables, and so are
202	either {\tt true = 1} or {\tt false = 0}. Currently the program only
203	reads them from the file as integers, and so they should be entered in
204	the file as 0 or 1 (see example file in Appendix~\ref{app-input}).
205
206	\section{What the program is doing}
207	\label{sec-flow}
208
209	The execution flow of the program is detailed here, indicating the
210	main algorithmic steps that are used. The program is written in C/C++
211	and makes use of the {\sc cfitsio}, {\sc wcslib} and {\sc pgplot}
212	libraries.
213
214	%\subsection{Parameter input}
215	%
216	%The user provides parameters that govern the selection of files and
217	%the parameters used by the various subroutines in the program. This is
218	%done via a parameter file, and the parameters are stored in a C++
219	%class for use throughout the program. The form of the parameter file is
220	%discussed in \S\ref{sec-param}, and the parameters themselves are
221	%listed in Appendix~\ref{app-param}.
222
223	\subsection{Image input}
224
225	The cube is read in using basic {\sc cfitsio} commands, and stored as
226	an array in a special C++ class structure. This class keeps track of
227	the list of detected objects, as well as any reconstructed arrays that
228	are made (see \S\ref{sec-recon}). The World Coordinate System (WCS)
229	information for the cube is also obtained from the FITS header by {\sc
230	wcslib} functions \citep{greisen02, calabretta02}, and this
231	information, in the form of a {\tt wcsprm} structure, is also stored
232	in the same class.
233
234	A sub-section of an image can be requested via the {\tt subsection}
235	parameter in the parameter file -- this can be a good idea if the cube
236	has very noisy edges, which may produce many spurious detections. The
237	generalised form of the subsection that is used by {\sc cfitsio} is
238	{\tt [x1:x2:dx,y1:y2:dy,z1:z2:dz]}, such that the x-coordinates run
239	from {\tt x1} to {\tt x2} (inclusive), with steps of {\tt dx}. The
240	step value can be omitted (so a subsection of the form {\tt
241	[2:50,2:50,10:1000]} is still valid). Duchamp does not at this stage
242	deal with the presence of steps in the subsection string, and any that
243	are present are removed before the file is opened.
244
245	If one wants the full range of a coordinate then replace the range
246	with an asterisk, \eg {\tt [2:50,2:50,*]}. If one wants to use just a
247	subsection, one must set {\tt flagSubsection = 1}. A complete
248	description of the section syntax can be found at the {\sc fitsio} web
249	site
250	\footnote{
251	\href{http://heasarc.gsfc.nasa.gov/docs/software/fitsio/c/c\_user/node90.html}%
252	{http://heasarc.gsfc.nasa.gov/docs/software/fitsio/c/c\_user/node90.html}}.
253
254	\subsection{Image modification}
255	\label{sec-modify}
256
257	Several modifications to the cube can be made that improve the
258	execution and efficiency of Duchamp (these are optional -- their
259	use is indicated by the relevant flags set in the input parameter
260	file).
261
262	\subsubsection{Milky-Way removal}
263
264	First, a single set of contiguous channels can be removed -- these may
265	exhibit very strong emission, such as that from the Milky Way as seen
266	in extragalactic \hi\ cubes (hence the references to ``Milky Way'' in
267	relation to this task -- apologies to Galactic astronomers!). Such
268	dominant channels will both produce many unnecessary, uninteresting
269	and large (in size and hence in memory usage) detections, and will
270	also affect any reconstruction that is performed (see next
271	section). The use of this feature is controlled by the {\tt flagMW}
272	parameter, and the exact channels concerned are able to be set by the
273	user (using {\tt maxMW} and {\tt minMW}). When employed, the flux in
274	these channels is set to zero. The information in those channels is
275	not kept.
276
277	\subsubsection{Blank pixel removal}
278
279	Second, the cube is trimmed of any BLANK pixels that pad the image out
280	to a rectangular shape. This is also optional, being determined by the
281	{\tt flagBlankPix} parameter. The value for these pixels is read from
282	the FITS header (using the BLANK, BSCALE and BZERO keywords), but if
283	these are not present then the value can be specified by the user in
284	the parameter file. If these blank pixels are stored as NaNs, then a
285	normal number will be substituted (allowing these pixels to be
286	accurately removed without adverse effects). [NOTE: this appears not
287	to be working correctly at time of writing. If your data has
288	unspecified BLANKs, be wary, or use the subsectioning option to trim
289	the BLANKs.]
290
291	This stage is particularly important for the reconstruction step, as
292	lots of BLANK pixels on the edges will smooth out features in the
293	wavelet calculation stage. The trimming will also reduce the size of
294	the cube's array, speeding up the execution. The amount of trimming is
295	recorded, and these pixels are added back in once the source-detection
296	is completed (so that quoted pixel positions are applicable to the
297	original cube).
298
299	Rows and columns are trimmed one at a time until the first non-BLANK
300	pixel is reached, so that the image remains rectangular. In practice,
301	this means that there will be BLANK pixels left in the trimmed image
302	(if the non-BLANK region is non-rectangular). However, these are
303	ignored in all further calculations done on the cube.
304
305	\subsubsection{Baseline removal}
306
307	Finally, the user may request the removal of baselines from the
308	spectra, via the parameter {\tt flagBaseline}. This may be necessary
309	if there is a strong baseline ripple present, which can result in
310	spurious detections on the high points of the ripple. The baseline is
311	calculated from a wavelet reconstruction procedure (see
312	\S\ref{sec-recon}) that keeps only the two largest scales. This is
313	done separately for each spatial pixel (\ie for each spectrum in the
314	cube), and the baselines are stored and added back in before any
315	output is done. In this way the quoted fluxes and displayed spectra
316	are as one would see from the input cube itself -- even though the
317	detection (and reconstruction if applicable) is done on the
318	baseline-removed cube.
319
320	The presence of very strong signals (for instance, masers at several
321	hundred Jy) can affect the determination of the baseline, leading to a
322	large dip centred on the signal in the baseline-subtracted
323	spectrum. To prevent this, the signal is trimmed prior to the
324	reconstruction process at some standard threshold (at $8\sigma$ above
325	the mean). The baseline determined should thus be representative of
326	the true, signal-free baseline. Note that this trimming is only a
327	temporary measure which does not affect the source-detection.
328
329	\subsection{Image reconstruction}
330	\label{sec-recon}
331
332	This is an optional step, but one that greatly enhances the
333	source-detection process. The user can direct Duchamp to reconstruct
334	the data cube using the {\it {\`a} trous} wavelet procedure. A good
335	description of the procedure can be found in
336	\citet{starck02:book}. The reconstruction is an effective way of
337	removing a lot of the noise in the image, allowing one to search
338	reliably to fainter levels, and reducing the number of spurious
339	detections. The payoff is that it can be relatively time- and
340	memory-intensive. The steps in the procedure are as follows:
341	\begin{enumerate}
342	\item Set the reconstructed array to 0 everywhere.
343	\item The cube is discretely convolved with a given filter
344	function. This is determined from the parameter file via the {\tt
345	filterCode} parameter -- see Appendix~\ref{app-param} for details on
346	the filters available.
347	\item The wavelet coefficients are calculated by taking the difference
348	between the convolved array and the input array.
349	\item If the wavelet coefficients at a given point are above the
350	threshold requested (given by {\tt snrRecon} as the number of
351	$\sigma$ above the mean and adjusted to the current scale), add
352	these to the reconstructed array.
353	\item The separation of the filter coefficients is doubled.
354	\item The procedure is repeated from step 2, using the convolved array
355	as the input array.
356	\item Continue until the required maximum number of scales is reached.
357	\item Add the final smoothed (\ie convolved) array to the
358	reconstructed array. This provides the ``DC offset'', as each of the
359	wavelet coefficient arrays will have zero mean.
360	\end{enumerate}
361
362	Note that any BLANK pixels that are still in the cube will not be
363	altered by the reconstruction -- they will be left as BLANK so that
364	the shape of the valid part of the cube is preserved.
365
366	It is important to note that the {\it {\`a} trous} decomposition is
367	an example of a ``redundant'' transformation. If no thresholding is
368	performed, the sum of all the wavelet coefficient arrays and the final
369	smoothed array is identical to the input array. The thresholding thus
370	removes only the unwanted structure in the array.
371
372	The statistics of the cube are estimated using robust methods, to
373	avoid corruption by strong outlying points. The mean is actually
374	estimated by the median, while the median absolute deviation from the
375	median (MADFM) is calculated and corrected assuming Gaussianity to
376	estimate the standard deviation $\sigma$. The Gaussianity (or
377	Normality) assumption is critical, as the MADFM does not give the same
378	value as the usual rms or standard deviation value -- for a normal
379	distribution $N(\mu,\sigma)$ we find MADFM$=0.6744888\sigma$. The
380	difference between the MADFM and $\sigma$ is corrected for, so the
381	user need only think in the usual multiples of $\sigma$ when setting
382	{\tt snrRecon}. See Appendix~\ref{app-madfm} for a derivation of this
383	value.
384
385	When thresholding the different wavelet scales, the value of $\sigma$
386	as measured from the input array needs to be scaled to account for the
387	increased amount of correlation between neighbouring pixels (due to
388	the convolution). See Appendix~\ref{app-scaling} for details on this
389	scaling.
390
391	The user can also select the minimum scale to be used in the
392	reconstruction -- the first scale exhibits the highest frequency
393	variations, and so ignoring this one can sometimes be beneficial in
394	removing excess noise. The default, however, is to use all scales
395	({\tt minscale = 1}).
396
397	The reconstruction has at least two iterations. The first iteration
398	makes a first pass at the wavelet reconstruction (the process outlined
399	in the 8 stages above), but the residual array will inevitably have
400	some structure still in it, so the wavelet filtering is done on the
401	residual, and any significant wavelet terms are added to the final
402	reconstruction. This step is repeated until the change in the $\sigma$
403	of the background is less than some fiducial amount.
404
405	\subsection{Reconstruction I/O}
406
407	The reconstruction stage can be relatively time-consuming,
408	particularly for large cubes. Duchamp thus has a shortcut to allow
409	users to quickly do multiple searches (\eg with different thresholds)
410	on the same reconstruction.
411
412	The first step is to select to save the reconstructed image as a
413	FITS file -- at the moment this is just saved in the same directory as
414	the input file, so it won't work if the user does not have write
415	permissions on that directory. The name of the file will be derived
416	from the input file, in the following manner: if the input file is
417	{\tt image.fits}, the reconstructed array will be saved in {\tt
418	image.RECON?.fits}, where {\tt ?} stands for the value of {\tt
419	snrRecon} (for instance, if {\tt snrRecon}$=4$, it will be {\tt
420	image.RECON4.fits}, and if {\tt snrRecon}$=4.5$, it will be {\tt
421	image.RECON4.5.fits}). To save the reconstructed array, set {\tt
422	flagOutputRecon = true}.
423
424	Likewise, the residual image, defined as the difference between the
425	input image and the reconstructed image, can also be saved in the same
426	manner -- its filename will be {\tt image.RESID?.fits}. This is done
427	by setting {\tt flagOutputResid = true}.
428
429	If a reconstructed image has been saved, it can be read in and used
430	instead of redoing the reconstruction. To do so, the user should set
431	{\tt flagReconExists = true}. The user can indicate the name of the
432	reconstructed FITS file using the {\tt reconFile} parameter, or, if
433	this is not specified, Duchamp searches for the file {\tt
434	image.RECON?.fits} (as defined above). If the file is not found, the
435	reconstruction is performed as normal. Note that to do this, the user
436	needs to set {\tt flagAtrous = true} (obviously, if this is {\tt
437	false}, the reconstruction is not needed).
438
439	\subsection{Searching the image}
440	\label{sec-detection}
441
442	The image is searched for detections in two ways: spectrally (a
443	1-dimensional search in the spectrum in each spatial pixel), and
444	spatially (a 2-dimensional search in the spatial image in each
445	channel). In both cases, the algorithm finds connected pixels that are
446	above the user-specified threshold. In the case of the spatial image
447	search, the algorithm of \citet{lutz80} is used to raster scan through
448	the image and connect groups of pixels on neighbouring rows.
449
450	Note that this algorithm cannot be applied directly to a 3-dimensional
451	case, as it requires that objects are completely nested in a row: that
452	is, if you are scanning along a row, and one object finishes and
453	another starts, you know that you will not get back to the first one
454	(if at all) until the second is finished for that
455	row. Three-dimensional data does not have this property, which is why
456	we break up the searching into 1- and 2-dimensional cases.
457
458	The determination of the threshold is done in one of two ways. The
459	first way is a simple sigma-clipping, where a threshold is set at
460	$n\sigma$ above the mean and pixels above this threshold are
461	flagged as detected. The value of $n$ is set with the parameter {\tt
462	snrCut}. As before, the value for $\sigma$ is estimated by
463	the MADFM, and corrected by the ratio derived in
464	Appendix~\ref{app-madfm}.
465
466	The second method uses the False Discovery Rate (FDR) technique
467	\citep{miller01,hopkins02}, whose basis we briefly detail here. The
468	false discovery rate (given by the number of false detections divided
469	by the total number of detections) is fixed at a certain value
470	$\alpha$ (\eg $\alpha=0.05$ implies 5\% of detections are false
471	positives). In practice, an $\alpha$ value is chosen, and the ensemble
472	average FDR (\ie $<FDR>$) when the method is used will be less than
473	$\alpha$. One calculates $p$ -- the probability, assuming the null
474	hypothesis is true, of obtaining a test statistic as extreme as the
475	pixel value (the observed test statistic) -- for each pixel, and sorts
476	them in increasing order. One then calculates $d$ where
477	\[
478	d = \max_j \left\{ j : P_j < \frac{j\alpha}{c_N N} \right\},
479	\]
480	and then rejects all hypotheses whose $p$-values are less than or equal
481	to $P_d$. (So a $P_i<P_d$ will be rejected even if $P_i \geq
482	j\alpha/c_N N$.) Note that ``reject hypothesis'' here means ``accept
483	the pixel as an object pixel'' (\ie we are rejecting the null
484	hypothesis that the pixel belongs to the background).
485
486	The $c_N$ values here are normalisation constants that depend on the
487	correlated nature of the pixel values. If all the pixels are
488	uncorrelated, then $c_N=1$. If $N$ pixels are correlated, then their
489	tests will be dependent on each other, and so $c_N = \sum_{i=1}^N
490	i^{-1}$. \citet{hopkins02} consider real radio data, where the pixels
491	are correlated over the beam. In this case the sum is made over the
492	$N$ pixels that make up the beam. The value of $N$ is calculated from
493	the FITS header (if the correct keywords -- BMAJ, BMIN -- are not
494	present, a default value of 10 pixels is assumed).
495
496	If a reconstruction has been made, the residuals (defined as original
497	$-$ reconstruction) are used to estimate the noise parameters of the
498	cube. Otherwise they are estimated directly from the cube itself. In
499	both cases, the median is used as a robust estimator of the mean
500	value, although the $\sigma$ is estimated by the standard deviation
501	(of the residual array, in the case of the reconstruction, but of the
502	original array otherwise).
503
504	Detections must have a minimum number of pixels to be counted. This
505	minimum number is given by the input parameters {\tt minPix} (for
506	2-dimensional searches) and {\tt minChannels} (for 1-dimensional
507	searches).
508
509	The search only looks for positive features. If one is interested
510	instead in negative features (such as absorption lines), set the
511	parameter {\tt flagNegative = true}. This will invert the cube (\ie
512	multiply all pixels by $-1$) prior to the search, and then re-invert
513	the cube (and the fluxes of any detections) after searching is
514	complete. All outputs are done in the same manner as normal, so that
515	fluxes of detections will be negative.
516
517	\subsection{Merging detected objects}
518	\label{sec-merger}
519
520	The searching step produces a list of detections that will have many
521	repeated detections of a given object -- for instance, spectral
522	detections in adjacent pixels of the same object and/or spatial
523	detections in neighbouring channels. These are then combined in an
524	algorithm that matches all objects judged to be ``close''. This
525	determination is made in one of two ways.
526
527	One way is to define two thresholds -- one spatial and one in velocity
528	-- and say that two objects should be merged if there is at least one
529	pair of pixels that lie within these threshold distances of each
530	other. These thresholds are specified by the parameters {\tt
531	threshSpatial} and {\tt threshVelocity} (in units of pixels and
532	channels respectively).
533
534	Alternatively, the spatial requirement can be changed to say that
535	there must be a pair of pixels that are {\it adjacent} -- a stricter,
536	but more realistic requirement, particularly when the spatial pixels
537	have a large angular size (as is the case for \hi\ surveys). This
538	method can be selected by setting the parameter
539	{\tt flagAdjacent} to 1 (\ie {\tt true}) in the parameter file. The
540	velocity thresholding is done in the same way as the first option.
541
542	Once the detections have been merged, they may be ``grown''. This is a
543	process of increasing the size of the detection by adding adjacent
544	pixels that are above some secondary threshold. This threshold is
545	lower than the one used for the initial detection, but above the noise
546	level, so that faint pixels are only detected when they are close to a
547	bright pixel. The value of this threshold is a possible input
548	parameter ({\tt growthCut}), with a default value of $1.5\sigma$. The
549	use of the growth algorithm is controlled by the {\tt flagGrowth}
550	parameter -- the default value of which is {\tt false}. If the
551	detections are grown, they are sent through the merging algorithm a
552	second time, to pick up any detections that now overlap or have grown
553	over each other.
554
555	Finally, to be accepted, the detections must span {\it both} a minimum
556	number of channels (to remove any spurious single-channel spikes that
557	may be present), and a minimum number of spatial pixels. These
558	numbers, as for the original detection step, are set with the {\tt
559	minChannels} and {\tt minPix} parameters. The channel requirement
560	means there must be at least one set of this many consecutive channels
561	in the source for it to be accepted.
562
563	\section{Outputs}
564	\label{sec-output}
565
566	\subsection{During execution}
567
568	Duchamp provides the user with feedback whilst it is running, to
569	keep the user informed on the progress of the analysis. Most of this
570	consists of self-explanatory messages about the particular stage the
571	program is up to. The relevant parameters are printed to the screen at
572	the start (once the file has been successfully read in), so the user
573	is able to make a quick check that the setup is correct.
574
575	If the cube is being trimmed (\S\ref{sec-modify}), the resulting
576	dimensions are printed to indicate how much has been trimmed. If a
577	reconstruction is being done, a continually updating message shows the
578	current iteration and scale (compared to the maximum scale).
579
580	During the searching algorithms, the progress through the 1D and 2D
581	searches are shown. When the searches have completed,
582	the number of objects found in both the 1D and 2D searches are
583	reported (see \S\ref{sec-detection} for details).
584
585	In the merging process (where multiple detections of the same object
586	are combined -- see \S\ref{sec-merger}), two stages of output
587	occur. The first is when each object in the list is compared with all
588	others. The output shows two numbers: the first being how far through
589	the list we are, and the second being the length of the list. As the
590	algorithm proceeds, the first number should increase and the second
591	should decrease (as objects are combined). When the numbers meet (\ie
592	the whole list has been compared), the second phase begins, in which
593	multiply-appearing pixels in each object are removed, as are objects
594	not meeting the minimum channels requirement. During this phase, the
595	total number of accepted objects is shown, which should steadily
596	increase until all have been accepted or rejected. Note that these
597	steps can be very quick for small numbers of detections.
598
599	Since this continual printing to screen has some overhead of time and
600	CPU involved, the user can elect to not print this information by
601	setting the parameter {\tt verbose = 0}. In this case, the user is
602	still informed as to the steps being undertaken, but the details of
603	the progress are not shown.
604
605	\subsection{Results}
606
607	Finally, we get to the results -- the reason for running Duchamp in
608	the first place. Once the detection list is finalised, it is sorted by
609	the mean velocity of the detections (or, if there is no good WCS
610	associated with the cube, by the mean Z-pixel position). The results
611	are then printed to the screen and to the output file, denoted by the
612	{\tt OutFile} parameter. The results list, an example of which can be
613	seen in Appendix~\ref{app-output}, contains the following columns
614	(note that the title of the columns depending on WCS information will
615	depend on the projection of the WCS):
616
617	\begin{entry}
618	\item[Obj\#] The ID number of the detection (simply the sequential
619	count for the list, which is ordered by increasing velocity).
620	\item[Name] The IAU-format name of the detection (based on the WCS
621	projection).
622	\item[X] The average X-pixel position.
623	\item[Y] The average Y-pixel position.
624	\item[Z] The average Z-pixel position.
625	\item[RA/GLON] The Right Ascension or Galactic Longitude of the centre
626	of the object.
627	\item[DEC/GLAT] The Declination or Galactic Latitude of the centre of
628	the object.
629	\item[w\_RA/w\_GLON] The width of the object in Right Ascension or
630	Galactic Longitude [arcmin].
631	\item[w\_DEC/w\_GLAT] The width of the object in Declination Galactic
632	Latitude [arcmin].
633	\item[VEL] The mean velocity of the object [km/s].
634	\item[w\_VEL] The full velocity width of the detection (max channel
635	$-$ min channel, in velocity units [km/s]).
636	\item[F\_tot] The integrated flux over the object, in the units of
637	flux times velocity (\eg Jy km/s).
638	\item[F\_peak] The peak flux over the object, in the units of flux.
639	\item[X1, X2] The minimum and maximum X-pixel coordinates.
640	\item[Y1, Y2] The minimum and maximum Y-pixel coordinates.
641	\item[Z1, Z2] The minimum and maximum Z-pixel coordinates.
642	\item[Npix] The number of pixels \& channels (\ie distinct $(x,y,z)$
643	coordinates) in the detection.
644	\item[Flag] Whether the detection has any warning flags (see below).
645	\end{entry}
646	The Name is derived from the WCS position. For instance, the (RA,Dec)
647	position 12$^h$53$^m$45$^s$, -36$^\circ$24$'$12$''$ will be called
648	J1253$-$3624 (if the epoch is J2000) or B1253$-$3624 (if B1950). An
649	alternative form is used for Galactic coordinates: the position
650	($l$,$b$) = (323.1245, 5.4567) will be called G323.12$+$05.45. If the
651	WCS is not valid (\ie is not present or does not have all the
652	necessary information), the Name, RA, DEC, VEL and related columns are
653	not printed, but the pixel coordinates are still provided.
654
655	\begin{figure}[t]
656	\begin{center}
657	\includegraphics[width=\textwidth]{example_spectrum}
658	\end{center}
659	\caption{\footnotesize An example of the spectrum output. Note several
660	of the features discussed in the text: the removal of the Milky Way
661	emission around 0 km/s; the red lines indicating the reconstructed
662	spectrum; the blue dashed lines indicating the spectral extent of
663	the detection; the blue border showing its spatial extent on the
664	0th moment map; and the 15~arcmin-long scale bar.}
665	\label{fig-spect}
666	\end{figure}
667
668	The last column contains any warning flags about the detection. There
669	are currently two options here. An `E' is printed if the detection is
670	next to the edge of the image, meaning either the limit of the pixels,
671	or the limit of the non-BLANK pixel region. An `N' is printed if the
672	total flux, summed over all the (non-BLANK) pixels in the smallest box
673	that completely encloses the detection, is negative. Note that this
674	sum will possibly include non-detected pixels. It is of use in
675	pointing out detections that lie next to strongly negative pixels,
676	such as might arise due to interference -- the detected pixels might
677	then also be due to the interference, so caution is advised.
678
679	Two alternative results files can also be requested. One option is a
680	VOTable-format XML file, containing just the RA, Dec, Velocity and the
681	corresponding widths of the detections, as well as the fluxes. The
682	user should set {\tt flagVOT = 1}, and put the desired filename in the
683	parameter {\tt votFile} -- note that the default is for it not to be
684	produced. This file should be compatible with all Virtual Observatory
685	tools (such as Aladin\footnote{ Aladin can be found on the web at
686	\href{http://aladin.u-strasbg.fr/}{http://aladin.u-strasbg.fr/}}). The
687	second option is an annotation file for use with the Karma toolkit of
688	visualisation tools (in particular, with {\tt kvis}). This will draw a
689	circle at the position of each detection, and number it according to
690	the Obj\# given above. To use, the user should set {\tt flagKarma = 1},
691	and put the desired filename in the parameter {\tt karmaFile} -- again,
692	the default is for it not to be produced.
693
694	As the program is running, it also (optionally) records the detections
695	made in each individual spectrum or channel (see
696	\S\ref{sec-detection} for details on this process). This is
697	recorded in the file denoted by the parameter {\tt LogFile}. This file
698	does not include the columns {\tt Name, RA, DEC, w\_RA, w\_DEC, VEL,
699	w\_VEL}. This file is designed primarily for diagnostic purposes: \eg
700	to see if a given set of pixels is detected in, say, one channel
701	image, but does not survive the merging process. The list of pixels
702	(and their fluxes) in the final detection list are also printed to
703	this file, again for diagnostic purposes. This feature can be turned
704	off by setting {\tt flagLog = false}. (This may be a good idea if you
705	are not interested in its contents, as it can be a large file.)
706
707	\begin{figure}[!t]
708	\begin{center}
709	\includegraphics[width=\textwidth]{example_moment_map}
710	\end{center}
711	\caption{\footnotesize An example of the moment map created by
712	Duchamp. The full extent of the cube is covered, and the 0th moment
713	of each object is shown (integrated individually over all the
714	detected channels).}
715	\label{fig-moment}
716	\end{figure}
717
718	As well as the output data file, a postscript file is created that
719	shows the spectrum for each detection, together with a small cutout
720	image (0th moment) and basic information about the detection (note
721	that any flags are printed after the name of the detection, in the
722	format {\tt [E]}). If the
723	cube was reconstructed, the spectrum from the reconstruction is shown
724	in red, over the top of the original spectrum. The spectrum that is
725	plotted is governed by the {\tt spectralMethod} parameter. It can be
726	either {\tt peak}, where the spectrum is from the spatial pixel
727	containing the detection's peak flux; or {\tt sum}, where the spectrum
728	is summed over all spatial pixels, and then corrected for the beam
729	size.
730
731	The spectral extent of the detection is indicated with blue lines, and
732	a zoom is shown in a separate window. The cutout image can optionally
733	include a border around the spatial pixels that are in the detection
734	(turned on and off by the parameter {\tt drawBorders}). It also
735	includes a scale bar in the bottom left corner to indicate size -- it
736	is 15~arcmin long (note that due to projection effects it may be a
737	slightly different physical length from object to object). An example
738	detection can be seen below in Fig.~\ref{fig-spect}.
739
740	Finally, a couple of images are optionally produced: a 0th moment map
741	of the cube, combining just the detected channels in each object,
742	showing the integrated flux in grey-scale; and a ``detection image'',
743	a grey-scale image where the pixel values are the number of channels
744	that spatial pixel is detected in. In both cases, if {\tt drawBorders =
745	true}, a border is drawn around the spatial extent of each
746	detection. An example moment map is shown in Fig.~\ref{fig-moment}.
747	The production or otherwise of these images is governed by the {\tt
748	flagMaps} parameter.
749
750	The purpose of these images are to provide a visual guide to where the
751	detections have been made, and, particularly in the case of the moment
752	map, to provide an indication of the strength of the source. In both
753	cases, the detections are numbered (in the same way as the output
754	list), and the spatial borders are marked out as for the cutout images
755	in the spectra file. Both these images are saved as postscript files
756	(given by the parameters {\tt momentMap} and {\tt detectionMap}
757	respectively), with the latter also displayed in a {\sc pgplot}
758	window (regardless of the state of {\tt flagMaps}).
759
760	\section{Notes and hints on the use of Duchamp}
761
762	In using Duchamp, the user has to make a number of decisions about
763	the way the program runs. This section is designed to give the user
764	some idea about what to choose.
765
766	The main choice is whether or not to use the wavelet
767	reconstruction. The main benefits of this are the marked reduction in
768	the noise level, leading to regularly-shaped detections, and good
769	reliability for faint sources. The main drawback with its use is the
770	long execution time: to reconstruct a $170\times160\times1024$
771	(\hipass) cube often requires three iterations and takes about 20-25
772	minutes. The searching part of the procedure is much quicker (although
773	see the note on merging, below), so if one uses the FDR method on the
774	un-reconstructed cube, the execution time is only a couple of
775	minutes. Alternatively, using the ability to read in previously-saved
776	reconstructed arrays makes running the reconstruction more than once a
777	more feasible prospect.
778
779	%A further drawback with the reconstruction is that it is susceptible
780	%to edge effects. If the valid area in the cube (\ie the part that is
781	%not BLANK) has very curved edges (such as the \hipass\ polar cap cube,
782	%H001, which has a roughly circular shape after gridding), the
783	%convolution can produce artefacts in the reconstruction that mimic the
784	%edges and can lead (depending on the selection threshold) to some
785	%spurious sources. Caution is advised with such data -- the user is
786	%advised to check carefully the reconstructed cube for the presence of
787	%such artefacts.
788
789	If one chooses the reconstruction method, a further decision is
790	required on the signal-to-noise cutoff used in determining acceptable
791	wavelet coefficients. A larger value will remove more noise from the
792	cube, at the expense of losing fainter sources, while a smaller value
793	will include more noise, which may produce spurious detections, but
794	will be more sensitive to faint sources. Values of less than about
795	$3\sigma$ tend to not reduce the noise a great deal and can lead to
796	many spurious sources (although this will depend on the nature of the
797	cube).
798
799	The FDR method certainly produces more reliable results than a simple
800	sigma-clipping (\ie thresholding at some number of $\sigma$ above the
801	mean), particularly if no reconstruction is done. However, at this
802	point it does not seem to be giving the sensitivity expected for the
803	supplied value of {\tt alpha} (\ie it is not finding as many sources
804	as expected). Work is being done to assess this, and to judge whether
805	there is a real problem (such as with the determination of the
806	statistics), or simply a result of working in 3 dimensions as opposed
807	to 2.
808
809	A further point to bear in mind is that the shape of the detections in
810	a cube that has been reconstructed will be much more regular and
811	smooth -- the ragged edges that objects in the raw cube possess are
812	smoothed by the removal of most of the noise.
813
814	Finally, as Duchamp is still undergoing development, there are some
815	elements that are not fully developed. In particular, it is not as
816	clever as I would like at avoiding interference. The ability to place
817	requirements on the minimum number of channels and pixels partially
818	circumvents this problem, but work is being done to make Duchamp
819	smarter at rejecting signals that are clearly (to a human eye at
820	least) interference. See the following section for further
821	improvements that are planned.
822
823	%\section{Drawbacks of the current program}
824	%
825	%The program currently has a few problems/drawbacks/things to be aware
826	%of that will hopefully be fixed in the future:
827	%\begin{itemize}
828	%
829	%\item Narrow interference spikes are still getting found, particularly
830	% if there is no reconstruction, or reconstruction with a relatively
831	% low {\tt snrRecon} (such as 2 or 3). Increasing the {\tt
832	% minChannels} parameter is one way to circumvent this, but making the
833	% algorithm a bit more clever would be preferable.
834	%
835	%\item Sources that have strong continuum ripple and/or artefacts often
836	% generate many spurious detections. This needs some work to avoid
837	% Duchamp doing this, and until then users are advised to be aware
838	% of the possibility. Strong continuum ripples may generate many
839	% sources on the same spatial pixel, and this will be apparent on the
840	% detection images.
841	%
842	%\item Spectra are integrated over every spatial pixel of the
843	% detection, and this may dilute the actual detection, making it
844	% harder to see \ie the apparent strength of the line as plotted may
845	% not give a true indication of how strong it really is.
846	%
847	%%\item A caution on the merging part of the procedure. This can be time
848	%% consuming if there are many detections that do not require merging
849	%% -- in this case, the time will go like $N^2$ ($N$ = number of
850	%% detections). If there are plenty of mergers, the size of the list
851	%% reduces quickly, so the execution time will be less.
852	%
853	%
854	%\end{itemize}
855
856
857	%\section{Comparison with other software (to be developed further...)}
858	%
859	%\subsection{fred, by Matt Howlett}
860	%
861	%This is the program used in the \hipass\ analysis. It smoothes the
862	%data spectrally with a boxcar filter of a size that varies over a
863	%user-specified range, and then thresholds the data.
864	%
865	%Works effectively, but generally doesn't find as many sources as
866	%Duchamp, particularly when the reconstruction is used. Sensitive to
867	%faint, broad features that fall below the reconstruction threshold.
868	%
869	%Execution takes a long time, depending on the range of filter widths
870	%that are used.
871	%
872	%\subsection{sfind}
873	%
874	%Hard to evaluate, as it does not (as far as I can see) output the
875	%channel number at which detections are made, and does not merge
876	%detections made at adjacent channels (\ie it just works in 2
877	%dimensions).
878	%
879
880	\section{Future Developments}
881
882	This is both a list of planned improvements and a wish-list of
883	features that would be nice to include (but are not planned in the
884	immediate future). Let me know if there are items not on this list, or
885	items on the list you would like prioritised.
886
887	\begin{itemize}
888
889	\item More varied output formats. {\bf Planned.}
890
891	\item Better determination of the noise characteristics of
892	spectral-line cubes, including understanding how the noise is
893	generated and developing a model for it. {\bf Planned.}
894
895	\item Include more source analysis. Examples could be: shape
896	information; measurements of HI mass; better measurements of
897	velocity width and profile... {\bf Some planned.}
898
899	\item Provide some indication of the significance of the detection
900	(\ie some S/N-like value). {\bf Planned.}
901
902	\item Improved ability to reject interference, possibly on the
903	spectral shape of features. {\bf Planned.}
904
905	\item Ability to separate (de-blend) distinct sources that have been
906	merged. {\bf Planned.}
907
908	\item Link to lists of possible counterparts (\eg via NED/SIMBAD/other
909	VO tools?). {\bf Wishlist.}
910
911	\item At this point, the ``Milky Way'' channels are discarded and set
912	to zero. It may be that users would like to have those put back in
913	the final cube after the source detection is done, so at some point
914	this option may be added. {\bf Wishlist -- if needed.}
915
916	\end{itemize}
917
918
919	%\bibliographystyle{mn2e}
920	%\bibliographystyle{abbrvnat}
921	%\bibliography{mnrasmnemonic,sourceDetection}
922	\begin{thebibliography}{}
923
924	\bibitem[\protect\citeauthoryear{{Calabretta} \& {Greisen}}{{Calabretta} \&
925	{Greisen}}{2002}]{calabretta02}
926	{Calabretta} M., {Greisen} E., 2002, A\&A, 395, 1077
927
928	\bibitem[\protect\citeauthoryear{{Greisen} \& {Calabretta}}{{Greisen} \&
929	{Calabretta}}{2002}]{greisen02}
930	{Greisen} E., {Calabretta} M., 2002, A\&A, 395, 1061
931
932	\bibitem[\protect\citeauthoryear{{Hanisch}, {Farris}, {Greisen}, {Pence},
933	{Schlesinger}, {Teuben}, {Thompson} \& {Warnock}}{{Hanisch}
934	et~al.}{2001}]{hanisch01}
935	{Hanisch} R., {Farris} A., {Greisen} E., {Pence} W., {Schlesinger} B.,
936	{Teuben} P., {Thompson} R., {Warnock} A., 2001, A\&A, 376, 359
937
938	\bibitem[\protect\citeauthoryear{{Hopkins}, {Miller}, {Connolly}, {Genovese},
939	{Nichol} \& {Wasserman}}{{Hopkins} et~al.}{2002}]{hopkins02}
940	{Hopkins} A., {Miller} C., {Connolly} A., {Genovese} C., {Nichol} R.,
941	{Wasserman} L., 2002, AJ, 123, 1086
942
943	\bibitem[\protect\citeauthoryear{Lutz}{Lutz}{1980}]{lutz80}
944	Lutz R., 1980, The Computer Journal, 23, 262
945
946	\bibitem[\protect\citeauthoryear{{Meyer} et~al.,}{{Meyer}
947	et~al.}{2004}]{meyer04:trunc}
948	{Meyer} M., et~al., 2004, MNRAS, 350, 1195
949
950	\bibitem[\protect\citeauthoryear{{Miller}, {Genovese}, {Nichol}, {Wasserman},
951	{Connolly}, {Reichart}, {Hopkins}, {Schneider} \& {Moore}}{{Miller}
952	et~al.}{2001}]{miller01}
953	{Miller} C., {Genovese} C., {Nichol} R., {Wasserman} L., {Connolly} A.,
954	{Reichart} D., {Hopkins} A., {Schneider} J., {Moore} A., 2001, AJ, 122,
955	3492
956
957	\bibitem[\protect\citeauthoryear{Minchin}{Minchin}{1999}]{minchin99}
958	Minchin R., 1999, PASA, 16, 12
959
960	\bibitem[\protect\citeauthoryear{Starck \& Murtagh}{Starck \&
961	Murtagh}{2002}]{starck02:book}
962	Starck J.-L., Murtagh F., 2002, {``Astronomical Image and Data Analysis''}.
963	Springer
964
965	\end{thebibliography}
966
967
968	\appendix
969	\newpage
970	\section{Available parameters}
971	\label{app-param}
972
973	The full list of parameters that can be listed in the input file are
974	given here. If not listed, they take the default value given in
975	parentheses. Since the order of the parameters in the input file does
976	not matter, they are grouped here in logical sections.
977
978	\subsection*{Input-output related}
979	\begin{entry}
980	\item[ImageFile (no default assumed)] The filename of the
981	data cube to be analysed.
982	\item[flagSubsection {\tt [false]}] A flag to indicate whether one
983	wants a subsection of the requested image.
984	\item[Subsection {\tt [ [,,*] ]}] The requested subsection, which
985	should be specified in the format {\tt [x1:x2,y1:y2,z1:z2]}, where
986	the limits are inclusive. If the full range of a dimension is
987	required, use a {\tt *}, \eg if you want the full spectral range of
988	a subsection of the image, use {\tt [30:140,30:140,*]}.
989	\item[flagReconExists {\tt [false]}] A flag to indicate whether the
990	reconstructed array has been saved by a previous run of Duchamp. If
991	set true, the reconstructed array will be read from the file given by
992	{\tt reconFile}, rather than calculated directly.
993	\item[reconFile (no default assumed)] The FITS file that contains the
994	reconstructed array. If {\tt flagReconExists} is true and this
995	parameter is not defined, the default file searched will be
996	determined by the {\`a} trous parameters (see \S\ref{sec-recon}).
997	\item[OutFile {\tt [duchamp-Results.txt]}] The file containing the
998	final list of detections. This also records the list of input
999	parameters.
1000	\item[SpectraFile {\tt [duchamp-Spectra.ps]}] The postscript file
1001	containing the resulting integrated spectra and images of the
1002	detections.
1003	\item[flagLog {\tt [true]}] A flag to indicate whether intermediate
1004	detections should be logged.
1005	\item[LogFile {\tt [duchamp-Logfile.txt]}] The file in which intermediate
1006	detections are logged. These are detections that have not been
1007	merged. This is primarily for use in debugging and diagnostic
1008	purposes -- normal use of the program will probably not require
1009	this.
1010	\item[flagOutputRecon {\tt [false]}] A flag to say whether or not to
1011	save the reconstructed cube as a FITS file. The filename will be
1012	derived from the ImageFile -- the reconstruction of {\tt image.fits}
1013	will be saved as {\tt image.RECON?.fits}, where {\tt ?} stands for
1014	the value of {\tt snrRecon} (see below).
1015	\item[flagOutputResid {\tt [false]}] As for {\tt flagOutputRecon}, but
1016	for the residual array -- the difference between the original cube
1017	and the reconstructed cube. The filename will be {\tt
1018	image.RESID?.fits}.
1019	\item[flagVOT {\tt [false]}] A flag to say whether to create a VOTable
1020	file corresponding to the information in {\tt outfile}. This will be
1021	an XML file in the Virtual Observatory VOTable format.
1022	\item[votFile {\tt [duchamp-Results.xml]}] The VOTable file with the
1023	list of final detections. Some input parameters are also recorded.
1024	\item[flagKarma {\tt [false]}] A flag to say whether to create a Karma
1025	annotation file corresponding to the information in {\tt
1026	outfile}. This can be used as an overlay for the Karma programs such
1027	as {\tt kvis}.
1028	\item[karmaFile {\tt [duchamp-Results.ann]}] The Karma annotation
1029	file showing the list of final detections.
1030	\item[flagMaps {\tt [true]}] A flag to say whether to save postscript
1031	files showing the 0th moment map of the whole cube (parameter {\tt
1032	momentMap}) and the detection image ({\tt detectionMap}).
1033	\item[momentMap {\tt [duchamp-MomentMap.ps]}] A postscript file
1034	containing a map of the 0th moment of the detected sources, as well
1035	as pixel and WCS coordinates.
1036	\item[detectionMap {\tt [duchamp-DetectionMap.ps]}] A postscript
1037	file showing each of the detected objects, coloured in greyscale by
1038	the number of channels they span. Also shows pixel and WCS
1039	coordinates.
1040	\end{entry}
1041
1042	\subsection*{Modifying the cube}
1043	\begin{entry}
1044	\item[flagBlankPix {\tt [true]}] A flag to say whether to remove BLANK
1045	pixels from the analysis -- these are pixels set to some particular
1046	value because they fall outside the imaged area.
1047	\item[blankPixValue {\tt [-8.00061]}] The value of the BLANK pixels,
1048	if this information is not contained in the FITS header (the usual
1049	procedure is to obtain this value from the header information -- in
1050	which case the value set by this parameter is ignored).
1051	\item[flagMW {\tt [false]}] A flag to say whether to remove channels
1052	contaminated by Milky Way (or other) emission -- the flux in these
1053	channels is currently just set to 0.
1054	\item[maxMW {\tt [112]}] The maximum channel for the Milky Way
1055	emission.
1056	\item[minMW {\tt [75]}] The minimum channel for the Milky Way
1057	emission. Note that the channels specified by {\tt maxMW} and {\tt
1058	minMW} are assumed to be Milky Way channels (\ie the range is
1059	inclusive).
1060	\item[flagBaseline {\tt [false]}] A flag to say whether to remove the
1061	baseline from each spectrum in the cube for the purposes of
1062	reconstruction and detection.
1063	\end{entry}
1064
1065	\subsection*{Detection related}
1066
1067	\subsubsection*{General detection}
1068	\begin{entry}
1069	\item[flagNegative {\tt [false]}] A flag to indicate that the features
1070	being searched for are negative. The cube will be inverted prior to
1071	searching.
1072	\item[snrCut {\tt [3.]}] The cut-off value for thresholding, in terms
1073	of number of $\sigma$ above the mean.
1074	\item[flagGrowth {\tt [false]}] A flag indicating whether or not to
1075	grow the detected objects to a smaller threshold.
1076	\item[growthCut {\tt [2.]}] The smaller threshold using in growing
1077	detections. In units of $\sigma$ above the mean.
1078	\end{entry}
1079
1080	\subsubsection*{{\` a} trous reconstruction}
1081	\begin{entry}
1082	\item [flagATrous {\tt [true]}] A flag indicating whether or not to
1083	reconstruct the cube using the {\it {\`a} trous} wavelet
1084	reconstruction. Currently does this in 3-dimensions. See
1085	\S\ref{sec-recon} for details.
1086	\item[scaleMin {\tt [1]}] The minimum wavelet scale to be used in the
1087	reconstruction. A value of 1 means ``use all scales''.
1088	\item[snrRecon {\tt [4]}] The thresholding cutoff used in the
1089	reconstruction -- only wavelet coefficients this many $\sigma$ above
1090	the mean (or greater) are included in the reconstruction.
1091	\item[filterCode {\tt [2]}] The code number of the filter to use in
1092	the reconstruction. The options are:
1093	\begin{itemize}
1094	\item {\bf 1:} B$_3$-spline filter: coefficients =
1095	$(\frac{1}{16}, \frac{1}{4}, \frac{3}{8}, \frac{1}{4}, \frac{1}{16})$
1096	\item {\bf 2:} Triangle filter: coefficients = $(\frac{1}{4}, \frac{1}{2}, \frac{1}{4})$
1097	\item {\bf 3:} Haar wavelet: coefficients = $(0, \frac{1}{2}, \frac{1}{2})$
1098	\end{itemize}
1099	\end{entry}
1100
1101	\subsubsection*{FDR method}
1102	\begin{entry}
1103	\item[flagFDR {\tt [false]}] A flag indicating whether or not to use
1104	the False Discovery Rate method in thresholding the pixels.
1105	\item[alphaFDR {\tt [0.01]}] The $\alpha$ parameter used in the FDR
1106	analysis. The average number of false detections, as a fraction of the
1107	total number, will be less than $\alpha$ (see \S\ref{sec-detection}).
1108	\end{entry}
1109
1110	\subsubsection*{Merging detections}
1111	\begin{entry}
1112	\item[minPix {\tt [2]}] The minimum number of spatial pixels for a single
1113	detection to be counted.
1114	\item[minChannels {\tt [3]}] The minimum number of consecutive
1115	channels that must be present in the detection for it to be accepted
1116	by the Merging algorithm.
1117	%The minimum number of channels that a
1118	% detection must span for it to be accepted by the Merging algorithm.
1119	\item[flagAdjacent {\tt [true]}] A flag indicating whether to use the
1120	``adjacent pixel'' criterion to decide whether to merge objects. If
1121	not, the next two parameters are used to determine whether objects
1122	are within the necessary thresholds.
1123	\item[threshSpatial {\tt [3.]}] The maximum allowed minimum spatial
1124	separation (in pixels) between two detections for them to be merged
1125	into one. Only used if {\tt flagAdjacent = false}.
1126	\item[threshVelocity {\tt [7.]}] The maximum allowed minimum channel
1127	separation between two detections for them to be merged into
1128	one. %Only used if {\tt flagAdjacent = false}.
1129	\end{entry}
1130
1131	\subsubsection*{Other parameters}
1132	\begin{entry}
1133	\item[spectralMethod {\tt [peak]}] This indicates which method is used
1134	to plot the output spectra: {\tt peak} means plot the spectrum
1135	containing the detection's peak pixel; {\tt sum} means sum the
1136	spectra of each detected spatial pixel, and correct for the beam
1137	size. Any other choice defaults to {\tt peak}.
1138	\item[drawBorders {\tt [true]}] A flag indicating whether borders
1139	are to be drawn around the detected objects in the moment maps
1140	included in the output (see for example Fig.~\ref{fig-spect}).
1141	\item[verbose {\tt [true]}] A flag indicating whether to print the
1142	progress of computationally-intensive algorithms (such as the
1143	searching and merging) to screen.
1144	\end{entry}
1145
1146
1147	\newpage
1148	\section{Example parameter files}
1149	\label{app-input}
1150
1151	This is what a typical parameter file would look like.
1152
1153	\begin{verbatim}
1154	imageFile /DATA/SITAR_1/whi550/cubes/H201_abcde_luther_chop.fits
1155	logFile logfile.txt
1156	outFile results.txt
1157	spectraFile spectra.ps
1158	flagSubsection 0
1159	flagOutputRecon 0
1160	flagOutputResid 0
1161	flagBlankPix 1
1162	flagMW 1
1163	minMW 75
1164	maxMW 112
1165	minPix 3
1166	flagGrowth 1
1167	growthCut 1.5
1168	flagATrous 0
1169	scaleMin 1
1170	snrRecon 4
1171	flagFDR 1
1172	alphaFDR 0.1
1173	numPixPSF 20
1174	snrCut 3
1175	threshSpatial 3
1176	threshVelocity 7
1177	\end{verbatim}
1178
1179	Note that it is not necessary to include all these parameters in the
1180	file, only those that need to be changed from the defaults (as listed
1181	in Appendix~\ref{app-param}), which in this case would be very few. A
1182	minimal parameter file might look like:
1183	\begin{verbatim}
1184	imageFile /DATA/SITAR_1/whi550/cubes/H201_abcde_luther_chop.fits
1185	flagLog 0
1186	snrRecon 3
1187	snrCut 2.5
1188	minChannels 4
1189	\end{verbatim}
1190	This will reconstruct the cube with a lower SNR value than the
1191	default, select objects at a lower threshold, with a looser minimum
1192	channel requirement, and not keep a log of the intermediate
1193	detections.
1194
1195	The following page demonstrates how the parameters are presented to
1196	the user, both on the screen at execution time and in the output and
1197	log files:
1198	\newpage
1199	\begin{landscape}
1200	Presentation of parameters in output and log files:
1201	\begin{verbatim}
1202	---- Parameters ----
1203	Image to be analysed = /DATA/SITAR_1/whi550/cubes/H201_abcde_luther_chop.fits
1204	Intermediate Logfile = duchamp-Logfile.txt
1205	Final Results file = duchamp-Results.txt
1206	Spectrum file = duchamp-Spectra.ps
1207	VOTable file = duchamp-Results.xml
1208	0th Moment Map = duchamp-MomentMap.ps
1209	Detection Map = duchamp-DetectionMap.ps
1210	Saving reconstructed cube? = false
1211	Saving residuals from reconstruction? = false
1212	------
1213	Searching for Negative features? = false
1214	Fixing Blank Pixels? = true
1215	Blank Pixel Value = -8.00061
1216	Removing Milky Way channels? = true
1217	Milky Way Channels = 75-112
1218	Beam Size (pixels) = 10.1788
1219	Removing baselines before search? = false
1220	Minimum # Pixels in a detection = 2
1221	Growing objects after detection? = false
1222	Using A Trous reconstruction? = true
1223	Minimum scale in reconstruction = 1
1224	SNR Threshold within reconstruction = 4
1225	Filter being used for reconstruction = B3 spline function
1226	Using FDR analysis? = false
1227	SNR Threshold = 2.5
1228	Using Adjacent-pixel criterion? = true
1229	Max. velocity separation for merging = 7
1230	Min. # channels for merging = 4
1231	Method of spectral plotting = peak
1232	\end{verbatim}
1233
1234	\newpage
1235	\section{Example output file}
1236	\label{app-output}
1237	This the typical content of an output file, after running Duchamp
1238	with the parameters illustrated on the previous page.
1239
1240	{\scriptsize
1241	\begin{verbatim}
1242	Results of the Duchamp source finder: Tue May 23 14:51:38 2006
1243	---- Parameters ----
1244
1245	(... omitted for clarity -- see previous page for examples...)
1246
1247	--------------------
1248	Total number of detections = 23
1249	--------------------
1250	Obj# Name X Y Z RA DEC w_RA w_DEC VEL w_VEL F_tot F_peak X1 X2 Y1 Y2 Z1 Z2 Npix Flag
1251	-----------------------------------------------------------------------------------------------------------------------------------------------------------
1252	1 J0609-2156 59.4 140.6 114.7 06:09:21.03 -21:56:51.08 48.48 39.45 226.253 65.957 17.572 0.213 55 66 136 145 113 118 185
1253	2 J0607-2601 65.2 79.6 116.2 06:07:52.21 -26:01:09.34 44.44 39.50 246.310 39.574 4.144 0.100 60 70 76 85 115 118 50
1254	3 J0606-2720 70.8 59.8 121.4 06:06:14.90 -27:20:45.24 52.45 47.59 315.404 39.574 17.066 0.150 65 77 53 64 120 123 213
1255	4 J0611-2138 52.5 145.1 162.5 06:11:18.85 -21:38:03.71 32.39 23.49 856.919 118.722 44.394 0.410 49 56 142 147 158 167 303 E
1256	5 J0600-2859 89.7 35.3 202.4 06:00:33.13 -28:59:01.59 23.92 28.10 1383.476 184.678 26.573 0.173 87 92 32 38 195 209 319
1257	6 J0558-2639 95.5 70.2 222.6 05:58:52.79 -26:39:04.56 15.93 12.10 1650.508 105.531 1.925 0.063 94 97 69 71 219 227 35
1258	7 J0617-2724 34.8 58.3 227.5 06:17:05.84 -27:24:00.93 20.75 23.42 1714.993 303.400 11.414 0.093 33 37 56 61 215 238 176
1259	8 J0609-2141 60.3 144.4 229.6 06:09:05.74 -21:41:38.75 16.14 11.82 1742.470 105.531 1.476 0.068 59 62 143 145 225 233 25
1260	9 J0558-2525 95.7 88.6 231.1 05:58:51.19 -25:25:33.12 27.87 24.16 1762.632 250.635 16.930 0.115 92 98 86 91 220 239 257
1261	10 J0600-2141 88.9 144.4 232.3 06:00:52.94 -21:41:57.48 31.95 24.15 1777.848 224.252 34.030 0.166 86 93 142 147 222 239 415 E
1262	11 J0615-2634 40.0 70.8 232.6 06:15:25.93 -26:34:35.73 16.54 19.58 1782.224 52.765 2.757 0.068 38 41 69 73 231 235 44
1263	12 J0604-2606 75.9 78.4 233.1 06:04:42.24 -26:06:22.98 28.12 23.86 1788.258 224.252 27.059 0.155 73 79 76 81 225 242 352
1264	13 J0601-2340 88.0 114.9 235.7 06:01:08.27 -23:40:17.66 35.94 32.09 1822.941 263.826 85.132 0.297 84 92 112 119 226 246 724
1265	14 J0615-2234 38.2 130.6 253.6 06:15:30.57 -22:34:51.69 12.38 15.71 2059.721 118.722 2.317 0.070 37 39 129 132 248 257 40
1266	15 J0617-2305 31.4 122.8 258.0 06:17:33.18 -23:05:36.24 16.45 15.54 2117.104 39.574 1.424 0.062 30 33 121 124 256 259 23
1267	16 J0612-2149 49.5 142.3 271.1 06:12:11.78 -21:49:20.22 24.35 19.58 2290.167 395.740 20.712 0.101 47 52 140 144 257 287 318
1268	17 J0616-2133 35.2 145.9 300.0 06:16:16.44 -21:33:36.96 20.21 7.47 2671.799 224.252 3.851 0.127 33 37 145 146 294 311 40 E
1269	18 J0544-2736 144.0 54.9 325.4 05:44:13.62 -27:36:34.24 3.57 12.13 3006.575 39.574 0.436 0.057 144 144 54 56 324 327 7 E
1270	19 J0555-2956 107.2 20.7 367.5 05:55:10.37 -29:56:43.13 19.65 24.31 3561.004 39.574 6.482 0.169 105 109 18 23 366 369 72
1271	20 J0558-2321 96.0 119.6 532.1 05:58:47.64 -23:21:17.38 11.91 16.09 5733.479 52.765 1.287 0.051 95 97 118 121 530 534 27
1272	21 J0616-2649 37.9 67.0 547.0 06:16:04.62 -26:49:18.33 12.35 11.67 5929.923 39.574 1.637 0.064 37 39 66 68 546 549 25
1273	22 J0619-2252 25.1 125.9 724.2 06:19:21.57 -22:52:13.98 12.38 11.61 8267.304 39.573 0.698 0.059 24 26 125 127 723 726 13 E
1274	23 J0552-2916 116.9 30.5 727.0 05:52:15.05 -29:16:49.65 11.59 20.25 8304.033 303.400 35.834 0.479 116 118 28 32 716 739 132
1275	\end{verbatim}
1276	}
1277	Note that the
1278	width of the table can make it hard to read. A good trick for those
1279	using UNIX/Linux is to make use of the {\tt a2ps} command. The
1280	following works well, producing a postscript file {\tt results.ps}:
1281	\\\verb\|a2ps -1 -r -f8 -o duchamp-Results.ps duchamp-Results.txt\|
1282
1283	%\end{landscape}
1284
1285	\newpage
1286	\section{Example VOTable output}
1287	\label{app-votable}
1288	This is part of the VOTable, in XML format, corresponding to the
1289	output file in Appendix~\ref{app-output} (the indentation has been removed to make it fit on the page!).
1290
1291	%\begin{landscape}
1292	{\scriptsize
1293	\begin{verbatim}
1294	<?xml version="1.0"?>
1295	<VOTABLE version="1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
1296	xsi:noNamespaceSchemaLocation="http://www.ivoa.net/xml/VOTable/VOTable/v1.1">
1297	<COOSYS ID="J2000" equinox="J2000." epoch="J2000." system="eq_FK5"/>
1298	<RESOURCE name="Duchamp Output">
1299	<TABLE name="Detections">
1300	<DESCRIPTION>Detected sources and parameters from running the Duchamp source finder.</DESCRIPTION>
1301	<PARAM name="FITS file" datatype="char" ucd="meta.file;meta.fits" value="/DATA/SITAR_1/whi550/cubes/H201_abcde_luther_chop.fits"/>
1302	<PARAM name="Threshold" datatype="float" ucd="stat.snr" value="2.5">
1303	<PARAM name="ATrous note" datatype="char" ucd="meta.note" value="The a trous reconstruction method was used, with the following parameters.">
1304	<PARAM name="ATrous Cut" datatype="float" ucd="stat.snr" value="4">
1305	<PARAM name="ATrous Minimum Scale" datatype="int" ucd="stat.param" value="1">
1306	<PARAM name="ATrous Filter" datatype="char" ucd="meta.code;stat" value="B3 spline function">
1307	<FIELD name="ID" ID="col1" ucd="meta.id" datatype="int" width="4"/>
1308	<FIELD name="Name" ID="col2" ucd="meta.id;meta.main" datatype="char" arraysize="14"/>
1309	<FIELD name="RA" ID="col3" ucd="pos.eq.ra;meta.main" ref="J2000" datatype="float" width="10" precision="6" unit="deg"/>
1310	<FIELD name="Dec" ID="col4" ucd="pos.eq.dec;meta.main" ref="J2000" datatype="float" width="10" precision="6" unit="deg"/>
1311	<FIELD name="w_RA" ID="col3" ucd="phys.angSize;pos.eq.ra" ref="J2000" datatype="float" width="7" precision="2" unit="arcmin"/>
1312	<FIELD name="w_Dec" ID="col4" ucd="phys.angSize;pos.eq.dec" ref="J2000" datatype="float" width="7" precision="2" unit="arcmin"/>
1313	<FIELD name="Vel" ID="col4" ucd="phys.veloc;src.dopplerVeloc" datatype="float" width="9" precision="3" unit="km/s"/>
1314	<FIELD name="w_Vel" ID="col4" ucd="phys.veloc;src.dopplerVeloc;spect.line.width" datatype="float" width="8" precision="3" unit="km/s"/>
1315	<FIELD name="Integrated_Flux" ID="col4" ucd="phys.flux;spect.line.intensity" datatype="float" width="10" precision="3" unit="km/s"/>
1316	<DATA>
1317	<TABLEDATA>
1318	<TR>
1319	<TD> 1</TD><TD> J0609-2200</TD><TD> 92.410416</TD><TD>-22.013390</TD><TD> 48.50</TD><TD> 39.42</TD><TD> 213.061</TD><TD> 65.957</TD><TD> 17.572</TD>
1320	</TR>
1321	<TR>
1322	<TD> 2</TD><TD> J0608-2605</TD><TD> 92.042633</TD><TD>-26.085157</TD><TD> 44.47</TD><TD> 39.47</TD><TD> 233.119</TD><TD> 39.574</TD><TD> 4.144</TD>
1323	</TR>
1324	<TR>
1325	<TD> 3</TD><TD> J0606-2724</TD><TD> 91.637840</TD><TD>-27.412022</TD><TD> 52.48</TD><TD> 47.57</TD><TD> 302.213</TD><TD> 39.574</TD><TD> 17.066</TD>
1326	</TR>
1327	(... table truncated for clarity ...)
1328	</TABLEDATA>
1329	</DATA>
1330	</TABLE>
1331	</RESOURCE>
1332	</VOTABLE>
1333	\end{verbatim}
1334	}
1335	\end{landscape}
1336
1337	\newpage
1338	\section{Example Karma Annotation File output}
1339	\label{app-karma}
1340
1341	This is the format of the Karma Annotation file, showing the locations
1342	of the detected objects. This can be loaded by the plotting tools of
1343	the Karma package (for instance, {\tt kvis}) as an overlay on the FITS
1344	file.
1345
1346	\begin{verbatim}
1347	# Duchamp Source Finder results for
1348	# cube /DATA/SITAR_1/whi550/cubes/H201_abcde_luther_chop.fits
1349	COLOR RED
1350	COORD W
1351	CIRCLE 92.3376 -21.9475 0.403992
1352	TEXT 92.3376 -21.9475 1
1353	CIRCLE 91.9676 -26.0193 0.37034
1354	TEXT 91.9676 -26.0193 2
1355	CIRCLE 91.5621 -27.3459 0.437109
1356	TEXT 91.5621 -27.3459 3
1357	CIRCLE 92.8285 -21.6344 0.269914
1358	TEXT 92.8285 -21.6344 4
1359	CIRCLE 90.1381 -28.9838 0.234179
1360	TEXT 90.1381 -28.9838 5
1361	CIRCLE 89.72 -26.6513 0.132743
1362	TEXT 89.72 -26.6513 6
1363	CIRCLE 94.2743 -27.4003 0.195175
1364	TEXT 94.2743 -27.4003 7
1365	CIRCLE 92.2739 -21.6941 0.134538
1366	TEXT 92.2739 -21.6941 8
1367	CIRCLE 89.7133 -25.4259 0.232252
1368	TEXT 89.7133 -25.4259 9
1369	CIRCLE 90.2206 -21.6993 0.266247
1370	TEXT 90.2206 -21.6993 10
1371	CIRCLE 93.8581 -26.5766 0.163153
1372	TEXT 93.8581 -26.5766 11
1373	CIRCLE 91.176 -26.1064 0.234356
1374	TEXT 91.176 -26.1064 12
1375	CIRCLE 90.2844 -23.6716 0.299509
1376	TEXT 90.2844 -23.6716 13
1377	CIRCLE 93.8774 -22.581 0.130925
1378	TEXT 93.8774 -22.581 14
1379	CIRCLE 94.3882 -23.0934 0.137108
1380	TEXT 94.3882 -23.0934 15
1381	CIRCLE 93.0491 -21.8223 0.202928
1382	TEXT 93.0491 -21.8223 16
1383	CIRCLE 94.0685 -21.5603 0.168456
1384	TEXT 94.0685 -21.5603 17
1385	CIRCLE 86.0568 -27.6095 0.101113
1386	TEXT 86.0568 -27.6095 18
1387	CIRCLE 88.7932 -29.9453 0.202624
1388	TEXT 88.7932 -29.9453 19
1389	\end{verbatim}
1390
1391	\newpage
1392	\section{Installing Duchamp (README file)}
1393	\begin{verbatim}
1394	There is an executable (Duchamp) that has been compiled on a Debian
1395	Linux kernel 2.6.8-2-686, with gcc version 3.3.5 (Debian 1:3.3.5-13)
1396
1397	If that is no good to you, you can compile it yourself using the
1398	Makefile included in this directory (sorry for not having a configure
1399	script or similar yet!).
1400
1401	Duchamp uses three main external libraries: pgplot, cfitsio and
1402	wcslib. You will need to set the paths for the base directory and
1403	three libraries, as they are currently configured for my use and will
1404	not be of much use to you! These are:
1405
1406	BASE --> the current directory
1407	PGDIR --> where the pgplot libraries (and header files) are located
1408	CFITSIODIR --> where the header file fitsio.h is
1409	CFITSIOLDIR --> where the cfitsio library is located (libcfitsio.a)
1410	WCSDIR --> where the wcslib header files are
1411	WCSLDIR --> where the wcslib library is located (libwcs.a)
1412
1413	If you do not have the libraries, they can be downloaded from the
1414	following locations:
1415	PGPlot -- http://www.astro.caltech.edu/~tjp/pgplot/
1416	cfitsio -- http://heasarc.gsfc.nasa.gov/docs/software/fitsio/fitsio.html
1417	wcslib -- http://www.atnf.csiro.au/people/Mark.Calabretta/WCS/index.html
1418
1419	Once you've set up the Makefile correctly, then simply typing
1420	> make duchamp
1421	will compile the program.
1422
1423	To run it, you need to use the syntax
1424	> Duchamp -p parameterFile
1425	where parameterFile is a file with the input parameters, including the
1426	name of the cube you want to search.
1427
1428	There are two example input files included with the distribution. The
1429	smaller one, InputExample, shows the typical parameters one might want
1430	to set. The large one, InputComplete, lists all parameters that can be
1431	entered, and a brief description of them. Refer to the documentation
1432	for further details.
1433
1434	To get going quickly, just replace the "your-file-here" in
1435	InputExample with your image name, and type
1436	> Duchamp -p InputExample
1437	and you're off!
1438	\end{verbatim}
1439
1440	\section{Robust statistics for a Normal distribution}
1441	\label{app-madfm}
1442
1443	The Normal, or Gaussian, distribution for mean $\mu$ and standard
1444	deviation $\sigma$ can be written as
1445	\[
1446	f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}\ e^{-(x-\mu)^2/2\sigma^2}.
1447	\]
1448
1449	When one has a purely Gaussian signal, it is straightforward to
1450	estimate $\sigma$ by calculating the standard deviation (or rms) of
1451	the data. However, if there is a small amount of signal present on top
1452	of Gaussian noise, and one wants to estimate the $\sigma$ for the
1453	noise, the presence of the large values from the signal can bias the
1454	estimator to higher values.
1455
1456	An alternative way is to use the median ($m$) and median absolute deviation
1457	from the median ($s$) to estimate $\mu$ and $\sigma$. The median is the
1458	middle of the distribution, defined for a continuous distribution by
1459	\[
1460	\int_{-\infty}^{m} f(x) \diff x = \int_{m}^{\infty} f(x) \diff x.
1461	\]
1462	From symmetry, we quickly see that for the continuous Normal
1463	distribution, $m=\mu$. We consider the case henceforth of $\mu=0$,
1464	without loss of generality.
1465
1466	To find $s$, we find the distribution of the absolute deviation from
1467	the median, and then find the median of that distribution. This
1468	distribution is given by
1469	\begin{eqnarray*}
1470	g(x) &= &{\mbox{\rm distribution of }} \|x\|\\
1471	&= &f(x) + f(-x),\ x\ge0\\
1472	&= &\sqrt{\frac{2}{\pi\sigma^2}}\, e^{-x^2/2\sigma^2},\ x\ge0.
1473	\end{eqnarray*}
1474	So, the median absolute deviation from the median, $s$, is given by
1475	\[
1476	\int_{0}^{s} g(x) \diff x = \int_{s}^{\infty} g(x) \diff x.
1477	\]
1478	Now, $\int_{0}^{\infty}e^{-x^2/2\sigma^2} \diff x = \sqrt{\pi\sigma^2/2}$, and
1479	so $\int_{s}^{\infty} e^{-x^2/2\sigma^2} \diff x =
1480	\sqrt{\pi\sigma^2/2} - \int_{0}^{s} e^{-\frac{x^2}{2\sigma^2}} \diff x
1481	$. Hence, to find $s$ we simply solve the following equation (setting $\sigma=1$ for
1482	simplicity -- equivalent to stating $x$ and $s$ in units of $\sigma$):
1483	\[
1484	\int_{0}^{s}e^{-x^2/2} \diff x - \sqrt{\pi/8} = 0.
1485	\]
1486	This is hard to solve analytically (no nice analytic solution exists
1487	for the finite integral that I'm aware of), but straightforward to
1488	solve numerically, yielding the value of $s=0.6744888$. Thus, to
1489	estimate $\sigma$ for a Normally distributed data set, one can calculate
1490	$s$, then divide by 0.6744888 (or multiply by 1.4826042) to obtain the
1491	correct estimator.
1492
1493	Note that this is different to solutions quoted elsewhere,
1494	specifically in \citet{meyer04:trunc}, where the same robust estimator
1495	is used but with an incorrect conversion to standard deviation -- they
1496	assume $\sigma = s\sqrt{\pi/2}$. This, in fact, is the conversion used
1497	to convert the {\it mean} absolute deviation from the mean to the
1498	standard deviation. This means that the cube noise in the \hipass\
1499	catalogue (their parameter Rms$_{\rm cube}$) should be 18\% larger
1500	than quoted.
1501
1502	\section{How Gaussian noise changes with wavelet scale.}
1503	\label{app-scaling}
1504
1505	The key element in the wavelet reconstruction of an array is the
1506	thresholding of the individual wavelet coefficient arrays. This is
1507	usually done by choosing a level to be some number of standard
1508	deviations above the mean value.
1509
1510	However, since the wavelet arrays are produced by convolving the input
1511	array by an increasingly large filter, the pixels in the coefficient
1512	arrays become increasingly correlated as the scale of the filter
1513	increases. This results in the measured standard deviation from a
1514	given coefficient array decreasing with increasing scale. To calculate
1515	this, we need to take into account how many other pixels each pixel in
1516	the convolved array depends on.
1517
1518	To demonstrate, suppose we have a 1-D array with $N$ pixel values
1519	given by $F_i,\ i=1,...,N$, and we convolve it with the B$_3$-spline
1520	filter, defined by the set of coefficients
1521	$\{1/16,1/4,3/8,1/4,1/16\}$. The flux of the $i$th pixel in the
1522	convolved array will be
1523	\[
1524	F'_i = \frac{1}{16}F_{i-2} + \frac{1}{16}F_{i-2} + \frac{3}{8}F_{i}
1525	+ \frac{1}{4}F_{i-1} + \frac{1}{16}F_{i+2}
1526	\]
1527	and the flux of the corresponding pixel in the wavelet array will be
1528	\[
1529	W'_i = F_i - F'_i = \frac{1}{16}F_{i-2} + \frac{1}{16}F_{i-2} + \frac{5}{8}F_{i}
1530	+ \frac{1}{4}F_{i-1} + \frac{1}{16}F_{i+2}
1531	\]
1532	Now, assuming each pixel has the same standard deviation
1533	$\sigma_i=\sigma$, we can work out the standard deviation for the
1534	coefficient array:
1535	\[
1536	\sigma'_i = \sigma \sqrt{\left(\frac{1}{16}\right)^2 + \left(\frac{1}{4}\right)^2
1537	+ \left(\frac{5}{8}\right)^2 + \left(\frac{1}{4}\right)^2 + \left(\frac{1}{16}\right)^2}
1538	= 0.72349\ \sigma
1539	\]
1540	Thus, the first scale wavelet coefficient array will have a standard
1541	deviation of 72.3\% of the input array. This procedure can be followed
1542	to calculate the necessary values for all scales, dimensions and
1543	filters used by Duchamp.
1544
1545	Calculating these values is, therefore, a critical step in performing
1546	the reconstruction. \citet{starck02:book} did so by simulating data sets
1547	with Gaussian noise, taking the wavelet transform, and measuring the
1548	value of $\sigma$ for each scale. We take a different approach, by
1549	calculating the scaling factors directly from the filter coefficients
1550	by taking the wavelet transform of an array made up of a 1 in the
1551	central pixel and 0s everywhere else. The scaling value is then
1552	derived by adding in quadrature all the wavelet coefficient values at
1553	each scale. We give the scaling factors for the three filters
1554	available to Duchamp on the following page. These values are
1555	hard-coded into Duchamp, so no on-the-fly calculation of them is
1556	necessary.
1557
1558	Memory limitations prevent us from calculating factors for large
1559	scales, particularly for the three-dimensional case (hence the --
1560	symbols in the tables). To calculate factors for
1561	higher scales than those available, we note the following
1562	relationships apply for large scales to a sufficient level of precision:
1563	\begin{itemize}
1564	\item 1-D: factor(scale $i$) = factor(scale $i-1$)$/\sqrt{2}$.
1565	\item 2-D: factor(scale $i$) = factor(scale $i-1$)$/2$.
1566	\item 1-D: factor(scale $i$) = factor(scale $i-1$)$/\sqrt{8}$.
1567	\end{itemize}
1568
1569	\newpage
1570	\begin{itemize}
1571	\item {\bf B$_3$-Spline Function:} $\{1/16,1/4,3/8,1/4,1/16\}$
1572
1573	\begin{tabular}{llll}
1574	Scale & 1 dimension & 2 dimension & 3 dimension\\ \hline
1575	1 & 0.723489806 & 0.890796310 & 0.956543592\\
1576	2 & 0.285450405 & 0.200663851 & 0.120336499\\
1577	3 & 0.177947535 & 0.0855075048 & 0.0349500154\\
1578	4 & 0.122223156 & 0.0412474444 & 0.0118164242\\
1579	5 & 0.0858113122 & 0.0204249666 & 0.00413233507\\
1580	6 & 0.0605703043 & 0.0101897592 & 0.00145703714\\
1581	7 & 0.0428107206 & 0.00509204670 & 0.000514791120\\
1582	8 & 0.0302684024 & 0.00254566946 & --\\
1583	9 & 0.0214024008 & 0.00127279050 & --\\
1584	10 & 0.0151336781 & 0.000636389722 & --\\
1585	11 & 0.0107011079 & 0.000318194170 & --\\
1586	12 & 0.00756682272 & -- & --\\
1587	13 & 0.00535055108 & -- & --\\
1588	%14 & 0.00378341085 & -- & --\\
1589	%15 & 0.00267527545 & -- & --\\
1590	%16 & 0.00189170541 & -- & --\\
1591	%17 & 0.00133763772 & -- & --\\
1592	%18 & 0.000945852704 & -- & --
1593	\end{tabular}
1594
1595	\item {\bf Triangle Function:} $\{1/4,1/2,1/4\}$
1596
1597	\begin{tabular}{llll}
1598	Scale & 1 dimension & 2 dimension & 3 dimension\\ \hline
1599	1 & 0.612372436 & 0.800390530 & 0.895954449 \\
1600	2 & 0.330718914 & 0.272878894 & 0.192033014\\
1601	3 & 0.211947812 & 0.119779282 & 0.0576484078\\
1602	4 & 0.145740298 & 0.0577664785 & 0.0194912393\\
1603	5 & 0.102310944 & 0.0286163283 & 0.00681278387\\
1604	6 & 0.0722128185 & 0.0142747506 & 0.00240175885\\
1605	7 & 0.0510388224 & 0.00713319703 & 0.000848538128 \\
1606	8 & 0.0360857673 & 0.00356607618 & 0.000299949455 \\
1607	9 & 0.0255157615 & 0.00178297280 & -- \\
1608	10 & 0.0180422389 & 0.000891478237 & -- \\
1609	11 & 0.0127577667 & 0.000445738098 & -- \\
1610	12 & 0.00902109930 & 0.000222868922 & -- \\
1611	13 & 0.00637887978 & -- & -- \\
1612	%14 & 0.00451054902 & -- & -- \\
1613	%15 & 0.00318942978 & -- & -- \\
1614	%16 & 0.00225527449 & -- & -- \\
1615	%17 & 0.00159471988 & -- & -- \\
1616	%18 & 0.000112763724 & -- & --
1617
1618	\end{tabular}
1619
1620	\item {\bf Haar Wavelet:} $\{0,1/2,1/2\}$
1621
1622	\begin{tabular}{llll}
1623	Scale & 1 dimension & 2 dimension & 3 dimension\\ \hline
1624	1 & 0.707167810 & 0.433012702 & 0.935414347 \\
1625	2 & 0.500000000 & 0.216506351 & 0.330718914\\
1626	3 & 0.353553391 & 0.108253175 & 0.116926793\\
1627	4 & 0.250000000 & 0.0541265877 & 0.0413398642\\
1628	5 & 0.176776695 & 0.0270632939 & 0.0146158492\\
1629	6 & 0.125000000 & 0.0135316469 & 0.00516748303
1630
1631	\end{tabular}
1632
1633
1634	\end{itemize}
1635
1636	\end{document}
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669

Note: See TracBrowser for help on using the repository browser.

Download in other formats: