Context Navigation

source: trunk/docs/Guide.tex @ 154

Visit:

Last change on this file since 154 was 154, checked in by Matthew Whiting, 18 years ago
Added the verification cube to the verification directory (rather than re-creating it each time) Added text to README and Guide about the verification process.
File size: 84.2 KB

Line
1	\documentclass[12pt,a4paper]{article}
2
3	%%%%%% LINE SPACING %%%%%%%%%%%%
4	\usepackage{setspace}
5	\singlespacing
6	%\onehalfspacing
7	%\doublespacing
8
9	%Define a test for doing PDF format -- use different code below
10	\newif\ifPDF
11	\ifx\pdfoutput\undefined\PDFfalse
12	\else\ifnum\pdfoutput > 0\PDFtrue
13	\else\PDFfalse
14	\fi
15	\fi
16
17	\textwidth=161 mm
18	\textheight=245 mm
19	\topmargin=-15 mm
20	\oddsidemargin=0 mm
21	\parindent=6 mm
22
23	\usepackage[sort]{natbib}
24	\usepackage{lscape}
25	\bibpunct[,]{(}{)}{;}{a}{}{,}
26
27	\newcommand{\eg}{e.g.\ }
28	\newcommand{\ie}{i.e.\ }
29	\newcommand{\hi}{H{\sc i}}
30	\newcommand{\hipass}{{\sc hipass}}
31	\newcommand{\duchamp}{\emph{Duchamp}}
32	\newcommand{\atrous}{\textit{{\`a} trous}}
33	\newcommand{\Atrous}{\textit{{\`A} trous}}
34	\newcommand{\diff}{{\rm d}}
35	\newcommand{\entrylabel}[1]{\mbox{\textsf{\bf{#1:}}}\hfil}
36	\newenvironment{entry}
37	{\begin{list}{}%
38	{\renewcommand{\makelabel}{\entrylabel}%
39	\setlength{\labelwidth}{30mm}%
40	\setlength{\labelsep}{5pt}%
41	\setlength{\itemsep}{2pt}%
42	\setlength{\parsep}{2pt}%
43	\setlength{\leftmargin}{35mm}%
44	}%
45	}%
46	{\end{list}}
47
48
49	\title{Source Detection with \duchamp\ v1.0\\A User's Guide}
50	\author{Matthew Whiting\\
51	Australia Telescope National Facility\\CSIRO}
52	\date{}
53
54	% If we are creating a PDF, use different options for graphicx, hyperref.
55	\ifPDF
56	\usepackage[pdftex]{graphicx,color}
57	\usepackage[pdftex]{hyperref}
58	\hypersetup{colorlinks=true,%
59	citecolor=red,%
60	filecolor=red,%
61	linkcolor=red,%
62	urlcolor=red,%
63	}
64	\else
65	\usepackage[dvips]{graphicx}
66	\usepackage[dvips]{hyperref}
67	\fi
68
69	\pagestyle{headings}
70	\begin{document}
71
72	\maketitle
73	\thispagestyle{empty}
74	\begin{figure}[!h]
75	\begin{center}
76	\includegraphics[width=\textwidth]{cover_image}
77	\end{center}
78	\end{figure}
79
80	\newpage
81	\tableofcontents
82
83	\newpage
84	\section{Introduction and getting going quickly}
85
86	This document provides a user's guide to \duchamp, an object-finder
87	for use on spectral-line data cubes. The basic execution of \duchamp\
88	is to read in a FITS data cube, find sources in the cube, and produce
89	a text file of positions, velocities and fluxes of the detections, as
90	well as a postscript file of the spectra of each detection.
91
92	So, you have a FITS cube, and you want to find the sources in it. What
93	do you do? First, you need to get \duchamp: there are instructions in
94	Appendix~\ref{app-install} for obtaining and installing it. Once you
95	have it running, the first step is to make an input file that contains
96	the list of parameters. Brief and detailed examples are shown in
97	Appendix~\ref{app-input}. This file provides the input file name, the
98	various output files, and defines various parameters that control the
99	execution.
100
101	The standard way to run \duchamp\ is by the command
102	\begin{quote}
103	\texttt{Duchamp -p [parameter file]}
104	\end{quote}
105	replacing \texttt{[parameter file]} with the name of the file listing
106	the parameters.
107
108	An even easier way is to use the default values for all parameters
109	(these are given in Appendix~\ref{app-param} and in the file
110	InputComplete included in the distribution directory) and use the
111	syntax
112	\begin{quote}
113	\texttt{Duchamp -f [FITS file]}
114	\end{quote}
115	where \texttt{[FITS file]} is the file you wish to search.
116
117	In either case, the program will then work away and give you the list
118	of detections and their spectra. The program execution is summarised
119	below, and detailed in \S\ref{sec-flow}. Information on inputs is in
120	\S\ref{sec-param} and Appendix~\ref{app-param}, and descriptions of
121	the output is in \S\ref{sec-output}.
122
123	\subsection{A summary of the execution steps}
124
125	The basic flow of the program is summarised here -- all steps are
126	discussed in more detail in the following sections.
127	\begin{enumerate}
128	\item If the \texttt{-p} option is used, the parameter file given on
129	the command line is read in, and the parameters absorbed.
130	\item The FITS image is located and read in to memory.
131	\item If requested, a FITS image with a previously reconstructed array
132	is read in.
133	\item If requested, BLANK pixels are trimmed from the edges, and
134	the baseline of each spectrum is removed.
135	\item If the reconstruction method is requested, and the reconstructed
136	array has not been read in at Step 3 above, the cube is
137	reconstructed using the \atrous\ wavelet method.
138	\item Searching for objects then takes place, using the requested
139	thresholding method.
140	\item The list of objects is condensed by merging neighbouring objects
141	and removing those deemed unacceptable.
142	\item The baselines and trimmed pixels are replaced prior to output.
143	\item The details of the detections are written to screen and to the
144	requested output file.
145	\item Maps showing the spatial location of the detections are written.
146	\item The integrated spectra of each detection are written to a
147	postscript file.
148	\item If requested, the reconstructed array can be written to a new
149	FITS file.
150	\end{enumerate}
151
152	\subsection{Guide to terminology and conventions}
153
154	First, a brief note on the use of terminology in this guide. \duchamp\
155	is designed to work on FITS ``cubes''. These are FITS\footnote{FITS is
156	the Flexible Image Transport System -- see \citet{hanisch01} or
157	websites such as
158	\href{http://fits.cv.nrao.edu/FITS.html}{http://fits.cv.nrao.edu/FITS.html}
159	for details.} image arrays with three dimensions -- they are assumed
160	to have the following form: the first two dimensions (referred to as
161	$x$ and $y$) are spatial directions (that is, relating to the position
162	on the sky), while the third dimension, $z$, is the spectral
163	direction, which can correspond to frequency, wavelength, or
164	velocity. The three dimensional analogue of pixels are ``voxels'', or
165	volume cells -- a voxel is defined by a unique $(x,y,z)$ location and
166	has a unique flux or intensity value associated with it.
167
168	Note that it is possible for the FITS file to have more than three
169	dimensions (for instance, a fourth dimension representing Stokes
170	parameters). \duchamp\ assumes that these extra dimensions occur
171	\emph{after} the spatial and spectral ones -- the data will not be
172	read correctly if the order is different! Herein, we discuss the data
173	in terms of the three basic dimensions, but you should be aware it is
174	possible to have more than three.
175
176	Each spatial pixel (a given $(x,y)$ coordinate) can be said to be a
177	single spectrum, while a slice through the cube perpendicular to the
178	spectral direction at a given $z$-value is a single channel (the 2-D
179	image is a channel map).
180
181	Detection involves locating a contiguous group of voxels with fluxes
182	above a certain threshold. \duchamp\ makes no assumptions as to the
183	size or shape of the detected features, other than having
184	user-selected minimum size criteria.
185
186	Features that are detected are assumed to be positive. The user can
187	choose to search for negative features by setting an input parameter
188	-- this inverts the cube prior to the search (see
189	\S\ref{sec-detection} for details).
190
191	Note that it is possible to run \duchamp\ on a two-dimensional image
192	(\ie one with no frequency or velocity information), or indeed a
193	one-dimensional array, and many of the features of the program will
194	work fine. The focus, however, is on object detection in three
195	dimensions.
196
197	\subsection{Why \duchamp?}
198
199	Well, it's important for a program to have a name, and the initial
200	working title of \emph{cubefind} was somewhat uninspiring. I wanted to
201	avoid the classic astronomical approach of designing a cute acronym,
202	and since it is designed to work on cubes, I looked at naming it after
203	a cubist. \emph{Picasso}, sadly, was already taken \citep{minchin99},
204	so I settled on naming it after Marcel Duchamp, another cubist, but
205	also one of the first artists to work with ``found objects''.
206
207	\section{User Inputs}
208	\label{sec-param}
209
210	Input to the program is provided by means of a parameter
211	file. Parameters are listed in the file, followed by the value that
212	should be assigned to them. The syntax used is \texttt{paramName
213	value}. Parameter names are not case-sensitive, and lines in the input
214	file that start with \texttt{\#} are ignored. If a parameter is listed
215	more than once, the latter value is used, but otherwise the order in
216	which the parameters are listed in the input file is
217	arbitrary. Example input files can be seen in
218	Appendix~\ref{app-input}.
219
220	If a parameter is not listed, the default value is assumed. The
221	defaults are chosen to provide a good result (using the reconstruction
222	method), so the user doesn't need to specify many new parameters in
223	the input file. Note that the image file \textbf{must} be specified!
224	The parameters that can be set are listed in Appendix~\ref{app-param},
225	with their default values in parentheses.
226
227	The parameters with names starting with \texttt{flag} are stored as
228	\texttt{bool} variables, and so are either \texttt{true = 1} or
229	\texttt{false = 0}. They can be entered in the file either in text or
230	integer format -- \duchamp\ will read them correctly in either case.
231
232	\section{What \duchamp\ is doing}
233	\label{sec-flow}
234
235	The execution flow of \duchamp\ is detailed here, indicating the main
236	algorithmic steps that are used. The program is written in C/C++ and
237	makes use of the {\sc cfitsio}, {\sc wcslib} and {\sc pgplot}
238	libraries.
239
240	\subsection{Image input}
241	\label{sec-input}
242
243	The cube is read in using basic {\sc cfitsio} commands, and stored as
244	an array in a special C++ class. This class keeps track of the list of
245	detected objects, as well as any reconstructed arrays that are made
246	(see \S\ref{sec-recon}). The World Coordinate System (WCS) information
247	for the cube is also obtained from the FITS header by {\sc wcslib}
248	functions \citep{greisen02, calabretta02}, and this information, in
249	the form of a \texttt{wcsprm} structure, is also stored in the same
250	class.
251
252	A sub-section of an image can be requested via the \texttt{subsection}
253	parameter in the parameter file -- this can be a good idea if the cube
254	has very noisy edges, which may produce many spurious detections. The
255	generalised form of the subsection that is used by {\sc cfitsio} is
256	\texttt{[x1:x2:dx,y1:y2:dy,z1:z2:dz]}, such that the x-coordinates run
257	from \texttt{x1} to \texttt{x2} (inclusive), with steps of
258	\texttt{dx}. The step value can be omitted (so a subsection of the
259	form \texttt{[2:50,2:50,10:1000]} is still valid). \duchamp\ does not
260	make use of any step value present in the subsection string, and any
261	that are present are removed before the file is opened.
262
263	If one wants the full range of a coordinate then replace the range
264	with an asterisk, \eg \texttt{[2:50,2:50,*]}. If one wants to use a
265	subsection, one must set \texttt{flagSubsection = 1}. A complete
266	description of the section syntax can be found at the {\sc fitsio} web
267	site
268	\footnote{
269	\href{http://heasarc.gsfc.nasa.gov/docs/software/fitsio/c/c\_user/node90.html}%
270	{http://heasarc.gsfc.nasa.gov/docs/software/fitsio/c/c\_user/node90.html}}.
271
272	\subsection{Image modification}
273	\label{sec-modify}
274
275	Several modifications to the cube can be made that improve the
276	execution and efficiency of \duchamp\ (these are optional -- their
277	use is indicated by the relevant flags set in the input parameter
278	file).
279
280	\subsubsection{BLANK pixel removal}
281
282	If the imaged area of a cube is non-rectangular, BLANK pixels are used
283	to pad it out to a rectangular shape. The value of these pixels is
284	given by the FITS header keywords BLANK, BSCALE and BZERO. While these
285	pixels make the image a nice shape, they will unnecessarily interfere
286	with the processing (as well as taking up needless memory). The first
287	step, then, is to trim them from the edge. This is done when the
288	parameter \texttt{flagBlankPix=true}. If the above keywords are not
289	present, the user can specify the BLANK value by the parameter
290	\texttt{blankPixValue}.
291
292	Removing BLANK pixels is particularly important for the reconstruction
293	step, as lots of BLANK pixels on the edges will smooth out features in
294	the wavelet calculation stage. The trimming will also reduce the size
295	of the cube's array, speeding up the execution. The amount of trimming
296	is recorded, and these pixels are added back in once the
297	source-detection is completed (so that quoted pixel positions are
298	applicable to the original cube).
299
300	Rows and columns are trimmed one at a time until the first non-BLANK
301	pixel is reached, so that the image remains rectangular. In practice,
302	this means that there will be BLANK pixels left in the trimmed image
303	(if the non-BLANK region is non-rectangular). However, these are
304	ignored in all further calculations done on the cube.
305
306	\subsubsection{Baseline removal}
307
308	Second, the user may request the removal of baselines from the
309	spectra, via the parameter \texttt{flagBaseline}. This may be
310	necessary if there is a strong baseline ripple present, which can
311	result in spurious detections at the high points of the ripple. The
312	baseline is calculated from a wavelet reconstruction procedure (see
313	\S\ref{sec-recon}) that keeps only the two largest scales. This is
314	done separately for each spatial pixel (\ie for each spectrum in the
315	cube), and the baselines are stored and added back in before any
316	output is done. In this way the quoted fluxes and displayed spectra
317	are as one would see from the input cube itself -- even though the
318	detection (and reconstruction if applicable) is done on the
319	baseline-removed cube.
320
321	The presence of very strong signals (for instance, masers at several
322	hundred Jy) can affect the determination of the baseline, leading to a
323	large dip centred on the signal in the baseline-subtracted
324	spectrum. To prevent this, the signal is trimmed prior to the
325	reconstruction process at some standard threshold (at $8\sigma$ above
326	the mean). The baseline determined should thus be representative of
327	the true, signal-free baseline. Note that this trimming is only a
328	temporary measure which does not affect the source-detection.
329
330	\subsubsection{Ignoring bright Milky Way emission}
331
332	Finally, a single set of contiguous channels can be ignored -- these
333	may exhibit very strong emission, such as that from the Milky Way as
334	seen in extragalactic \hi\ cubes (hence the references to ``Milky
335	Way'' in relation to this task -- apologies to Galactic
336	astronomers!). Such dominant channels will produce many detections
337	that are unnecessary, uninteresting (if one is interested in
338	extragalactic \hi) and large (in size and hence in memory usage), and
339	so will slow the program down and detract from the interesting
340	detections.
341
342	The use of this feature is controlled by the \texttt{flagMW}
343	parameter, and the exact channels concerned are able to be set by the
344	user (using \texttt{maxMW} and \texttt{minMW} -- these give an
345	inclusive range of channels). When employed, these channels are
346	ignored for the searching, and the scaling of the spectral output (see
347	Fig.~\ref{fig-spect}) will not take them into account. They will be
348	present in the reconstructed array, however, and so will be included
349	in the saved FITS file (see \S\ref{sec-reconIO}). When the final
350	spectra are plotted, the range of channels covered by these parameters
351	is indicated by a green hashed box.
352
353	\subsection{Image reconstruction}
354	\label{sec-recon}
355
356	The user can direct \duchamp\ to reconstruct the data cube using the
357	\atrous\ wavelet procedure. A good description of the procedure can be
358	found in \citet{starck02:book}. The reconstruction is an effective way
359	of removing a lot of the noise in the image, allowing one to search
360	reliably to fainter levels, and reducing the number of spurious
361	detections. This is an optional step, but one that greatly enhances
362	the source-detection process, with the payoff that it can be
363	relatively time- and memory-intensive.
364
365	\subsubsection{Algorithm}
366
367	The steps in the \atrous\ reconstruction are as follows:
368	\begin{enumerate}
369	\item Set the reconstructed array to 0 everywhere.
370	\item The input array is discretely convolved with a given filter
371	function. This is determined from the parameter file via the
372	\texttt{filterCode} parameter -- see Appendix~\ref{app-param} for
373	details on the filters available.
374	\item The wavelet coefficients are calculated by taking the difference
375	between the convolved array and the input array.
376	\item If the wavelet coefficients at a given point are above the
377	requested threshold (given by \texttt{snrRecon} as the number of
378	$\sigma$ above the mean and adjusted to the current scale -- see
379	Appendix~\ref{app-scaling}), add these to the reconstructed array.
380	\item The separation of the filter coefficients is doubled. (Note that
381	this step provides the name of the procedure\footnote{\atrous\ means
382	``with holes'' in French.}, as gaps or holes are created in the
383	filter coverage.)
384	\item The procedure is repeated from step 2, using the convolved array
385	as the input array.
386	\item Continue until the required maximum number of scales is reached.
387	\item Add the final smoothed (\ie convolved) array to the
388	reconstructed array. This provides the ``DC offset'', as each of the
389	wavelet coefficient arrays will have zero mean.
390	\end{enumerate}
391
392	The reconstruction has at least two iterations. The first iteration
393	makes a first pass at the wavelet reconstruction (the process outlined
394	in the 8 stages above), but the residual array will inevitably have
395	some structure still in it, so the wavelet filtering is done on the
396	residual, and any significant wavelet terms are added to the final
397	reconstruction. This step is repeated until the change in the $\sigma$
398	of the background is less than some fiducial amount.
399
400	It is important to note that the \atrous\ decomposition is an example
401	of a ``redundant'' transformation. If no thresholding is performed,
402	the sum of all the wavelet coefficient arrays and the final smoothed
403	array is identical to the input array. The thresholding thus removes
404	only the unwanted structure in the array.
405
406	Note that any BLANK pixels that are still in the cube will not be
407	altered by the reconstruction -- they will be left as BLANK so that
408	the shape of the valid part of the cube is preserved.
409
410	\subsubsection{Note on Statistics}
411
412	The correct calculation of the reconstructed array needs good
413	estimation of the underlying mean and standard deviation of the
414	background noise distribution. These statistics are estimated using
415	robust methods, to avoid corruption by strong outlying points. The
416	mean of the distribution is actually estimated by the median, while
417	the median absolute deviation from the median (MADFM) is calculated
418	and corrected assuming Gaussianity to estimate the underlying standard
419	deviation $\sigma$. The Gaussianity (or Normality) assumption is
420	critical, as the MADFM does not give the same value as the usual rms
421	or standard deviation value -- for a normal distribution
422	$N(\mu,\sigma)$ we find MADFM$=0.6744888\sigma$. The difference
423	between the MADFM and $\sigma$ is corrected for, so the user need only
424	think in the usual multiples of $\sigma$ when setting
425	\texttt{snrRecon}. See Appendix~\ref{app-madfm} for a derivation of
426	this value.
427
428	When thresholding the different wavelet scales, the value of $\sigma$
429	as measured from the wavelet array needs to be scaled to account for
430	the increased amount of correlation between neighbouring pixels (due
431	to the convolution). See Appendix~\ref{app-scaling} for details on
432	this scaling.
433
434	\subsubsection{User control of reconstruction parameters}
435
436	The most important parameter for the user to select in relation to the
437	reconstruction is the threshold for each wavelet array. This is set
438	using the \texttt{snrRecon} parameter, and is given as a multiple of
439	the rms (estimated by the MADFM) above the mean (which for the wavelet
440	arrays should be approximately zero). There are several other
441	parameters that can be altered as well that affect the outcome of the
442	reconstruction.
443
444	By default, the cube is reconstructed in three dimensions, using a
445	3-dimensional filter and 3-dimensional convolution. This can be
446	altered, however, using the parameter \texttt{reconDim}. If set to 1,
447	this means the cube is reconstructed by considering each spectrum
448	separately, whereas \texttt{reconDim=2} will mean the cube is
449	reconstructed by doing each channel map separately. The merits of
450	these choices are discussed in \S\ref{sec-notes}, but it should be
451	noted that a 2-dimensional reconstruction can be susceptible to edge
452	effects if the spatial shape is not rectangular.
453
454	The user can also select the minimum scale to be used in the
455	reconstruction -- the first scale exhibits the highest frequency
456	variations, and so ignoring this one can sometimes be beneficial in
457	removing excess noise. The default, however, is to use all scales
458	(\texttt{minscale = 1}).
459
460	Finally, the filter that is used for the convolution can be selected
461	by using \texttt{filterCode} and the relevant code number -- the
462	choices are listed in Appendix~\ref{app-param}. A larger filter will
463	give a better reconstruction, but take longer and use more memory when
464	executing. When multi-dimensional reconstruction is selected, this
465	filter is used to construct a 2- or 3-dimensional equivalent.
466
467	\subsection{Input/Output of reconstructed arrays}
468	\label{sec-reconIO}
469
470	The reconstruction stage can be relatively time-consuming,
471	particularly for large cubes and reconstructions in 3-D. To get around
472	this, \duchamp\ provides a shortcut to allow users to perform multiple
473	searches (\eg with different thresholds) on the same reconstruction
474	without calculating the reconstruction each time.
475
476	The first step is to choose to save the reconstructed array as a FITS
477	file by setting \texttt{flagOutputRecon = true}. The file will be
478	saved in the same directory as the input image, so the user needs to
479	have write permissions for that directory.
480
481	The filename will be derived from the input filename, with extra
482	information detailing the reconstruction that has been done. For
483	example, suppose \texttt{image.fits} has been reconstructed using a
484	3-dimensional reconstruction with filter 2, thresholded at $4\sigma$
485	using all scales. The output filename will then be
486	\texttt{image.RECON-3-2-4-1.fits} (\ie it uses the four parameters
487	relevant for the \atrous\ reconstruction as listed in
488	Appendix~\ref{app-param}). The new FITS file will also have these
489	parameters as header keywords. If a subsection of the input image has
490	been used (see \S\ref{sec-input}), the format of the output filename
491	will be \texttt{image.sub.RECON-3-2-4-1.fits}, and the subsection that
492	has been used is also stored in the FITS header.
493
494	Likewise, the residual image, defined as the difference between the
495	input and reconstructed arrays, can also be saved in the same manner
496	by setting \texttt{flagOutputResid = true}. Its filename will be the
497	same as above, with RESID replacing RECON.
498
499	If a reconstructed image has been saved, it can be read in and used
500	instead of redoing the reconstruction. To do so, the user should set
501	\texttt{flagReconExists = true}. The user can indicate the name of the
502	reconstructed FITS file using the \texttt{reconFile} parameter, or, if
503	this is not specified, \duchamp\ searches for the file with the name
504	as defined above. If the file is not found, the reconstruction is
505	performed as normal. Note that to do this, the user needs to set
506	\texttt{flagAtrous = true} (obviously, if this is \texttt{false}, the
507	reconstruction is not needed).
508
509	\subsection{Searching the image}
510	\label{sec-detection}
511
512	The image is searched for detections in two ways: spectrally (a
513	1-dimensional search in the spectrum in each spatial pixel), and
514	spatially (a 2-dimensional search in the spatial image in each
515	channel). In both cases, the algorithm finds connected pixels that are
516	above the user-specified threshold. In the case of the spatial image
517	search, the algorithm of \citet{lutz80} is used to raster scan through
518	the image and connect groups of pixels on neighbouring rows.
519
520	Note that this algorithm cannot be applied directly to a 3-dimensional
521	case, as it requires that objects are completely nested in a row: that
522	is, if you are scanning along a row, and one object finishes and
523	another starts, you know that you will not get back to the first one
524	(if at all) until the second is completely finished for that
525	row. Three-dimensional data does not have this property, which is why
526	we break up the searching into 1- and 2-dimensional cases.
527
528	The determination of the threshold is done in one of two ways. The
529	first way is a simple sigma-clipping, where a threshold is set at a
530	fixed number $n$ of standard deviations above the mean, and pixels
531	above this threshold are flagged as detected. The value of $n$ is set
532	with the parameter \texttt{snrCut}. As before, the value of the
533	standard deviation is estimated by the MADFM, and corrected by the
534	ratio derived in Appendix~\ref{app-madfm}.
535
536	The second method uses the False Discovery Rate (FDR) technique
537	\citep{miller01,hopkins02}, whose basis we briefly detail here. The
538	false discovery rate (given by the number of false detections divided
539	by the total number of detections) is fixed at a certain value
540	$\alpha$ (\eg $\alpha=0.05$ implies 5\% of detections are false
541	positives). In practice, an $\alpha$ value is chosen, and the ensemble
542	average FDR (\ie $\langle FDR \rangle$) when the method is used will
543	be less than $\alpha$. One calculates $p$ -- the probability,
544	assuming the null hypothesis is true, of obtaining a test statistic as
545	extreme as the pixel value (the observed test statistic) -- for each
546	pixel, and sorts them in increasing order. One then calculates $d$
547	where
548	\[
549	d = \max_j \left\{ j : P_j < \frac{j\alpha}{c_N N} \right\},
550	\]
551	and then rejects all hypotheses whose $p$-values are less than or
552	equal to $P_d$. (So a $P_i<P_d$ will be rejected even if $P_i \geq
553	j\alpha/c_N N$.) Note that ``reject hypothesis'' here means ``accept
554	the pixel as an object pixel'' (\ie we are rejecting the null
555	hypothesis that the pixel belongs to the background).
556
557	The $c_N$ values here are normalisation constants that depend on the
558	correlated nature of the pixel values. If all the pixels are
559	uncorrelated, then $c_N=1$. If $N$ pixels are correlated, then their
560	tests will be dependent on each other, and so $c_N = \sum_{i=1}^N
561	i^{-1}$. \citet{hopkins02} consider real radio data, where the pixels
562	are correlated over the beam. In this case the sum is made over the
563	$N$ pixels that make up the beam. The value of $N$ is calculated from
564	the FITS header (if the correct keywords -- BMAJ, BMIN -- are not
565	present, a default value of 10 pixels is assumed).
566
567	The theory behind the FDR method implies a direct connection between
568	the choice of $\alpha$ and the fraction of detections that will be
569	false positives. However, due to the merging process, this direct
570	connection is lost when looking at the final number of detections --
571	see discussion in \S\ref{sec-notes}. The effect is that the number of
572	false detections will be less than indicated by the $\alpha$ value
573	used.
574
575	If a reconstruction has been made, the residuals (defined in the sense
576	of original $-$ reconstruction) are used to estimate the noise
577	parameters of the cube. Otherwise they are estimated directly from the
578	cube itself. In both cases, robust estimators are used as described
579	above.
580
581	Detections must have a minimum number of pixels to be counted. This
582	minimum number is given by the input parameters \texttt{minPix} (for
583	2-dimensional searches) and \texttt{minChannels} (for 1-dimensional
584	searches).
585
586	The search only looks for positive features. If one is interested
587	instead in negative features (such as absorption lines), set the
588	parameter \texttt{flagNegative = true}. This will invert the cube (\ie
589	multiply all pixels by $-1$) prior to the search, and then re-invert
590	the cube (and the fluxes of any detections) after searching is
591	complete. All outputs are done in the same manner as normal, so that
592	fluxes of detections will be negative.
593
594	\subsection{Merging detected objects}
595	\label{sec-merger}
596
597	The searching step produces a list of detected objects that will have
598	many repeated detections of a given object -- for instance, spectral
599	detections in adjacent pixels of the same object and/or spatial
600	detections in neighbouring channels. These are then combined in an
601	algorithm that matches all objects judged to be ``close''. This
602	determination is made in one of two ways.
603
604	One way is to define two thresholds -- one spatial and one in velocity
605	-- and say that two objects should be merged if there is at least one
606	pair of pixels that lie within these threshold distances of each
607	other. These thresholds are specified by the parameters
608	\texttt{threshSpatial} and \texttt{threshVelocity} (in units of pixels
609	and channels respectively).
610
611	Alternatively, the spatial requirement can be changed to say that
612	there must be a pair of pixels that are \emph{adjacent} -- a stricter,
613	but perhaps more realistic requirement, particularly when the spatial
614	pixels have a large angular size (as is the case for \hi\
615	surveys). This method can be selected by setting the parameter
616	\texttt{flagAdjacent} to 1 (\ie \texttt{true}) in the parameter
617	file. The velocity thresholding is done in the same way as the first
618	option.
619
620	Once the detections have been merged, they may be ``grown''. This is a
621	process of increasing the size of the detection by adding adjacent
622	pixels that are above some secondary threshold. This threshold is
623	lower than the one used for the initial detection, but above the noise
624	level, so that faint pixels are only detected when they are close to a
625	bright pixel. The value of this threshold is a possible input
626	parameter (\texttt{growthCut}), with a default value of
627	$1.5\sigma$. The use of the growth algorithm is controlled by the
628	\texttt{flagGrowth} parameter -- the default value of which is
629	\texttt{false}. If the detections are grown, they are sent through the
630	merging algorithm a second time, to pick up any detections that now
631	overlap or have grown over each other.
632
633	Finally, to be accepted, the detections must span \emph{both} a
634	minimum number of channels (to remove any spurious single-channel
635	spikes that may be present), and a minimum number of spatial
636	pixels. These numbers, as for the original detection step, are set
637	with the \texttt{minChannels} and \texttt{minPix} parameters. The
638	channel requirement means there must be at least one set of
639	\texttt{minChannels} consecutive channels in the source for it to be
640	accepted.
641
642	\section{Outputs}
643	\label{sec-output}
644
645	\subsection{During execution}
646
647	\duchamp\ provides the user with feedback whilst it is running, to
648	keep the user informed on the progress of the analysis. Most of this
649	consists of self-explanatory messages about the particular stage the
650	program is up to. The relevant parameters are printed to the screen at
651	the start (once the file has been successfully read in), so the user
652	is able to make a quick check that the setup is correct (see
653	Appendix~{app-input} for an example).
654
655	If the cube is being trimmed (\S\ref{sec-modify}), the resulting
656	dimensions are printed to indicate how much has been trimmed. If a
657	reconstruction is being done, a continually updating message shows
658	either the current iteration and scale, compared to the maximum scale
659	(when \texttt{reconDim=3}), or a progress bar showing the amount of
660	the cube that has been reconstructed (for smaller values of
661	\texttt{reconDim}).
662
663	During the searching algorithms, the progress through the 1D and 2D
664	searches are shown. When the searches have completed, the number of
665	objects found in both the 1D and 2D searches are reported (see
666	\S\ref{sec-detection} for details).
667
668	In the merging process (where multiple detections of the same object
669	are combined -- see \S\ref{sec-merger}), two stages of output
670	occur. The first is when each object in the list is compared with all
671	others. The output shows two numbers: the first being how far through
672	the list the current object is, and the second being the length of the
673	list. As the algorithm proceeds, the first number should increase and
674	the second should decrease (as objects are combined). When the numbers
675	meet (\ie the whole list has been compared), the second phase begins,
676	in which multiply-appearing pixels in each object are removed, as are
677	objects not meeting the minimum channels requirement. During this
678	phase, the total number of accepted objects is shown, which should
679	steadily increase until all have been accepted or rejected. Note that
680	these steps can be very quick for small numbers of detections.
681
682	Since this continual printing to screen has some overhead of time and
683	CPU involved, the user can elect to not print this information by
684	setting the parameter \texttt{verbose = 0}. In this case, the user is
685	still informed as to the steps being undertaken, but the details of
686	the progress are not shown.
687
688	\subsection{Results}
689
690	\subsubsection{Table of results}
691
692	Finally, we get to the results -- the reason for running \duchamp\ in
693	the first place. Once the detection list is finalised, it is sorted by
694	the mean velocity of the detections (or, if there is no good WCS
695	associated with the cube, by the mean Z-pixel position). The results
696	are then printed to the screen and to the output file, given by the
697	\texttt{OutFile} parameter. The results list, an example of which can
698	be seen in Appendix~\ref{app-output}, contains the following columns
699	(note that the title of the columns depending on WCS information will
700	depend on the projection of the WCS):
701
702	\begin{entry}
703	\item[Obj\#] The ID number of the detection (simply the sequential
704	count for the list, which is ordered by increasing velocity, or
705	channel number, if the WCS is not good enough to find the velocity).
706	\item[Name] The ``IAU''-format name of the detection (derived from the
707	WCS position -- see below for a description of the format).
708	\item[X] The average X-pixel position (averaged over all detected
709	voxels).
710	\item[Y] The average Y-pixel position.
711	\item[Z] The average Z-pixel position.
712	\item[RA/GLON] The Right Ascension or Galactic Longitude of the centre
713	of the object.
714	\item[DEC/GLAT] The Declination or Galactic Latitude of the centre of
715	the object.
716	\item[VEL] The mean velocity of the object [units given by the
717	\texttt{spectralUnits} parameter].
718	\item[w\_RA/w\_GLON] The width of the object in Right Ascension or
719	Galactic Longitude [arcmin].
720	\item[w\_DEC/w\_GLAT] The width of the object in Declination Galactic
721	Latitude [arcmin].
722	\item[w\_VEL] The full velocity width of the detection (max channel
723	$-$ min channel, in velocity units [see note below]).
724	\item[F\_int] The integrated flux over the object, in the units of
725	flux times velocity, corrected for the beam if necessary.
726	\item[F\_peak] The peak flux over the object, in the units of flux.
727	\item[X1, X2] The minimum and maximum X-pixel coordinates.
728	\item[Y1, Y2] The minimum and maximum Y-pixel coordinates.
729	\item[Z1, Z2] The minimum and maximum Z-pixel coordinates.
730	\item[Npix] The number of voxels (\ie distinct $(x,y,z)$ coordinates)
731	in the detection.
732	\item[Flag] Whether the detection has any warning flags (see below).
733	\end{entry}
734	The Name is derived from the WCS position. For instance, a source
735	centred on the RA,Dec position 12$^h$53$^m$45$^s$,
736	-36$^\circ$24$'$12$''$ will be called J125345$-$362412 (if the epoch
737	is J2000) or B125345$-$362412 (if B1950). An alternative form is used
738	for Galactic coordinates: a source centred on the position ($l$,$b$) =
739	(323.1245, 5.4567) will be called G323.124$+$05.457. If the WCS is not
740	valid (\ie is not present or does not have all the necessary
741	information), the Name, RA, DEC, VEL and related columns are not
742	printed, but the pixel coordinates are still provided.
743
744	The velocity units can be specified by the user, using the parameter
745	\texttt{spectralUnits} (enter it as a single string). The default
746	value is km/s, which should be suitable for most users. These units
747	are also used to give the units of integrated flux. Note that if there
748	is no rest frequency specified in the FITS header, the \duchamp\
749	output will instead default to using Frequency, with units of MHz.
750
751	If the WCS is not specified sufficiently to be used, then all columns
752	from RA/GLON to w\_VEL will be left blank. Also, F\_int will be
753	replaced with the more simple F\_tot -- the total flux in the
754	detection, being the sum of all detected voxels.
755
756	The last column contains any warning flags about the detection. There
757	are currently two options here. An `E' is printed if the detection is
758	next to the edge of the image, meaning either the limit of the pixels,
759	or the limit of the non-BLANK pixel region. An `N' is printed if the
760	total flux, summed over all the (non-BLANK) pixels in the smallest box
761	that completely encloses the detection, is negative. Note that this
762	sum is likely to include non-detected pixels. It is of use in pointing
763	out detections that lie next to strongly negative pixels, such as
764	might arise due to interference -- the detected pixels might then also
765	be due to the interference, so caution is advised.
766
767	\subsubsection{Other results lists}
768
769	Two alternative results files can also be requested. One option is a
770	VOTable-format XML file, containing just the RA, Dec, Velocity and the
771	corresponding widths of the detections, as well as the fluxes. The
772	user should set \texttt{flagVOT = 1}, and put the desired filename in
773	the parameter \texttt{votFile} -- note that the default is for it not
774	to be produced. This file should be compatible with all Virtual
775	Observatory tools (such as Aladin\footnote{ Aladin can be found on the
776	web at
777	\href{http://aladin.u-strasbg.fr/}{http://aladin.u-strasbg.fr/}}). The
778	second option is an annotation file for use with the Karma toolkit of
779	visualisation tools (in particular, with \texttt{kvis}). This will
780	draw a circle at the position of each detection, and number it
781	according to the Obj\# given above. To make use of this option, the
782	user should set \texttt{flagKarma = 1}, and put the desired filename
783	in the parameter \texttt{karmaFile} -- again, the default is for it
784	not to be produced.
785
786	As the program is running, it also (optionally) records the detections
787	made in each individual spectrum or channel (see \S\ref{sec-detection}
788	for details on this process). This is recorded in the file given by
789	the parameter \texttt{LogFile}. This file does not include the columns
790	\texttt{Name, RA, DEC, w\_RA, w\_DEC, VEL, w\_VEL}. This file is
791	designed primarily for diagnostic purposes: \eg to see if a given set
792	of pixels is detected in, say, one channel image, but does not survive
793	the merging process. The list of pixels (and their fluxes) in the
794	final detection list are also printed to this file, again for
795	diagnostic purposes. The file also records the execution time, as well
796	as the command-line statement used to run \duchamp. The creation of
797	this log file can be prevented by setting \texttt{flagLog =
798	false}. (This may be a good idea if you are not interested in its
799	contents, as it can be a large file if many pixels are being
800	detected.)
801
802	\subsubsection{Graphical output -- spectra}
803
804	As well as the output data file, a postscript file is created that
805	shows the spectrum for each detection, together with a small cutout
806	image (the 0th moment) and basic information about the detection (note
807	that any flags are printed after the name of the detection, in the
808	format \texttt{[E]}). If the cube was reconstructed, the spectrum from
809	the reconstruction is shown in red, over the top of the original
810	spectrum. The spectral extent of the detected object is indicated by
811	two dashed blue lines, and the region covered by the ``Milky Way''
812	channels is shown by a green hashed box. An example detection can be
813	seen below in Fig.~\ref{fig-spect}.
814
815	The spectrum that is plotted is governed by the
816	\texttt{spectralMethod} parameter. It can be either \texttt{peak},
817	where the spectrum is from the spatial pixel containing the
818	detection's peak flux; or \texttt{sum}, where the spectrum is summed
819	over all spatial pixels, and then corrected for the beam size. The
820	spectral extent of the detection is indicated with blue lines, and a
821	zoom is shown in a separate window.
822
823	The cutout image can optionally include a border around the spatial
824	pixels that are in the detection (turned on and off by the parameter
825	\texttt{drawBorders} -- the default is \texttt{true}). It includes a
826	scale bar in the bottom left corner to indicate size -- its length is
827	indicated next to it (the choice of length depends on the size of the
828	image).
829
830	There may also be one or two extra lines on the image. A yellow line
831	shows the limits of the cube -- the detected object is obviously close
832	to the edge of the cube, and the box size extends outside the region
833	covered by the data. A purple line, however, shows the dividing line
834	between BLANK and non-BLANK pixels. The BLANK pixels will always be
835	shown in black. The first type of line is always drawn, while the
836	second is governed by the parameter \texttt{drawBlankEdges} (whose
837	default is \texttt{true}), and obviously whether there are any BLANK
838	pixel present.
839
840	\begin{figure}[t]
841	\begin{center}
842	\includegraphics[width=\textwidth]{example_spectrum}
843	\end{center}
844	\caption{\footnotesize An example of the spectrum output. Note several
845	of the features discussed in the text: the red lines indicating the
846	reconstructed spectrum; the blue dashed lines indicating the
847	spectral extent of the detection; the green hashed area indicating
848	the Milky Way channels that are ignored by the searching algorithm;
849	the blue border showing its spatial extent on the 0th moment map;
850	and the 15~arcmin-long scale bar.}
851	\label{fig-spect}
852	\end{figure}
853
854	\subsubsection{Graphical output -- maps}
855
856	\begin{figure}[!t]
857	\begin{center}
858	\includegraphics[width=\textwidth]{example_moment_map}
859	\end{center}
860	\caption{\footnotesize An example of the moment map created by
861	\duchamp. The full extent of the cube is covered, and the 0th moment
862	of each object is shown (integrated individually over all the
863	detected channels). The purple line indicates the limit of the
864	non-BLANK pixels.}
865	\label{fig-moment}
866	\end{figure}
867
868	Finally, a couple of images are optionally produced: a 0th moment map
869	of the cube, combining just the detected channels in each object,
870	showing the integrated flux in grey-scale; and a ``detection image'',
871	a grey-scale image where the pixel values are the number of channels
872	that spatial pixel is detected in. In both cases, if
873	\texttt{drawBorders = true}, a border is drawn around the spatial
874	extent of each detection, and if \texttt{drawBlankEdges = true}, the
875	purple line dividing BLANK and non-BLANK pixels (as described above)
876	is drawn. An example moment map is shown in Fig.~\ref{fig-moment}.
877	The production or otherwise of these images is governed by the
878	\texttt{flagMaps} parameter.
879
880	The purpose of these images are to provide a visual guide to where the
881	detections have been made, and, particularly in the case of the moment
882	map, to provide an indication of the strength of the source. In both
883	cases, the detections are numbered (in the same sense as the output
884	list and as the spectral plots), and the spatial borders are marked
885	out as for the cutout images in the spectra file. Both these images
886	are saved as postscript files (given by the parameters
887	\texttt{momentMap} and \texttt{detectionMap} respectively), with the
888	latter also displayed in a {\sc pgplot} window (regardless of the
889	state of \texttt{flagMaps}).
890
891	\section{Notes and hints on the use of \duchamp}
892	\label{sec-notes}
893
894	In using \duchamp, the user has to make a number of decisions about
895	the way the program runs. This section is designed to give the user
896	some idea about what to choose.
897
898	The main choice is whether or not to use the wavelet
899	reconstruction. The main benefits of this are the marked reduction in
900	the noise level, leading to regularly-shaped detections, and good
901	reliability for faint sources. The main drawback with its use is the
902	long execution time: to reconstruct a $170\times160\times1024$
903	(\hipass) cube often requires three iterations and takes about 20-25
904	minutes to run completely. Note that this is for the three-dimensional
905	reconstruction: using \texttt{reconDim=1} makes the reconstruction
906	quicker (the full program then takes about 6 minutes), but it is still
907	the largest part of the time.
908
909	The searching part of the procedure is much quicker: searching an
910	un-reconstructed cube leads to execution times of only a couple of
911	minutes. Alternatively, using the ability to read in previously-saved
912	reconstructed arrays makes running the reconstruction more than once a
913	more feasible prospect.
914
915	On the positive side, the shape of the detections in a cube that has
916	been reconstructed will be much more regular and smooth -- the ragged
917	edges that objects in the raw cube possess are smoothed by the removal
918	of most of the noise. This enables better determination of the shapes
919	and characteristics of objects.
920
921	A further point to consider when using the reconstruction is that if
922	the two-dimensional reconstruction is chosen (\texttt{reconDim=2}), it
923	can be susceptible to edge effects. If the valid area in the cube (\ie
924	the part that is not BLANK) has non-rectangular edges, the convolution
925	can produce artefacts in the reconstruction that mimic the edges and
926	can lead (depending on the selection threshold) to some spurious
927	sources. Caution is advised with such data -- the user is advised to
928	check carefully the reconstructed cube for the presence of such
929	artefacts. Note, however, that the 1- and 3-dimensional
930	reconstructions are \emph{not} susceptible in the same way, since the
931	spectral direction does not generally exhibit these BLANK edges, and
932	so we recommend the use of either of these.
933
934	If one chooses the reconstruction method, a further decision is
935	required on the signal-to-noise cutoff used in determining acceptable
936	wavelet coefficients. A larger value will remove more noise from the
937	cube, at the expense of losing fainter sources, while a smaller value
938	will include more noise, which may produce spurious detections, but
939	will be more sensitive to faint sources. Values of less than about
940	$3\sigma$ tend to not reduce the noise a great deal and can lead to
941	many spurious sources (although this will depend on the nature of the
942	cube).
943
944	When it comes to searching, the FDR method produces more reliable
945	results than simple sigma-clipping, particularly in the absence of
946	reconstruction. However, it does not work in exactly the way one
947	would expect for a given value of \texttt{alpha}. For instance,
948	setting fairly liberal values of \texttt{alpha} (say, 0.1) will often
949	lead to a much smaller fraction of false detections (\ie much less
950	than 10\%). This is the effect of the merging algorithms, that combine
951	the sources after the detection stage, and reject detections not
952	meeting the minimum pixel or channel requirements. It is thus better
953	to aim for larger \texttt{alpha} values than those derived from a
954	straight conversion of the desired false detection rate.
955
956	Finally, as \duchamp\ is still undergoing development, there are some
957	elements that are not fully developed. In particular, it is not as
958	clever as I would like at avoiding interference. The ability to place
959	requirements on the minimum number of channels and pixels partially
960	circumvents this problem, but work is being done to make \duchamp\
961	smarter at rejecting signals that are clearly (to a human eye at
962	least) interference. See the following section for further
963	improvements that are planned.
964
965	\section{Future developments}
966
967	This is both a list of planned improvements and a wish-list of
968	features that would be nice to include (but are not planned in the
969	immediate future). Let me know if there are items not on this list, or
970	items on the list you would like prioritised.
971
972	\begin{itemize}
973
974	\item Better determination of the noise characteristics of
975	spectral-line cubes, including understanding how the noise is
976	generated and developing a model for it. \textbf{Planned.}
977
978	\item Include more source analysis. Examples could be: shape
979	information; measurements of HI mass; more variety of measurements
980	of velocity width and profile. \textbf{Some planned.}
981
982	\item Provide some indication of the significance of the detection
983	(\ie some S/N-like value). \textbf{Planned.}
984
985	\item Improved ability to reject interference, possibly on the
986	spectral shape of features. \textbf{Planned.}
987
988	\item Ability to separate (de-blend) distinct sources that have been
989	merged. \textbf{Planned.}
990
991	\item Link to lists of possible counterparts (\eg via NED/SIMBAD/other
992	VO tools?). \textbf{Wish-list.}
993
994	\item On-line web service interface, so a user can upload a cube and
995	get back a source-list. \textbf{Wish-list}.
996
997	\item Embed \duchamp\ in a GUI, to move away from the text-based
998	interaction. \textbf{Wish-list}.
999	\end{itemize}
1000
1001
1002	%\bibliographystyle{mn2e}
1003	%\bibliographystyle{abbrvnat}
1004	%\bibliography{mnrasmnemonic,sourceDetection}
1005	\begin{thebibliography}{}
1006
1007	\bibitem[\protect\citeauthoryear{{Calabretta} \& {Greisen}}{{Calabretta} \&
1008	{Greisen}}{2002}]{calabretta02}
1009	{Calabretta} M., {Greisen} E., 2002, A\&A, 395, 1077
1010
1011	\bibitem[\protect\citeauthoryear{{Greisen} \& {Calabretta}}{{Greisen} \&
1012	{Calabretta}}{2002}]{greisen02}
1013	{Greisen} E., {Calabretta} M., 2002, A\&A, 395, 1061
1014
1015	\bibitem[\protect\citeauthoryear{{Hanisch}, {Farris}, {Greisen}, {Pence},
1016	{Schlesinger}, {Teuben}, {Thompson} \& {Warnock}}{{Hanisch}
1017	et~al.}{2001}]{hanisch01}
1018	{Hanisch} R., {Farris} A., {Greisen} E., {Pence} W., {Schlesinger} B.,
1019	{Teuben} P., {Thompson} R., {Warnock} A., 2001, A\&A, 376, 359
1020
1021	\bibitem[\protect\citeauthoryear{{Hopkins}, {Miller}, {Connolly}, {Genovese},
1022	{Nichol} \& {Wasserman}}{{Hopkins} et~al.}{2002}]{hopkins02}
1023	{Hopkins} A., {Miller} C., {Connolly} A., {Genovese} C., {Nichol} R.,
1024	{Wasserman} L., 2002, AJ, 123, 1086
1025
1026	\bibitem[\protect\citeauthoryear{Lutz}{Lutz}{1980}]{lutz80}
1027	Lutz R., 1980, The Computer Journal, 23, 262
1028
1029	\bibitem[\protect\citeauthoryear{{Meyer} et~al.,}{{Meyer}
1030	et~al.}{2004}]{meyer04:trunc}
1031	{Meyer} M., et~al., 2004, MNRAS, 350, 1195
1032
1033	\bibitem[\protect\citeauthoryear{{Miller}, {Genovese}, {Nichol}, {Wasserman},
1034	{Connolly}, {Reichart}, {Hopkins}, {Schneider} \& {Moore}}{{Miller}
1035	et~al.}{2001}]{miller01}
1036	{Miller} C., {Genovese} C., {Nichol} R., {Wasserman} L., {Connolly} A.,
1037	{Reichart} D., {Hopkins} A., {Schneider} J., {Moore} A., 2001, AJ, 122,
1038	3492
1039
1040	\bibitem[\protect\citeauthoryear{Minchin}{Minchin}{1999}]{minchin99}
1041	Minchin R., 1999, PASA, 16, 12
1042
1043	\bibitem[\protect\citeauthoryear{Starck \& Murtagh}{Starck \&
1044	Murtagh}{2002}]{starck02:book}
1045	Starck J.-L., Murtagh F., 2002, {``Astronomical Image and Data Analysis''}.
1046	Springer
1047
1048	\end{thebibliography}
1049
1050
1051	\appendix
1052	\newpage
1053	\section{Obtaining and installing \duchamp}
1054
1055	The \duchamp\ web page can be found at the following location:\\
1056	\href{http://www.atnf.csiro.au/people/Matthew.Whiting/Duchamp}%
1057	{http://www.atnf.csiro.au/people/Matthew.Whiting/Duchamp}\\
1058	Here you can find a gzipped tar archive of the source code that can be
1059	downloaded and extracted, as well as this User's Guide in postscript
1060	and hyperlinked PDF formats.
1061
1062	To build \duchamp, you will need three main external libraries:
1063	pgplot, cfitsio and wcslib. If these are not present on your system,
1064	you can download them from the following locations:
1065	\begin{itemize}
1066	\item PGPlot:
1067	\href{http://www.astro.caltech.edu/~tjp/pgplot/}%
1068	{http://www.astro.caltech.edu/~tjp/pgplot/}
1069	\item CFITSIO:
1070	\href{http://heasarc.gsfc.nasa.gov/docs/software/fitsio/fitsio.html}%
1071	{http://heasarc.gsfc.nasa.gov/docs/software/fitsio/fitsio.html}
1072	\item WCSLIB:
1073	\href{http://www.atnf.csiro.au/people/Mark.Calabretta/WCS/index.html}%
1074	{http://www.atnf.csiro.au/people/Mark.Calabretta/WCS/index.html}
1075	\end{itemize}
1076
1077	\duchamp\ can be built on Unix systems by typing (assuming that the
1078	prompt your terminal provides is a \texttt{> } -- don't type this
1079	character!):
1080	\begin{quote}
1081	\texttt{%
1082	> ./configure\\
1083	> make\\
1084	> make clean (optional -- to remove the object files)}
1085	\end{quote}
1086
1087	Run in this manner, \texttt{configure} should find all the necessary
1088	libraries, but if some libraries have been installed in non-standard
1089	locations, it may fail. In this case, you can specify additional
1090	directories to look in by giving extra command-line arguments. There
1091	are separate options for library files (eg. libcpgplot.a) and header
1092	files (eg. cpgplot.h).
1093
1094	For example, if \textsc{wcslib} had been installed in
1095	\texttt{/home/mduchamp/wcslib}, there are two libraries that are
1096	likely to be in separate subdirectories: \texttt{C/} and
1097	\texttt{pgsbox/}. Each subdirectory needs to be searched for library
1098	and header files, so one could build Duchamp by typing:
1099	\begin{quote}
1100	\texttt{%
1101	> ./configure $\backslash$ \\
1102	LIBDIRS="/home/mduchamp/wcslib/C /home/mduchamp/wcslib/pgsbox"
1103	$\backslash$\\
1104	INCDIRS="/home/mduchamp/wcslib/C /home/mduchamp/wcslib/pgsbox"}
1105	\end{quote}
1106	And then just run make in the usual fashion:
1107	\begin{quote}
1108	\texttt{> make}
1109	\end{quote}
1110
1111	This will produce the executable \texttt{Duchamp}.
1112	You can verify that it is running correctly by running the test
1113	script:
1114	\begin{quote}
1115	\texttt{> VerifyDuchamp.sh}
1116	\end{quote}
1117	This will compile and run a bit of
1118	C++ code in the directory \texttt{verification/} that creates a dummy FITS
1119	image. It then runs Duchamp on this image with three different sets of
1120	inputs, and compares to known results, looking for differences and
1121	reporting any. There should be none reported if everything is working
1122	correctly.
1123
1124	You can then run \duchamp\ on your own data. This can be done in one
1125	of two ways. The first is:
1126	\begin{quote}
1127	\texttt{> Duchamp -f [FITS file]}
1128	\end{quote}
1129	where \texttt{[FITS file]} is the file you wish to search. This method
1130	simply uses the default values of all parameters.
1131
1132	The second method allows some determination of the parameter values by
1133	the user. Type:
1134	\begin{quote}
1135	\texttt{> Duchamp -p [parameter file]}
1136	\end{quote}
1137	where \texttt{[parameterFile]} is a file with the input parameters,
1138	including the name of the cube you want to search. There are two
1139	example input files included with the distribution. The smaller one,
1140	\texttt{InputExample}, shows the typical parameters one might want to
1141	set. The large one, \texttt{InputComplete}, lists all possible
1142	parameters that can be entered, and a brief description of them. To
1143	get going quickly, just replace the "your-file-here" in
1144	\texttt{InputExample} with your image name, and type
1145	\begin{quote}
1146	\texttt{> Duchamp -p InputExample}
1147	\end{quote}
1148
1149	The following appendices provide details on the individual parameters,
1150	and show examples of the output files that \duchamp\ produces.
1151
1152	\newpage
1153	\section{Available parameters}
1154	\label{app-param}
1155
1156	The full list of parameters that can be listed in the input file are
1157	given here. If not listed, they take the default value given in
1158	parentheses. Since the order of the parameters in the input file does
1159	not matter, they are grouped here in logical sections.
1160
1161	\subsection*{Input-output related}
1162	\begin{entry}
1163	\item[ImageFile (no default assumed)] The filename of the
1164	data cube to be analysed.
1165	\item[flagSubsection \texttt{[false]}] A flag to indicate whether one
1166	wants a subsection of the requested image.
1167	\item[Subsection \texttt{[ [,,*] ]}] The requested subsection, which
1168	should be specified in the format \texttt{[x1:x2,y1:y2,z1:z2]},
1169	where the limits are inclusive. If the full range of a dimension is
1170	required, use a \texttt{*}, \eg if you want the full spectral range
1171	of a subsection of the image, use \texttt{[30:140,30:140,*]}.
1172	\item[flagReconExists \texttt{[false]}] A flag to indicate whether the
1173	reconstructed array has been saved by a previous run of \duchamp. If
1174	set true, the reconstructed array will be read from the file given
1175	by \texttt{reconFile}, rather than calculated directly.
1176	\item[reconFile (no default assumed)] The FITS file that contains the
1177	reconstructed array. If \texttt{flagReconExists} is true and this
1178	parameter is not defined, the default file searched will be
1179	determined by the \atrous\ parameters (see \S\ref{sec-recon}).
1180	\item[OutFile \texttt{[duchamp-Results.txt]}] The file containing the
1181	final list of detections. This also records the list of input
1182	parameters.
1183	\item[SpectraFile \texttt{[duchamp-Spectra.ps]}] The postscript file
1184	containing the resulting integrated spectra and images of the
1185	detections.
1186	\item[flagLog \texttt{[true]}] A flag to indicate whether intermediate
1187	detections should be logged.
1188	\item[LogFile \texttt{[duchamp-Logfile.txt]}] The file in which
1189	intermediate detections are logged. These are detections that have
1190	not been merged. This is primarily for use in debugging and
1191	diagnostic purposes -- normal use of the program will probably not
1192	require this.
1193	\item[flagOutputRecon \texttt{[false]}] A flag to say whether or not
1194	to save the reconstructed cube as a FITS file. The filename will be
1195	derived from the ImageFile -- the reconstruction of
1196	\texttt{image.fits} will be saved as \texttt{image.RECON?.fits},
1197	where \texttt{?} stands for the value of \texttt{snrRecon} (see
1198	below).
1199	\item[flagOutputResid \texttt{[false]}] As for
1200	\texttt{flagOutputRecon}, but for the residual array -- the
1201	difference between the original cube and the reconstructed cube. The
1202	filename will be \texttt{image.RESID?.fits}.
1203	\item[flagVOT \texttt{[false]}] A flag to say whether to create a
1204	VOTable file corresponding to the information in
1205	\texttt{outfile}. This will be an XML file in the Virtual
1206	Observatory VOTable format.
1207	\item[votFile \texttt{[duchamp-Results.xml]}] The VOTable file with
1208	the list of final detections. Some input parameters are also
1209	recorded.
1210	\item[flagKarma \texttt{[false]}] A flag to say whether to create a
1211	Karma annotation file corresponding to the information in
1212	\texttt{outfile}. This can be used as an overlay for the Karma
1213	programs such as \texttt{kvis}.
1214	\item[karmaFile \texttt{[duchamp-Results.ann]}] The Karma annotation
1215	file showing the list of final detections.
1216	\item[flagMaps \texttt{[true]}] A flag to say whether to save
1217	postscript files showing the 0th moment map of the whole cube
1218	(parameter \texttt{momentMap}) and the detection image
1219	(\texttt{detectionMap}).
1220	\item[momentMap \texttt{[duchamp-MomentMap.ps]}] A postscript file
1221	containing a map of the 0th moment of the detected sources, as well
1222	as pixel and WCS coordinates.
1223	\item[detectionMap \texttt{[duchamp-DetectionMap.ps]}] A postscript
1224	file showing each of the detected objects, coloured in greyscale by
1225	the number of channels spanned by each pixel. Also shows pixel and
1226	WCS coordinates.
1227	\end{entry}
1228
1229	\subsection*{Modifying the cube}
1230	\begin{entry}
1231	\item[flagBlankPix \texttt{[true]}] A flag to say whether to remove
1232	BLANK pixels from the analysis -- these are pixels set to some
1233	particular value because they fall outside the imaged area.
1234	\item[blankPixValue \texttt{[-8.00061]}] The value of the BLANK
1235	pixels, if this information is not contained in the FITS header (the
1236	usual procedure is to obtain this value from the header information
1237	-- in which case the value set by this parameter is ignored).
1238	\item[flagMW \texttt{[false]}] A flag to say whether to ignore
1239	channels contaminated by Milky Way (or other) emission -- the
1240	searching algorithms will not look at these channels.
1241	\item[maxMW \texttt{[112]}] The maximum channel number containing
1242	``Milky Way'' emission.
1243	\item[minMW \texttt{[75]}] The minimum channel number containing
1244	``Milky Way'' emission. Note that the range specified by
1245	\texttt{maxMW} and \texttt{minMW} is inclusive.
1246	\item[flagBaseline \texttt{[false]}] A flag to say whether to remove
1247	the baseline from each spectrum in the cube for the purposes of
1248	reconstruction and detection.
1249	\end{entry}
1250
1251	\subsection*{Detection related}
1252
1253	\subsubsection*{General detection}
1254	\begin{entry}
1255	\item[flagNegative \texttt{[false]}] A flag to indicate that the
1256	features being searched for are negative. The cube will be inverted
1257	prior to searching.
1258	\item[snrCut \texttt{[3.]}] The cut-off value for thresholding, in
1259	terms of number of $\sigma$ above the mean.
1260	\item[flagGrowth \texttt{[false]}] A flag indicating whether or not to
1261	grow the detected objects to a smaller threshold.
1262	\item[growthCut \texttt{[2.]}] The smaller threshold using in growing
1263	detections. In units of $\sigma$ above the mean.
1264	\end{entry}
1265
1266	\subsubsection*{\Atrous\ reconstruction}
1267	\begin{entry}
1268	\item [flagATrous \texttt{[true]}] A flag indicating whether or not to
1269	reconstruct the cube using the \atrous\ wavelet
1270	reconstruction. See \S\ref{sec-recon} for details.
1271	\item[reconDim \texttt{[3]}] The number of dimensions to use in the
1272	reconstruction. 1 means reconstruct each spectrum separately, 2
1273	means each channel map is done separately, and 3 means do the whole
1274	cube in one go.
1275	\item[scaleMin \texttt{[1]}] The minimum wavelet scale to be used in the
1276	reconstruction. A value of 1 means ``use all scales''.
1277	\item[snrRecon \texttt{[4]}] The thresholding cutoff used in the
1278	reconstruction -- only wavelet coefficients this many $\sigma$ above
1279	the mean (or greater) are included in the reconstruction.
1280	\item[filterCode \texttt{[1]}] The code number of the filter to use in
1281	the reconstruction. The options are:
1282	\begin{itemize}
1283	\item \textbf{1:} B$_3$-spline filter: coefficients =
1284	$(\frac{1}{16}, \frac{1}{4}, \frac{3}{8}, \frac{1}{4}, \frac{1}{16})$
1285	\item \textbf{2:} Triangle filter: coefficients =
1286	$(\frac{1}{4}, \frac{1}{2}, \frac{1}{4})$
1287	\item \textbf{3:} Haar wavelet: coefficients =
1288	$(0, \frac{1}{2}, \frac{1}{2})$
1289	\end{itemize}
1290	\end{entry}
1291
1292	\subsubsection*{FDR method}
1293	\begin{entry}
1294	\item[flagFDR \texttt{[false]}] A flag indicating whether or not to use
1295	the False Discovery Rate method in thresholding the pixels.
1296	\item[alphaFDR \texttt{[0.01]}] The $\alpha$ parameter used in the FDR
1297	analysis. The average number of false detections, as a fraction of the
1298	total number, will be less than $\alpha$ (see \S\ref{sec-detection}).
1299	\end{entry}
1300
1301	\subsubsection*{Merging detections}
1302	\begin{entry}
1303	\item[minPix \texttt{[2]}] The minimum number of spatial pixels for a
1304	single detection to be counted.
1305	\item[minChannels \texttt{[3]}] The minimum number of consecutive
1306	channels that must be present in a detection.
1307	\item[flagAdjacent \texttt{[true]}] A flag indicating whether to use
1308	the ``adjacent pixel'' criterion to decide whether to merge
1309	objects. If not, the next two parameters are used to determine
1310	whether objects are within the necessary thresholds.
1311	\item[threshSpatial \texttt{[3.]}] The maximum allowed minimum spatial
1312	separation (in pixels) between two detections for them to be merged
1313	into one. Only used if \texttt{flagAdjacent = false}.
1314	\item[threshVelocity \texttt{[7.]}] The maximum allowed minimum channel
1315	separation between two detections for them to be merged into
1316	one.
1317	\end{entry}
1318
1319	\subsubsection*{Other parameters}
1320	\begin{entry}
1321	\item[spectralMethod \texttt{[peak]}] This indicates which method is used
1322	to plot the output spectra: \texttt{peak} means plot the spectrum
1323	containing the detection's peak pixel; \texttt{sum} means sum the
1324	spectra of each detected spatial pixel, and correct for the beam
1325	size. Any other choice defaults to \texttt{peak}.
1326	\item[spectralUnits \texttt{[km/s]}] The user can specify the units of
1327	the spectral axis. Assuming the WCS of the FITS file is valid, the
1328	spectral axis is transformed into velocity, and put into these units
1329	for all output and for calculations such as the integrated flux of a
1330	detection.
1331	\item[drawBorders \texttt{[true]}] A flag indicating whether borders
1332	are to be drawn around the detected objects in the moment maps
1333	included in the output (see for example Fig.~\ref{fig-spect}).
1334	\item[drawBlankEdges \texttt{[true]}] A flag indicating whether to
1335	draw the dividing line between BLANK and non-BLANK pixels on the
1336	2-dimensional images (see for example Fig.~\ref{fig-moment}).
1337	\item[verbose \texttt{[true]}] A flag indicating whether to print the
1338	progress of computationally-intensive algorithms (such as the
1339	searching and merging) to screen.
1340	\end{entry}
1341
1342
1343	\newpage
1344	\section{Example parameter files}
1345	\label{app-input}
1346
1347	This is what a typical parameter file would look like.
1348
1349	\begin{verbatim}
1350	imageFile /DATA/SITAR_1/whi550/cubes/H201_abcde_luther_chop.fits
1351	logFile logfile.txt
1352	outFile results.txt
1353	spectraFile spectra.ps
1354	flagSubsection false
1355	flagOutputRecon false
1356	flagOutputResid 0
1357	flagBlankPix 1
1358	flagMW 1
1359	minMW 75
1360	maxMW 112
1361	minPix 3
1362	flagGrowth 1
1363	growthCut 1.5
1364	flagATrous 0
1365	scaleMin 1
1366	snrRecon 4
1367	flagFDR 1
1368	alphaFDR 0.1
1369	numPixPSF 20
1370	snrCut 3
1371	threshSpatial 3
1372	threshVelocity 7
1373	\end{verbatim}
1374
1375	Note that, as in this example, the flag parameters can be entered as
1376	strings (true/false) or integers (1/0). Also, note that it is not
1377	necessary to include all these parameters in the file, only those that
1378	need to be changed from the defaults (as listed in
1379	Appendix~\ref{app-param}), which in this case would be very few. A
1380	minimal parameter file might look like:
1381	\begin{verbatim}
1382	imageFile /DATA/SITAR_1/whi550/cubes/H201_abcde_luther_chop.fits
1383	flagLog false
1384	snrRecon 3
1385	snrCut 2.5
1386	minChannels 4
1387	\end{verbatim}
1388	This will reconstruct the cube with a lower SNR value than the
1389	default, select objects at a lower threshold, with a looser minimum
1390	channel requirement, and not keep a log of the intermediate
1391	detections.
1392
1393	The following page demonstrates how the parameters are presented to
1394	the user, both on the screen at execution time, and in the output and
1395	log files. On each line, there is a description on the parameter, the
1396	relevant parameter name that is used in the input file (if there is
1397	one that the user can enter), and the value of the parameter being
1398	used.
1399	\newpage
1400	\begin{landscape}
1401	Typical presentation of parameters in output and log files:
1402	\begin{verbatim}
1403	---- Parameters ----
1404	Image to be analysed.........................[imageFile] = input.fits
1405	Intermediate Logfile...........................[logFile] = duchamp-Logfile.txt
1406	Final Results file.............................[outFile] = duchamp-Results.txt
1407	Spectrum file..............................[spectraFile] = duchamp-Spectra.ps
1408	0th Moment Map...............................[momentMap] = duchamp-MomentMap.ps
1409	Detection Map.............................[detectionMap] = duchamp-DetectionMap.ps
1410	Saving reconstructed cube?.............[flagoutputrecon] = false
1411	Saving residuals from reconstruction?..[flagoutputresid] = false
1412	------
1413	Searching for Negative features?..........[flagNegative] = false
1414	Fixing Blank Pixels?......................[flagBlankPix] = true
1415	Blank Pixel Value....................................... = -8.00061
1416	Removing Milky Way channels?....................[flagMW] = true
1417	Milky Way Channels.......................[minMW - maxMW] = 75-112
1418	Beam Size (pixels)...................................... = 10.1788
1419	Removing baselines before search?.........[flagBaseline] = false
1420	Minimum # Pixels in a detection.................[minPix] = 2
1421	Minimum # Channels in a detection..........[minChannels] = 3
1422	Growing objects after detection?............[flagGrowth] = false
1423	Using A Trous reconstruction?...............[flagATrous] = true
1424	Number of dimensions in reconstruction........[reconDim] = 3
1425	Minimum scale in reconstruction...............[scaleMin] = 1
1426	SNR Threshold within reconstruction...........[snrRecon] = 4
1427	Filter being used for reconstruction........[filterCode] = 1 (B3 spline function)
1428	Using FDR analysis?............................[flagFDR] = false
1429	SNR Threshold...................................[snrCut] = 3
1430	Using Adjacent-pixel criterion?...........[flagAdjacent] = true
1431	Max. velocity separation for merging....[threshVelocity] = 7
1432	Method of spectral plotting.............[spectralMethod] = peak
1433	\end{verbatim}
1434
1435	\newpage
1436	\section{Example results file}
1437	\label{app-output}
1438	This the typical content of an output file, after running \duchamp\
1439	with the parameters illustrated on the previous page.
1440
1441	{\scriptsize
1442	\begin{verbatim}
1443	Results of the \duchamp\ source finder: Tue May 23 14:51:38 2006
1444	---- Parameters ----
1445	(... omitted for clarity -- see previous page for examples...)
1446	--------------------
1447	Total number of detections = 25
1448	--------------------
1449	------------------------------------------------------------------------------------------------------------------------------------------------------
1450	Obj# Name X Y Z RA DEC VEL w_RA w_DEC w_VEL F_int F_peak X1 X2 Y1 Y2 Z1 Z2 Npix Flag
1451	[km/s] [arcmin] [arcmin] [km/s] [Jy km/s] [Jy/beam] [pix]
1452	------------------------------------------------------------------------------------------------------------------------------------------------------
1453	1 J0618-2532 30.2 86.0 113.3 06:18:12.54 -25:32:44.79 208.502 45.17 34.61 26.383 24.394 0.350 25 35 82 90 112 114 137 E
1454	2 J0609-2156 59.5 140.6 114.6 06:09:19.66 -21:56:31.20 225.572 44.39 31.47 65.957 16.128 0.213 55 65 137 144 113 118 153
1455	3 J0545-2143 141.2 143.2 114.8 05:45:51.71 -21:43:36.20 228.470 19.61 16.66 26.383 2.412 0.090 139 143 142 145 114 116 29
1456	4 J0617-2633 33.3 70.8 115.6 06:17:25.52 -26:33:33.83 238.736 65.02 30.10 26.383 9.776 0.117 26 41 68 75 115 117 104 E
1457	5 J0601-2500 86.2 94.9 117.9 06:01:39.54 -25:00:32.46 269.419 27.99 24.02 26.383 3.920 0.124 83 89 92 97 117 119 44
1458	6 J0602-2547 84.0 83.1 118.0 06:02:18.29 -25:47:31.69 270.319 20.01 19.99 26.383 2.999 0.118 82 86 81 85 117 119 34
1459	7 J0547-2448 133.0 97.2 118.7 05:47:52.53 -24:48:38.16 279.113 19.72 12.54 26.383 1.474 0.074 131 135 96 98 118 120 21
1460	8 J0606-2719 71.1 60.0 121.3 06:06:10.99 -27:19:48.61 314.090 52.36 39.59 39.574 14.268 0.150 65 77 55 64 120 123 154
1461	9 J0611-2137 52.4 145.3 162.5 06:11:20.92 -21:37:29.57 857.955 32.39 23.49 118.722 43.178 0.410 49 56 142 147 158 167 265 E
1462	10 J0600-2859 89.7 35.3 202.4 06:00:34.08 -28:59:00.43 1383.160 23.93 24.10 171.487 24.439 0.173 87 92 33 38 196 209 271
1463	11 J0558-2638 95.4 70.3 223.1 05:58:53.03 -26:38:45.91 1656.140 11.93 12.07 92.339 1.045 0.063 94 96 69 71 220 227 18
1464	12 J0617-2723 34.7 58.3 227.4 06:17:07.07 -27:23:50.65 1712.868 16.75 23.53 290.209 8.529 0.093 33 36 56 61 215 237 118
1465	13 J0558-2525 95.8 88.6 231.7 05:58:49.27 -25:25:33.60 1770.134 27.87 24.16 237.444 12.863 0.115 92 98 86 91 221 239 175
1466	14 J0600-2141 88.8 144.4 232.5 06:00:54.02 -21:41:57.06 1780.188 27.96 24.13 224.252 30.743 0.166 86 92 142 147 222 239 344 E
1467	15 J0615-2634 40.0 70.8 232.6 06:15:25.50 -26:34:20.04 1782.214 12.44 15.69 52.765 2.084 0.068 39 41 69 72 231 235 31
1468	16 J0604-2606 76.0 78.4 233.0 06:04:41.13 -26:06:21.19 1787.226 24.13 23.87 211.061 23.563 0.155 73 78 76 81 225 241 278
1469	17 J0601-2340 87.9 114.9 235.8 06:01:08.83 -23:40:19.37 1824.122 31.95 28.09 237.444 82.380 0.297 85 92 112 118 227 245 647
1470	18 J0615-2235 38.2 130.5 254.5 06:15:32.09 -22:35:37.24 2070.934 12.29 11.70 105.531 1.555 0.070 37 39 129 131 249 257 24
1471	19 J0617-2305 31.4 122.8 258.1 06:17:33.45 -23:05:28.94 2118.752 12.34 11.65 26.383 1.022 0.062 30 32 122 124 257 259 16
1472	20 J0612-2149 49.6 142.2 270.3 06:12:11.04 -21:49:29.72 2279.926 16.27 15.73 395.740 15.156 0.101 48 51 141 144 257 287 204
1473	21 J0616-2133 35.3 146.0 300.6 06:16:15.78 -21:33:09.69 2679.148 20.22 7.47 224.252 3.014 0.127 33 37 145 146 294 311 28 E
1474	22 J0555-2956 107.3 20.9 367.6 05:55:08.02 -29:56:09.08 3562.236 19.71 20.30 39.574 5.891 0.169 105 109 19 23 366 369 58
1475	23 J0557-2246 99.8 128.2 434.0 05:57:43.77 -22:46:42.95 4438.776 11.88 16.12 105.531 1.703 0.167 99 101 127 130 430 438 17 N
1476	24 J0616-2648 38.1 67.2 546.8 06:16:02.10 -26:48:35.49 5926.464 12.35 11.67 26.383 1.276 0.064 37 39 66 68 546 548 18
1477	25 J0552-2916 117.0 30.5 727.0 05:52:13.64 -29:16:58.02 8303.952 11.59 20.25 303.400 35.523 0.479 116 118 28 32 716 739 111
1478	\end{verbatim}
1479	}
1480	Note that the
1481	width of the table can make it hard to read. A good trick for those
1482	using UNIX/Linux is to make use of the \texttt{a2ps} command. The
1483	following works well, producing a postscript file \texttt{results.ps}:
1484	\\\verb\|a2ps -1 -r -f8 -o duchamp-Results.ps duchamp-Results.txt\|
1485
1486	%\end{landscape}
1487
1488	\newpage
1489	\section{Example VOTable output}
1490	\label{app-votable}
1491	This is part of the VOTable, in XML format, corresponding to the
1492	output file in Appendix~\ref{app-output} (the indentation has been
1493	removed to make it fit on the page).
1494
1495	%\begin{landscape}
1496	{\scriptsize
1497	\begin{verbatim}
1498	<?xml version="1.0"?>
1499	<VOTABLE version="1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
1500	xsi:noNamespaceSchemaLocation="http://www.ivoa.net/xml/VOTable/VOTable/v1.1">
1501	<COOSYS ID="J2000" equinox="J2000." epoch="J2000." system="eq_FK5"/>
1502	<RESOURCE name="Duchamp Output">
1503	<TABLE name="Detections">
1504	<DESCRIPTION>Detected sources and parameters from running the Duchamp source finder.</DESCRIPTION>
1505	<PARAM name="FITS file" datatype="char" ucd="meta.file;meta.fits" value="/DATA/SITAR_1/whi550/cubes/H201_abcde_luther_chop.fits"/>
1506	<PARAM name="Threshold" datatype="float" ucd="stat.snr" value="2.5">
1507	<PARAM name="ATrous note" datatype="char" ucd="meta.note" value="The a trous reconstruction method was used, with the following parameters.">
1508	<PARAM name="ATrous Dimension" datatype="int" ucd="meta.code;stat" value="3">
1509	<PARAM name="ATrous Cut" datatype="float" ucd="stat.snr" value="4">
1510	<PARAM name="ATrous Minimum Scale" datatype="int" ucd="stat.param" value="1">
1511	<PARAM name="ATrous Filter" datatype="char" ucd="meta.code;stat" value="B3 spline function">
1512	<FIELD name="ID" ID="col1" ucd="meta.id" datatype="int" width="4"/>
1513	<FIELD name="Name" ID="col2" ucd="meta.id;meta.main" datatype="char" arraysize="14"/>
1514	<FIELD name="RA" ID="col3" ucd="pos.eq.ra;meta.main" ref="J2000" datatype="float" width="10" precision="6" unit="deg"/>
1515	<FIELD name="Dec" ID="col4" ucd="pos.eq.dec;meta.main" ref="J2000" datatype="float" width="10" precision="6" unit="deg"/>
1516	<FIELD name="w_RA" ID="col3" ucd="phys.angSize;pos.eq.ra" ref="J2000" datatype="float" width="7" precision="2" unit="arcmin"/>
1517	<FIELD name="w_Dec" ID="col4" ucd="phys.angSize;pos.eq.dec" ref="J2000" datatype="float" width="7" precision="2" unit="arcmin"/>
1518	<FIELD name="Vel" ID="col4" ucd="phys.veloc;src.dopplerVeloc" datatype="float" width="9" precision="3" unit="km/s"/>
1519	<FIELD name="w_Vel" ID="col4" ucd="phys.veloc;src.dopplerVeloc;spect.line.width" datatype="float" width="8" precision="3" unit="km/s"/>
1520	<FIELD name="Integrated_Flux" ID="col4" ucd="phys.flux;spect.line.intensity" datatype="float" width="10" precision="3" unit="km/s"/>
1521	<DATA>
1522	<TABLEDATA>
1523	<TR>
1524	<TD> 1</TD><TD> J0609-2200</TD><TD> 92.410416</TD><TD>-22.013390</TD><TD> 48.50</TD><TD> 39.42</TD><TD> 213.061</TD><TD> 65.957</TD><TD> 17.572</TD>
1525	</TR>
1526	<TR>
1527	<TD> 2</TD><TD> J0608-2605</TD><TD> 92.042633</TD><TD>-26.085157</TD><TD> 44.47</TD><TD> 39.47</TD><TD> 233.119</TD><TD> 39.574</TD><TD> 4.144</TD>
1528	</TR>
1529	<TR>
1530	<TD> 3</TD><TD> J0606-2724</TD><TD> 91.637840</TD><TD>-27.412022</TD><TD> 52.48</TD><TD> 47.57</TD><TD> 302.213</TD><TD> 39.574</TD><TD> 17.066</TD>
1531	</TR>
1532	(... table truncated for clarity ...)
1533	</TABLEDATA>
1534	</DATA>
1535	</TABLE>
1536	</RESOURCE>
1537	</VOTABLE>
1538	\end{verbatim}
1539	}
1540	\end{landscape}
1541
1542	\newpage
1543	\section{Example Karma Annotation file output}
1544	\label{app-karma}
1545
1546	This is the format of the Karma Annotation file, showing the locations
1547	of the detected objects. This can be loaded by the plotting tools of
1548	the Karma package (for instance, \texttt{kvis}) as an overlay on the FITS
1549	file.
1550
1551	\begin{verbatim}
1552	# Duchamp Source Finder results for
1553	# cube /DATA/SITAR_1/whi550/cubes/H201_abcde_luther_chop.fits
1554	COLOR RED
1555	COORD W
1556	CIRCLE 92.3376 -21.9475 0.403992
1557	TEXT 92.3376 -21.9475 1
1558	CIRCLE 91.9676 -26.0193 0.37034
1559	TEXT 91.9676 -26.0193 2
1560	CIRCLE 91.5621 -27.3459 0.437109
1561	TEXT 91.5621 -27.3459 3
1562	CIRCLE 92.8285 -21.6344 0.269914
1563	TEXT 92.8285 -21.6344 4
1564	CIRCLE 90.1381 -28.9838 0.234179
1565	TEXT 90.1381 -28.9838 5
1566	CIRCLE 89.72 -26.6513 0.132743
1567	TEXT 89.72 -26.6513 6
1568	CIRCLE 94.2743 -27.4003 0.195175
1569	TEXT 94.2743 -27.4003 7
1570	CIRCLE 92.2739 -21.6941 0.134538
1571	TEXT 92.2739 -21.6941 8
1572	CIRCLE 89.7133 -25.4259 0.232252
1573	TEXT 89.7133 -25.4259 9
1574	CIRCLE 90.2206 -21.6993 0.266247
1575	TEXT 90.2206 -21.6993 10
1576	CIRCLE 93.8581 -26.5766 0.163153
1577	TEXT 93.8581 -26.5766 11
1578	CIRCLE 91.176 -26.1064 0.234356
1579	TEXT 91.176 -26.1064 12
1580	CIRCLE 90.2844 -23.6716 0.299509
1581	TEXT 90.2844 -23.6716 13
1582	CIRCLE 93.8774 -22.581 0.130925
1583	TEXT 93.8774 -22.581 14
1584	CIRCLE 94.3882 -23.0934 0.137108
1585	TEXT 94.3882 -23.0934 15
1586	CIRCLE 93.0491 -21.8223 0.202928
1587	TEXT 93.0491 -21.8223 16
1588	CIRCLE 94.0685 -21.5603 0.168456
1589	TEXT 94.0685 -21.5603 17
1590	CIRCLE 86.0568 -27.6095 0.101113
1591	TEXT 86.0568 -27.6095 18
1592	CIRCLE 88.7932 -29.9453 0.202624
1593	TEXT 88.7932 -29.9453 19
1594	\end{verbatim}
1595
1596	\newpage
1597	\section{Robust statistics for a Normal distribution}
1598	\label{app-madfm}
1599
1600	The Normal, or Gaussian, distribution for mean $\mu$ and standard
1601	deviation $\sigma$ can be written as
1602	\[
1603	f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}\ e^{-(x-\mu)^2/2\sigma^2}.
1604	\]
1605
1606	When one has a purely Gaussian signal, it is straightforward to
1607	estimate $\sigma$ by calculating the standard deviation (or rms) of
1608	the data. However, if there is a small amount of signal present on top
1609	of Gaussian noise, and one wants to estimate the $\sigma$ for the
1610	noise, the presence of the large values from the signal can bias the
1611	estimator to higher values.
1612
1613	An alternative way is to use the median ($m$) and median absolute
1614	deviation from the median ($s$) to estimate $\mu$ and $\sigma$. The
1615	median is the middle of the distribution, defined for a continuous
1616	distribution by
1617	\[
1618	\int_{-\infty}^{m} f(x) \diff x = \int_{m}^{\infty} f(x) \diff x.
1619	\]
1620	From symmetry, we quickly see that for the continuous Normal
1621	distribution, $m=\mu$. We consider the case henceforth of $\mu=0$,
1622	without loss of generality.
1623
1624	To find $s$, we find the distribution of the absolute deviation from
1625	the median, and then find the median of that distribution. This
1626	distribution is given by
1627	\begin{eqnarray*}
1628	g(x) &= &{\mbox{\rm distribution of }} \|x\|\\
1629	&= &f(x) + f(-x),\ x\ge0\\
1630	&= &\sqrt{\frac{2}{\pi\sigma^2}}\, e^{-x^2/2\sigma^2},\ x\ge0.
1631	\end{eqnarray*}
1632	So, the median absolute deviation from the median, $s$, is given by
1633	\[
1634	\int_{0}^{s} g(x) \diff x = \int_{s}^{\infty} g(x) \diff x.
1635	\]
1636	Now, $\int_{0}^{\infty}e^{-x^2/2\sigma^2} \diff x = \sqrt{\pi\sigma^2/2}$, and
1637	so $\int_{s}^{\infty} e^{-x^2/2\sigma^2} \diff x =
1638	\sqrt{\pi\sigma^2/2} - \int_{0}^{s} e^{-\frac{x^2}{2\sigma^2}} \diff x
1639	$. Hence, to find $s$ we simply solve the following equation (setting $\sigma=1$ for
1640	simplicity -- equivalent to stating $x$ and $s$ in units of $\sigma$):
1641	\[
1642	\int_{0}^{s}e^{-x^2/2} \diff x - \sqrt{\pi/8} = 0.
1643	\]
1644	This is hard to solve analytically (no nice analytic solution exists
1645	for the finite integral that I'm aware of), but straightforward to
1646	solve numerically, yielding the value of $s=0.6744888$. Thus, to
1647	estimate $\sigma$ for a Normally distributed data set, one can
1648	calculate $s$, then divide by 0.6744888 (or multiply by 1.4826042) to
1649	obtain the correct estimator.
1650
1651	Note that this is different to solutions quoted elsewhere,
1652	specifically in \citet{meyer04:trunc}, where the same robust estimator
1653	is used but with an incorrect conversion to standard deviation -- they
1654	assume $\sigma = s\sqrt{\pi/2}$. This, in fact, is the conversion used
1655	to convert the \emph{mean} absolute deviation from the mean to the
1656	standard deviation. This means that the cube noise in the \hipass\
1657	catalogue (their parameter Rms$_{\rm cube}$) should be 18\% larger
1658	than quoted.
1659
1660	\section{How Gaussian noise changes with wavelet scale}
1661	\label{app-scaling}
1662
1663	The key element in the wavelet reconstruction of an array is the
1664	thresholding of the individual wavelet coefficient arrays. This is
1665	usually done by choosing a level to be some number of standard
1666	deviations above the mean value.
1667
1668	However, since the wavelet arrays are produced by convolving the input
1669	array by an increasingly large filter, the pixels in the coefficient
1670	arrays become increasingly correlated as the scale of the filter
1671	increases. This results in the measured standard deviation from a
1672	given coefficient array decreasing with increasing scale. To calculate
1673	this, we need to take into account how many other pixels each pixel in
1674	the convolved array depends on.
1675
1676	To demonstrate, suppose we have a 1-D array with $N$ pixel values
1677	given by $F_i,\ i=1,...,N$, and we convolve it with the B$_3$-spline
1678	filter, defined by the set of coefficients
1679	$\{1/16,1/4,3/8,1/4,1/16\}$. The flux of the $i$th pixel in the
1680	convolved array will be
1681	\[
1682	F'_i = \frac{1}{16}F_{i-2} + \frac{1}{4}F_{i-1} + \frac{3}{8}F_{i}
1683	+ \frac{1}{4}F_{i+1} + \frac{1}{16}F_{i+2}
1684	\]
1685	and the flux of the corresponding pixel in the wavelet array will be
1686	\[
1687	W'_i = F_i - F'_i = \frac{-1}{16}F_{i-2} - \frac{1}{4}F_{i-1}
1688	+ \frac{5}{8}F_{i} - \frac{1}{4}F_{i+1} - \frac{1}{16}F_{i+2}
1689	\]
1690	Now, assuming each pixel has the same standard deviation
1691	$\sigma_i=\sigma$, we can work out the standard deviation for the
1692	wavelet array:
1693	\[
1694	\sigma'_i = \sigma \sqrt{\left(\frac{1}{16}\right)^2
1695	+ \left(\frac{1}{4}\right)^2 + \left(\frac{5}{8}\right)^2
1696	+ \left(\frac{1}{4}\right)^2 + \left(\frac{1}{16}\right)^2}
1697	= 0.72349\ \sigma
1698	\]
1699	Thus, the first scale wavelet coefficient array will have a standard
1700	deviation of 72.3\% of the input array. This procedure can be followed
1701	to calculate the necessary values for all scales, dimensions and
1702	filters used by \duchamp.
1703
1704	Calculating these values is clearly a critical step in performing the
1705	reconstruction. \citet{starck02:book} did so by simulating data sets
1706	with Gaussian noise, taking the wavelet transform, and measuring the
1707	value of $\sigma$ for each scale. We take a different approach, by
1708	calculating the scaling factors directly from the filter coefficients
1709	by taking the wavelet transform of an array made up of a 1 in the
1710	central pixel and 0s everywhere else. The scaling value is then
1711	derived by taking the square root of the sum (in quadrature) of all
1712	the wavelet coefficient values at each scale. We give the scaling
1713	factors for the three filters available to \duchamp\ on the following
1714	page. These values are hard-coded into \duchamp, so no on-the-fly
1715	calculation of them is necessary.
1716
1717	Memory limitations prevent us from calculating factors for large
1718	scales, particularly for the three-dimensional case (hence the --
1719	symbols in the tables). To calculate factors for higher scales than
1720	those available, we note the following relationships apply for large
1721	scales to a sufficient level of precision:
1722	\begin{itemize}
1723	\item 1-D: factor(scale $i$) = factor(scale $i-1$)$/\sqrt{2}$.
1724	\item 2-D: factor(scale $i$) = factor(scale $i-1$)$/2$.
1725	\item 1-D: factor(scale $i$) = factor(scale $i-1$)$/\sqrt{8}$.
1726	\end{itemize}
1727
1728	\newpage
1729	\begin{itemize}
1730	\item \textbf{B$_3$-Spline Function:} $\{1/16,1/4,3/8,1/4,1/16\}$
1731
1732	\begin{tabular}{llll}
1733	Scale & 1 dimension & 2 dimension & 3 dimension\\ \hline
1734	1 & 0.723489806 & 0.890796310 & 0.956543592\\
1735	2 & 0.285450405 & 0.200663851 & 0.120336499\\
1736	3 & 0.177947535 & 0.0855075048 & 0.0349500154\\
1737	4 & 0.122223156 & 0.0412474444 & 0.0118164242\\
1738	5 & 0.0858113122 & 0.0204249666 & 0.00413233507\\
1739	6 & 0.0605703043 & 0.0101897592 & 0.00145703714\\
1740	7 & 0.0428107206 & 0.00509204670 & 0.000514791120\\
1741	8 & 0.0302684024 & 0.00254566946 & --\\
1742	9 & 0.0214024008 & 0.00127279050 & --\\
1743	10 & 0.0151336781 & 0.000636389722 & --\\
1744	11 & 0.0107011079 & 0.000318194170 & --\\
1745	12 & 0.00756682272 & -- & --\\
1746	13 & 0.00535055108 & -- & --\\
1747	%14 & 0.00378341085 & -- & --\\
1748	%15 & 0.00267527545 & -- & --\\
1749	%16 & 0.00189170541 & -- & --\\
1750	%17 & 0.00133763772 & -- & --\\
1751	%18 & 0.000945852704 & -- & --
1752	\end{tabular}
1753
1754	\item \textbf{Triangle Function:} $\{1/4,1/2,1/4\}$
1755
1756	\begin{tabular}{llll}
1757	Scale & 1 dimension & 2 dimension & 3 dimension\\ \hline
1758	1 & 0.612372436 & 0.800390530 & 0.895954449 \\
1759	2 & 0.330718914 & 0.272878894 & 0.192033014\\
1760	3 & 0.211947812 & 0.119779282 & 0.0576484078\\
1761	4 & 0.145740298 & 0.0577664785 & 0.0194912393\\
1762	5 & 0.102310944 & 0.0286163283 & 0.00681278387\\
1763	6 & 0.0722128185 & 0.0142747506 & 0.00240175885\\
1764	7 & 0.0510388224 & 0.00713319703 & 0.000848538128 \\
1765	8 & 0.0360857673 & 0.00356607618 & 0.000299949455 \\
1766	9 & 0.0255157615 & 0.00178297280 & -- \\
1767	10 & 0.0180422389 & 0.000891478237 & -- \\
1768	11 & 0.0127577667 & 0.000445738098 & -- \\
1769	12 & 0.00902109930 & 0.000222868922 & -- \\
1770	13 & 0.00637887978 & -- & -- \\
1771	%14 & 0.00451054902 & -- & -- \\
1772	%15 & 0.00318942978 & -- & -- \\
1773	%16 & 0.00225527449 & -- & -- \\
1774	%17 & 0.00159471988 & -- & -- \\
1775	%18 & 0.000112763724 & -- & --
1776
1777	\end{tabular}
1778
1779	\item \textbf{Haar Wavelet:} $\{0,1/2,1/2\}$
1780
1781	\begin{tabular}{llll}
1782	Scale & 1 dimension & 2 dimension & 3 dimension\\ \hline
1783	1 & 0.707167810 & 0.433012702 & 0.935414347 \\
1784	2 & 0.500000000 & 0.216506351 & 0.330718914\\
1785	3 & 0.353553391 & 0.108253175 & 0.116926793\\
1786	4 & 0.250000000 & 0.0541265877 & 0.0413398642\\
1787	5 & 0.176776695 & 0.0270632939 & 0.0146158492\\
1788	6 & 0.125000000 & 0.0135316469 & 0.00516748303
1789
1790	\end{tabular}
1791
1792
1793	\end{itemize}
1794
1795	\end{document}

Note: See TracBrowser for help on using the repository browser.

Download in other formats: