Context Navigation

Guide.tex @ 543

Visit:

Last change on this file since 543 was 112, checked in by Matthew Whiting, 18 years ago

Several bug fixes:

The determination of the blank pixel value was not working correctly, due

to confusion between which out of par and head had the correct values.
getCube now reads the header values into the FitsHeader? class, and then these
are copied into the Param class by the new function Param::copyHeaderInfo.

The zoom box in the spectral output was scaling the flux scale by all data

points, and this caused problems when the MW channels were in view. These
channels are now omitted in the determination of the flux axis range.

The precision in the implied position given by the IAU name has been

increased -- it is now of the format J125345-362412 or G323.124+05.457.

Also added a CHANGES file, as we want to go to v.1.0.1, and updated
the version number in configure.ac.

File size: 83.2 KB

Line
1	\documentclass[12pt,a4paper]{article}
2
3	%%%%%% LINE SPACING %%%%%%%%%%%%
4	\usepackage{setspace}
5	\singlespacing
6	%\onehalfspacing
7	%\doublespacing
8
9	%Define a test for doing PDF format -- use different code below
10	\newif\ifPDF
11	\ifx\pdfoutput\undefined\PDFfalse
12	\else\ifnum\pdfoutput > 0\PDFtrue
13	\else\PDFfalse
14	\fi
15	\fi
16
17	\textwidth=161 mm
18	\textheight=245 mm
19	\topmargin=-15 mm
20	\oddsidemargin=0 mm
21	\parindent=6 mm
22
23	\usepackage[sort]{natbib}
24	\usepackage{lscape}
25	\bibpunct[,]{(}{)}{;}{a}{}{,}
26
27	\newcommand{\eg}{e.g.\ }
28	\newcommand{\ie}{i.e.\ }
29	\newcommand{\hi}{H{\sc i}}
30	\newcommand{\hipass}{{\sc hipass}}
31	\newcommand{\duchamp}{\emph{Duchamp}}
32	\newcommand{\atrous}{\textit{{\`a} trous}}
33	\newcommand{\Atrous}{\textit{{\`A} trous}}
34	\newcommand{\diff}{{\rm d}}
35	\newcommand{\entrylabel}[1]{\mbox{\textsf{\bf{#1:}}}\hfil}
36	\newenvironment{entry}
37	{\begin{list}{}%
38	{\renewcommand{\makelabel}{\entrylabel}%
39	\setlength{\labelwidth}{30mm}%
40	\setlength{\labelsep}{5pt}%
41	\setlength{\itemsep}{2pt}%
42	\setlength{\parsep}{2pt}%
43	\setlength{\leftmargin}{35mm}%
44	}%
45	}%
46	{\end{list}}
47
48
49	\title{Source Detection with \duchamp\ v1.0\\A User's Guide}
50	\author{Matthew Whiting\\
51	%{\small \href{mailto:Matthew.Whiting@csiro.au}{Matthew.Whiting@csiro.au}}\\
52	Australia Telescope National Facility\\CSIRO}
53	%\date{January 2006}
54	\date{}
55
56	% If we are creating a PDF, use different options for graphicx, hyperref.
57	\ifPDF
58	\usepackage[pdftex]{graphicx,color}
59	\usepackage[pdftex]{hyperref}
60	\hypersetup{colorlinks=true,%
61	citecolor=red,%
62	filecolor=red,%
63	linkcolor=red,%
64	urlcolor=red,%
65	}
66	\else
67	\usepackage[dvips]{graphicx}
68	\usepackage[dvips]{hyperref}
69	\fi
70
71	\pagestyle{headings}
72	\begin{document}
73
74	\maketitle
75	\thispagestyle{empty}
76	\begin{figure}[!h]
77	\begin{center}
78	\includegraphics[width=\textwidth]{cover_image}
79	\end{center}
80	\end{figure}
81
82	\newpage
83	\tableofcontents
84
85	\newpage
86	\section{Introduction and getting going quickly}
87
88	This document provides a user's guide to \duchamp, an object-finder
89	for use on spectral-line data cubes. The basic execution of
90	\duchamp\ is to read in a FITS data cube, find sources in the cube,
91	and produce a text file of positions, velocities and fluxes of the
92	detections, as well as a postscript file of the spectra of each
93	detection.
94
95	So, you have a FITS cube, and you want to find the sources in it. What
96	do you do? The first step is to make an input file that contains the
97	list of parameters. Brief and detailed examples are shown in
98	Appendix~\ref{app-input}. This file provides the input file name, the various
99	output files, and defines various parameters that control the
100	execution.
101
102	The standard way to run \duchamp\ is by the command
103	\begin{quote}
104	\texttt{Duchamp -p [parameter file]}
105	\end{quote}
106	replacing \texttt{[parameter file]} with the name of the file listing
107	the parameters. Alternatively, you can use the syntax
108	\begin{quote}
109	\texttt{Duchamp -f [FITS file]}
110	\end{quote}
111	where \texttt{[FITS file]} is the file you wish to search. In the latter
112	case, all parameters will take their default values detailed in
113	Appendix~\ref{app-param}. In either case, the program will then work
114	away and give you the list of detections and their spectra. The
115	program execution is summarised below, and detailed in
116	\S\ref{sec-flow}. Information on inputs is in \S\ref{sec-param} and
117	Appendix~\ref{app-param}, and descriptions of the output is in
118	\S\ref{sec-output}.
119
120	\subsection{A summary of the execution steps}
121
122	The basic flow of the program is summarised here -- all steps are
123	discussed in more detail in the following sections.
124	\begin{enumerate}
125	\item If the \texttt{-p} option is used, the parameter file given on
126	the command line is read in, and the parameters absorbed.
127	\item The FITS image is located and read in to memory.
128	\item If requested, a FITS image with a previously reconstructed array
129	is read in.
130	\item If requested, blank pixels are trimmed from the edges, and
131	the baseline of each spectrum is removed.
132	\item If the reconstruction method is requested, and the reconstructed
133	array has not been read in at Step 3 above, the cube is
134	reconstructed using the \atrous\ wavelet method.
135	\item Searching for objects then takes place, using the requested
136	thresholding method.
137	\item The list of objects is condensed by merging neighbouring objects
138	and removing those deemed unacceptable.
139	\item The baselines and trimmed pixels are replaced prior to output.
140	\item The details of the detections are written to screen and to the
141	requested output file.
142	\item Maps showing the spatial location of the detections are written.
143	\item The integrated spectra of each detection are written to a
144	postscript file.
145	\item If requested, the reconstructed array can be written to a new
146	FITS file.
147	\end{enumerate}
148
149	\subsection{Guide to terminology}
150
151	First, a brief note on the use of terminology in this guide. \duchamp\
152	is designed to work on FITS ``cubes''. These are FITS\footnote{FITS is
153	the Flexible Image Transport System -- see \citet{hanisch01} or
154	websites such as
155	\href{http://fits.cv.nrao.edu/FITS.html}{http://fits.cv.nrao.edu/FITS.html}
156	for details.} image arrays with three dimensions -- they are assumed
157	to have the following form: the first two dimensions (referred to as
158	$x$ and $y$) are spatial directions (that is, relating to the position
159	on the sky), while the third dimension, $z$, is the spectral
160	direction, which can correspond to frequency, wavelength, or
161	velocity. The three dimensional analogue of pixels are ``voxels'', or
162	volume cells -- a voxel is defined by a unique $(x,y,z)$ location and
163	has a unique flux or intensity value associated with it.
164
165	Each spatial pixel (a given $(x,y)$ coordinate) can be said to be a
166	single spectrum, while a slice through the cube perpendicular to the
167	spectral direction at a given $z$-value is a single channel (the 2-D
168	image is a channel map).
169
170	Detection involves locating a contiguous group of voxels with fluxes
171	above a certain threshold. \duchamp\ makes no assumptions as to the
172	size or shape of the detected features, other than having
173	user-selected minimum size criteria.
174
175	Features that are detected are assumed to be positive. The user can
176	choose to search for negative features by setting an input parameter
177	-- this inverts the cube prior to the search (see
178	\S\ref{sec-detection} for details).
179
180	Note that it is possible to run \duchamp\ on a two-dimensional image
181	(\ie one with no frequency or velocity information), or indeed a
182	one-dimensional array, and many of the features of the program will
183	work fine. The focus, however, is on object detection in three
184	dimensions.
185
186	\subsection{Why \duchamp?}
187
188	Well, it's important for a program to have a name, and the initial
189	working title of \emph{cubefind} was somewhat uninspiring. I wanted to
190	avoid the classic astronomical approach of designing a cute acronym,
191	and since it is designed to work on cubes, I looked at naming it after
192	a cubist. \emph{Picasso}, sadly, was already taken \citep{minchin99},
193	so I settled on naming it after Marcel Duchamp, another cubist, but
194	also one of the first artists to work with ``found objects''.
195
196	\section{User Inputs}
197	\label{sec-param}
198
199	Input to the program is provided by means of a parameter
200	file. Parameters are listed in the file, followed by the value that
201	should be assigned to them. The syntax used is \texttt{paramName
202	value}. Parameter names are not case-sensitive, and lines in the input
203	file that start with \texttt{\#} are ignored. If a parameter is listed
204	more than once, the latter value is used, but otherwise the order in
205	which the parameters are listed in the input file is arbitrary.
206
207	If a parameter is not listed, the default value is assumed. The
208	defaults are chosen to provide a good result (using the reconstruction
209	method), so the user doesn't need to specify many new parameters in
210	the input file. Note that the image file \textbf{must} be specified! The
211	parameters that can be set are listed in Appendix~\ref{app-param},
212	with their default values in parentheses.
213
214	The parameters with names starting with \texttt{flag} are stored as
215	\texttt{bool} variables, and so are either \texttt{true = 1} or
216	\texttt{false = 0}. \duchamp\ will only read them from the file as
217	integers, and so they should be entered in the file as 0 or 1 (see
218	example file in Appendix~\ref{app-input}).
219
220	\section{What \duchamp\ is doing}
221	\label{sec-flow}
222
223	The execution flow of \duchamp\ is detailed here, indicating the
224	main algorithmic steps that are used. The program is written in C/C++
225	and makes use of the {\sc cfitsio}, {\sc wcslib} and {\sc pgplot}
226	libraries.
227
228	%\subsection{Parameter input}
229	%
230	%The user provides parameters that govern the selection of files and
231	%the parameters used by the various subroutines in the program. This is
232	%done via a parameter file, and the parameters are stored in a C++
233	%class for use throughout the program. The form of the parameter file is
234	%discussed in \S\ref{sec-param}, and the parameters themselves are
235	%listed in Appendix~\ref{app-param}.
236
237	\subsection{Image input}
238	\label{sec-input}
239
240	The cube is read in using basic {\sc cfitsio} commands, and stored as
241	an array in a special C++ class. This class keeps track of
242	the list of detected objects, as well as any reconstructed arrays that
243	are made (see \S\ref{sec-recon}). The World Coordinate System (WCS)
244	information for the cube is also obtained from the FITS header by {\sc
245	wcslib} functions \citep{greisen02, calabretta02}, and this
246	information, in the form of a \texttt{wcsprm} structure, is also stored
247	in the same class.
248
249	A sub-section of an image can be requested via the \texttt{subsection}
250	parameter in the parameter file -- this can be a good idea if the cube
251	has very noisy edges, which may produce many spurious detections. The
252	generalised form of the subsection that is used by {\sc cfitsio} is
253	\texttt{[x1:x2:dx,y1:y2:dy,z1:z2:dz]}, such that the x-coordinates run
254	from \texttt{x1} to \texttt{x2} (inclusive), with steps of
255	\texttt{dx}. The step value can be omitted (so a subsection of the
256	form \texttt{[2:50,2:50,10:1000]} is still valid). \duchamp\ does not
257	make use of any step value present in the subsection string, and any
258	that are present are removed before the file is opened.
259
260	If one wants the full range of a coordinate then replace the range
261	with an asterisk, \eg \texttt{[2:50,2:50,*]}. If one wants to use a
262	subsection, one must set \texttt{flagSubsection = 1}. A complete
263	description of the section syntax can be found at the {\sc fitsio} web
264	site
265	\footnote{
266	\href{http://heasarc.gsfc.nasa.gov/docs/software/fitsio/c/c\_user/node90.html}%
267	{http://heasarc.gsfc.nasa.gov/docs/software/fitsio/c/c\_user/node90.html}}.
268
269	\subsection{Image modification}
270	\label{sec-modify}
271
272	Several modifications to the cube can be made that improve the
273	execution and efficiency of \duchamp\ (these are optional -- their
274	use is indicated by the relevant flags set in the input parameter
275	file).
276
277	\subsubsection{Blank pixel removal}
278
279	First, the cube is trimmed of any BLANK pixels that pad the image out
280	to a rectangular shape. This is optional, its use determined by the
281	\texttt{flagBlankPix} parameter. The value for these pixels is read from
282	the FITS header (using the BLANK, BSCALE and BZERO keywords), but if
283	these are not present then the value can be specified by the user in
284	the parameter file using \texttt{blankPixValue}.
285
286	This stage is particularly important for the reconstruction step, as
287	lots of BLANK pixels on the edges will smooth out features in the
288	wavelet calculation stage. The trimming will also reduce the size of
289	the cube's array, speeding up the execution. The amount of trimming is
290	recorded, and these pixels are added back in once the source-detection
291	is completed (so that quoted pixel positions are applicable to the
292	original cube).
293
294	Rows and columns are trimmed one at a time until the first non-BLANK
295	pixel is reached, so that the image remains rectangular. In practice,
296	this means that there will be BLANK pixels left in the trimmed image
297	(if the non-BLANK region is non-rectangular). However, these are
298	ignored in all further calculations done on the cube.
299
300	\subsubsection{Baseline removal}
301
302	Second, the user may request the removal of baselines from the
303	spectra, via the parameter \texttt{flagBaseline}. This may be necessary
304	if there is a strong baseline ripple present, which can result in
305	spurious detections at the high points of the ripple. The baseline is
306	calculated from a wavelet reconstruction procedure (see
307	\S\ref{sec-recon}) that keeps only the two largest scales. This is
308	done separately for each spatial pixel (\ie for each spectrum in the
309	cube), and the baselines are stored and added back in before any
310	output is done. In this way the quoted fluxes and displayed spectra
311	are as one would see from the input cube itself -- even though the
312	detection (and reconstruction if applicable) is done on the
313	baseline-removed cube.
314
315	The presence of very strong signals (for instance, masers at several
316	hundred Jy) can affect the determination of the baseline, leading to a
317	large dip centred on the signal in the baseline-subtracted
318	spectrum. To prevent this, the signal is trimmed prior to the
319	reconstruction process at some standard threshold (at $8\sigma$ above
320	the mean). The baseline determined should thus be representative of
321	the true, signal-free baseline. Note that this trimming is only a
322	temporary measure which does not affect the source-detection.
323
324	\subsubsection{Ignoring bright Milky Way emission}
325
326	Finally, a single set of contiguous channels can be ignored -- these
327	may exhibit very strong emission, such as that from the Milky Way as
328	seen in extragalactic \hi\ cubes (hence the references to ``Milky
329	Way'' in relation to this task -- apologies to Galactic
330	astronomers!). Such dominant channels will produce many detections
331	that are unnecessary, uninteresting (if one is interested in
332	extragalactic \hi) and large (in size and hence in memory usage), and
333	so will slow the program down and detract from the interesting
334	detections. The use of this feature is controlled by the
335	\texttt{flagMW} parameter, and the exact channels concerned are able
336	to be set by the user (using \texttt{maxMW} and \texttt{minMW} --
337	these give an inclusive range of channels). When employed, these
338	channels are temporarily blanked out for the searching, and the
339	scaling of the spectral output (see Fig.~\ref{fig-spect}) will not
340	take them into account. They will be present in the reconstructed
341	array, however, and so will be included in the saved FITS file (see
342	\S\ref{sec-reconIO}). When the final spectra are plotted, the range of
343	channels covered by these parameters is indicated by a green hashed
344	box.
345
346	\subsection{Image reconstruction}
347	\label{sec-recon}
348
349	The user can direct \duchamp\ to reconstruct the data cube using the
350	\atrous\ wavelet procedure. A good description of the procedure can be
351	found in \citet{starck02:book}. The reconstruction is an effective way
352	of removing a lot of the noise in the image, allowing one to search
353	reliably to fainter levels, and reducing the number of spurious
354	detections. This is an optional step, but one that greatly enhances
355	the source-detection process, with the payoff that it can be
356	relatively time- and memory-intensive.
357
358	\subsubsection{Algorithm}
359
360	The steps in the \atrous\ reconstruction are as follows:
361	\begin{enumerate}
362	\item Set the reconstructed array to 0 everywhere.
363	\item The input array is discretely convolved with a given filter
364	function. This is determined from the parameter file via the
365	\texttt{filterCode} parameter -- see Appendix~\ref{app-param} for
366	details on the filters available.
367	\item The wavelet coefficients are calculated by taking the difference
368	between the convolved array and the input array.
369	\item If the wavelet coefficients at a given point are above the
370	requested threshold (given by \texttt{snrRecon} as the number of
371	$\sigma$ above the mean and adjusted to the current scale -- see
372	Appendix~\ref{app-scaling}), add these to the reconstructed array.
373	\item The separation of the filter coefficients is doubled. (Note that
374	this step provides the name of the procedure\footnote{\atrous\ means
375	``with holes'' in French.}, as gaps or holes are created in the
376	filter coverage.)
377	\item The procedure is repeated from step 2, using the convolved array
378	as the input array.
379	\item Continue until the required maximum number of scales is reached.
380	\item Add the final smoothed (\ie convolved) array to the
381	reconstructed array. This provides the ``DC offset'', as each of the
382	wavelet coefficient arrays will have zero mean.
383	\end{enumerate}
384
385	The reconstruction has at least two iterations. The first iteration
386	makes a first pass at the wavelet reconstruction (the process outlined
387	in the 8 stages above), but the residual array will inevitably have
388	some structure still in it, so the wavelet filtering is done on the
389	residual, and any significant wavelet terms are added to the final
390	reconstruction. This step is repeated until the change in the $\sigma$
391	of the background is less than some fiducial amount.
392
393	It is important to note that the \atrous\ decomposition is an
394	example of a ``redundant'' transformation. If no thresholding is
395	performed, the sum of all the wavelet coefficient arrays and the final
396	smoothed array is identical to the input array. The thresholding thus
397	removes only the unwanted structure in the array.
398
399	Note that any BLANK pixels that are still in the cube will not be
400	altered by the reconstruction -- they will be left as BLANK so that
401	the shape of the valid part of the cube is preserved.
402
403	\subsubsection{Note on Statistics}
404
405	The correct calculation of the reconstructed array needs good
406	estimation of the underlying mean and standard deviation of the
407	background noise distribution. These statistics are estimated using
408	robust methods, to avoid corruption by strong outlying points. The
409	mean of the distribution is actually estimated by the median, while
410	the median absolute deviation from the median (MADFM) is calculated
411	and corrected assuming Gaussianity to estimate the underlying standard
412	deviation $\sigma$. The Gaussianity (or Normality) assumption is
413	critical, as the MADFM does not give the same value as the usual rms
414	or standard deviation value -- for a normal distribution
415	$N(\mu,\sigma)$ we find MADFM$=0.6744888\sigma$. The difference
416	between the MADFM and $\sigma$ is corrected for, so the user need only
417	think in the usual multiples of $\sigma$ when setting
418	\texttt{snrRecon}. See Appendix~\ref{app-madfm} for a derivation of
419	this value.
420
421	When thresholding the different wavelet scales, the value of $\sigma$
422	as measured from the wavelet array needs to be scaled to account for the
423	increased amount of correlation between neighbouring pixels (due to
424	the convolution). See Appendix~\ref{app-scaling} for details on this
425	scaling.
426
427	\subsubsection{User control of reconstruction parameters}
428
429	The most important parameter for the user to select in relation to the
430	reconstruction is the threshold for each wavelet array. This is set
431	using the \texttt{snrRecon} parameter, and is given as a multiple of the
432	rms (estimated by the MADFM) above the mean (which for the wavelet
433	arrays should be approximately zero). There are several other
434	parameters that can be altered as well that affect the outcome of the
435	reconstruction.
436
437	By default, the cube is reconstructed in three dimensions, using a
438	3-dimensional filter and 3-dimensional convolution. This can be
439	altered, however, using the parameter \texttt{reconDim}. If set to 1,
440	this means the cube is reconstructed by considering each spectrum
441	separately, whereas \texttt{reconDim=2} will mean the cube is
442	reconstructed by doing each channel map separately. The merits of
443	these choices are discussed in \S\ref{sec-notes}, but it should be
444	noted that a 2-dimensional reconstruction can be susceptible to edge
445	effects if the spatial shape is not rectangular.
446
447	The user can also select the minimum scale to be used in the
448	reconstruction -- the first scale exhibits the highest frequency
449	variations, and so ignoring this one can sometimes be beneficial in
450	removing excess noise. The default, however, is to use all scales
451	(\texttt{minscale = 1}).
452
453	Finally, the filter that is used for the convolution can be selected
454	by using \texttt{filterCode} and the relevant code number -- the
455	choices are listed in Appendix~\ref{app-param}. A larger filter will
456	give a better reconstruction, but take longer and use more memory when
457	executing. When multi-dimensional reconstruction is selected, this
458	filter is used to construct a 2- or 3-dimensional equivalent.
459
460	\subsection{Reconstruction I/O}
461	\label{sec-reconIO}
462
463	The reconstruction stage can be relatively time-consuming, particularly
464	for large cubes and reconstructions in 3-D. To get around this, \duchamp\
465	provides a shortcut to allow users to perform multiple searches (\eg with
466	different thresholds) on the same reconstruction without calculating the
467	reconstruction each time.
468
469	The first step is to choose to save the reconstructed array as a FITS
470	file by setting \texttt{flagOutputRecon = true}. The file will be saved
471	in the same directory as the input image, so the user needs to have write
472	permissions for that directory.
473
474	The filename will be derived from the input filename, with extra
475	information detailing the reconstruction that has been done. For
476	example, suppose \texttt{image.fits} has been reconstructed using a
477	3-dimensional reconstruction with filter 2, thresholded at $4\sigma$
478	using all scales. The output filename will then be
479	\texttt{image.RECON-3-2-4-1.fits} (\ie it uses the four parameters
480	relevant for the \atrous\ reconstruction as listed in
481	Appendix~\ref{app-param}). The new FITS file will also have these
482	parameters as header keywords. If a subsection of the input image has
483	been used (see \S\ref{sec-input}), the format of the output filename
484	will be \texttt{image.sub.RECON-3-2-4-1.fits}, and the subsection that
485	has been used is also stored in the FITS header.
486
487	Likewise, the residual image, defined as the difference between the input
488	and reconstructed arrays, can also be saved in the same manner by setting
489	\texttt{flagOutputResid = true}. Its filename will be the same as above,
490	with RESID replacing RECON.
491
492	If a reconstructed image has been saved, it can be read in and used
493	instead of redoing the reconstruction. To do so, the user should set
494	\texttt{flagReconExists = true}. The user can indicate the name of the
495	reconstructed FITS file using the \texttt{reconFile} parameter, or, if
496	this is not specified, \duchamp\ searches for the file with the name
497	as defined above. If the file is not found, the reconstruction is
498	performed as normal. Note that to do this, the user needs to set
499	\texttt{flagAtrous = true} (obviously, if this is \texttt{false}, the
500	reconstruction is not needed).
501
502	\subsection{Searching the image}
503	\label{sec-detection}
504
505	The image is searched for detections in two ways: spectrally (a
506	1-dimensional search in the spectrum in each spatial pixel), and
507	spatially (a 2-dimensional search in the spatial image in each
508	channel). In both cases, the algorithm finds connected pixels that are
509	above the user-specified threshold. In the case of the spatial image
510	search, the algorithm of \citet{lutz80} is used to raster scan through
511	the image and connect groups of pixels on neighbouring rows.
512
513	Note that this algorithm cannot be applied directly to a 3-dimensional
514	case, as it requires that objects are completely nested in a row: that
515	is, if you are scanning along a row, and one object finishes and
516	another starts, you know that you will not get back to the first one
517	(if at all) until the second is completely finished for that
518	row. Three-dimensional data does not have this property, which is why
519	we break up the searching into 1- and 2-dimensional cases.
520
521	The determination of the threshold is done in one of two ways. The
522	first way is a simple sigma-clipping, where a threshold is set at a
523	fixed number $n$ of standard deviations above the mean, and pixels
524	above this threshold are flagged as detected. The value of $n$ is set
525	with the parameter \texttt{snrCut}. As before, the value of the
526	standard deviation is estimated by the MADFM, and corrected by the
527	ratio derived in Appendix~\ref{app-madfm}.
528
529	The second method uses the False Discovery Rate (FDR) technique
530	\citep{miller01,hopkins02}, whose basis we briefly detail here. The
531	false discovery rate (given by the number of false detections divided
532	by the total number of detections) is fixed at a certain value
533	$\alpha$ (\eg $\alpha=0.05$ implies 5\% of detections are false
534	positives). In practice, an $\alpha$ value is chosen, and the ensemble
535	average FDR (\ie $\langle FDR \rangle$) when the method is used will
536	be less than $\alpha$. One calculates $p$ -- the probability,
537	assuming the null hypothesis is true, of obtaining a test statistic as
538	extreme as the pixel value (the observed test statistic) -- for each
539	pixel, and sorts them in increasing order. One then calculates $d$
540	where
541	\[
542	d = \max_j \left\{ j : P_j < \frac{j\alpha}{c_N N} \right\},
543	\]
544	and then rejects all hypotheses whose $p$-values are less than or equal
545	to $P_d$. (So a $P_i<P_d$ will be rejected even if $P_i \geq
546	j\alpha/c_N N$.) Note that ``reject hypothesis'' here means ``accept
547	the pixel as an object pixel'' (\ie we are rejecting the null
548	hypothesis that the pixel belongs to the background).
549
550	The $c_N$ values here are normalisation constants that depend on the
551	correlated nature of the pixel values. If all the pixels are
552	uncorrelated, then $c_N=1$. If $N$ pixels are correlated, then their
553	tests will be dependent on each other, and so $c_N = \sum_{i=1}^N
554	i^{-1}$. \citet{hopkins02} consider real radio data, where the pixels
555	are correlated over the beam. In this case the sum is made over the
556	$N$ pixels that make up the beam. The value of $N$ is calculated from
557	the FITS header (if the correct keywords -- BMAJ, BMIN -- are not
558	present, a default value of 10 pixels is assumed).
559
560	The theory behind the FDR method implies a direct connection between the
561	choice of $\alpha$ and the fraction of detections that will be false
562	positives. However, due to the merging process, this direct connection is
563	lost when looking at the final number of detections -- see discussion in
564	\S\ref{sec-notes}. The effect is that the number of false detections will
565	be less than indicated by the $\alpha$ value used.
566
567	If a reconstruction has been made, the residuals (defined as original
568	$-$ reconstruction) are used to estimate the noise parameters of the
569	cube. Otherwise they are estimated directly from the cube itself. In
570	both cases, robust estimators are used as described above.
571
572	Detections must have a minimum number of pixels to be counted. This
573	minimum number is given by the input parameters \texttt{minPix} (for
574	2-dimensional searches) and \texttt{minChannels} (for 1-dimensional
575	searches).
576
577	The search only looks for positive features. If one is interested
578	instead in negative features (such as absorption lines), set the
579	parameter \texttt{flagNegative = true}. This will invert the cube (\ie
580	multiply all pixels by $-1$) prior to the search, and then re-invert
581	the cube (and the fluxes of any detections) after searching is
582	complete. All outputs are done in the same manner as normal, so that
583	fluxes of detections will be negative.
584
585	\subsection{Merging detected objects}
586	\label{sec-merger}
587
588	The searching step produces a list of detected objects that will have many
589	repeated detections of a given object -- for instance, spectral
590	detections in adjacent pixels of the same object and/or spatial
591	detections in neighbouring channels. These are then combined in an
592	algorithm that matches all objects judged to be ``close''. This
593	determination is made in one of two ways.
594
595	One way is to define two thresholds -- one spatial and one in velocity
596	-- and say that two objects should be merged if there is at least one
597	pair of pixels that lie within these threshold distances of each
598	other. These thresholds are specified by the parameters
599	\texttt{threshSpatial} and \texttt{threshVelocity} (in units of pixels
600	and channels respectively).
601
602	Alternatively, the spatial requirement can be changed to say that
603	there must be a pair of pixels that are \emph{adjacent} -- a stricter,
604	but perhaps more realistic requirement, particularly when the spatial pixels
605	have a large angular size (as is the case for \hi\ surveys). This
606	method can be selected by setting the parameter
607	\texttt{flagAdjacent} to 1 (\ie \texttt{true}) in the parameter file. The
608	velocity thresholding is done in the same way as the first option.
609
610	Once the detections have been merged, they may be ``grown''. This is a
611	process of increasing the size of the detection by adding adjacent
612	pixels that are above some secondary threshold. This threshold is
613	lower than the one used for the initial detection, but above the noise
614	level, so that faint pixels are only detected when they are close to a
615	bright pixel. The value of this threshold is a possible input
616	parameter (\texttt{growthCut}), with a default value of $1.5\sigma$. The
617	use of the growth algorithm is controlled by the \texttt{flagGrowth}
618	parameter -- the default value of which is \texttt{false}. If the
619	detections are grown, they are sent through the merging algorithm a
620	second time, to pick up any detections that now overlap or have grown
621	over each other.
622
623	Finally, to be accepted, the detections must span \emph{both} a minimum
624	number of channels (to remove any spurious single-channel spikes that
625	may be present), and a minimum number of spatial pixels. These
626	numbers, as for the original detection step, are set with the
627	\texttt{minChannels} and \texttt{minPix} parameters. The channel
628	requirement means there must be at least one set of \texttt{minChannels}
629	consecutive channels in the source for it to be accepted.
630
631	\section{Outputs}
632	\label{sec-output}
633
634	\subsection{During execution}
635
636	\duchamp\ provides the user with feedback whilst it is running, to
637	keep the user informed on the progress of the analysis. Most of this
638	consists of self-explanatory messages about the particular stage the
639	program is up to. The relevant parameters are printed to the screen at
640	the start (once the file has been successfully read in), so the user
641	is able to make a quick check that the setup is correct (see
642	Appendix~{app-input} for an example).
643
644	If the cube is being trimmed (\S\ref{sec-modify}), the resulting
645	dimensions are printed to indicate how much has been trimmed. If a
646	reconstruction is being done, a continually updating message shows
647	either the current iteration and scale, compared to the maximum scale
648	(when \texttt{reconDim=3}), or a progress bar showing the amount of
649	the cube that has been reconstructed (for smaller values of
650	\texttt{reconDim}).
651
652	During the searching algorithms, the progress through the 1D and 2D
653	searches are shown. When the searches have completed,
654	the number of objects found in both the 1D and 2D searches are
655	reported (see \S\ref{sec-detection} for details).
656
657	In the merging process (where multiple detections of the same object
658	are combined -- see \S\ref{sec-merger}), two stages of output
659	occur. The first is when each object in the list is compared with all
660	others. The output shows two numbers: the first being how far through
661	the list the current object is, and the second being the length of the
662	list. As the algorithm proceeds, the first number should increase and
663	the second should decrease (as objects are combined). When the numbers
664	meet (\ie the whole list has been compared), the second phase begins,
665	in which multiply-appearing pixels in each object are removed, as are
666	objects not meeting the minimum channels requirement. During this
667	phase, the total number of accepted objects is shown, which should
668	steadily increase until all have been accepted or rejected. Note that
669	these steps can be very quick for small numbers of detections.
670
671	Since this continual printing to screen has some overhead of time and
672	CPU involved, the user can elect to not print this information by
673	setting the parameter \texttt{verbose = 0}. In this case, the user is
674	still informed as to the steps being undertaken, but the details of
675	the progress are not shown.
676
677	\subsection{Results}
678
679	\subsubsection{Table of Results}
680
681	Finally, we get to the results -- the reason for running \duchamp\ in
682	the first place. Once the detection list is finalised, it is sorted by
683	the mean velocity of the detections (or, if there is no good WCS
684	associated with the cube, by the mean Z-pixel position). The results
685	are then printed to the screen and to the output file, given by the
686	\texttt{OutFile} parameter. The results list, an example of which can be
687	seen in Appendix~\ref{app-output}, contains the following columns
688	(note that the title of the columns depending on WCS information will
689	depend on the projection of the WCS):
690
691	\begin{entry}
692	\item[Obj\#] The ID number of the detection (simply the sequential
693	count for the list, which is ordered by increasing velocity).
694	\item[Name] The IAU-format name of the detection (derived from the WCS
695	projection).
696	\item[X] The average X-pixel position.
697	\item[Y] The average Y-pixel position.
698	\item[Z] The average Z-pixel position.
699	\item[RA/GLON] The Right Ascension or Galactic Longitude of the centre
700	of the object.
701	\item[DEC/GLAT] The Declination or Galactic Latitude of the centre of
702	the object.
703	\item[VEL] The mean velocity of the object [units given by the
704	\texttt{spectralUnits} parameter].
705	\item[w\_RA/w\_GLON] The width of the object in Right Ascension or
706	Galactic Longitude [arcmin].
707	\item[w\_DEC/w\_GLAT] The width of the object in Declination Galactic
708	Latitude [arcmin].
709	\item[w\_VEL] The full velocity width of the detection (max channel
710	$-$ min channel, in velocity units [see note below]).
711	\item[F\_int] The integrated flux over the object, in the units of
712	flux times velocity, corrected for the beam if necessary.
713	\item[F\_peak] The peak flux over the object, in the units of flux.
714	\item[X1, X2] The minimum and maximum X-pixel coordinates.
715	\item[Y1, Y2] The minimum and maximum Y-pixel coordinates.
716	\item[Z1, Z2] The minimum and maximum Z-pixel coordinates.
717	\item[Npix] The number of voxels (\ie distinct $(x,y,z)$ coordinates)
718	in the detection.
719	\item[Flag] Whether the detection has any warning flags (see below).
720	\end{entry}
721	The Name is derived from the WCS position. For instance, a source
722	centred on the RA,Dec position 12$^h$53$^m$45$^s$,
723	-36$^\circ$24$'$12$''$ will be called J125345$-$362412 (if the epoch
724	is J2000) or B125345$-$362412 (if B1950). An alternative form is used
725	for Galactic coordinates: a source centred on the position ($l$,$b$) =
726	(323.1245, 5.4567) will be called G323.124$+$05.457. If the WCS is not
727	valid (\ie is not present or does not have all the necessary
728	information), the Name, RA, DEC, VEL and related columns are not
729	printed, but the pixel coordinates are still provided.
730
731	The velocity units can be specified by the user, using the parameter
732	\texttt{spectralUnits} (enter it as a single string). The default value
733	is km/s, which should be suitable for most users. These units are also
734	used to give the units of integrated flux.
735
736	The last column contains any warning flags about the detection. There
737	are currently two options here. An `E' is printed if the detection is
738	next to the edge of the image, meaning either the limit of the pixels,
739	or the limit of the non-BLANK pixel region. An `N' is printed if the
740	total flux, summed over all the (non-BLANK) pixels in the smallest box
741	that completely encloses the detection, is negative. Note that this
742	sum is likely to include non-detected pixels. It is of use in
743	pointing out detections that lie next to strongly negative pixels,
744	such as might arise due to interference -- the detected pixels might
745	then also be due to the interference, so caution is advised.
746
747	\subsubsection{Other results lists}
748
749	Two alternative results files can also be requested. One option is a
750	VOTable-format XML file, containing just the RA, Dec, Velocity and the
751	corresponding widths of the detections, as well as the fluxes. The
752	user should set \texttt{flagVOT = 1}, and put the desired filename in the
753	parameter \texttt{votFile} -- note that the default is for it not to be
754	produced. This file should be compatible with all Virtual Observatory
755	tools (such as Aladin\footnote{ Aladin can be found on the web at
756	\href{http://aladin.u-strasbg.fr/}{http://aladin.u-strasbg.fr/}}). The
757	second option is an annotation file for use with the Karma toolkit of
758	visualisation tools (in particular, with \texttt{kvis}). This will draw a
759	circle at the position of each detection, and number it according to
760	the Obj\# given above. To make use of this option, the user should
761	set \texttt{flagKarma = 1}, and put the desired filename in the parameter
762	\texttt{karmaFile} -- again, the default is for it not to be produced.
763
764	As the program is running, it also (optionally) records the detections
765	made in each individual spectrum or channel (see
766	\S\ref{sec-detection} for details on this process). This is
767	recorded in the file given by the parameter \texttt{LogFile}. This file
768	does not include the columns \texttt{Name, RA, DEC, w\_RA, w\_DEC, VEL,
769	w\_VEL}. This file is designed primarily for diagnostic purposes: \eg
770	to see if a given set of pixels is detected in, say, one channel
771	image, but does not survive the merging process. The list of pixels
772	(and their fluxes) in the final detection list are also printed to
773	this file, again for diagnostic purposes. This feature can be turned
774	off by setting \texttt{flagLog = false}. (This may be a good idea if you
775	are not interested in its contents, as it can be a large file.)
776
777	\begin{figure}[t]
778	\begin{center}
779	\includegraphics[width=\textwidth]{example_spectrum}
780	\end{center}
781	\caption{\footnotesize An example of the spectrum output. Note several
782	of the features discussed in the text: the red lines indicating the
783	reconstructed spectrum; the blue dashed lines indicating the
784	spectral extent of the detection; the green hashed area indicating
785	the Milky Way channels that are ignored by the searching algorithm;
786	the blue border showing its spatial extent on the 0th moment map;
787	and the 15~arcmin-long scale bar.}
788	\label{fig-spect}
789	\end{figure}
790
791	\begin{figure}[!t]
792	\begin{center}
793	\includegraphics[width=\textwidth]{example_moment_map}
794	\end{center}
795	\caption{\footnotesize An example of the moment map created by
796	\duchamp. The full extent of the cube is covered, and the 0th moment
797	of each object is shown (integrated individually over all the
798	detected channels).}
799	\label{fig-moment}
800	\end{figure}
801
802	\subsubsection{Graphical output -- spectra}
803
804	As well as the output data file, a postscript file is created that
805	shows the spectrum for each detection, together with a small cutout
806	image (the 0th moment) and basic information about the detection (note
807	that any flags are printed after the name of the detection, in the
808	format \texttt{[E]}). If the cube was reconstructed, the spectrum from
809	the reconstruction is shown in red, over the top of the original
810	spectrum. The spectral extent of the detected object is indicated by
811	two dashed blue lines, and the region covered by the ``Milky Way''
812	channels is shown by a green hashed box.
813
814	The spectrum that is plotted is governed by the
815	\texttt{spectralMethod} parameter. It can be either \texttt{peak},
816	where the spectrum is from the spatial pixel containing the
817	detection's peak flux; or \texttt{sum}, where the spectrum is summed
818	over all spatial pixels, and then corrected for the beam size.
819
820	The spectral extent of the detection is indicated with blue lines, and
821	a zoom is shown in a separate window. The cutout image can optionally
822	include a border around the spatial pixels that are in the detection
823	(turned on and off by the parameter \texttt{drawBorders} -- the
824	default is \texttt{true}). It also includes a scale bar in the bottom
825	left corner to indicate size -- it is 15~arcmin long (note that due to
826	projection effects it may be a slightly different physical length from
827	object to object). An example detection can be seen below in
828	Fig.~\ref{fig-spect}.
829
830	\subsubsection{Graphical output -- maps}
831
832	Finally, a couple of images are optionally produced: a 0th moment map
833	of the cube, combining just the detected channels in each object,
834	showing the integrated flux in grey-scale; and a ``detection image'',
835	a grey-scale image where the pixel values are the number of channels
836	that spatial pixel is detected in. In both cases, if
837	\texttt{drawBorders = true}, a border is drawn around the spatial
838	extent of each detection. An example moment map is shown in
839	Fig.~\ref{fig-moment}. The production or otherwise of these images is
840	governed by the \texttt{flagMaps} parameter.
841
842	The purpose of these images are to provide a visual guide to where the
843	detections have been made, and, particularly in the case of the moment
844	map, to provide an indication of the strength of the source. In both
845	cases, the detections are numbered (in the same sense as the output
846	list), and the spatial borders are marked out as for the cutout images
847	in the spectra file. Both these images are saved as postscript files
848	(given by the parameters \texttt{momentMap} and \texttt{detectionMap}
849	respectively), with the latter also displayed in a {\sc pgplot}
850	window (regardless of the state of \texttt{flagMaps}).
851
852	\section{Notes and hints on the use of \duchamp}
853	\label{sec-notes}
854
855	In using \duchamp, the user has to make a number of decisions about
856	the way the program runs. This section is designed to give the user
857	some idea about what to choose.
858
859	The main choice is whether or not to use the wavelet
860	reconstruction. The main benefits of this are the marked reduction in
861	the noise level, leading to regularly-shaped detections, and good
862	reliability for faint sources. The main drawback with its use is the
863	long execution time: to reconstruct a $170\times160\times1024$
864	(\hipass) cube often requires three iterations and takes about 20-25
865	minutes to run completely. Note that this is for the three-dimensional
866	reconstruction: using \texttt{reconDim=1} makes the reconstruction
867	quicker (the full program then takes about 6 minutes), but it is still
868	the largest part of the time.
869
870	The searching part of the procedure is much quicker: searching an
871	un-reconstructed cube leads to execution times of only a couple of
872	minutes. Alternatively, using the ability to read in previously-saved
873	reconstructed arrays makes running the reconstruction more than once a
874	more feasible prospect.
875
876	On the positive side, the shape of the detections in a cube that has
877	been reconstructed will be much more regular and smooth -- the ragged
878	edges that objects in the raw cube possess are smoothed by the removal
879	of most of the noise. This enables better determination of the shapes
880	and characteristics of objects.
881
882	A further point to consider when using the reconstruction is that if
883	the two-dimensional reconstruction is chosen (\texttt{reconDim=2}), it
884	can be susceptible to edge effects. If the valid area in the cube (\ie
885	the part that is not BLANK) has non-rectangular edges, the convolution
886	can produce artefacts in the reconstruction that mimic the edges and
887	can lead (depending on the selection threshold) to some spurious
888	sources. Caution is advised with such data -- the user is advised to
889	check carefully the reconstructed cube for the presence of such
890	artefacts. Note, however, that the 1- and 3-dimensional
891	reconstructions are \emph{not} susceptible in the same way, since the
892	spectral direction does not generally exhibit these BLANK edges, and
893	so we recommend the use of either of these.
894
895	If one chooses the reconstruction method, a further decision is
896	required on the signal-to-noise cutoff used in determining acceptable
897	wavelet coefficients. A larger value will remove more noise from the
898	cube, at the expense of losing fainter sources, while a smaller value
899	will include more noise, which may produce spurious detections, but
900	will be more sensitive to faint sources. Values of less than about
901	$3\sigma$ tend to not reduce the noise a great deal and can lead to
902	many spurious sources (although this will depend on the nature of the
903	cube).
904
905	When it comes to searching, the FDR method produces more reliable results
906	than simple sigma-clipping, particularly in the absence of reconstruction.
907	However, it does not work in exactly the way one would expect for a
908	given value of \texttt{alpha}. For instance, setting fairly liberal values
909	of \texttt{alpha} (say, 0.1) will often lead to a much smaller fraction
910	of false detections (\ie much less than 10\%). This is the effect of the
911	merging algorithms, that combine the sources after the detection stage,
912	and reject detections not meeting the minimum pixel or channel requirements.
913	It is thus better to aim for larger \texttt{alpha} values than those derived
914	from a straight conversion of the desired false detection rate.
915
916	Finally, as \duchamp\ is still undergoing development, there are some
917	elements that are not fully developed. In particular, it is not as
918	clever as I would like at avoiding interference. The ability to place
919	requirements on the minimum number of channels and pixels partially
920	circumvents this problem, but work is being done to make \duchamp\
921	smarter at rejecting signals that are clearly (to a human eye at
922	least) interference. See the following section for further
923	improvements that are planned.
924
925	%\section{Drawbacks of the current program}
926	%
927	%The program currently has a few problems/drawbacks/things to be aware
928	%of that will hopefully be fixed in the future:
929	%\begin{itemize}
930	%
931	%\item Narrow interference spikes are still getting found, particularly
932	% if there is no reconstruction, or reconstruction with a relatively
933	% low \texttt{snrRecon} (such as 2 or 3). Increasing the
934	% \texttt{minChannels} parameter is one way to circumvent this, but
935	% making the algorithm a bit more clever would be preferable.
936	%
937	%\item Sources that have strong continuum ripple and/or artefacts often
938	% generate many spurious detections. This needs some work to avoid
939	% \duchamp\ doing this, and until then users are advised to be aware
940	% of the possibility. Strong continuum ripples may generate many
941	% sources on the same spatial pixel, and this will be apparent on the
942	% detection images.
943	%
944	%\item Spectra are integrated over every spatial pixel of the
945	% detection, and this may dilute the actual detection, making it
946	% harder to see \ie the apparent strength of the line as plotted may
947	% not give a true indication of how strong it really is.
948	%
949	%%\item A caution on the merging part of the procedure. This can be time
950	%% consuming if there are many detections that do not require merging
951	%% -- in this case, the time will go like $N^2$ ($N$ = number of
952	%% detections). If there are plenty of mergers, the size of the list
953	%% reduces quickly, so the execution time will be less.
954	%
955	%
956	%\end{itemize}
957
958
959	%\section{Comparison with other software (to be developed further...)}
960	%
961	%\subsection{fred, by Matt Howlett}
962	%
963	%This is the program used in the \hipass\ analysis. It smoothes the
964	%data spectrally with a boxcar filter of a size that varies over a
965	%user-specified range, and then thresholds the data.
966	%
967	%Works effectively, but generally doesn't find as many sources as
968	%\duchamp, particularly when the reconstruction is used. Sensitive to
969	%faint, broad features that fall below the reconstruction threshold.
970	%
971	%Execution takes a long time, depending on the range of filter widths
972	%that are used.
973	%
974	%\subsection{sfind}
975	%
976	%Hard to evaluate, as it does not (as far as I can see) output the
977	%channel number at which detections are made, and does not merge
978	%detections made at adjacent channels (\ie it just works in 2
979	%dimensions).
980	%
981
982	\section{Future Developments}
983
984	This is both a list of planned improvements and a wish-list of
985	features that would be nice to include (but are not planned in the
986	immediate future). Let me know if there are items not on this list, or
987	items on the list you would like prioritised.
988
989	\begin{itemize}
990
991	\item Better determination of the noise characteristics of
992	spectral-line cubes, including understanding how the noise is
993	generated and developing a model for it. \textbf{Planned.}
994
995	\item Include more source analysis. Examples could be: shape
996	information; measurements of HI mass; more variety of measurements
997	of velocity width and profile. \textbf{Some planned.}
998
999	\item Provide some indication of the significance of the detection
1000	(\ie some S/N-like value). \textbf{Planned.}
1001
1002	\item Improved ability to reject interference, possibly on the
1003	spectral shape of features. \textbf{Planned.}
1004
1005	\item Ability to separate (de-blend) distinct sources that have been
1006	merged. \textbf{Planned.}
1007
1008	\item Link to lists of possible counterparts (\eg via NED/SIMBAD/other
1009	VO tools?). \textbf{Wish-list.}
1010
1011	\item On-line web service interface, so a user can upload a cube and
1012	get back a source-list. \textbf{Wish-list}.
1013
1014	\item Embed \duchamp\ in a GUI, to move away from the text-based
1015	interaction. \textbf{Wish-list}.
1016	\end{itemize}
1017
1018
1019	%\bibliographystyle{mn2e}
1020	%\bibliographystyle{abbrvnat}
1021	%\bibliography{mnrasmnemonic,sourceDetection}
1022	\begin{thebibliography}{}
1023
1024	\bibitem[\protect\citeauthoryear{{Calabretta} \& {Greisen}}{{Calabretta} \&
1025	{Greisen}}{2002}]{calabretta02}
1026	{Calabretta} M., {Greisen} E., 2002, A\&A, 395, 1077
1027
1028	\bibitem[\protect\citeauthoryear{{Greisen} \& {Calabretta}}{{Greisen} \&
1029	{Calabretta}}{2002}]{greisen02}
1030	{Greisen} E., {Calabretta} M., 2002, A\&A, 395, 1061
1031
1032	\bibitem[\protect\citeauthoryear{{Hanisch}, {Farris}, {Greisen}, {Pence},
1033	{Schlesinger}, {Teuben}, {Thompson} \& {Warnock}}{{Hanisch}
1034	et~al.}{2001}]{hanisch01}
1035	{Hanisch} R., {Farris} A., {Greisen} E., {Pence} W., {Schlesinger} B.,
1036	{Teuben} P., {Thompson} R., {Warnock} A., 2001, A\&A, 376, 359
1037
1038	\bibitem[\protect\citeauthoryear{{Hopkins}, {Miller}, {Connolly}, {Genovese},
1039	{Nichol} \& {Wasserman}}{{Hopkins} et~al.}{2002}]{hopkins02}
1040	{Hopkins} A., {Miller} C., {Connolly} A., {Genovese} C., {Nichol} R.,
1041	{Wasserman} L., 2002, AJ, 123, 1086
1042
1043	\bibitem[\protect\citeauthoryear{Lutz}{Lutz}{1980}]{lutz80}
1044	Lutz R., 1980, The Computer Journal, 23, 262
1045
1046	\bibitem[\protect\citeauthoryear{{Meyer} et~al.,}{{Meyer}
1047	et~al.}{2004}]{meyer04:trunc}
1048	{Meyer} M., et~al., 2004, MNRAS, 350, 1195
1049
1050	\bibitem[\protect\citeauthoryear{{Miller}, {Genovese}, {Nichol}, {Wasserman},
1051	{Connolly}, {Reichart}, {Hopkins}, {Schneider} \& {Moore}}{{Miller}
1052	et~al.}{2001}]{miller01}
1053	{Miller} C., {Genovese} C., {Nichol} R., {Wasserman} L., {Connolly} A.,
1054	{Reichart} D., {Hopkins} A., {Schneider} J., {Moore} A., 2001, AJ, 122,
1055	3492
1056
1057	\bibitem[\protect\citeauthoryear{Minchin}{Minchin}{1999}]{minchin99}
1058	Minchin R., 1999, PASA, 16, 12
1059
1060	\bibitem[\protect\citeauthoryear{Starck \& Murtagh}{Starck \&
1061	Murtagh}{2002}]{starck02:book}
1062	Starck J.-L., Murtagh F., 2002, {``Astronomical Image and Data Analysis''}.
1063	Springer
1064
1065	\end{thebibliography}
1066
1067
1068	\appendix
1069	\newpage
1070	\section{Obtaining and Installing \duchamp}
1071
1072	The \duchamp\ web page can be found at the following location:\\
1073	\href{http://www.atnf.csiro.au/people/Matthew.Whiting/Duchamp}%
1074	{http://www.atnf.csiro.au/people/Matthew.Whiting/Duchamp}\\
1075	Here you can find a gzipped tar archive of the source code that can be
1076	downloaded and extracted, as well as this User's Guide in postscript
1077	and hyperlinked PDF formats.
1078
1079	\duchamp\ can be built on Unix systems by typing:
1080	\begin{quote}
1081	\texttt{%
1082	> ./configure\\
1083	> make\\
1084	> make clean (optional -- to remove the object files)}
1085	\end{quote}
1086
1087	Run in this manner, \texttt{configure} should find all the necessary
1088	libraries, but if the above-mentioned libraries have been installed in
1089	non-standard locations, you can specify additional directories to look
1090	in. There are separate options for library files (eg. libcpgplot.a)
1091	and header files (eg. cpgplot.h).
1092
1093	For example, if \textsc{wcslib} had been installed in
1094	\texttt{/home/mduchamp/wcslib}, there are two libraries that are
1095	likely to be in separate subdirectories: \texttt{C/} and
1096	\texttt{pgsbox/}. Each subdirectory needs to be searched for library
1097	and header files, so one could build Duchamp by typing:
1098	\begin{quote}
1099	\texttt{%
1100	> ./configure $\backslash$ \\
1101	LIBDIRS="/home/mduchamp/wcslib/C /home/mduchamp/wcslib/pgsbox"
1102	$\backslash$\\
1103	INCDIRS="/home/mduchamp/wcslib/C /home/mduchamp/wcslib/pgsbox"}
1104	\end{quote}
1105	And then just run make in the usual fashion:
1106	\begin{quote}
1107	\texttt{> make}
1108	\end{quote}
1109
1110	This will produce the executable \texttt{Duchamp}. There are two
1111	possible ways to run it. The first is:
1112	\begin{quote}
1113	\texttt{> Duchamp -f [FITS file]}
1114	\end{quote}
1115	where \texttt{[FITS file]} is the file you wish to search. This method
1116	simply uses the default values of all parameters.
1117
1118	The second method allows some determination of the parameter values by
1119	the user. Type:
1120	\begin{quote}
1121	\texttt{> Duchamp -p [parameter file]}
1122	\end{quote}
1123	where \texttt{[parameterFile]} is a file with the input parameters,
1124	including the name of the cube you want to search. There are two
1125	example input files included with the distribution. The smaller one,
1126	\texttt{InputExample}, shows the typical parameters one might want to
1127	set. The large one, \texttt{InputComplete}, lists all possible
1128	parameters that can be entered, and a brief description of them. To
1129	get going quickly, just replace the "your-file-here" in
1130	\texttt{InputExample} with your image name, and type
1131	\begin{quote}
1132	\texttt{> Duchamp -p InputExample}
1133	\end{quote}
1134
1135	The following appendices provide details on the individual parameters,
1136	and show examples of the output files that \duchamp\ produces.
1137
1138	\newpage
1139	\section{Available parameters}
1140	\label{app-param}
1141
1142	The full list of parameters that can be listed in the input file are
1143	given here. If not listed, they take the default value given in
1144	parentheses. Since the order of the parameters in the input file does
1145	not matter, they are grouped here in logical sections.
1146
1147	\subsection*{Input-output related}
1148	\begin{entry}
1149	\item[ImageFile (no default assumed)] The filename of the
1150	data cube to be analysed.
1151	\item[flagSubsection \texttt{[false]}] A flag to indicate whether one
1152	wants a subsection of the requested image.
1153	\item[Subsection \texttt{[ [,,*] ]}] The requested subsection, which
1154	should be specified in the format \texttt{[x1:x2,y1:y2,z1:z2]}, where
1155	the limits are inclusive. If the full range of a dimension is
1156	required, use a \texttt{*}, \eg if you want the full spectral range of
1157	a subsection of the image, use \texttt{[30:140,30:140,*]}.
1158	\item[flagReconExists \texttt{[false]}] A flag to indicate whether the
1159	reconstructed array has been saved by a previous run of \duchamp. If
1160	set true, the reconstructed array will be read from the file given by
1161	\texttt{reconFile}, rather than calculated directly.
1162	\item[reconFile (no default assumed)] The FITS file that contains the
1163	reconstructed array. If \texttt{flagReconExists} is true and this
1164	parameter is not defined, the default file searched will be
1165	determined by the \atrous\ parameters (see \S\ref{sec-recon}).
1166	\item[OutFile \texttt{[duchamp-Results.txt]}] The file containing the
1167	final list of detections. This also records the list of input
1168	parameters.
1169	\item[SpectraFile \texttt{[duchamp-Spectra.ps]}] The postscript file
1170	containing the resulting integrated spectra and images of the
1171	detections.
1172	\item[flagLog \texttt{[true]}] A flag to indicate whether intermediate
1173	detections should be logged.
1174	\item[LogFile \texttt{[duchamp-Logfile.txt]}] The file in which intermediate
1175	detections are logged. These are detections that have not been
1176	merged. This is primarily for use in debugging and diagnostic
1177	purposes -- normal use of the program will probably not require
1178	this.
1179	\item[flagOutputRecon \texttt{[false]}] A flag to say whether or not to
1180	save the reconstructed cube as a FITS file. The filename will be
1181	derived from the ImageFile -- the reconstruction of \texttt{image.fits}
1182	will be saved as \texttt{image.RECON?.fits}, where \texttt{?} stands for
1183	the value of \texttt{snrRecon} (see below).
1184	\item[flagOutputResid \texttt{[false]}] As for \texttt{flagOutputRecon}, but
1185	for the residual array -- the difference between the original cube
1186	and the reconstructed cube. The filename will be \texttt{image.RESID?.fits}.
1187	\item[flagVOT \texttt{[false]}] A flag to say whether to create a VOTable
1188	file corresponding to the information in \texttt{outfile}. This will be
1189	an XML file in the Virtual Observatory VOTable format.
1190	\item[votFile \texttt{[duchamp-Results.xml]}] The VOTable file with the
1191	list of final detections. Some input parameters are also recorded.
1192	\item[flagKarma \texttt{[false]}] A flag to say whether to create a
1193	Karma annotation file corresponding to the information in
1194	\texttt{outfile}. This can be used as an overlay for the Karma
1195	programs such as \texttt{kvis}.
1196	\item[karmaFile \texttt{[duchamp-Results.ann]}] The Karma annotation
1197	file showing the list of final detections.
1198	\item[flagMaps \texttt{[true]}] A flag to say whether to save
1199	postscript files showing the 0th moment map of the whole cube
1200	(parameter \texttt{momentMap}) and the detection image
1201	(\texttt{detectionMap}).
1202	\item[momentMap \texttt{[duchamp-MomentMap.ps]}] A postscript file
1203	containing a map of the 0th moment of the detected sources, as well
1204	as pixel and WCS coordinates.
1205	\item[detectionMap \texttt{[duchamp-DetectionMap.ps]}] A postscript
1206	file showing each of the detected objects, coloured in greyscale by
1207	the number of channels spanned by each pixel. Also shows pixel and WCS
1208	coordinates.
1209	\end{entry}
1210
1211	\subsection*{Modifying the cube}
1212	\begin{entry}
1213	\item[flagBlankPix \texttt{[true]}] A flag to say whether to remove BLANK
1214	pixels from the analysis -- these are pixels set to some particular
1215	value because they fall outside the imaged area.
1216	\item[blankPixValue \texttt{[-8.00061]}] The value of the BLANK pixels,
1217	if this information is not contained in the FITS header (the usual
1218	procedure is to obtain this value from the header information -- in
1219	which case the value set by this parameter is ignored).
1220	\item[flagMW \texttt{[false]}] A flag to say whether to ignore channels
1221	contaminated by Milky Way (or other) emission -- the searching
1222	algorithms will not look at these channels.
1223	\item[maxMW \texttt{[112]}] The maximum channel number containing
1224	``Milky Way'' emission.
1225	\item[minMW \texttt{[75]}] The minimum channel number containing
1226	``Milky Way'' emission. Note that the range specified by
1227	\texttt{maxMW} and \texttt{minMW} is inclusive.
1228	\item[flagBaseline \texttt{[false]}] A flag to say whether to remove the
1229	baseline from each spectrum in the cube for the purposes of
1230	reconstruction and detection.
1231	\end{entry}
1232
1233	\subsection*{Detection related}
1234
1235	\subsubsection*{General detection}
1236	\begin{entry}
1237	\item[flagNegative \texttt{[false]}] A flag to indicate that the features
1238	being searched for are negative. The cube will be inverted prior to
1239	searching.
1240	\item[snrCut \texttt{[3.]}] The cut-off value for thresholding, in terms
1241	of number of $\sigma$ above the mean.
1242	\item[flagGrowth \texttt{[false]}] A flag indicating whether or not to
1243	grow the detected objects to a smaller threshold.
1244	\item[growthCut \texttt{[2.]}] The smaller threshold using in growing
1245	detections. In units of $\sigma$ above the mean.
1246	\end{entry}
1247
1248	\subsubsection*{\Atrous\ reconstruction}
1249	\begin{entry}
1250	\item [flagATrous \texttt{[true]}] A flag indicating whether or not to
1251	reconstruct the cube using the \atrous\ wavelet
1252	reconstruction. See \S\ref{sec-recon} for details.
1253	\item[reconDim \texttt{[3]}] The number of dimensions to use in the
1254	reconstruction. 1 means reconstruct each spectrum separately, 2
1255	means each channel map is done separately, and 3 means do the whole
1256	cube in one go.
1257	\item[scaleMin \texttt{[1]}] The minimum wavelet scale to be used in the
1258	reconstruction. A value of 1 means ``use all scales''.
1259	\item[snrRecon \texttt{[4]}] The thresholding cutoff used in the
1260	reconstruction -- only wavelet coefficients this many $\sigma$ above
1261	the mean (or greater) are included in the reconstruction.
1262	\item[filterCode \texttt{[1]}] The code number of the filter to use in
1263	the reconstruction. The options are:
1264	\begin{itemize}
1265	\item \textbf{1:} B$_3$-spline filter: coefficients =
1266	$(\frac{1}{16}, \frac{1}{4}, \frac{3}{8}, \frac{1}{4}, \frac{1}{16})$
1267	\item \textbf{2:} Triangle filter: coefficients = $(\frac{1}{4}, \frac{1}{2}, \frac{1}{4})$
1268	\item \textbf{3:} Haar wavelet: coefficients = $(0, \frac{1}{2}, \frac{1}{2})$
1269	\end{itemize}
1270	\end{entry}
1271
1272	\subsubsection*{FDR method}
1273	\begin{entry}
1274	\item[flagFDR \texttt{[false]}] A flag indicating whether or not to use
1275	the False Discovery Rate method in thresholding the pixels.
1276	\item[alphaFDR \texttt{[0.01]}] The $\alpha$ parameter used in the FDR
1277	analysis. The average number of false detections, as a fraction of the
1278	total number, will be less than $\alpha$ (see \S\ref{sec-detection}).
1279	\end{entry}
1280
1281	\subsubsection*{Merging detections}
1282	\begin{entry}
1283	\item[minPix \texttt{[2]}] The minimum number of spatial pixels for a single
1284	detection to be counted.
1285	\item[minChannels \texttt{[3]}] The minimum number of consecutive
1286	channels that must be present in a detection.
1287	\item[flagAdjacent \texttt{[true]}] A flag indicating whether to use the
1288	``adjacent pixel'' criterion to decide whether to merge objects. If
1289	not, the next two parameters are used to determine whether objects
1290	are within the necessary thresholds.
1291	\item[threshSpatial \texttt{[3.]}] The maximum allowed minimum spatial
1292	separation (in pixels) between two detections for them to be merged
1293	into one. Only used if \texttt{flagAdjacent = false}.
1294	\item[threshVelocity \texttt{[7.]}] The maximum allowed minimum channel
1295	separation between two detections for them to be merged into
1296	one.
1297	\end{entry}
1298
1299	\subsubsection*{Other parameters}
1300	\begin{entry}
1301	\item[spectralMethod \texttt{[peak]}] This indicates which method is used
1302	to plot the output spectra: \texttt{peak} means plot the spectrum
1303	containing the detection's peak pixel; \texttt{sum} means sum the
1304	spectra of each detected spatial pixel, and correct for the beam
1305	size. Any other choice defaults to \texttt{peak}.
1306	\item[spectralUnits \texttt{[km/s]}] The user can specify the units of
1307	the spectral axis. Assuming the WCS of the FITS file is valid, the
1308	spectral axis is transformed into velocity, and put into these units
1309	for all output and for calculations such as the integrated flux of a
1310	detection.
1311	\item[drawBorders \texttt{[true]}] A flag indicating whether borders
1312	are to be drawn around the detected objects in the moment maps
1313	included in the output (see for example Fig.~\ref{fig-spect}).
1314	\item[verbose \texttt{[true]}] A flag indicating whether to print the
1315	progress of computationally-intensive algorithms (such as the
1316	searching and merging) to screen.
1317	\end{entry}
1318
1319
1320	\newpage
1321	\section{Example parameter files}
1322	\label{app-input}
1323
1324	This is what a typical parameter file would look like.
1325
1326	\begin{verbatim}
1327	imageFile /DATA/SITAR_1/whi550/cubes/H201_abcde_luther_chop.fits
1328	logFile logfile.txt
1329	outFile results.txt
1330	spectraFile spectra.ps
1331	flagSubsection 0
1332	flagOutputRecon 0
1333	flagOutputResid 0
1334	flagBlankPix 1
1335	flagMW 1
1336	minMW 75
1337	maxMW 112
1338	minPix 3
1339	flagGrowth 1
1340	growthCut 1.5
1341	flagATrous 0
1342	scaleMin 1
1343	snrRecon 4
1344	flagFDR 1
1345	alphaFDR 0.1
1346	numPixPSF 20
1347	snrCut 3
1348	threshSpatial 3
1349	threshVelocity 7
1350	\end{verbatim}
1351
1352	Note that it is not necessary to include all these parameters in the
1353	file, only those that need to be changed from the defaults (as listed
1354	in Appendix~\ref{app-param}), which in this case would be very few. A
1355	minimal parameter file might look like:
1356	\begin{verbatim}
1357	imageFile /DATA/SITAR_1/whi550/cubes/H201_abcde_luther_chop.fits
1358	flagLog 0
1359	snrRecon 3
1360	snrCut 2.5
1361	minChannels 4
1362	\end{verbatim}
1363	This will reconstruct the cube with a lower SNR value than the
1364	default, select objects at a lower threshold, with a looser minimum
1365	channel requirement, and not keep a log of the intermediate
1366	detections.
1367
1368	The following page demonstrates how the parameters are presented to the
1369	user, both on the screen at execution time, and in the output and log
1370	files. On each line, there is a description on the parameter, the relevant
1371	parameter name that is used in the input file (if there is one that the
1372	user can enter), and the value of the parameter being used.
1373	\newpage
1374	\begin{landscape}
1375	Typical presentation of parameters in output and log files:
1376	\begin{verbatim}
1377	---- Parameters ----
1378	Image to be analysed.........................[imageFile] = input.fits
1379	Intermediate Logfile...........................[logFile] = duchamp-Logfile.txt
1380	Final Results file.............................[outFile] = duchamp-Results.txt
1381	Spectrum file..............................[spectraFile] = duchamp-Spectra.ps
1382	0th Moment Map...............................[momentMap] = duchamp-MomentMap.ps
1383	Detection Map.............................[detectionMap] = duchamp-DetectionMap.ps
1384	Saving reconstructed cube?.............[flagoutputrecon] = false
1385	Saving residuals from reconstruction?..[flagoutputresid] = false
1386	------
1387	Searching for Negative features?..........[flagNegative] = false
1388	Fixing Blank Pixels?......................[flagBlankPix] = true
1389	Blank Pixel Value....................................... = -8.00061
1390	Removing Milky Way channels?....................[flagMW] = true
1391	Milky Way Channels.......................[minMW - maxMW] = 75-112
1392	Beam Size (pixels)...................................... = 10.1788
1393	Removing baselines before search?.........[flagBaseline] = false
1394	Minimum # Pixels in a detection.................[minPix] = 2
1395	Minimum # Channels in a detection..........[minChannels] = 3
1396	Growing objects after detection?............[flagGrowth] = false
1397	Using A Trous reconstruction?...............[flagATrous] = true
1398	Number of dimensions in reconstruction........[reconDim] = 3
1399	Minimum scale in reconstruction...............[scaleMin] = 1
1400	SNR Threshold within reconstruction...........[snrRecon] = 4
1401	Filter being used for reconstruction........[filterCode] = 1 (B3 spline function)
1402	Using FDR analysis?............................[flagFDR] = false
1403	SNR Threshold...................................[snrCut] = 3
1404	Using Adjacent-pixel criterion?...........[flagAdjacent] = true
1405	Max. velocity separation for merging....[threshVelocity] = 7
1406	Method of spectral plotting.............[spectralMethod] = peak
1407	\end{verbatim}
1408
1409	\newpage
1410	\section{Example results file}
1411	\label{app-output}
1412	This the typical content of an output file, after running \duchamp\
1413	with the parameters illustrated on the previous page.
1414
1415	{\scriptsize
1416	\begin{verbatim}
1417	Results of the \duchamp\ source finder: Tue May 23 14:51:38 2006
1418	---- Parameters ----
1419	(... omitted for clarity -- see previous page for examples...)
1420	--------------------
1421	Total number of detections = 25
1422	--------------------
1423	------------------------------------------------------------------------------------------------------------------------------------------------------
1424	Obj# Name X Y Z RA DEC VEL w_RA w_DEC w_VEL F_int F_peak X1 X2 Y1 Y2 Z1 Z2 Npix Flag
1425	[km/s] [arcmin] [arcmin] [km/s] [Jy km/s] [Jy/beam] [pix]
1426	------------------------------------------------------------------------------------------------------------------------------------------------------
1427	1 J0618-2532 30.2 86.0 113.3 06:18:12.54 -25:32:44.79 208.502 45.17 34.61 26.383 24.394 0.350 25 35 82 90 112 114 137 E
1428	2 J0609-2156 59.5 140.6 114.6 06:09:19.66 -21:56:31.20 225.572 44.39 31.47 65.957 16.128 0.213 55 65 137 144 113 118 153
1429	3 J0545-2143 141.2 143.2 114.8 05:45:51.71 -21:43:36.20 228.470 19.61 16.66 26.383 2.412 0.090 139 143 142 145 114 116 29
1430	4 J0617-2633 33.3 70.8 115.6 06:17:25.52 -26:33:33.83 238.736 65.02 30.10 26.383 9.776 0.117 26 41 68 75 115 117 104 E
1431	5 J0601-2500 86.2 94.9 117.9 06:01:39.54 -25:00:32.46 269.419 27.99 24.02 26.383 3.920 0.124 83 89 92 97 117 119 44
1432	6 J0602-2547 84.0 83.1 118.0 06:02:18.29 -25:47:31.69 270.319 20.01 19.99 26.383 2.999 0.118 82 86 81 85 117 119 34
1433	7 J0547-2448 133.0 97.2 118.7 05:47:52.53 -24:48:38.16 279.113 19.72 12.54 26.383 1.474 0.074 131 135 96 98 118 120 21
1434	8 J0606-2719 71.1 60.0 121.3 06:06:10.99 -27:19:48.61 314.090 52.36 39.59 39.574 14.268 0.150 65 77 55 64 120 123 154
1435	9 J0611-2137 52.4 145.3 162.5 06:11:20.92 -21:37:29.57 857.955 32.39 23.49 118.722 43.178 0.410 49 56 142 147 158 167 265 E
1436	10 J0600-2859 89.7 35.3 202.4 06:00:34.08 -28:59:00.43 1383.160 23.93 24.10 171.487 24.439 0.173 87 92 33 38 196 209 271
1437	11 J0558-2638 95.4 70.3 223.1 05:58:53.03 -26:38:45.91 1656.140 11.93 12.07 92.339 1.045 0.063 94 96 69 71 220 227 18
1438	12 J0617-2723 34.7 58.3 227.4 06:17:07.07 -27:23:50.65 1712.868 16.75 23.53 290.209 8.529 0.093 33 36 56 61 215 237 118
1439	13 J0558-2525 95.8 88.6 231.7 05:58:49.27 -25:25:33.60 1770.134 27.87 24.16 237.444 12.863 0.115 92 98 86 91 221 239 175
1440	14 J0600-2141 88.8 144.4 232.5 06:00:54.02 -21:41:57.06 1780.188 27.96 24.13 224.252 30.743 0.166 86 92 142 147 222 239 344 E
1441	15 J0615-2634 40.0 70.8 232.6 06:15:25.50 -26:34:20.04 1782.214 12.44 15.69 52.765 2.084 0.068 39 41 69 72 231 235 31
1442	16 J0604-2606 76.0 78.4 233.0 06:04:41.13 -26:06:21.19 1787.226 24.13 23.87 211.061 23.563 0.155 73 78 76 81 225 241 278
1443	17 J0601-2340 87.9 114.9 235.8 06:01:08.83 -23:40:19.37 1824.122 31.95 28.09 237.444 82.380 0.297 85 92 112 118 227 245 647
1444	18 J0615-2235 38.2 130.5 254.5 06:15:32.09 -22:35:37.24 2070.934 12.29 11.70 105.531 1.555 0.070 37 39 129 131 249 257 24
1445	19 J0617-2305 31.4 122.8 258.1 06:17:33.45 -23:05:28.94 2118.752 12.34 11.65 26.383 1.022 0.062 30 32 122 124 257 259 16
1446	20 J0612-2149 49.6 142.2 270.3 06:12:11.04 -21:49:29.72 2279.926 16.27 15.73 395.740 15.156 0.101 48 51 141 144 257 287 204
1447	21 J0616-2133 35.3 146.0 300.6 06:16:15.78 -21:33:09.69 2679.148 20.22 7.47 224.252 3.014 0.127 33 37 145 146 294 311 28 E
1448	22 J0555-2956 107.3 20.9 367.6 05:55:08.02 -29:56:09.08 3562.236 19.71 20.30 39.574 5.891 0.169 105 109 19 23 366 369 58
1449	23 J0557-2246 99.8 128.2 434.0 05:57:43.77 -22:46:42.95 4438.776 11.88 16.12 105.531 1.703 0.167 99 101 127 130 430 438 17 N
1450	24 J0616-2648 38.1 67.2 546.8 06:16:02.10 -26:48:35.49 5926.464 12.35 11.67 26.383 1.276 0.064 37 39 66 68 546 548 18
1451	25 J0552-2916 117.0 30.5 727.0 05:52:13.64 -29:16:58.02 8303.952 11.59 20.25 303.400 35.523 0.479 116 118 28 32 716 739 111
1452	\end{verbatim}
1453	}
1454	Note that the
1455	width of the table can make it hard to read. A good trick for those
1456	using UNIX/Linux is to make use of the \texttt{a2ps} command. The
1457	following works well, producing a postscript file \texttt{results.ps}:
1458	\\\verb\|a2ps -1 -r -f8 -o duchamp-Results.ps duchamp-Results.txt\|
1459
1460	%\end{landscape}
1461
1462	\newpage
1463	\section{Example VOTable output}
1464	\label{app-votable}
1465	This is part of the VOTable, in XML format, corresponding to the
1466	output file in Appendix~\ref{app-output} (the indentation has been
1467	removed to make it fit on the page).
1468
1469	%\begin{landscape}
1470	{\scriptsize
1471	\begin{verbatim}
1472	<?xml version="1.0"?>
1473	<VOTABLE version="1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
1474	xsi:noNamespaceSchemaLocation="http://www.ivoa.net/xml/VOTable/VOTable/v1.1">
1475	<COOSYS ID="J2000" equinox="J2000." epoch="J2000." system="eq_FK5"/>
1476	<RESOURCE name="Duchamp Output">
1477	<TABLE name="Detections">
1478	<DESCRIPTION>Detected sources and parameters from running the Duchamp source finder.</DESCRIPTION>
1479	<PARAM name="FITS file" datatype="char" ucd="meta.file;meta.fits" value="/DATA/SITAR_1/whi550/cubes/H201_abcde_luther_chop.fits"/>
1480	<PARAM name="Threshold" datatype="float" ucd="stat.snr" value="2.5">
1481	<PARAM name="ATrous note" datatype="char" ucd="meta.note" value="The a trous reconstruction method was used, with the following parameters.">
1482	<PARAM name="ATrous Dimension" datatype="int" ucd="meta.code;stat" value="3">
1483	<PARAM name="ATrous Cut" datatype="float" ucd="stat.snr" value="4">
1484	<PARAM name="ATrous Minimum Scale" datatype="int" ucd="stat.param" value="1">
1485	<PARAM name="ATrous Filter" datatype="char" ucd="meta.code;stat" value="B3 spline function">
1486	<FIELD name="ID" ID="col1" ucd="meta.id" datatype="int" width="4"/>
1487	<FIELD name="Name" ID="col2" ucd="meta.id;meta.main" datatype="char" arraysize="14"/>
1488	<FIELD name="RA" ID="col3" ucd="pos.eq.ra;meta.main" ref="J2000" datatype="float" width="10" precision="6" unit="deg"/>
1489	<FIELD name="Dec" ID="col4" ucd="pos.eq.dec;meta.main" ref="J2000" datatype="float" width="10" precision="6" unit="deg"/>
1490	<FIELD name="w_RA" ID="col3" ucd="phys.angSize;pos.eq.ra" ref="J2000" datatype="float" width="7" precision="2" unit="arcmin"/>
1491	<FIELD name="w_Dec" ID="col4" ucd="phys.angSize;pos.eq.dec" ref="J2000" datatype="float" width="7" precision="2" unit="arcmin"/>
1492	<FIELD name="Vel" ID="col4" ucd="phys.veloc;src.dopplerVeloc" datatype="float" width="9" precision="3" unit="km/s"/>
1493	<FIELD name="w_Vel" ID="col4" ucd="phys.veloc;src.dopplerVeloc;spect.line.width" datatype="float" width="8" precision="3" unit="km/s"/>
1494	<FIELD name="Integrated_Flux" ID="col4" ucd="phys.flux;spect.line.intensity" datatype="float" width="10" precision="3" unit="km/s"/>
1495	<DATA>
1496	<TABLEDATA>
1497	<TR>
1498	<TD> 1</TD><TD> J0609-2200</TD><TD> 92.410416</TD><TD>-22.013390</TD><TD> 48.50</TD><TD> 39.42</TD><TD> 213.061</TD><TD> 65.957</TD><TD> 17.572</TD>
1499	</TR>
1500	<TR>
1501	<TD> 2</TD><TD> J0608-2605</TD><TD> 92.042633</TD><TD>-26.085157</TD><TD> 44.47</TD><TD> 39.47</TD><TD> 233.119</TD><TD> 39.574</TD><TD> 4.144</TD>
1502	</TR>
1503	<TR>
1504	<TD> 3</TD><TD> J0606-2724</TD><TD> 91.637840</TD><TD>-27.412022</TD><TD> 52.48</TD><TD> 47.57</TD><TD> 302.213</TD><TD> 39.574</TD><TD> 17.066</TD>
1505	</TR>
1506	(... table truncated for clarity ...)
1507	</TABLEDATA>
1508	</DATA>
1509	</TABLE>
1510	</RESOURCE>
1511	</VOTABLE>
1512	\end{verbatim}
1513	}
1514	\end{landscape}
1515
1516	\newpage
1517	\section{Example Karma Annotation File output}
1518	\label{app-karma}
1519
1520	This is the format of the Karma Annotation file, showing the locations
1521	of the detected objects. This can be loaded by the plotting tools of
1522	the Karma package (for instance, \texttt{kvis}) as an overlay on the FITS
1523	file.
1524
1525	\begin{verbatim}
1526	# Duchamp Source Finder results for
1527	# cube /DATA/SITAR_1/whi550/cubes/H201_abcde_luther_chop.fits
1528	COLOR RED
1529	COORD W
1530	CIRCLE 92.3376 -21.9475 0.403992
1531	TEXT 92.3376 -21.9475 1
1532	CIRCLE 91.9676 -26.0193 0.37034
1533	TEXT 91.9676 -26.0193 2
1534	CIRCLE 91.5621 -27.3459 0.437109
1535	TEXT 91.5621 -27.3459 3
1536	CIRCLE 92.8285 -21.6344 0.269914
1537	TEXT 92.8285 -21.6344 4
1538	CIRCLE 90.1381 -28.9838 0.234179
1539	TEXT 90.1381 -28.9838 5
1540	CIRCLE 89.72 -26.6513 0.132743
1541	TEXT 89.72 -26.6513 6
1542	CIRCLE 94.2743 -27.4003 0.195175
1543	TEXT 94.2743 -27.4003 7
1544	CIRCLE 92.2739 -21.6941 0.134538
1545	TEXT 92.2739 -21.6941 8
1546	CIRCLE 89.7133 -25.4259 0.232252
1547	TEXT 89.7133 -25.4259 9
1548	CIRCLE 90.2206 -21.6993 0.266247
1549	TEXT 90.2206 -21.6993 10
1550	CIRCLE 93.8581 -26.5766 0.163153
1551	TEXT 93.8581 -26.5766 11
1552	CIRCLE 91.176 -26.1064 0.234356
1553	TEXT 91.176 -26.1064 12
1554	CIRCLE 90.2844 -23.6716 0.299509
1555	TEXT 90.2844 -23.6716 13
1556	CIRCLE 93.8774 -22.581 0.130925
1557	TEXT 93.8774 -22.581 14
1558	CIRCLE 94.3882 -23.0934 0.137108
1559	TEXT 94.3882 -23.0934 15
1560	CIRCLE 93.0491 -21.8223 0.202928
1561	TEXT 93.0491 -21.8223 16
1562	CIRCLE 94.0685 -21.5603 0.168456
1563	TEXT 94.0685 -21.5603 17
1564	CIRCLE 86.0568 -27.6095 0.101113
1565	TEXT 86.0568 -27.6095 18
1566	CIRCLE 88.7932 -29.9453 0.202624
1567	TEXT 88.7932 -29.9453 19
1568	\end{verbatim}
1569
1570	\newpage
1571	\section{Robust statistics for a Normal distribution}
1572	\label{app-madfm}
1573
1574	The Normal, or Gaussian, distribution for mean $\mu$ and standard
1575	deviation $\sigma$ can be written as
1576	\[
1577	f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}\ e^{-(x-\mu)^2/2\sigma^2}.
1578	\]
1579
1580	When one has a purely Gaussian signal, it is straightforward to
1581	estimate $\sigma$ by calculating the standard deviation (or rms) of
1582	the data. However, if there is a small amount of signal present on top
1583	of Gaussian noise, and one wants to estimate the $\sigma$ for the
1584	noise, the presence of the large values from the signal can bias the
1585	estimator to higher values.
1586
1587	An alternative way is to use the median ($m$) and median absolute deviation
1588	from the median ($s$) to estimate $\mu$ and $\sigma$. The median is the
1589	middle of the distribution, defined for a continuous distribution by
1590	\[
1591	\int_{-\infty}^{m} f(x) \diff x = \int_{m}^{\infty} f(x) \diff x.
1592	\]
1593	From symmetry, we quickly see that for the continuous Normal
1594	distribution, $m=\mu$. We consider the case henceforth of $\mu=0$,
1595	without loss of generality.
1596
1597	To find $s$, we find the distribution of the absolute deviation from
1598	the median, and then find the median of that distribution. This
1599	distribution is given by
1600	\begin{eqnarray*}
1601	g(x) &= &{\mbox{\rm distribution of }} \|x\|\\
1602	&= &f(x) + f(-x),\ x\ge0\\
1603	&= &\sqrt{\frac{2}{\pi\sigma^2}}\, e^{-x^2/2\sigma^2},\ x\ge0.
1604	\end{eqnarray*}
1605	So, the median absolute deviation from the median, $s$, is given by
1606	\[
1607	\int_{0}^{s} g(x) \diff x = \int_{s}^{\infty} g(x) \diff x.
1608	\]
1609	Now, $\int_{0}^{\infty}e^{-x^2/2\sigma^2} \diff x = \sqrt{\pi\sigma^2/2}$, and
1610	so $\int_{s}^{\infty} e^{-x^2/2\sigma^2} \diff x =
1611	\sqrt{\pi\sigma^2/2} - \int_{0}^{s} e^{-\frac{x^2}{2\sigma^2}} \diff x
1612	$. Hence, to find $s$ we simply solve the following equation (setting $\sigma=1$ for
1613	simplicity -- equivalent to stating $x$ and $s$ in units of $\sigma$):
1614	\[
1615	\int_{0}^{s}e^{-x^2/2} \diff x - \sqrt{\pi/8} = 0.
1616	\]
1617	This is hard to solve analytically (no nice analytic solution exists
1618	for the finite integral that I'm aware of), but straightforward to
1619	solve numerically, yielding the value of $s=0.6744888$. Thus, to
1620	estimate $\sigma$ for a Normally distributed data set, one can calculate
1621	$s$, then divide by 0.6744888 (or multiply by 1.4826042) to obtain the
1622	correct estimator.
1623
1624	Note that this is different to solutions quoted elsewhere,
1625	specifically in \citet{meyer04:trunc}, where the same robust estimator
1626	is used but with an incorrect conversion to standard deviation -- they
1627	assume $\sigma = s\sqrt{\pi/2}$. This, in fact, is the conversion used
1628	to convert the \emph{mean} absolute deviation from the mean to the
1629	standard deviation. This means that the cube noise in the \hipass\
1630	catalogue (their parameter Rms$_{\rm cube}$) should be 18\% larger
1631	than quoted.
1632
1633	\section{How Gaussian noise changes with wavelet scale.}
1634	\label{app-scaling}
1635
1636	The key element in the wavelet reconstruction of an array is the
1637	thresholding of the individual wavelet coefficient arrays. This is
1638	usually done by choosing a level to be some number of standard
1639	deviations above the mean value.
1640
1641	However, since the wavelet arrays are produced by convolving the input
1642	array by an increasingly large filter, the pixels in the coefficient
1643	arrays become increasingly correlated as the scale of the filter
1644	increases. This results in the measured standard deviation from a
1645	given coefficient array decreasing with increasing scale. To calculate
1646	this, we need to take into account how many other pixels each pixel in
1647	the convolved array depends on.
1648
1649	To demonstrate, suppose we have a 1-D array with $N$ pixel values
1650	given by $F_i,\ i=1,...,N$, and we convolve it with the B$_3$-spline
1651	filter, defined by the set of coefficients
1652	$\{1/16,1/4,3/8,1/4,1/16\}$. The flux of the $i$th pixel in the
1653	convolved array will be
1654	\[
1655	F'_i = \frac{1}{16}F_{i-2} + \frac{1}{4}F_{i-1} + \frac{3}{8}F_{i}
1656	+ \frac{1}{4}F_{i+1} + \frac{1}{16}F_{i+2}
1657	\]
1658	and the flux of the corresponding pixel in the wavelet array will be
1659	\[
1660	W'_i = F_i - F'_i = \frac{-1}{16}F_{i-2} - \frac{1}{4}F_{i-1} + \frac{5}{8}F_{i}
1661	- \frac{1}{4}F_{i+1} - \frac{1}{16}F_{i+2}
1662	\]
1663	Now, assuming each pixel has the same standard deviation
1664	$\sigma_i=\sigma$, we can work out the standard deviation for the
1665	wavelet array:
1666	\[
1667	\sigma'_i = \sigma \sqrt{\left(\frac{1}{16}\right)^2 + \left(\frac{1}{4}\right)^2
1668	+ \left(\frac{5}{8}\right)^2 + \left(\frac{1}{4}\right)^2 + \left(\frac{1}{16}\right)^2}
1669	= 0.72349\ \sigma
1670	\]
1671	Thus, the first scale wavelet coefficient array will have a standard
1672	deviation of 72.3\% of the input array. This procedure can be followed
1673	to calculate the necessary values for all scales, dimensions and
1674	filters used by \duchamp.
1675
1676	Calculating these values is clearly a critical step in performing the
1677	reconstruction. \citet{starck02:book} did so by simulating data sets
1678	with Gaussian noise, taking the wavelet transform, and measuring the
1679	value of $\sigma$ for each scale. We take a different approach, by
1680	calculating the scaling factors directly from the filter coefficients
1681	by taking the wavelet transform of an array made up of a 1 in the
1682	central pixel and 0s everywhere else. The scaling value is then
1683	derived by taking the square root of the sum (in quadrature) of all
1684	the wavelet coefficient values at each scale. We give the scaling
1685	factors for the three filters available to \duchamp\ on the following
1686	page. These values are hard-coded into \duchamp, so no on-the-fly
1687	calculation of them is necessary.
1688
1689	Memory limitations prevent us from calculating factors for large
1690	scales, particularly for the three-dimensional case (hence the --
1691	symbols in the tables). To calculate factors for
1692	higher scales than those available, we note the following
1693	relationships apply for large scales to a sufficient level of precision:
1694	\begin{itemize}
1695	\item 1-D: factor(scale $i$) = factor(scale $i-1$)$/\sqrt{2}$.
1696	\item 2-D: factor(scale $i$) = factor(scale $i-1$)$/2$.
1697	\item 1-D: factor(scale $i$) = factor(scale $i-1$)$/\sqrt{8}$.
1698	\end{itemize}
1699
1700	\newpage
1701	\begin{itemize}
1702	\item \textbf{B$_3$-Spline Function:} $\{1/16,1/4,3/8,1/4,1/16\}$
1703
1704	\begin{tabular}{llll}
1705	Scale & 1 dimension & 2 dimension & 3 dimension\\ \hline
1706	1 & 0.723489806 & 0.890796310 & 0.956543592\\
1707	2 & 0.285450405 & 0.200663851 & 0.120336499\\
1708	3 & 0.177947535 & 0.0855075048 & 0.0349500154\\
1709	4 & 0.122223156 & 0.0412474444 & 0.0118164242\\
1710	5 & 0.0858113122 & 0.0204249666 & 0.00413233507\\
1711	6 & 0.0605703043 & 0.0101897592 & 0.00145703714\\
1712	7 & 0.0428107206 & 0.00509204670 & 0.000514791120\\
1713	8 & 0.0302684024 & 0.00254566946 & --\\
1714	9 & 0.0214024008 & 0.00127279050 & --\\
1715	10 & 0.0151336781 & 0.000636389722 & --\\
1716	11 & 0.0107011079 & 0.000318194170 & --\\
1717	12 & 0.00756682272 & -- & --\\
1718	13 & 0.00535055108 & -- & --\\
1719	%14 & 0.00378341085 & -- & --\\
1720	%15 & 0.00267527545 & -- & --\\
1721	%16 & 0.00189170541 & -- & --\\
1722	%17 & 0.00133763772 & -- & --\\
1723	%18 & 0.000945852704 & -- & --
1724	\end{tabular}
1725
1726	\item \textbf{Triangle Function:} $\{1/4,1/2,1/4\}$
1727
1728	\begin{tabular}{llll}
1729	Scale & 1 dimension & 2 dimension & 3 dimension\\ \hline
1730	1 & 0.612372436 & 0.800390530 & 0.895954449 \\
1731	2 & 0.330718914 & 0.272878894 & 0.192033014\\
1732	3 & 0.211947812 & 0.119779282 & 0.0576484078\\
1733	4 & 0.145740298 & 0.0577664785 & 0.0194912393\\
1734	5 & 0.102310944 & 0.0286163283 & 0.00681278387\\
1735	6 & 0.0722128185 & 0.0142747506 & 0.00240175885\\
1736	7 & 0.0510388224 & 0.00713319703 & 0.000848538128 \\
1737	8 & 0.0360857673 & 0.00356607618 & 0.000299949455 \\
1738	9 & 0.0255157615 & 0.00178297280 & -- \\
1739	10 & 0.0180422389 & 0.000891478237 & -- \\
1740	11 & 0.0127577667 & 0.000445738098 & -- \\
1741	12 & 0.00902109930 & 0.000222868922 & -- \\
1742	13 & 0.00637887978 & -- & -- \\
1743	%14 & 0.00451054902 & -- & -- \\
1744	%15 & 0.00318942978 & -- & -- \\
1745	%16 & 0.00225527449 & -- & -- \\
1746	%17 & 0.00159471988 & -- & -- \\
1747	%18 & 0.000112763724 & -- & --
1748
1749	\end{tabular}
1750
1751	\item \textbf{Haar Wavelet:} $\{0,1/2,1/2\}$
1752
1753	\begin{tabular}{llll}
1754	Scale & 1 dimension & 2 dimension & 3 dimension\\ \hline
1755	1 & 0.707167810 & 0.433012702 & 0.935414347 \\
1756	2 & 0.500000000 & 0.216506351 & 0.330718914\\
1757	3 & 0.353553391 & 0.108253175 & 0.116926793\\
1758	4 & 0.250000000 & 0.0541265877 & 0.0413398642\\
1759	5 & 0.176776695 & 0.0270632939 & 0.0146158492\\
1760	6 & 0.125000000 & 0.0135316469 & 0.00516748303
1761
1762	\end{tabular}
1763
1764
1765	\end{itemize}
1766
1767	\end{document}

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: tags/release-1.0.1/docs/Guide.tex @ 543

Download in other formats: