Context Navigation

Guide.tex @ 1112

Visit:

Last change on this file since 1112 was 120, checked in by Matthew Whiting, 18 years ago

Introduced error and warning reporting functions, to normalise the format
of errors and warnings. Definitions of functions go in duchamp.cc.
Changed all code that reports errors/warnings to use these new functions.

Fixed a couple of bugs that affected the way 2D images were dealt with:
ReconSearch? now looks at how many dimensions there are in the data array
before choosing the dimension of the reconstruction, and the minChannels
test was improved for the case of minChannels=0.

Some minor additions made to the Guide as well.

File size: 80.8 KB

Line
1	\documentclass[12pt,a4paper]{article}
2
3	%%%%%% LINE SPACING %%%%%%%%%%%%
4	\usepackage{setspace}
5	\singlespacing
6	%\onehalfspacing
7	%\doublespacing
8
9	%Define a test for doing PDF format -- use different code below
10	\newif\ifPDF
11	\ifx\pdfoutput\undefined\PDFfalse
12	\else\ifnum\pdfoutput > 0\PDFtrue
13	\else\PDFfalse
14	\fi
15	\fi
16
17	\textwidth=161 mm
18	\textheight=245 mm
19	\topmargin=-15 mm
20	\oddsidemargin=0 mm
21	\parindent=6 mm
22
23	\usepackage[sort]{natbib}
24	\usepackage{lscape}
25	\bibpunct[,]{(}{)}{;}{a}{}{,}
26
27	\newcommand{\eg}{e.g.\ }
28	\newcommand{\ie}{i.e.\ }
29	\newcommand{\hi}{H{\sc i}}
30	\newcommand{\hipass}{{\sc hipass}}
31	\newcommand{\duchamp}{\emph{Duchamp}}
32	\newcommand{\atrous}{\textit{{\`a} trous}}
33	\newcommand{\Atrous}{\textit{{\`A} trous}}
34	\newcommand{\diff}{{\rm d}}
35	\newcommand{\entrylabel}[1]{\mbox{\textsf{\bf{#1:}}}\hfil}
36	\newenvironment{entry}
37	{\begin{list}{}%
38	{\renewcommand{\makelabel}{\entrylabel}%
39	\setlength{\labelwidth}{30mm}%
40	\setlength{\labelsep}{5pt}%
41	\setlength{\itemsep}{2pt}%
42	\setlength{\parsep}{2pt}%
43	\setlength{\leftmargin}{35mm}%
44	}%
45	}%
46	{\end{list}}
47
48
49	\title{Source Detection with \duchamp\ v1.0\\A User's Guide}
50	\author{Matthew Whiting\\
51	Australia Telescope National Facility\\CSIRO}
52	\date{}
53
54	% If we are creating a PDF, use different options for graphicx, hyperref.
55	\ifPDF
56	\usepackage[pdftex]{graphicx,color}
57	\usepackage[pdftex]{hyperref}
58	\hypersetup{colorlinks=true,%
59	citecolor=red,%
60	filecolor=red,%
61	linkcolor=red,%
62	urlcolor=red,%
63	}
64	\else
65	\usepackage[dvips]{graphicx}
66	\usepackage[dvips]{hyperref}
67	\fi
68
69	\pagestyle{headings}
70	\begin{document}
71
72	\maketitle
73	\thispagestyle{empty}
74	\begin{figure}[!h]
75	\begin{center}
76	\includegraphics[width=\textwidth]{cover_image}
77	\end{center}
78	\end{figure}
79
80	\newpage
81	\tableofcontents
82
83	\newpage
84	\section{Introduction and getting going quickly}
85
86	This document provides a user's guide to \duchamp, an object-finder
87	for use on spectral-line data cubes. The basic execution of
88	\duchamp\ is to read in a FITS data cube, find sources in the cube,
89	and produce a text file of positions, velocities and fluxes of the
90	detections, as well as a postscript file of the spectra of each
91	detection.
92
93	So, you have a FITS cube, and you want to find the sources in it. What
94	do you do? The first step is to make an input file that contains the
95	list of parameters. Brief and detailed examples are shown in
96	Appendix~\ref{app-input}. This file provides the input file name, the various
97	output files, and defines various parameters that control the
98	execution.
99
100	The standard way to run \duchamp\ is by the command
101	\begin{quote}
102	\texttt{Duchamp -p [parameter file]}
103	\end{quote}
104	replacing \texttt{[parameter file]} with the name of the file listing
105	the parameters. Alternatively, you can use the syntax
106	\begin{quote}
107	\texttt{Duchamp -f [FITS file]}
108	\end{quote}
109	where \texttt{[FITS file]} is the file you wish to search. In the latter
110	case, all parameters will take their default values detailed in
111	Appendix~\ref{app-param}. In either case, the program will then work
112	away and give you the list of detections and their spectra. The
113	program execution is summarised below, and detailed in
114	\S\ref{sec-flow}. Information on inputs is in \S\ref{sec-param} and
115	Appendix~\ref{app-param}, and descriptions of the output is in
116	\S\ref{sec-output}.
117
118	\subsection{A summary of the execution steps}
119
120	The basic flow of the program is summarised here -- all steps are
121	discussed in more detail in the following sections.
122	\begin{enumerate}
123	\item If the \texttt{-p} option is used, the parameter file given on
124	the command line is read in, and the parameters absorbed.
125	\item The FITS image is located and read in to memory.
126	\item If requested, a FITS image with a previously reconstructed array
127	is read in.
128	\item If requested, blank pixels are trimmed from the edges, and
129	the baseline of each spectrum is removed.
130	\item If the reconstruction method is requested, and the reconstructed
131	array has not been read in at Step 3 above, the cube is
132	reconstructed using the \atrous\ wavelet method.
133	\item Searching for objects then takes place, using the requested
134	thresholding method.
135	\item The list of objects is condensed by merging neighbouring objects
136	and removing those deemed unacceptable.
137	\item The baselines and trimmed pixels are replaced prior to output.
138	\item The details of the detections are written to screen and to the
139	requested output file.
140	\item Maps showing the spatial location of the detections are written.
141	\item The integrated spectra of each detection are written to a
142	postscript file.
143	\item If requested, the reconstructed array can be written to a new
144	FITS file.
145	\end{enumerate}
146
147	\subsection{Guide to terminology}
148
149	First, a brief note on the use of terminology in this guide. \duchamp\
150	is designed to work on FITS ``cubes''. These are FITS\footnote{FITS is
151	the Flexible Image Transport System -- see \citet{hanisch01} or
152	websites such as
153	\href{http://fits.cv.nrao.edu/FITS.html}{http://fits.cv.nrao.edu/FITS.html}
154	for details.} image arrays with three dimensions -- they are assumed
155	to have the following form: the first two dimensions (referred to as
156	$x$ and $y$) are spatial directions (that is, relating to the position
157	on the sky), while the third dimension, $z$, is the spectral
158	direction, which can correspond to frequency, wavelength, or
159	velocity. The three dimensional analogue of pixels are ``voxels'', or
160	volume cells -- a voxel is defined by a unique $(x,y,z)$ location and
161	has a unique flux or intensity value associated with it.
162
163	Each spatial pixel (a given $(x,y)$ coordinate) can be said to be a
164	single spectrum, while a slice through the cube perpendicular to the
165	spectral direction at a given $z$-value is a single channel (the 2-D
166	image is a channel map).
167
168	Detection involves locating a contiguous group of voxels with fluxes
169	above a certain threshold. \duchamp\ makes no assumptions as to the
170	size or shape of the detected features, other than having
171	user-selected minimum size criteria.
172
173	Features that are detected are assumed to be positive. The user can
174	choose to search for negative features by setting an input parameter
175	-- this inverts the cube prior to the search (see
176	\S\ref{sec-detection} for details).
177
178	Note that it is possible to run \duchamp\ on a two-dimensional image
179	(\ie one with no frequency or velocity information), or indeed a
180	one-dimensional array, and many of the features of the program will
181	work fine. The focus, however, is on object detection in three
182	dimensions.
183
184	\subsection{Why \duchamp?}
185
186	Well, it's important for a program to have a name, and the initial
187	working title of \emph{cubefind} was somewhat uninspiring. I wanted to
188	avoid the classic astronomical approach of designing a cute acronym,
189	and since it is designed to work on cubes, I looked at naming it after
190	a cubist. \emph{Picasso}, sadly, was already taken \citep{minchin99},
191	so I settled on naming it after Marcel Duchamp, another cubist, but
192	also one of the first artists to work with ``found objects''.
193
194	\section{User Inputs}
195	\label{sec-param}
196
197	Input to the program is provided by means of a parameter
198	file. Parameters are listed in the file, followed by the value that
199	should be assigned to them. The syntax used is \texttt{paramName
200	value}. Parameter names are not case-sensitive, and lines in the input
201	file that start with \texttt{\#} are ignored. If a parameter is listed
202	more than once, the latter value is used, but otherwise the order in
203	which the parameters are listed in the input file is arbitrary.
204
205	If a parameter is not listed, the default value is assumed. The
206	defaults are chosen to provide a good result (using the reconstruction
207	method), so the user doesn't need to specify many new parameters in
208	the input file. Note that the image file \textbf{must} be specified! The
209	parameters that can be set are listed in Appendix~\ref{app-param},
210	with their default values in parentheses.
211
212	The parameters with names starting with \texttt{flag} are stored as
213	\texttt{bool} variables, and so are either \texttt{true = 1} or
214	\texttt{false = 0}. \duchamp\ will only read them from the file as
215	integers, and so they should be entered in the file as 0 or 1 (see
216	example file in Appendix~\ref{app-input}).
217
218	\section{What \duchamp\ is doing}
219	\label{sec-flow}
220
221	The execution flow of \duchamp\ is detailed here, indicating the
222	main algorithmic steps that are used. The program is written in C/C++
223	and makes use of the {\sc cfitsio}, {\sc wcslib} and {\sc pgplot}
224	libraries.
225
226	\subsection{Image input}
227	\label{sec-input}
228
229	The cube is read in using basic {\sc cfitsio} commands, and stored as
230	an array in a special C++ class. This class keeps track of
231	the list of detected objects, as well as any reconstructed arrays that
232	are made (see \S\ref{sec-recon}). The World Coordinate System (WCS)
233	information for the cube is also obtained from the FITS header by {\sc
234	wcslib} functions \citep{greisen02, calabretta02}, and this
235	information, in the form of a \texttt{wcsprm} structure, is also stored
236	in the same class.
237
238	A sub-section of an image can be requested via the \texttt{subsection}
239	parameter in the parameter file -- this can be a good idea if the cube
240	has very noisy edges, which may produce many spurious detections. The
241	generalised form of the subsection that is used by {\sc cfitsio} is
242	\texttt{[x1:x2:dx,y1:y2:dy,z1:z2:dz]}, such that the x-coordinates run
243	from \texttt{x1} to \texttt{x2} (inclusive), with steps of
244	\texttt{dx}. The step value can be omitted (so a subsection of the
245	form \texttt{[2:50,2:50,10:1000]} is still valid). \duchamp\ does not
246	make use of any step value present in the subsection string, and any
247	that are present are removed before the file is opened.
248
249	If one wants the full range of a coordinate then replace the range
250	with an asterisk, \eg \texttt{[2:50,2:50,*]}. If one wants to use a
251	subsection, one must set \texttt{flagSubsection = 1}. A complete
252	description of the section syntax can be found at the {\sc fitsio} web
253	site
254	\footnote{
255	\href{http://heasarc.gsfc.nasa.gov/docs/software/fitsio/c/c\_user/node90.html}%
256	{http://heasarc.gsfc.nasa.gov/docs/software/fitsio/c/c\_user/node90.html}}.
257
258	\subsection{Image modification}
259	\label{sec-modify}
260
261	Several modifications to the cube can be made that improve the
262	execution and efficiency of \duchamp\ (these are optional -- their
263	use is indicated by the relevant flags set in the input parameter
264	file).
265
266	\subsubsection{Blank pixel removal}
267
268	First, the cube is trimmed of any BLANK pixels that pad the image out
269	to a rectangular shape. This is optional, its use determined by the
270	\texttt{flagBlankPix} parameter. The value for these pixels is read from
271	the FITS header (using the BLANK, BSCALE and BZERO keywords), but if
272	these are not present then the value can be specified by the user in
273	the parameter file using \texttt{blankPixValue}.
274
275	This stage is particularly important for the reconstruction step, as
276	lots of BLANK pixels on the edges will smooth out features in the
277	wavelet calculation stage. The trimming will also reduce the size of
278	the cube's array, speeding up the execution. The amount of trimming is
279	recorded, and these pixels are added back in once the source-detection
280	is completed (so that quoted pixel positions are applicable to the
281	original cube).
282
283	Rows and columns are trimmed one at a time until the first non-BLANK
284	pixel is reached, so that the image remains rectangular. In practice,
285	this means that there will be BLANK pixels left in the trimmed image
286	(if the non-BLANK region is non-rectangular). However, these are
287	ignored in all further calculations done on the cube.
288
289	\subsubsection{Baseline removal}
290
291	Second, the user may request the removal of baselines from the
292	spectra, via the parameter \texttt{flagBaseline}. This may be necessary
293	if there is a strong baseline ripple present, which can result in
294	spurious detections at the high points of the ripple. The baseline is
295	calculated from a wavelet reconstruction procedure (see
296	\S\ref{sec-recon}) that keeps only the two largest scales. This is
297	done separately for each spatial pixel (\ie for each spectrum in the
298	cube), and the baselines are stored and added back in before any
299	output is done. In this way the quoted fluxes and displayed spectra
300	are as one would see from the input cube itself -- even though the
301	detection (and reconstruction if applicable) is done on the
302	baseline-removed cube.
303
304	The presence of very strong signals (for instance, masers at several
305	hundred Jy) can affect the determination of the baseline, leading to a
306	large dip centred on the signal in the baseline-subtracted
307	spectrum. To prevent this, the signal is trimmed prior to the
308	reconstruction process at some standard threshold (at $8\sigma$ above
309	the mean). The baseline determined should thus be representative of
310	the true, signal-free baseline. Note that this trimming is only a
311	temporary measure which does not affect the source-detection.
312
313	\subsubsection{Ignoring bright Milky Way emission}
314
315	Finally, a single set of contiguous channels can be ignored -- these
316	may exhibit very strong emission, such as that from the Milky Way as
317	seen in extragalactic \hi\ cubes (hence the references to ``Milky
318	Way'' in relation to this task -- apologies to Galactic
319	astronomers!). Such dominant channels will produce many detections
320	that are unnecessary, uninteresting (if one is interested in
321	extragalactic \hi) and large (in size and hence in memory usage), and
322	so will slow the program down and detract from the interesting
323	detections. The use of this feature is controlled by the
324	\texttt{flagMW} parameter, and the exact channels concerned are able
325	to be set by the user (using \texttt{maxMW} and \texttt{minMW} --
326	these give an inclusive range of channels). When employed, these
327	channels are temporarily blanked out for the searching, and the
328	scaling of the spectral output (see Fig.~\ref{fig-spect}) will not
329	take them into account. They will be present in the reconstructed
330	array, however, and so will be included in the saved FITS file (see
331	\S\ref{sec-reconIO}). When the final spectra are plotted, the range of
332	channels covered by these parameters is indicated by a green hashed
333	box.
334
335	\subsection{Image reconstruction}
336	\label{sec-recon}
337
338	The user can direct \duchamp\ to reconstruct the data cube using the
339	\atrous\ wavelet procedure. A good description of the procedure can be
340	found in \citet{starck02:book}. The reconstruction is an effective way
341	of removing a lot of the noise in the image, allowing one to search
342	reliably to fainter levels, and reducing the number of spurious
343	detections. This is an optional step, but one that greatly enhances
344	the source-detection process, with the payoff that it can be
345	relatively time- and memory-intensive.
346
347	\subsubsection{Algorithm}
348
349	The steps in the \atrous\ reconstruction are as follows:
350	\begin{enumerate}
351	\item Set the reconstructed array to 0 everywhere.
352	\item The input array is discretely convolved with a given filter
353	function. This is determined from the parameter file via the
354	\texttt{filterCode} parameter -- see Appendix~\ref{app-param} for
355	details on the filters available.
356	\item The wavelet coefficients are calculated by taking the difference
357	between the convolved array and the input array.
358	\item If the wavelet coefficients at a given point are above the
359	requested threshold (given by \texttt{snrRecon} as the number of
360	$\sigma$ above the mean and adjusted to the current scale -- see
361	Appendix~\ref{app-scaling}), add these to the reconstructed array.
362	\item The separation of the filter coefficients is doubled. (Note that
363	this step provides the name of the procedure\footnote{\atrous\ means
364	``with holes'' in French.}, as gaps or holes are created in the
365	filter coverage.)
366	\item The procedure is repeated from step 2, using the convolved array
367	as the input array.
368	\item Continue until the required maximum number of scales is reached.
369	\item Add the final smoothed (\ie convolved) array to the
370	reconstructed array. This provides the ``DC offset'', as each of the
371	wavelet coefficient arrays will have zero mean.
372	\end{enumerate}
373
374	The reconstruction has at least two iterations. The first iteration
375	makes a first pass at the wavelet reconstruction (the process outlined
376	in the 8 stages above), but the residual array will inevitably have
377	some structure still in it, so the wavelet filtering is done on the
378	residual, and any significant wavelet terms are added to the final
379	reconstruction. This step is repeated until the change in the $\sigma$
380	of the background is less than some fiducial amount.
381
382	It is important to note that the \atrous\ decomposition is an
383	example of a ``redundant'' transformation. If no thresholding is
384	performed, the sum of all the wavelet coefficient arrays and the final
385	smoothed array is identical to the input array. The thresholding thus
386	removes only the unwanted structure in the array.
387
388	Note that any BLANK pixels that are still in the cube will not be
389	altered by the reconstruction -- they will be left as BLANK so that
390	the shape of the valid part of the cube is preserved.
391
392	\subsubsection{Note on Statistics}
393
394	The correct calculation of the reconstructed array needs good
395	estimation of the underlying mean and standard deviation of the
396	background noise distribution. These statistics are estimated using
397	robust methods, to avoid corruption by strong outlying points. The
398	mean of the distribution is actually estimated by the median, while
399	the median absolute deviation from the median (MADFM) is calculated
400	and corrected assuming Gaussianity to estimate the underlying standard
401	deviation $\sigma$. The Gaussianity (or Normality) assumption is
402	critical, as the MADFM does not give the same value as the usual rms
403	or standard deviation value -- for a normal distribution
404	$N(\mu,\sigma)$ we find MADFM$=0.6744888\sigma$. The difference
405	between the MADFM and $\sigma$ is corrected for, so the user need only
406	think in the usual multiples of $\sigma$ when setting
407	\texttt{snrRecon}. See Appendix~\ref{app-madfm} for a derivation of
408	this value.
409
410	When thresholding the different wavelet scales, the value of $\sigma$
411	as measured from the wavelet array needs to be scaled to account for the
412	increased amount of correlation between neighbouring pixels (due to
413	the convolution). See Appendix~\ref{app-scaling} for details on this
414	scaling.
415
416	\subsubsection{User control of reconstruction parameters}
417
418	The most important parameter for the user to select in relation to the
419	reconstruction is the threshold for each wavelet array. This is set
420	using the \texttt{snrRecon} parameter, and is given as a multiple of the
421	rms (estimated by the MADFM) above the mean (which for the wavelet
422	arrays should be approximately zero). There are several other
423	parameters that can be altered as well that affect the outcome of the
424	reconstruction.
425
426	By default, the cube is reconstructed in three dimensions, using a
427	3-dimensional filter and 3-dimensional convolution. This can be
428	altered, however, using the parameter \texttt{reconDim}. If set to 1,
429	this means the cube is reconstructed by considering each spectrum
430	separately, whereas \texttt{reconDim=2} will mean the cube is
431	reconstructed by doing each channel map separately. The merits of
432	these choices are discussed in \S\ref{sec-notes}, but it should be
433	noted that a 2-dimensional reconstruction can be susceptible to edge
434	effects if the spatial shape is not rectangular.
435
436	The user can also select the minimum scale to be used in the
437	reconstruction -- the first scale exhibits the highest frequency
438	variations, and so ignoring this one can sometimes be beneficial in
439	removing excess noise. The default, however, is to use all scales
440	(\texttt{minscale = 1}).
441
442	Finally, the filter that is used for the convolution can be selected
443	by using \texttt{filterCode} and the relevant code number -- the
444	choices are listed in Appendix~\ref{app-param}. A larger filter will
445	give a better reconstruction, but take longer and use more memory when
446	executing. When multi-dimensional reconstruction is selected, this
447	filter is used to construct a 2- or 3-dimensional equivalent.
448
449	\subsection{Reconstruction I/O}
450	\label{sec-reconIO}
451
452	The reconstruction stage can be relatively time-consuming, particularly
453	for large cubes and reconstructions in 3-D. To get around this, \duchamp\
454	provides a shortcut to allow users to perform multiple searches (\eg with
455	different thresholds) on the same reconstruction without calculating the
456	reconstruction each time.
457
458	The first step is to choose to save the reconstructed array as a FITS
459	file by setting \texttt{flagOutputRecon = true}. The file will be saved
460	in the same directory as the input image, so the user needs to have write
461	permissions for that directory.
462
463	The filename will be derived from the input filename, with extra
464	information detailing the reconstruction that has been done. For
465	example, suppose \texttt{image.fits} has been reconstructed using a
466	3-dimensional reconstruction with filter 2, thresholded at $4\sigma$
467	using all scales. The output filename will then be
468	\texttt{image.RECON-3-2-4-1.fits} (\ie it uses the four parameters
469	relevant for the \atrous\ reconstruction as listed in
470	Appendix~\ref{app-param}). The new FITS file will also have these
471	parameters as header keywords. If a subsection of the input image has
472	been used (see \S\ref{sec-input}), the format of the output filename
473	will be \texttt{image.sub.RECON-3-2-4-1.fits}, and the subsection that
474	has been used is also stored in the FITS header.
475
476	Likewise, the residual image, defined as the difference between the input
477	and reconstructed arrays, can also be saved in the same manner by setting
478	\texttt{flagOutputResid = true}. Its filename will be the same as above,
479	with RESID replacing RECON.
480
481	If a reconstructed image has been saved, it can be read in and used
482	instead of redoing the reconstruction. To do so, the user should set
483	\texttt{flagReconExists = true}. The user can indicate the name of the
484	reconstructed FITS file using the \texttt{reconFile} parameter, or, if
485	this is not specified, \duchamp\ searches for the file with the name
486	as defined above. If the file is not found, the reconstruction is
487	performed as normal. Note that to do this, the user needs to set
488	\texttt{flagAtrous = true} (obviously, if this is \texttt{false}, the
489	reconstruction is not needed).
490
491	\subsection{Searching the image}
492	\label{sec-detection}
493
494	The image is searched for detections in two ways: spectrally (a
495	1-dimensional search in the spectrum in each spatial pixel), and
496	spatially (a 2-dimensional search in the spatial image in each
497	channel). In both cases, the algorithm finds connected pixels that are
498	above the user-specified threshold. In the case of the spatial image
499	search, the algorithm of \citet{lutz80} is used to raster scan through
500	the image and connect groups of pixels on neighbouring rows.
501
502	Note that this algorithm cannot be applied directly to a 3-dimensional
503	case, as it requires that objects are completely nested in a row: that
504	is, if you are scanning along a row, and one object finishes and
505	another starts, you know that you will not get back to the first one
506	(if at all) until the second is completely finished for that
507	row. Three-dimensional data does not have this property, which is why
508	we break up the searching into 1- and 2-dimensional cases.
509
510	The determination of the threshold is done in one of two ways. The
511	first way is a simple sigma-clipping, where a threshold is set at a
512	fixed number $n$ of standard deviations above the mean, and pixels
513	above this threshold are flagged as detected. The value of $n$ is set
514	with the parameter \texttt{snrCut}. As before, the value of the
515	standard deviation is estimated by the MADFM, and corrected by the
516	ratio derived in Appendix~\ref{app-madfm}.
517
518	The second method uses the False Discovery Rate (FDR) technique
519	\citep{miller01,hopkins02}, whose basis we briefly detail here. The
520	false discovery rate (given by the number of false detections divided
521	by the total number of detections) is fixed at a certain value
522	$\alpha$ (\eg $\alpha=0.05$ implies 5\% of detections are false
523	positives). In practice, an $\alpha$ value is chosen, and the ensemble
524	average FDR (\ie $\langle FDR \rangle$) when the method is used will
525	be less than $\alpha$. One calculates $p$ -- the probability,
526	assuming the null hypothesis is true, of obtaining a test statistic as
527	extreme as the pixel value (the observed test statistic) -- for each
528	pixel, and sorts them in increasing order. One then calculates $d$
529	where
530	\[
531	d = \max_j \left\{ j : P_j < \frac{j\alpha}{c_N N} \right\},
532	\]
533	and then rejects all hypotheses whose $p$-values are less than or equal
534	to $P_d$. (So a $P_i<P_d$ will be rejected even if $P_i \geq
535	j\alpha/c_N N$.) Note that ``reject hypothesis'' here means ``accept
536	the pixel as an object pixel'' (\ie we are rejecting the null
537	hypothesis that the pixel belongs to the background).
538
539	The $c_N$ values here are normalisation constants that depend on the
540	correlated nature of the pixel values. If all the pixels are
541	uncorrelated, then $c_N=1$. If $N$ pixels are correlated, then their
542	tests will be dependent on each other, and so $c_N = \sum_{i=1}^N
543	i^{-1}$. \citet{hopkins02} consider real radio data, where the pixels
544	are correlated over the beam. In this case the sum is made over the
545	$N$ pixels that make up the beam. The value of $N$ is calculated from
546	the FITS header (if the correct keywords -- BMAJ, BMIN -- are not
547	present, a default value of 10 pixels is assumed).
548
549	The theory behind the FDR method implies a direct connection between the
550	choice of $\alpha$ and the fraction of detections that will be false
551	positives. However, due to the merging process, this direct connection is
552	lost when looking at the final number of detections -- see discussion in
553	\S\ref{sec-notes}. The effect is that the number of false detections will
554	be less than indicated by the $\alpha$ value used.
555
556	If a reconstruction has been made, the residuals (defined as original
557	$-$ reconstruction) are used to estimate the noise parameters of the
558	cube. Otherwise they are estimated directly from the cube itself. In
559	both cases, robust estimators are used as described above.
560
561	Detections must have a minimum number of pixels to be counted. This
562	minimum number is given by the input parameters \texttt{minPix} (for
563	2-dimensional searches) and \texttt{minChannels} (for 1-dimensional
564	searches).
565
566	The search only looks for positive features. If one is interested
567	instead in negative features (such as absorption lines), set the
568	parameter \texttt{flagNegative = true}. This will invert the cube (\ie
569	multiply all pixels by $-1$) prior to the search, and then re-invert
570	the cube (and the fluxes of any detections) after searching is
571	complete. All outputs are done in the same manner as normal, so that
572	fluxes of detections will be negative.
573
574	\subsection{Merging detected objects}
575	\label{sec-merger}
576
577	The searching step produces a list of detected objects that will have many
578	repeated detections of a given object -- for instance, spectral
579	detections in adjacent pixels of the same object and/or spatial
580	detections in neighbouring channels. These are then combined in an
581	algorithm that matches all objects judged to be ``close''. This
582	determination is made in one of two ways.
583
584	One way is to define two thresholds -- one spatial and one in velocity
585	-- and say that two objects should be merged if there is at least one
586	pair of pixels that lie within these threshold distances of each
587	other. These thresholds are specified by the parameters
588	\texttt{threshSpatial} and \texttt{threshVelocity} (in units of pixels
589	and channels respectively).
590
591	Alternatively, the spatial requirement can be changed to say that
592	there must be a pair of pixels that are \emph{adjacent} -- a stricter,
593	but perhaps more realistic requirement, particularly when the spatial pixels
594	have a large angular size (as is the case for \hi\ surveys). This
595	method can be selected by setting the parameter
596	\texttt{flagAdjacent} to 1 (\ie \texttt{true}) in the parameter file. The
597	velocity thresholding is done in the same way as the first option.
598
599	Once the detections have been merged, they may be ``grown''. This is a
600	process of increasing the size of the detection by adding adjacent
601	pixels that are above some secondary threshold. This threshold is
602	lower than the one used for the initial detection, but above the noise
603	level, so that faint pixels are only detected when they are close to a
604	bright pixel. The value of this threshold is a possible input
605	parameter (\texttt{growthCut}), with a default value of $1.5\sigma$. The
606	use of the growth algorithm is controlled by the \texttt{flagGrowth}
607	parameter -- the default value of which is \texttt{false}. If the
608	detections are grown, they are sent through the merging algorithm a
609	second time, to pick up any detections that now overlap or have grown
610	over each other.
611
612	Finally, to be accepted, the detections must span \emph{both} a minimum
613	number of channels (to remove any spurious single-channel spikes that
614	may be present), and a minimum number of spatial pixels. These
615	numbers, as for the original detection step, are set with the
616	\texttt{minChannels} and \texttt{minPix} parameters. The channel
617	requirement means there must be at least one set of \texttt{minChannels}
618	consecutive channels in the source for it to be accepted.
619
620	\section{Outputs}
621	\label{sec-output}
622
623	\subsection{During execution}
624
625	\duchamp\ provides the user with feedback whilst it is running, to
626	keep the user informed on the progress of the analysis. Most of this
627	consists of self-explanatory messages about the particular stage the
628	program is up to. The relevant parameters are printed to the screen at
629	the start (once the file has been successfully read in), so the user
630	is able to make a quick check that the setup is correct (see
631	Appendix~{app-input} for an example).
632
633	If the cube is being trimmed (\S\ref{sec-modify}), the resulting
634	dimensions are printed to indicate how much has been trimmed. If a
635	reconstruction is being done, a continually updating message shows
636	either the current iteration and scale, compared to the maximum scale
637	(when \texttt{reconDim=3}), or a progress bar showing the amount of
638	the cube that has been reconstructed (for smaller values of
639	\texttt{reconDim}).
640
641	During the searching algorithms, the progress through the 1D and 2D
642	searches are shown. When the searches have completed,
643	the number of objects found in both the 1D and 2D searches are
644	reported (see \S\ref{sec-detection} for details).
645
646	In the merging process (where multiple detections of the same object
647	are combined -- see \S\ref{sec-merger}), two stages of output
648	occur. The first is when each object in the list is compared with all
649	others. The output shows two numbers: the first being how far through
650	the list the current object is, and the second being the length of the
651	list. As the algorithm proceeds, the first number should increase and
652	the second should decrease (as objects are combined). When the numbers
653	meet (\ie the whole list has been compared), the second phase begins,
654	in which multiply-appearing pixels in each object are removed, as are
655	objects not meeting the minimum channels requirement. During this
656	phase, the total number of accepted objects is shown, which should
657	steadily increase until all have been accepted or rejected. Note that
658	these steps can be very quick for small numbers of detections.
659
660	Since this continual printing to screen has some overhead of time and
661	CPU involved, the user can elect to not print this information by
662	setting the parameter \texttt{verbose = 0}. In this case, the user is
663	still informed as to the steps being undertaken, but the details of
664	the progress are not shown.
665
666	\subsection{Results}
667
668	\subsubsection{Table of Results}
669
670	Finally, we get to the results -- the reason for running \duchamp\ in
671	the first place. Once the detection list is finalised, it is sorted by
672	the mean velocity of the detections (or, if there is no good WCS
673	associated with the cube, by the mean Z-pixel position). The results
674	are then printed to the screen and to the output file, given by the
675	\texttt{OutFile} parameter. The results list, an example of which can be
676	seen in Appendix~\ref{app-output}, contains the following columns
677	(note that the title of the columns depending on WCS information will
678	depend on the projection of the WCS):
679
680	\begin{entry}
681	\item[Obj\#] The ID number of the detection (simply the sequential
682	count for the list, which is ordered by increasing velocity).
683	\item[Name] The IAU-format name of the detection (derived from the WCS
684	position -- see below for a description of the format).
685	\item[X] The average X-pixel position.
686	\item[Y] The average Y-pixel position.
687	\item[Z] The average Z-pixel position.
688	\item[RA/GLON] The Right Ascension or Galactic Longitude of the centre
689	of the object.
690	\item[DEC/GLAT] The Declination or Galactic Latitude of the centre of
691	the object.
692	\item[VEL] The mean velocity of the object [units given by the
693	\texttt{spectralUnits} parameter].
694	\item[w\_RA/w\_GLON] The width of the object in Right Ascension or
695	Galactic Longitude [arcmin].
696	\item[w\_DEC/w\_GLAT] The width of the object in Declination Galactic
697	Latitude [arcmin].
698	\item[w\_VEL] The full velocity width of the detection (max channel
699	$-$ min channel, in velocity units [see note below]).
700	\item[F\_int] The integrated flux over the object, in the units of
701	flux times velocity, corrected for the beam if necessary.
702	\item[F\_peak] The peak flux over the object, in the units of flux.
703	\item[X1, X2] The minimum and maximum X-pixel coordinates.
704	\item[Y1, Y2] The minimum and maximum Y-pixel coordinates.
705	\item[Z1, Z2] The minimum and maximum Z-pixel coordinates.
706	\item[Npix] The number of voxels (\ie distinct $(x,y,z)$ coordinates)
707	in the detection.
708	\item[Flag] Whether the detection has any warning flags (see below).
709	\end{entry}
710	The Name is derived from the WCS position. For instance, a source
711	centred on the RA,Dec position 12$^h$53$^m$45$^s$,
712	-36$^\circ$24$'$12$''$ will be called J125345$-$362412 (if the epoch
713	is J2000) or B125345$-$362412 (if B1950). An alternative form is used
714	for Galactic coordinates: a source centred on the position ($l$,$b$) =
715	(323.1245, 5.4567) will be called G323.124$+$05.457. If the WCS is not
716	valid (\ie is not present or does not have all the necessary
717	information), the Name, RA, DEC, VEL and related columns are not
718	printed, but the pixel coordinates are still provided.
719
720	The velocity units can be specified by the user, using the parameter
721	\texttt{spectralUnits} (enter it as a single string). The default value
722	is km/s, which should be suitable for most users. These units are also
723	used to give the units of integrated flux.
724
725	The last column contains any warning flags about the detection. There
726	are currently two options here. An `E' is printed if the detection is
727	next to the edge of the image, meaning either the limit of the pixels,
728	or the limit of the non-BLANK pixel region. An `N' is printed if the
729	total flux, summed over all the (non-BLANK) pixels in the smallest box
730	that completely encloses the detection, is negative. Note that this
731	sum is likely to include non-detected pixels. It is of use in
732	pointing out detections that lie next to strongly negative pixels,
733	such as might arise due to interference -- the detected pixels might
734	then also be due to the interference, so caution is advised.
735
736	\subsubsection{Other results lists}
737
738	Two alternative results files can also be requested. One option is a
739	VOTable-format XML file, containing just the RA, Dec, Velocity and the
740	corresponding widths of the detections, as well as the fluxes. The
741	user should set \texttt{flagVOT = 1}, and put the desired filename in the
742	parameter \texttt{votFile} -- note that the default is for it not to be
743	produced. This file should be compatible with all Virtual Observatory
744	tools (such as Aladin\footnote{ Aladin can be found on the web at
745	\href{http://aladin.u-strasbg.fr/}{http://aladin.u-strasbg.fr/}}). The
746	second option is an annotation file for use with the Karma toolkit of
747	visualisation tools (in particular, with \texttt{kvis}). This will draw a
748	circle at the position of each detection, and number it according to
749	the Obj\# given above. To make use of this option, the user should
750	set \texttt{flagKarma = 1}, and put the desired filename in the parameter
751	\texttt{karmaFile} -- again, the default is for it not to be produced.
752
753	As the program is running, it also (optionally) records the detections
754	made in each individual spectrum or channel (see \S\ref{sec-detection}
755	for details on this process). This is recorded in the file given by
756	the parameter \texttt{LogFile}. This file does not include the columns
757	\texttt{Name, RA, DEC, w\_RA, w\_DEC, VEL, w\_VEL}. This file is
758	designed primarily for diagnostic purposes: \eg to see if a given set
759	of pixels is detected in, say, one channel image, but does not survive
760	the merging process. The list of pixels (and their fluxes) in the
761	final detection list are also printed to this file, again for
762	diagnostic purposes. The file also records the execution time, as well
763	as the command-line statement used to run \duchamp. The creation of
764	this log file can be prevented by setting \texttt{flagLog =
765	false}. (This may be a good idea if you are not interested in its
766	contents, as it can be a large file if many pixels are being
767	detected.)
768
769	\subsubsection{Graphical output -- spectra}
770
771	As well as the output data file, a postscript file is created that
772	shows the spectrum for each detection, together with a small cutout
773	image (the 0th moment) and basic information about the detection (note
774	that any flags are printed after the name of the detection, in the
775	format \texttt{[E]}). If the cube was reconstructed, the spectrum from
776	the reconstruction is shown in red, over the top of the original
777	spectrum. The spectral extent of the detected object is indicated by
778	two dashed blue lines, and the region covered by the ``Milky Way''
779	channels is shown by a green hashed box.
780
781	The spectrum that is plotted is governed by the
782	\texttt{spectralMethod} parameter. It can be either \texttt{peak},
783	where the spectrum is from the spatial pixel containing the
784	detection's peak flux; or \texttt{sum}, where the spectrum is summed
785	over all spatial pixels, and then corrected for the beam size.
786
787	The spectral extent of the detection is indicated with blue lines, and
788	a zoom is shown in a separate window. The cutout image can optionally
789	include a border around the spatial pixels that are in the detection
790	(turned on and off by the parameter \texttt{drawBorders} -- the
791	default is \texttt{true}). It also includes a scale bar in the bottom
792	left corner to indicate size -- it is 15~arcmin long (note that due to
793	projection effects it may be a slightly different physical length from
794	object to object). An example detection can be seen below in
795	Fig.~\ref{fig-spect}.
796
797	\begin{figure}[t]
798	\begin{center}
799	\includegraphics[width=\textwidth]{example_spectrum}
800	\end{center}
801	\caption{\footnotesize An example of the spectrum output. Note several
802	of the features discussed in the text: the red lines indicating the
803	reconstructed spectrum; the blue dashed lines indicating the
804	spectral extent of the detection; the green hashed area indicating
805	the Milky Way channels that are ignored by the searching algorithm;
806	the blue border showing its spatial extent on the 0th moment map;
807	and the 15~arcmin-long scale bar.}
808	\label{fig-spect}
809	\end{figure}
810
811	\subsubsection{Graphical output -- maps}
812
813	\begin{figure}[!t]
814	\begin{center}
815	\includegraphics[width=\textwidth]{example_moment_map}
816	\end{center}
817	\caption{\footnotesize An example of the moment map created by
818	\duchamp. The full extent of the cube is covered, and the 0th moment
819	of each object is shown (integrated individually over all the
820	detected channels).}
821	\label{fig-moment}
822	\end{figure}
823
824	Finally, a couple of images are optionally produced: a 0th moment map
825	of the cube, combining just the detected channels in each object,
826	showing the integrated flux in grey-scale; and a ``detection image'',
827	a grey-scale image where the pixel values are the number of channels
828	that spatial pixel is detected in. In both cases, if
829	\texttt{drawBorders = true}, a border is drawn around the spatial
830	extent of each detection. An example moment map is shown in
831	Fig.~\ref{fig-moment}. The production or otherwise of these images is
832	governed by the \texttt{flagMaps} parameter.
833
834	The purpose of these images are to provide a visual guide to where the
835	detections have been made, and, particularly in the case of the moment
836	map, to provide an indication of the strength of the source. In both
837	cases, the detections are numbered (in the same sense as the output
838	list), and the spatial borders are marked out as for the cutout images
839	in the spectra file. Both these images are saved as postscript files
840	(given by the parameters \texttt{momentMap} and \texttt{detectionMap}
841	respectively), with the latter also displayed in a {\sc pgplot}
842	window (regardless of the state of \texttt{flagMaps}).
843
844	\section{Notes and hints on the use of \duchamp}
845	\label{sec-notes}
846
847	In using \duchamp, the user has to make a number of decisions about
848	the way the program runs. This section is designed to give the user
849	some idea about what to choose.
850
851	The main choice is whether or not to use the wavelet
852	reconstruction. The main benefits of this are the marked reduction in
853	the noise level, leading to regularly-shaped detections, and good
854	reliability for faint sources. The main drawback with its use is the
855	long execution time: to reconstruct a $170\times160\times1024$
856	(\hipass) cube often requires three iterations and takes about 20-25
857	minutes to run completely. Note that this is for the three-dimensional
858	reconstruction: using \texttt{reconDim=1} makes the reconstruction
859	quicker (the full program then takes about 6 minutes), but it is still
860	the largest part of the time.
861
862	The searching part of the procedure is much quicker: searching an
863	un-reconstructed cube leads to execution times of only a couple of
864	minutes. Alternatively, using the ability to read in previously-saved
865	reconstructed arrays makes running the reconstruction more than once a
866	more feasible prospect.
867
868	On the positive side, the shape of the detections in a cube that has
869	been reconstructed will be much more regular and smooth -- the ragged
870	edges that objects in the raw cube possess are smoothed by the removal
871	of most of the noise. This enables better determination of the shapes
872	and characteristics of objects.
873
874	A further point to consider when using the reconstruction is that if
875	the two-dimensional reconstruction is chosen (\texttt{reconDim=2}), it
876	can be susceptible to edge effects. If the valid area in the cube (\ie
877	the part that is not BLANK) has non-rectangular edges, the convolution
878	can produce artefacts in the reconstruction that mimic the edges and
879	can lead (depending on the selection threshold) to some spurious
880	sources. Caution is advised with such data -- the user is advised to
881	check carefully the reconstructed cube for the presence of such
882	artefacts. Note, however, that the 1- and 3-dimensional
883	reconstructions are \emph{not} susceptible in the same way, since the
884	spectral direction does not generally exhibit these BLANK edges, and
885	so we recommend the use of either of these.
886
887	If one chooses the reconstruction method, a further decision is
888	required on the signal-to-noise cutoff used in determining acceptable
889	wavelet coefficients. A larger value will remove more noise from the
890	cube, at the expense of losing fainter sources, while a smaller value
891	will include more noise, which may produce spurious detections, but
892	will be more sensitive to faint sources. Values of less than about
893	$3\sigma$ tend to not reduce the noise a great deal and can lead to
894	many spurious sources (although this will depend on the nature of the
895	cube).
896
897	When it comes to searching, the FDR method produces more reliable results
898	than simple sigma-clipping, particularly in the absence of reconstruction.
899	However, it does not work in exactly the way one would expect for a
900	given value of \texttt{alpha}. For instance, setting fairly liberal values
901	of \texttt{alpha} (say, 0.1) will often lead to a much smaller fraction
902	of false detections (\ie much less than 10\%). This is the effect of the
903	merging algorithms, that combine the sources after the detection stage,
904	and reject detections not meeting the minimum pixel or channel requirements.
905	It is thus better to aim for larger \texttt{alpha} values than those derived
906	from a straight conversion of the desired false detection rate.
907
908	Finally, as \duchamp\ is still undergoing development, there are some
909	elements that are not fully developed. In particular, it is not as
910	clever as I would like at avoiding interference. The ability to place
911	requirements on the minimum number of channels and pixels partially
912	circumvents this problem, but work is being done to make \duchamp\
913	smarter at rejecting signals that are clearly (to a human eye at
914	least) interference. See the following section for further
915	improvements that are planned.
916
917	\section{Future Developments}
918
919	This is both a list of planned improvements and a wish-list of
920	features that would be nice to include (but are not planned in the
921	immediate future). Let me know if there are items not on this list, or
922	items on the list you would like prioritised.
923
924	\begin{itemize}
925
926	\item Better determination of the noise characteristics of
927	spectral-line cubes, including understanding how the noise is
928	generated and developing a model for it. \textbf{Planned.}
929
930	\item Include more source analysis. Examples could be: shape
931	information; measurements of HI mass; more variety of measurements
932	of velocity width and profile. \textbf{Some planned.}
933
934	\item Provide some indication of the significance of the detection
935	(\ie some S/N-like value). \textbf{Planned.}
936
937	\item Improved ability to reject interference, possibly on the
938	spectral shape of features. \textbf{Planned.}
939
940	\item Ability to separate (de-blend) distinct sources that have been
941	merged. \textbf{Planned.}
942
943	\item Link to lists of possible counterparts (\eg via NED/SIMBAD/other
944	VO tools?). \textbf{Wish-list.}
945
946	\item On-line web service interface, so a user can upload a cube and
947	get back a source-list. \textbf{Wish-list}.
948
949	\item Embed \duchamp\ in a GUI, to move away from the text-based
950	interaction. \textbf{Wish-list}.
951	\end{itemize}
952
953
954	%\bibliographystyle{mn2e}
955	%\bibliographystyle{abbrvnat}
956	%\bibliography{mnrasmnemonic,sourceDetection}
957	\begin{thebibliography}{}
958
959	\bibitem[\protect\citeauthoryear{{Calabretta} \& {Greisen}}{{Calabretta} \&
960	{Greisen}}{2002}]{calabretta02}
961	{Calabretta} M., {Greisen} E., 2002, A\&A, 395, 1077
962
963	\bibitem[\protect\citeauthoryear{{Greisen} \& {Calabretta}}{{Greisen} \&
964	{Calabretta}}{2002}]{greisen02}
965	{Greisen} E., {Calabretta} M., 2002, A\&A, 395, 1061
966
967	\bibitem[\protect\citeauthoryear{{Hanisch}, {Farris}, {Greisen}, {Pence},
968	{Schlesinger}, {Teuben}, {Thompson} \& {Warnock}}{{Hanisch}
969	et~al.}{2001}]{hanisch01}
970	{Hanisch} R., {Farris} A., {Greisen} E., {Pence} W., {Schlesinger} B.,
971	{Teuben} P., {Thompson} R., {Warnock} A., 2001, A\&A, 376, 359
972
973	\bibitem[\protect\citeauthoryear{{Hopkins}, {Miller}, {Connolly}, {Genovese},
974	{Nichol} \& {Wasserman}}{{Hopkins} et~al.}{2002}]{hopkins02}
975	{Hopkins} A., {Miller} C., {Connolly} A., {Genovese} C., {Nichol} R.,
976	{Wasserman} L., 2002, AJ, 123, 1086
977
978	\bibitem[\protect\citeauthoryear{Lutz}{Lutz}{1980}]{lutz80}
979	Lutz R., 1980, The Computer Journal, 23, 262
980
981	\bibitem[\protect\citeauthoryear{{Meyer} et~al.,}{{Meyer}
982	et~al.}{2004}]{meyer04:trunc}
983	{Meyer} M., et~al., 2004, MNRAS, 350, 1195
984
985	\bibitem[\protect\citeauthoryear{{Miller}, {Genovese}, {Nichol}, {Wasserman},
986	{Connolly}, {Reichart}, {Hopkins}, {Schneider} \& {Moore}}{{Miller}
987	et~al.}{2001}]{miller01}
988	{Miller} C., {Genovese} C., {Nichol} R., {Wasserman} L., {Connolly} A.,
989	{Reichart} D., {Hopkins} A., {Schneider} J., {Moore} A., 2001, AJ, 122,
990	3492
991
992	\bibitem[\protect\citeauthoryear{Minchin}{Minchin}{1999}]{minchin99}
993	Minchin R., 1999, PASA, 16, 12
994
995	\bibitem[\protect\citeauthoryear{Starck \& Murtagh}{Starck \&
996	Murtagh}{2002}]{starck02:book}
997	Starck J.-L., Murtagh F., 2002, {``Astronomical Image and Data Analysis''}.
998	Springer
999
1000	\end{thebibliography}
1001
1002
1003	\appendix
1004	\newpage
1005	\section{Obtaining and Installing \duchamp}
1006
1007	The \duchamp\ web page can be found at the following location:\\
1008	\href{http://www.atnf.csiro.au/people/Matthew.Whiting/Duchamp}%
1009	{http://www.atnf.csiro.au/people/Matthew.Whiting/Duchamp}\\
1010	Here you can find a gzipped tar archive of the source code that can be
1011	downloaded and extracted, as well as this User's Guide in postscript
1012	and hyperlinked PDF formats.
1013
1014	\duchamp\ can be built on Unix systems by typing (assuming that the
1015	prompt your terminal provides is a \texttt{> } -- don't type this
1016	character!):
1017	\begin{quote}
1018	\texttt{%
1019	> ./configure\\
1020	> make\\
1021	> make clean (optional -- to remove the object files)}
1022	\end{quote}
1023
1024	Run in this manner, \texttt{configure} should find all the necessary
1025	libraries, but if some libraries have been installed in non-standard
1026	locations, it may fail. In this case, you can specify additional
1027	directories to look in by giving extra command-line arguments. There
1028	are separate options for library files (eg. libcpgplot.a) and header
1029	files (eg. cpgplot.h).
1030
1031	For example, if \textsc{wcslib} had been installed in
1032	\texttt{/home/mduchamp/wcslib}, there are two libraries that are
1033	likely to be in separate subdirectories: \texttt{C/} and
1034	\texttt{pgsbox/}. Each subdirectory needs to be searched for library
1035	and header files, so one could build Duchamp by typing:
1036	\begin{quote}
1037	\texttt{%
1038	> ./configure $\backslash$ \\
1039	LIBDIRS="/home/mduchamp/wcslib/C /home/mduchamp/wcslib/pgsbox"
1040	$\backslash$\\
1041	INCDIRS="/home/mduchamp/wcslib/C /home/mduchamp/wcslib/pgsbox"}
1042	\end{quote}
1043	And then just run make in the usual fashion:
1044	\begin{quote}
1045	\texttt{> make}
1046	\end{quote}
1047
1048	This will produce the executable \texttt{Duchamp}. There are two
1049	possible ways to run it. The first is:
1050	\begin{quote}
1051	\texttt{> Duchamp -f [FITS file]}
1052	\end{quote}
1053	where \texttt{[FITS file]} is the file you wish to search. This method
1054	simply uses the default values of all parameters.
1055
1056	The second method allows some determination of the parameter values by
1057	the user. Type:
1058	\begin{quote}
1059	\texttt{> Duchamp -p [parameter file]}
1060	\end{quote}
1061	where \texttt{[parameterFile]} is a file with the input parameters,
1062	including the name of the cube you want to search. There are two
1063	example input files included with the distribution. The smaller one,
1064	\texttt{InputExample}, shows the typical parameters one might want to
1065	set. The large one, \texttt{InputComplete}, lists all possible
1066	parameters that can be entered, and a brief description of them. To
1067	get going quickly, just replace the "your-file-here" in
1068	\texttt{InputExample} with your image name, and type
1069	\begin{quote}
1070	\texttt{> Duchamp -p InputExample}
1071	\end{quote}
1072
1073	The following appendices provide details on the individual parameters,
1074	and show examples of the output files that \duchamp\ produces.
1075
1076	\newpage
1077	\section{Available parameters}
1078	\label{app-param}
1079
1080	The full list of parameters that can be listed in the input file are
1081	given here. If not listed, they take the default value given in
1082	parentheses. Since the order of the parameters in the input file does
1083	not matter, they are grouped here in logical sections.
1084
1085	\subsection*{Input-output related}
1086	\begin{entry}
1087	\item[ImageFile (no default assumed)] The filename of the
1088	data cube to be analysed.
1089	\item[flagSubsection \texttt{[false]}] A flag to indicate whether one
1090	wants a subsection of the requested image.
1091	\item[Subsection \texttt{[ [,,*] ]}] The requested subsection, which
1092	should be specified in the format \texttt{[x1:x2,y1:y2,z1:z2]}, where
1093	the limits are inclusive. If the full range of a dimension is
1094	required, use a \texttt{*}, \eg if you want the full spectral range of
1095	a subsection of the image, use \texttt{[30:140,30:140,*]}.
1096	\item[flagReconExists \texttt{[false]}] A flag to indicate whether the
1097	reconstructed array has been saved by a previous run of \duchamp. If
1098	set true, the reconstructed array will be read from the file given by
1099	\texttt{reconFile}, rather than calculated directly.
1100	\item[reconFile (no default assumed)] The FITS file that contains the
1101	reconstructed array. If \texttt{flagReconExists} is true and this
1102	parameter is not defined, the default file searched will be
1103	determined by the \atrous\ parameters (see \S\ref{sec-recon}).
1104	\item[OutFile \texttt{[duchamp-Results.txt]}] The file containing the
1105	final list of detections. This also records the list of input
1106	parameters.
1107	\item[SpectraFile \texttt{[duchamp-Spectra.ps]}] The postscript file
1108	containing the resulting integrated spectra and images of the
1109	detections.
1110	\item[flagLog \texttt{[true]}] A flag to indicate whether intermediate
1111	detections should be logged.
1112	\item[LogFile \texttt{[duchamp-Logfile.txt]}] The file in which intermediate
1113	detections are logged. These are detections that have not been
1114	merged. This is primarily for use in debugging and diagnostic
1115	purposes -- normal use of the program will probably not require
1116	this.
1117	\item[flagOutputRecon \texttt{[false]}] A flag to say whether or not to
1118	save the reconstructed cube as a FITS file. The filename will be
1119	derived from the ImageFile -- the reconstruction of \texttt{image.fits}
1120	will be saved as \texttt{image.RECON?.fits}, where \texttt{?} stands for
1121	the value of \texttt{snrRecon} (see below).
1122	\item[flagOutputResid \texttt{[false]}] As for \texttt{flagOutputRecon}, but
1123	for the residual array -- the difference between the original cube
1124	and the reconstructed cube. The filename will be \texttt{image.RESID?.fits}.
1125	\item[flagVOT \texttt{[false]}] A flag to say whether to create a VOTable
1126	file corresponding to the information in \texttt{outfile}. This will be
1127	an XML file in the Virtual Observatory VOTable format.
1128	\item[votFile \texttt{[duchamp-Results.xml]}] The VOTable file with the
1129	list of final detections. Some input parameters are also recorded.
1130	\item[flagKarma \texttt{[false]}] A flag to say whether to create a
1131	Karma annotation file corresponding to the information in
1132	\texttt{outfile}. This can be used as an overlay for the Karma
1133	programs such as \texttt{kvis}.
1134	\item[karmaFile \texttt{[duchamp-Results.ann]}] The Karma annotation
1135	file showing the list of final detections.
1136	\item[flagMaps \texttt{[true]}] A flag to say whether to save
1137	postscript files showing the 0th moment map of the whole cube
1138	(parameter \texttt{momentMap}) and the detection image
1139	(\texttt{detectionMap}).
1140	\item[momentMap \texttt{[duchamp-MomentMap.ps]}] A postscript file
1141	containing a map of the 0th moment of the detected sources, as well
1142	as pixel and WCS coordinates.
1143	\item[detectionMap \texttt{[duchamp-DetectionMap.ps]}] A postscript
1144	file showing each of the detected objects, coloured in greyscale by
1145	the number of channels spanned by each pixel. Also shows pixel and WCS
1146	coordinates.
1147	\end{entry}
1148
1149	\subsection*{Modifying the cube}
1150	\begin{entry}
1151	\item[flagBlankPix \texttt{[true]}] A flag to say whether to remove BLANK
1152	pixels from the analysis -- these are pixels set to some particular
1153	value because they fall outside the imaged area.
1154	\item[blankPixValue \texttt{[-8.00061]}] The value of the BLANK pixels,
1155	if this information is not contained in the FITS header (the usual
1156	procedure is to obtain this value from the header information -- in
1157	which case the value set by this parameter is ignored).
1158	\item[flagMW \texttt{[false]}] A flag to say whether to ignore channels
1159	contaminated by Milky Way (or other) emission -- the searching
1160	algorithms will not look at these channels.
1161	\item[maxMW \texttt{[112]}] The maximum channel number containing
1162	``Milky Way'' emission.
1163	\item[minMW \texttt{[75]}] The minimum channel number containing
1164	``Milky Way'' emission. Note that the range specified by
1165	\texttt{maxMW} and \texttt{minMW} is inclusive.
1166	\item[flagBaseline \texttt{[false]}] A flag to say whether to remove the
1167	baseline from each spectrum in the cube for the purposes of
1168	reconstruction and detection.
1169	\end{entry}
1170
1171	\subsection*{Detection related}
1172
1173	\subsubsection*{General detection}
1174	\begin{entry}
1175	\item[flagNegative \texttt{[false]}] A flag to indicate that the features
1176	being searched for are negative. The cube will be inverted prior to
1177	searching.
1178	\item[snrCut \texttt{[3.]}] The cut-off value for thresholding, in terms
1179	of number of $\sigma$ above the mean.
1180	\item[flagGrowth \texttt{[false]}] A flag indicating whether or not to
1181	grow the detected objects to a smaller threshold.
1182	\item[growthCut \texttt{[2.]}] The smaller threshold using in growing
1183	detections. In units of $\sigma$ above the mean.
1184	\end{entry}
1185
1186	\subsubsection*{\Atrous\ reconstruction}
1187	\begin{entry}
1188	\item [flagATrous \texttt{[true]}] A flag indicating whether or not to
1189	reconstruct the cube using the \atrous\ wavelet
1190	reconstruction. See \S\ref{sec-recon} for details.
1191	\item[reconDim \texttt{[3]}] The number of dimensions to use in the
1192	reconstruction. 1 means reconstruct each spectrum separately, 2
1193	means each channel map is done separately, and 3 means do the whole
1194	cube in one go.
1195	\item[scaleMin \texttt{[1]}] The minimum wavelet scale to be used in the
1196	reconstruction. A value of 1 means ``use all scales''.
1197	\item[snrRecon \texttt{[4]}] The thresholding cutoff used in the
1198	reconstruction -- only wavelet coefficients this many $\sigma$ above
1199	the mean (or greater) are included in the reconstruction.
1200	\item[filterCode \texttt{[1]}] The code number of the filter to use in
1201	the reconstruction. The options are:
1202	\begin{itemize}
1203	\item \textbf{1:} B$_3$-spline filter: coefficients =
1204	$(\frac{1}{16}, \frac{1}{4}, \frac{3}{8}, \frac{1}{4}, \frac{1}{16})$
1205	\item \textbf{2:} Triangle filter: coefficients =
1206	$(\frac{1}{4}, \frac{1}{2}, \frac{1}{4})$
1207	\item \textbf{3:} Haar wavelet: coefficients =
1208	$(0, \frac{1}{2}, \frac{1}{2})$
1209	\end{itemize}
1210	\end{entry}
1211
1212	\subsubsection*{FDR method}
1213	\begin{entry}
1214	\item[flagFDR \texttt{[false]}] A flag indicating whether or not to use
1215	the False Discovery Rate method in thresholding the pixels.
1216	\item[alphaFDR \texttt{[0.01]}] The $\alpha$ parameter used in the FDR
1217	analysis. The average number of false detections, as a fraction of the
1218	total number, will be less than $\alpha$ (see \S\ref{sec-detection}).
1219	\end{entry}
1220
1221	\subsubsection*{Merging detections}
1222	\begin{entry}
1223	\item[minPix \texttt{[2]}] The minimum number of spatial pixels for a single
1224	detection to be counted.
1225	\item[minChannels \texttt{[3]}] The minimum number of consecutive
1226	channels that must be present in a detection.
1227	\item[flagAdjacent \texttt{[true]}] A flag indicating whether to use the
1228	``adjacent pixel'' criterion to decide whether to merge objects. If
1229	not, the next two parameters are used to determine whether objects
1230	are within the necessary thresholds.
1231	\item[threshSpatial \texttt{[3.]}] The maximum allowed minimum spatial
1232	separation (in pixels) between two detections for them to be merged
1233	into one. Only used if \texttt{flagAdjacent = false}.
1234	\item[threshVelocity \texttt{[7.]}] The maximum allowed minimum channel
1235	separation between two detections for them to be merged into
1236	one.
1237	\end{entry}
1238
1239	\subsubsection*{Other parameters}
1240	\begin{entry}
1241	\item[spectralMethod \texttt{[peak]}] This indicates which method is used
1242	to plot the output spectra: \texttt{peak} means plot the spectrum
1243	containing the detection's peak pixel; \texttt{sum} means sum the
1244	spectra of each detected spatial pixel, and correct for the beam
1245	size. Any other choice defaults to \texttt{peak}.
1246	\item[spectralUnits \texttt{[km/s]}] The user can specify the units of
1247	the spectral axis. Assuming the WCS of the FITS file is valid, the
1248	spectral axis is transformed into velocity, and put into these units
1249	for all output and for calculations such as the integrated flux of a
1250	detection.
1251	\item[drawBorders \texttt{[true]}] A flag indicating whether borders
1252	are to be drawn around the detected objects in the moment maps
1253	included in the output (see for example Fig.~\ref{fig-spect}).
1254	\item[verbose \texttt{[true]}] A flag indicating whether to print the
1255	progress of computationally-intensive algorithms (such as the
1256	searching and merging) to screen.
1257	\end{entry}
1258
1259
1260	\newpage
1261	\section{Example parameter files}
1262	\label{app-input}
1263
1264	This is what a typical parameter file would look like.
1265
1266	\begin{verbatim}
1267	imageFile /DATA/SITAR_1/whi550/cubes/H201_abcde_luther_chop.fits
1268	logFile logfile.txt
1269	outFile results.txt
1270	spectraFile spectra.ps
1271	flagSubsection 0
1272	flagOutputRecon 0
1273	flagOutputResid 0
1274	flagBlankPix 1
1275	flagMW 1
1276	minMW 75
1277	maxMW 112
1278	minPix 3
1279	flagGrowth 1
1280	growthCut 1.5
1281	flagATrous 0
1282	scaleMin 1
1283	snrRecon 4
1284	flagFDR 1
1285	alphaFDR 0.1
1286	numPixPSF 20
1287	snrCut 3
1288	threshSpatial 3
1289	threshVelocity 7
1290	\end{verbatim}
1291
1292	Note that it is not necessary to include all these parameters in the
1293	file, only those that need to be changed from the defaults (as listed
1294	in Appendix~\ref{app-param}), which in this case would be very few. A
1295	minimal parameter file might look like:
1296	\begin{verbatim}
1297	imageFile /DATA/SITAR_1/whi550/cubes/H201_abcde_luther_chop.fits
1298	flagLog 0
1299	snrRecon 3
1300	snrCut 2.5
1301	minChannels 4
1302	\end{verbatim}
1303	This will reconstruct the cube with a lower SNR value than the
1304	default, select objects at a lower threshold, with a looser minimum
1305	channel requirement, and not keep a log of the intermediate
1306	detections.
1307
1308	The following page demonstrates how the parameters are presented to the
1309	user, both on the screen at execution time, and in the output and log
1310	files. On each line, there is a description on the parameter, the relevant
1311	parameter name that is used in the input file (if there is one that the
1312	user can enter), and the value of the parameter being used.
1313	\newpage
1314	\begin{landscape}
1315	Typical presentation of parameters in output and log files:
1316	\begin{verbatim}
1317	---- Parameters ----
1318	Image to be analysed.........................[imageFile] = input.fits
1319	Intermediate Logfile...........................[logFile] = duchamp-Logfile.txt
1320	Final Results file.............................[outFile] = duchamp-Results.txt
1321	Spectrum file..............................[spectraFile] = duchamp-Spectra.ps
1322	0th Moment Map...............................[momentMap] = duchamp-MomentMap.ps
1323	Detection Map.............................[detectionMap] = duchamp-DetectionMap.ps
1324	Saving reconstructed cube?.............[flagoutputrecon] = false
1325	Saving residuals from reconstruction?..[flagoutputresid] = false
1326	------
1327	Searching for Negative features?..........[flagNegative] = false
1328	Fixing Blank Pixels?......................[flagBlankPix] = true
1329	Blank Pixel Value....................................... = -8.00061
1330	Removing Milky Way channels?....................[flagMW] = true
1331	Milky Way Channels.......................[minMW - maxMW] = 75-112
1332	Beam Size (pixels)...................................... = 10.1788
1333	Removing baselines before search?.........[flagBaseline] = false
1334	Minimum # Pixels in a detection.................[minPix] = 2
1335	Minimum # Channels in a detection..........[minChannels] = 3
1336	Growing objects after detection?............[flagGrowth] = false
1337	Using A Trous reconstruction?...............[flagATrous] = true
1338	Number of dimensions in reconstruction........[reconDim] = 3
1339	Minimum scale in reconstruction...............[scaleMin] = 1
1340	SNR Threshold within reconstruction...........[snrRecon] = 4
1341	Filter being used for reconstruction........[filterCode] = 1 (B3 spline function)
1342	Using FDR analysis?............................[flagFDR] = false
1343	SNR Threshold...................................[snrCut] = 3
1344	Using Adjacent-pixel criterion?...........[flagAdjacent] = true
1345	Max. velocity separation for merging....[threshVelocity] = 7
1346	Method of spectral plotting.............[spectralMethod] = peak
1347	\end{verbatim}
1348
1349	\newpage
1350	\section{Example results file}
1351	\label{app-output}
1352	This the typical content of an output file, after running \duchamp\
1353	with the parameters illustrated on the previous page.
1354
1355	{\scriptsize
1356	\begin{verbatim}
1357	Results of the \duchamp\ source finder: Tue May 23 14:51:38 2006
1358	---- Parameters ----
1359	(... omitted for clarity -- see previous page for examples...)
1360	--------------------
1361	Total number of detections = 25
1362	--------------------
1363	------------------------------------------------------------------------------------------------------------------------------------------------------
1364	Obj# Name X Y Z RA DEC VEL w_RA w_DEC w_VEL F_int F_peak X1 X2 Y1 Y2 Z1 Z2 Npix Flag
1365	[km/s] [arcmin] [arcmin] [km/s] [Jy km/s] [Jy/beam] [pix]
1366	------------------------------------------------------------------------------------------------------------------------------------------------------
1367	1 J0618-2532 30.2 86.0 113.3 06:18:12.54 -25:32:44.79 208.502 45.17 34.61 26.383 24.394 0.350 25 35 82 90 112 114 137 E
1368	2 J0609-2156 59.5 140.6 114.6 06:09:19.66 -21:56:31.20 225.572 44.39 31.47 65.957 16.128 0.213 55 65 137 144 113 118 153
1369	3 J0545-2143 141.2 143.2 114.8 05:45:51.71 -21:43:36.20 228.470 19.61 16.66 26.383 2.412 0.090 139 143 142 145 114 116 29
1370	4 J0617-2633 33.3 70.8 115.6 06:17:25.52 -26:33:33.83 238.736 65.02 30.10 26.383 9.776 0.117 26 41 68 75 115 117 104 E
1371	5 J0601-2500 86.2 94.9 117.9 06:01:39.54 -25:00:32.46 269.419 27.99 24.02 26.383 3.920 0.124 83 89 92 97 117 119 44
1372	6 J0602-2547 84.0 83.1 118.0 06:02:18.29 -25:47:31.69 270.319 20.01 19.99 26.383 2.999 0.118 82 86 81 85 117 119 34
1373	7 J0547-2448 133.0 97.2 118.7 05:47:52.53 -24:48:38.16 279.113 19.72 12.54 26.383 1.474 0.074 131 135 96 98 118 120 21
1374	8 J0606-2719 71.1 60.0 121.3 06:06:10.99 -27:19:48.61 314.090 52.36 39.59 39.574 14.268 0.150 65 77 55 64 120 123 154
1375	9 J0611-2137 52.4 145.3 162.5 06:11:20.92 -21:37:29.57 857.955 32.39 23.49 118.722 43.178 0.410 49 56 142 147 158 167 265 E
1376	10 J0600-2859 89.7 35.3 202.4 06:00:34.08 -28:59:00.43 1383.160 23.93 24.10 171.487 24.439 0.173 87 92 33 38 196 209 271
1377	11 J0558-2638 95.4 70.3 223.1 05:58:53.03 -26:38:45.91 1656.140 11.93 12.07 92.339 1.045 0.063 94 96 69 71 220 227 18
1378	12 J0617-2723 34.7 58.3 227.4 06:17:07.07 -27:23:50.65 1712.868 16.75 23.53 290.209 8.529 0.093 33 36 56 61 215 237 118
1379	13 J0558-2525 95.8 88.6 231.7 05:58:49.27 -25:25:33.60 1770.134 27.87 24.16 237.444 12.863 0.115 92 98 86 91 221 239 175
1380	14 J0600-2141 88.8 144.4 232.5 06:00:54.02 -21:41:57.06 1780.188 27.96 24.13 224.252 30.743 0.166 86 92 142 147 222 239 344 E
1381	15 J0615-2634 40.0 70.8 232.6 06:15:25.50 -26:34:20.04 1782.214 12.44 15.69 52.765 2.084 0.068 39 41 69 72 231 235 31
1382	16 J0604-2606 76.0 78.4 233.0 06:04:41.13 -26:06:21.19 1787.226 24.13 23.87 211.061 23.563 0.155 73 78 76 81 225 241 278
1383	17 J0601-2340 87.9 114.9 235.8 06:01:08.83 -23:40:19.37 1824.122 31.95 28.09 237.444 82.380 0.297 85 92 112 118 227 245 647
1384	18 J0615-2235 38.2 130.5 254.5 06:15:32.09 -22:35:37.24 2070.934 12.29 11.70 105.531 1.555 0.070 37 39 129 131 249 257 24
1385	19 J0617-2305 31.4 122.8 258.1 06:17:33.45 -23:05:28.94 2118.752 12.34 11.65 26.383 1.022 0.062 30 32 122 124 257 259 16
1386	20 J0612-2149 49.6 142.2 270.3 06:12:11.04 -21:49:29.72 2279.926 16.27 15.73 395.740 15.156 0.101 48 51 141 144 257 287 204
1387	21 J0616-2133 35.3 146.0 300.6 06:16:15.78 -21:33:09.69 2679.148 20.22 7.47 224.252 3.014 0.127 33 37 145 146 294 311 28 E
1388	22 J0555-2956 107.3 20.9 367.6 05:55:08.02 -29:56:09.08 3562.236 19.71 20.30 39.574 5.891 0.169 105 109 19 23 366 369 58
1389	23 J0557-2246 99.8 128.2 434.0 05:57:43.77 -22:46:42.95 4438.776 11.88 16.12 105.531 1.703 0.167 99 101 127 130 430 438 17 N
1390	24 J0616-2648 38.1 67.2 546.8 06:16:02.10 -26:48:35.49 5926.464 12.35 11.67 26.383 1.276 0.064 37 39 66 68 546 548 18
1391	25 J0552-2916 117.0 30.5 727.0 05:52:13.64 -29:16:58.02 8303.952 11.59 20.25 303.400 35.523 0.479 116 118 28 32 716 739 111
1392	\end{verbatim}
1393	}
1394	Note that the
1395	width of the table can make it hard to read. A good trick for those
1396	using UNIX/Linux is to make use of the \texttt{a2ps} command. The
1397	following works well, producing a postscript file \texttt{results.ps}:
1398	\\\verb\|a2ps -1 -r -f8 -o duchamp-Results.ps duchamp-Results.txt\|
1399
1400	%\end{landscape}
1401
1402	\newpage
1403	\section{Example VOTable output}
1404	\label{app-votable}
1405	This is part of the VOTable, in XML format, corresponding to the
1406	output file in Appendix~\ref{app-output} (the indentation has been
1407	removed to make it fit on the page).
1408
1409	%\begin{landscape}
1410	{\scriptsize
1411	\begin{verbatim}
1412	<?xml version="1.0"?>
1413	<VOTABLE version="1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
1414	xsi:noNamespaceSchemaLocation="http://www.ivoa.net/xml/VOTable/VOTable/v1.1">
1415	<COOSYS ID="J2000" equinox="J2000." epoch="J2000." system="eq_FK5"/>
1416	<RESOURCE name="Duchamp Output">
1417	<TABLE name="Detections">
1418	<DESCRIPTION>Detected sources and parameters from running the Duchamp source finder.</DESCRIPTION>
1419	<PARAM name="FITS file" datatype="char" ucd="meta.file;meta.fits" value="/DATA/SITAR_1/whi550/cubes/H201_abcde_luther_chop.fits"/>
1420	<PARAM name="Threshold" datatype="float" ucd="stat.snr" value="2.5">
1421	<PARAM name="ATrous note" datatype="char" ucd="meta.note" value="The a trous reconstruction method was used, with the following parameters.">
1422	<PARAM name="ATrous Dimension" datatype="int" ucd="meta.code;stat" value="3">
1423	<PARAM name="ATrous Cut" datatype="float" ucd="stat.snr" value="4">
1424	<PARAM name="ATrous Minimum Scale" datatype="int" ucd="stat.param" value="1">
1425	<PARAM name="ATrous Filter" datatype="char" ucd="meta.code;stat" value="B3 spline function">
1426	<FIELD name="ID" ID="col1" ucd="meta.id" datatype="int" width="4"/>
1427	<FIELD name="Name" ID="col2" ucd="meta.id;meta.main" datatype="char" arraysize="14"/>
1428	<FIELD name="RA" ID="col3" ucd="pos.eq.ra;meta.main" ref="J2000" datatype="float" width="10" precision="6" unit="deg"/>
1429	<FIELD name="Dec" ID="col4" ucd="pos.eq.dec;meta.main" ref="J2000" datatype="float" width="10" precision="6" unit="deg"/>
1430	<FIELD name="w_RA" ID="col3" ucd="phys.angSize;pos.eq.ra" ref="J2000" datatype="float" width="7" precision="2" unit="arcmin"/>
1431	<FIELD name="w_Dec" ID="col4" ucd="phys.angSize;pos.eq.dec" ref="J2000" datatype="float" width="7" precision="2" unit="arcmin"/>
1432	<FIELD name="Vel" ID="col4" ucd="phys.veloc;src.dopplerVeloc" datatype="float" width="9" precision="3" unit="km/s"/>
1433	<FIELD name="w_Vel" ID="col4" ucd="phys.veloc;src.dopplerVeloc;spect.line.width" datatype="float" width="8" precision="3" unit="km/s"/>
1434	<FIELD name="Integrated_Flux" ID="col4" ucd="phys.flux;spect.line.intensity" datatype="float" width="10" precision="3" unit="km/s"/>
1435	<DATA>
1436	<TABLEDATA>
1437	<TR>
1438	<TD> 1</TD><TD> J0609-2200</TD><TD> 92.410416</TD><TD>-22.013390</TD><TD> 48.50</TD><TD> 39.42</TD><TD> 213.061</TD><TD> 65.957</TD><TD> 17.572</TD>
1439	</TR>
1440	<TR>
1441	<TD> 2</TD><TD> J0608-2605</TD><TD> 92.042633</TD><TD>-26.085157</TD><TD> 44.47</TD><TD> 39.47</TD><TD> 233.119</TD><TD> 39.574</TD><TD> 4.144</TD>
1442	</TR>
1443	<TR>
1444	<TD> 3</TD><TD> J0606-2724</TD><TD> 91.637840</TD><TD>-27.412022</TD><TD> 52.48</TD><TD> 47.57</TD><TD> 302.213</TD><TD> 39.574</TD><TD> 17.066</TD>
1445	</TR>
1446	(... table truncated for clarity ...)
1447	</TABLEDATA>
1448	</DATA>
1449	</TABLE>
1450	</RESOURCE>
1451	</VOTABLE>
1452	\end{verbatim}
1453	}
1454	\end{landscape}
1455
1456	\newpage
1457	\section{Example Karma Annotation File output}
1458	\label{app-karma}
1459
1460	This is the format of the Karma Annotation file, showing the locations
1461	of the detected objects. This can be loaded by the plotting tools of
1462	the Karma package (for instance, \texttt{kvis}) as an overlay on the FITS
1463	file.
1464
1465	\begin{verbatim}
1466	# Duchamp Source Finder results for
1467	# cube /DATA/SITAR_1/whi550/cubes/H201_abcde_luther_chop.fits
1468	COLOR RED
1469	COORD W
1470	CIRCLE 92.3376 -21.9475 0.403992
1471	TEXT 92.3376 -21.9475 1
1472	CIRCLE 91.9676 -26.0193 0.37034
1473	TEXT 91.9676 -26.0193 2
1474	CIRCLE 91.5621 -27.3459 0.437109
1475	TEXT 91.5621 -27.3459 3
1476	CIRCLE 92.8285 -21.6344 0.269914
1477	TEXT 92.8285 -21.6344 4
1478	CIRCLE 90.1381 -28.9838 0.234179
1479	TEXT 90.1381 -28.9838 5
1480	CIRCLE 89.72 -26.6513 0.132743
1481	TEXT 89.72 -26.6513 6
1482	CIRCLE 94.2743 -27.4003 0.195175
1483	TEXT 94.2743 -27.4003 7
1484	CIRCLE 92.2739 -21.6941 0.134538
1485	TEXT 92.2739 -21.6941 8
1486	CIRCLE 89.7133 -25.4259 0.232252
1487	TEXT 89.7133 -25.4259 9
1488	CIRCLE 90.2206 -21.6993 0.266247
1489	TEXT 90.2206 -21.6993 10
1490	CIRCLE 93.8581 -26.5766 0.163153
1491	TEXT 93.8581 -26.5766 11
1492	CIRCLE 91.176 -26.1064 0.234356
1493	TEXT 91.176 -26.1064 12
1494	CIRCLE 90.2844 -23.6716 0.299509
1495	TEXT 90.2844 -23.6716 13
1496	CIRCLE 93.8774 -22.581 0.130925
1497	TEXT 93.8774 -22.581 14
1498	CIRCLE 94.3882 -23.0934 0.137108
1499	TEXT 94.3882 -23.0934 15
1500	CIRCLE 93.0491 -21.8223 0.202928
1501	TEXT 93.0491 -21.8223 16
1502	CIRCLE 94.0685 -21.5603 0.168456
1503	TEXT 94.0685 -21.5603 17
1504	CIRCLE 86.0568 -27.6095 0.101113
1505	TEXT 86.0568 -27.6095 18
1506	CIRCLE 88.7932 -29.9453 0.202624
1507	TEXT 88.7932 -29.9453 19
1508	\end{verbatim}
1509
1510	\newpage
1511	\section{Robust statistics for a Normal distribution}
1512	\label{app-madfm}
1513
1514	The Normal, or Gaussian, distribution for mean $\mu$ and standard
1515	deviation $\sigma$ can be written as
1516	\[
1517	f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}\ e^{-(x-\mu)^2/2\sigma^2}.
1518	\]
1519
1520	When one has a purely Gaussian signal, it is straightforward to
1521	estimate $\sigma$ by calculating the standard deviation (or rms) of
1522	the data. However, if there is a small amount of signal present on top
1523	of Gaussian noise, and one wants to estimate the $\sigma$ for the
1524	noise, the presence of the large values from the signal can bias the
1525	estimator to higher values.
1526
1527	An alternative way is to use the median ($m$) and median absolute deviation
1528	from the median ($s$) to estimate $\mu$ and $\sigma$. The median is the
1529	middle of the distribution, defined for a continuous distribution by
1530	\[
1531	\int_{-\infty}^{m} f(x) \diff x = \int_{m}^{\infty} f(x) \diff x.
1532	\]
1533	From symmetry, we quickly see that for the continuous Normal
1534	distribution, $m=\mu$. We consider the case henceforth of $\mu=0$,
1535	without loss of generality.
1536
1537	To find $s$, we find the distribution of the absolute deviation from
1538	the median, and then find the median of that distribution. This
1539	distribution is given by
1540	\begin{eqnarray*}
1541	g(x) &= &{\mbox{\rm distribution of }} \|x\|\\
1542	&= &f(x) + f(-x),\ x\ge0\\
1543	&= &\sqrt{\frac{2}{\pi\sigma^2}}\, e^{-x^2/2\sigma^2},\ x\ge0.
1544	\end{eqnarray*}
1545	So, the median absolute deviation from the median, $s$, is given by
1546	\[
1547	\int_{0}^{s} g(x) \diff x = \int_{s}^{\infty} g(x) \diff x.
1548	\]
1549	Now, $\int_{0}^{\infty}e^{-x^2/2\sigma^2} \diff x = \sqrt{\pi\sigma^2/2}$, and
1550	so $\int_{s}^{\infty} e^{-x^2/2\sigma^2} \diff x =
1551	\sqrt{\pi\sigma^2/2} - \int_{0}^{s} e^{-\frac{x^2}{2\sigma^2}} \diff x
1552	$. Hence, to find $s$ we simply solve the following equation (setting $\sigma=1$ for
1553	simplicity -- equivalent to stating $x$ and $s$ in units of $\sigma$):
1554	\[
1555	\int_{0}^{s}e^{-x^2/2} \diff x - \sqrt{\pi/8} = 0.
1556	\]
1557	This is hard to solve analytically (no nice analytic solution exists
1558	for the finite integral that I'm aware of), but straightforward to
1559	solve numerically, yielding the value of $s=0.6744888$. Thus, to
1560	estimate $\sigma$ for a Normally distributed data set, one can calculate
1561	$s$, then divide by 0.6744888 (or multiply by 1.4826042) to obtain the
1562	correct estimator.
1563
1564	Note that this is different to solutions quoted elsewhere,
1565	specifically in \citet{meyer04:trunc}, where the same robust estimator
1566	is used but with an incorrect conversion to standard deviation -- they
1567	assume $\sigma = s\sqrt{\pi/2}$. This, in fact, is the conversion used
1568	to convert the \emph{mean} absolute deviation from the mean to the
1569	standard deviation. This means that the cube noise in the \hipass\
1570	catalogue (their parameter Rms$_{\rm cube}$) should be 18\% larger
1571	than quoted.
1572
1573	\section{How Gaussian noise changes with wavelet scale.}
1574	\label{app-scaling}
1575
1576	The key element in the wavelet reconstruction of an array is the
1577	thresholding of the individual wavelet coefficient arrays. This is
1578	usually done by choosing a level to be some number of standard
1579	deviations above the mean value.
1580
1581	However, since the wavelet arrays are produced by convolving the input
1582	array by an increasingly large filter, the pixels in the coefficient
1583	arrays become increasingly correlated as the scale of the filter
1584	increases. This results in the measured standard deviation from a
1585	given coefficient array decreasing with increasing scale. To calculate
1586	this, we need to take into account how many other pixels each pixel in
1587	the convolved array depends on.
1588
1589	To demonstrate, suppose we have a 1-D array with $N$ pixel values
1590	given by $F_i,\ i=1,...,N$, and we convolve it with the B$_3$-spline
1591	filter, defined by the set of coefficients
1592	$\{1/16,1/4,3/8,1/4,1/16\}$. The flux of the $i$th pixel in the
1593	convolved array will be
1594	\[
1595	F'_i = \frac{1}{16}F_{i-2} + \frac{1}{4}F_{i-1} + \frac{3}{8}F_{i}
1596	+ \frac{1}{4}F_{i+1} + \frac{1}{16}F_{i+2}
1597	\]
1598	and the flux of the corresponding pixel in the wavelet array will be
1599	\[
1600	W'_i = F_i - F'_i = \frac{-1}{16}F_{i-2} - \frac{1}{4}F_{i-1} + \frac{5}{8}F_{i}
1601	- \frac{1}{4}F_{i+1} - \frac{1}{16}F_{i+2}
1602	\]
1603	Now, assuming each pixel has the same standard deviation
1604	$\sigma_i=\sigma$, we can work out the standard deviation for the
1605	wavelet array:
1606	\[
1607	\sigma'_i = \sigma \sqrt{\left(\frac{1}{16}\right)^2 + \left(\frac{1}{4}\right)^2
1608	+ \left(\frac{5}{8}\right)^2 + \left(\frac{1}{4}\right)^2 + \left(\frac{1}{16}\right)^2}
1609	= 0.72349\ \sigma
1610	\]
1611	Thus, the first scale wavelet coefficient array will have a standard
1612	deviation of 72.3\% of the input array. This procedure can be followed
1613	to calculate the necessary values for all scales, dimensions and
1614	filters used by \duchamp.
1615
1616	Calculating these values is clearly a critical step in performing the
1617	reconstruction. \citet{starck02:book} did so by simulating data sets
1618	with Gaussian noise, taking the wavelet transform, and measuring the
1619	value of $\sigma$ for each scale. We take a different approach, by
1620	calculating the scaling factors directly from the filter coefficients
1621	by taking the wavelet transform of an array made up of a 1 in the
1622	central pixel and 0s everywhere else. The scaling value is then
1623	derived by taking the square root of the sum (in quadrature) of all
1624	the wavelet coefficient values at each scale. We give the scaling
1625	factors for the three filters available to \duchamp\ on the following
1626	page. These values are hard-coded into \duchamp, so no on-the-fly
1627	calculation of them is necessary.
1628
1629	Memory limitations prevent us from calculating factors for large
1630	scales, particularly for the three-dimensional case (hence the --
1631	symbols in the tables). To calculate factors for
1632	higher scales than those available, we note the following
1633	relationships apply for large scales to a sufficient level of precision:
1634	\begin{itemize}
1635	\item 1-D: factor(scale $i$) = factor(scale $i-1$)$/\sqrt{2}$.
1636	\item 2-D: factor(scale $i$) = factor(scale $i-1$)$/2$.
1637	\item 1-D: factor(scale $i$) = factor(scale $i-1$)$/\sqrt{8}$.
1638	\end{itemize}
1639
1640	\newpage
1641	\begin{itemize}
1642	\item \textbf{B$_3$-Spline Function:} $\{1/16,1/4,3/8,1/4,1/16\}$
1643
1644	\begin{tabular}{llll}
1645	Scale & 1 dimension & 2 dimension & 3 dimension\\ \hline
1646	1 & 0.723489806 & 0.890796310 & 0.956543592\\
1647	2 & 0.285450405 & 0.200663851 & 0.120336499\\
1648	3 & 0.177947535 & 0.0855075048 & 0.0349500154\\
1649	4 & 0.122223156 & 0.0412474444 & 0.0118164242\\
1650	5 & 0.0858113122 & 0.0204249666 & 0.00413233507\\
1651	6 & 0.0605703043 & 0.0101897592 & 0.00145703714\\
1652	7 & 0.0428107206 & 0.00509204670 & 0.000514791120\\
1653	8 & 0.0302684024 & 0.00254566946 & --\\
1654	9 & 0.0214024008 & 0.00127279050 & --\\
1655	10 & 0.0151336781 & 0.000636389722 & --\\
1656	11 & 0.0107011079 & 0.000318194170 & --\\
1657	12 & 0.00756682272 & -- & --\\
1658	13 & 0.00535055108 & -- & --\\
1659	%14 & 0.00378341085 & -- & --\\
1660	%15 & 0.00267527545 & -- & --\\
1661	%16 & 0.00189170541 & -- & --\\
1662	%17 & 0.00133763772 & -- & --\\
1663	%18 & 0.000945852704 & -- & --
1664	\end{tabular}
1665
1666	\item \textbf{Triangle Function:} $\{1/4,1/2,1/4\}$
1667
1668	\begin{tabular}{llll}
1669	Scale & 1 dimension & 2 dimension & 3 dimension\\ \hline
1670	1 & 0.612372436 & 0.800390530 & 0.895954449 \\
1671	2 & 0.330718914 & 0.272878894 & 0.192033014\\
1672	3 & 0.211947812 & 0.119779282 & 0.0576484078\\
1673	4 & 0.145740298 & 0.0577664785 & 0.0194912393\\
1674	5 & 0.102310944 & 0.0286163283 & 0.00681278387\\
1675	6 & 0.0722128185 & 0.0142747506 & 0.00240175885\\
1676	7 & 0.0510388224 & 0.00713319703 & 0.000848538128 \\
1677	8 & 0.0360857673 & 0.00356607618 & 0.000299949455 \\
1678	9 & 0.0255157615 & 0.00178297280 & -- \\
1679	10 & 0.0180422389 & 0.000891478237 & -- \\
1680	11 & 0.0127577667 & 0.000445738098 & -- \\
1681	12 & 0.00902109930 & 0.000222868922 & -- \\
1682	13 & 0.00637887978 & -- & -- \\
1683	%14 & 0.00451054902 & -- & -- \\
1684	%15 & 0.00318942978 & -- & -- \\
1685	%16 & 0.00225527449 & -- & -- \\
1686	%17 & 0.00159471988 & -- & -- \\
1687	%18 & 0.000112763724 & -- & --
1688
1689	\end{tabular}
1690
1691	\item \textbf{Haar Wavelet:} $\{0,1/2,1/2\}$
1692
1693	\begin{tabular}{llll}
1694	Scale & 1 dimension & 2 dimension & 3 dimension\\ \hline
1695	1 & 0.707167810 & 0.433012702 & 0.935414347 \\
1696	2 & 0.500000000 & 0.216506351 & 0.330718914\\
1697	3 & 0.353553391 & 0.108253175 & 0.116926793\\
1698	4 & 0.250000000 & 0.0541265877 & 0.0413398642\\
1699	5 & 0.176776695 & 0.0270632939 & 0.0146158492\\
1700	6 & 0.125000000 & 0.0135316469 & 0.00516748303
1701
1702	\end{tabular}
1703
1704
1705	\end{itemize}
1706
1707	\end{document}

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: tags/release-1.0.2/docs/Guide.tex @ 1112

Download in other formats: