Context Navigation

Guide.tex @ 1441

Visit:

Last change on this file since 1441 was 101, checked in by Matthew Whiting, 18 years ago
Updated version of the documentation and images to go with it. Close to finalised for v1.0 -- only README/Install section not updated.
File size: 82.0 KB

Line
1	\documentclass[12pt,a4paper]{article}
2
3	%%%%%% LINE SPACING %%%%%%%%%%%%
4	\usepackage{setspace}
5	\singlespacing
6	%\onehalfspacing
7	%\doublespacing
8
9	%Define a test for doing PDF format -- use different code below
10	\newif\ifPDF
11	\ifx\pdfoutput\undefined\PDFfalse
12	\else\ifnum\pdfoutput > 0\PDFtrue
13	\else\PDFfalse
14	\fi
15	\fi
16
17	\textwidth=161 mm
18	\textheight=245 mm
19	\topmargin=-15 mm
20	\oddsidemargin=0 mm
21	\parindent=6 mm
22
23	\usepackage[sort]{natbib}
24	\usepackage{lscape}
25	\bibpunct[,]{(}{)}{;}{a}{}{,}
26
27	\newcommand{\eg}{e.g.\ }
28	\newcommand{\ie}{i.e.\ }
29	\newcommand{\hi}{H{\sc i}}
30	\newcommand{\hipass}{{\sc hipass}}
31	\newcommand{\duchamp}{\emph{Duchamp}}
32	\newcommand{\atrous}{\textit{{\`a} trous}}
33	\newcommand{\Atrous}{\textit{{\`A} trous}}
34	\newcommand{\diff}{{\rm d}}
35	\newcommand{\entrylabel}[1]{\mbox{\textsf{\bf{#1:}}}\hfil}
36	\newenvironment{entry}
37	{\begin{list}{}%
38	{\renewcommand{\makelabel}{\entrylabel}%
39	\setlength{\labelwidth}{30mm}%
40	\setlength{\labelsep}{5pt}%
41	\setlength{\itemsep}{2pt}%
42	\setlength{\parsep}{2pt}%
43	\setlength{\leftmargin}{35mm}%
44	}%
45	}%
46	{\end{list}}
47
48
49	\title{A Guide to the \duchamp\ Source Finding Software}
50	\author{Matthew Whiting\\
51	%{\small \href{mailto:Matthew.Whiting@csiro.au}{Matthew.Whiting@csiro.au}}\\
52	Australia Telescope National Facility\\CSIRO}
53	%\date{January 2006}
54	\date{}
55
56	% If we are creating a PDF, use different options for graphicx, hyperref.
57	\ifPDF
58	\usepackage[pdftex]{graphicx,color}
59	\usepackage[pdftex]{hyperref}
60	\hypersetup{colorlinks=true,%
61	citecolor=red,%
62	filecolor=red,%
63	linkcolor=red,%
64	urlcolor=red,%
65	}
66	\else
67	\usepackage[dvips]{graphicx}
68	\usepackage[dvips]{hyperref}
69	\fi
70
71	\pagestyle{headings}
72	\begin{document}
73
74	\maketitle
75	\thispagestyle{empty}
76	\begin{figure}[!h]
77	\begin{center}
78	\includegraphics[width=\textwidth]{cover_image}
79	\end{center}
80	\end{figure}
81
82	\newpage
83	\tableofcontents
84
85	\newpage
86	\section{Introduction and getting going quickly}
87
88	This document gives details on the program \duchamp: how to use it and
89	what it does. This has been designed to provide a source-detection
90	facility for spectral-line data cubes. The basic execution of
91	\duchamp\ is to read in a FITS data cube, find sources in the cube,
92	and produce a text file of positions, velocities and fluxes of the
93	detections, as well as a postscript file of the spectra of each
94	detection.
95
96	So, you have a FITS cube, and you want to find the sources in it. What
97	do you do? The first step is to make an input file that contains the
98	list of parameters. Brief and detailed examples are shown in
99	Appendix~\ref{app-input}. This provides the input file name, the various
100	output files, and defines various parameters that control the
101	execution.
102
103	The standard way to run \duchamp\ is by the command
104	\begin{quote}
105	\texttt{Duchamp -p [parameter file]}
106	\end{quote}
107	replacing \texttt{[parameter file]} with the name of the file you have
108	just created/copied. Alternatively, you can use the syntax
109	\begin{quote}
110	\texttt{Duchamp -f [FITS file]}
111	\end{quote}
112	where \texttt{[FITS file]} is the file you wish to search. In the latter
113	case, all parameters will take their default values detailed in
114	Appendix~\ref{app-param}. In either case, the program will then work
115	away and give you the list of detections and their spectra. The
116	program execution is summarised below, and detailed in
117	\S\ref{sec-flow}. Information on inputs is in \S\ref{sec-param} and
118	Appendix~\ref{app-param}, and descriptions of the output is in
119	\S\ref{sec-output}.
120
121	\subsection{A summary of the execution steps}
122
123	The basic flow of the program is summarised here -- all steps are
124	discussed in more detail in the following sections.
125	\begin{enumerate}
126	\item If the \texttt{-p} option is used, the parameter file given on
127	the command line is read in, and the parameters absorbed.
128	\item The FITS image is located and read in to memory.
129	\item If requested, a FITS image with a previously reconstructed array
130	is read in.
131	\item If requested, blank pixels are trimmed from the edges, and
132	the baseline of each spectrum is removed.
133	\item If the reconstruction method is requested, and the reconstructed
134	array has not been read in at Step 3 above, the cube is
135	reconstructed using the \atrous\ wavelet method.
136	\item Searching for objects then takes place, using the requested
137	thresholding method.
138	\item The list of objects is condensed by merging neighbouring objects
139	and removing those deemed unacceptable.
140	\item The baselines and trimmed pixels are replaced prior to output.
141	\item The details of the detections are written to screen and to the
142	requested output file.
143	\item Maps showing the spatial location of the detections are written.
144	\item The integrated spectra of each detection are written to a
145	postscript file.
146	\item If requested, the reconstructed array can be written to a new
147	FITS file.
148	\end{enumerate}
149
150	\subsection{Guide to terminology}
151
152	First, a brief note on the use of terminology in this guide. \duchamp\
153	is designed to work on FITS ``cubes''. These are FITS\footnote{FITS is
154	the Flexible Image Transport System -- see \citet{hanisch01} or
155	websites such as
156	\href{http://fits.cv.nrao.edu/FITS.html}{http://fits.cv.nrao.edu/FITS.html}
157	for details.} image arrays with three dimensions -- they are assumed
158	to have the following form: the first two dimensions (referred to as
159	$x$ and $y$) are spatial directions (that is, relating to the position
160	on the sky), while the third dimension, $z$, is the spectral
161	direction, which can correspond to frequency, wavelength, or
162	velocity. The three dimensional analogue of pixels are ``voxels'', or
163	volume cells -- a voxel is defined by a unique $(x,y,z)$ location and
164	has a unique flux or intensity value associated with it.
165
166	Each spatial pixel (a given $(x,y)$ coordinate) can be said to be a
167	single spectrum, while a slice through the cube perpendicular to the
168	spectral direction at a given $z$-value is a single channel (the 2-D
169	image is a channel map).
170
171	Detection involves locating a contiguous group of voxels with fluxes
172	above a certain threshold. \duchamp\ makes no assumptions as to the
173	size or shape of the detected features, other than having
174	user-selected minimum size criteria.
175
176	Features that are detected are assumed to be positive. The user can
177	choose to search for negative features by setting an input parameter
178	-- this inverts the cube prior to the search (see
179	\S\ref{sec-detection} for details).
180
181	Note that it is possible to run \duchamp\ on a two-dimensional image
182	(\ie one with no frequency or velocity information), or indeed a
183	one-dimensional array, and many of the features of the program will
184	work fine. The focus, however, is on object detection in three
185	dimensions.
186
187	\subsection{Why \duchamp?}
188
189	Well, it's important for a program to have a name, and the initial
190	working title of \emph{cubefind} was somewhat uninspiring. I wanted to
191	avoid the classic astronomical approach of designing a cute acronym,
192	and since it is designed to work on cubes, I looked at naming it after
193	a cubist. \emph{Picasso}, sadly, was already taken \citep{minchin99},
194	so I settled on naming it after Marcel Duchamp, another cubist, but
195	also one of the first artists to work with ``found objects''.
196
197	\section{User Inputs}
198	\label{sec-param}
199
200	Input to the program is provided by means of a parameter
201	file. Parameters are listed in the file, followed by the value that
202	should be assigned to them. The syntax used is \texttt{paramName
203	value}. Parameter names are not case-sensitive, and lines in the input
204	file that start with \texttt{\#} are ignored. If a parameter is listed
205	more than once, the latter value is used, but otherwise the order in
206	which the parameters are listed in the input file is arbitrary.
207
208	If a parameter is not listed, the default value is assumed. The
209	defaults are chosen to provide a good result (using the reconstruction
210	method), so the user doesn't need to specify many new parameters in
211	the input file. Note that the image file \textbf{must} be specified! The
212	parameters that can be set are listed in Appendix~\ref{app-param},
213	with their default values in parentheses.
214
215	The 'flag' parameters are stored as \texttt{bool} variables, and so are
216	either \texttt{true = 1} or \texttt{false = 0}. \duchamp\ will only
217	read them from the file as integers, and so they should be entered in
218	the file as 0 or 1 (see example file in Appendix~\ref{app-input}).
219
220	\section{What \duchamp\ is doing}
221	\label{sec-flow}
222
223	The execution flow of \duchamp\ is detailed here, indicating the
224	main algorithmic steps that are used. The program is written in C/C++
225	and makes use of the {\sc cfitsio}, {\sc wcslib} and {\sc pgplot}
226	libraries.
227
228	%\subsection{Parameter input}
229	%
230	%The user provides parameters that govern the selection of files and
231	%the parameters used by the various subroutines in the program. This is
232	%done via a parameter file, and the parameters are stored in a C++
233	%class for use throughout the program. The form of the parameter file is
234	%discussed in \S\ref{sec-param}, and the parameters themselves are
235	%listed in Appendix~\ref{app-param}.
236
237	\subsection{Image input}
238
239	The cube is read in using basic {\sc cfitsio} commands, and stored as
240	an array in a special C++ class. This class keeps track of
241	the list of detected objects, as well as any reconstructed arrays that
242	are made (see \S\ref{sec-recon}). The World Coordinate System (WCS)
243	information for the cube is also obtained from the FITS header by {\sc
244	wcslib} functions \citep{greisen02, calabretta02}, and this
245	information, in the form of a \texttt{wcsprm} structure, is also stored
246	in the same class.
247
248	A sub-section of an image can be requested via the \texttt{subsection}
249	parameter in the parameter file -- this can be a good idea if the cube
250	has very noisy edges, which may produce many spurious detections. The
251	generalised form of the subsection that is used by {\sc cfitsio} is
252	\texttt{[x1:x2:dx,y1:y2:dy,z1:z2:dz]}, such that the x-coordinates run
253	from \texttt{x1} to \texttt{x2} (inclusive), with steps of
254	\texttt{dx}. The step value can be omitted (so a subsection of the
255	form \texttt{[2:50,2:50,10:1000]} is still valid). \duchamp\ does not
256	at this stage deal with the presence of steps in the subsection
257	string, and any that are present are removed before the file is
258	opened.
259
260	If one wants the full range of a coordinate then replace the range
261	with an asterisk, \eg \texttt{[2:50,2:50,*]}. If one wants to use a
262	subsection, one must set \texttt{flagSubsection = 1}. A complete
263	description of the section syntax can be found at the {\sc fitsio} web
264	site
265	\footnote{
266	\href{http://heasarc.gsfc.nasa.gov/docs/software/fitsio/c/c\_user/node90.html}%
267	{http://heasarc.gsfc.nasa.gov/docs/software/fitsio/c/c\_user/node90.html}}.
268
269	\subsection{Image modification}
270	\label{sec-modify}
271
272	Several modifications to the cube can be made that improve the
273	execution and efficiency of \duchamp\ (these are optional -- their
274	use is indicated by the relevant flags set in the input parameter
275	file).
276
277	\subsubsection{Blank pixel removal}
278
279	First, the cube is trimmed of any BLANK pixels that pad the image out
280	to a rectangular shape. This is optional, its use determined by the
281	\texttt{flagBlankPix} parameter. The value for these pixels is read from
282	the FITS header (using the BLANK, BSCALE and BZERO keywords), but if
283	these are not present then the value can be specified by the user in
284	the parameter file using \texttt{blankPixValue}.
285
286	This stage is particularly important for the reconstruction step, as
287	lots of BLANK pixels on the edges will smooth out features in the
288	wavelet calculation stage. The trimming will also reduce the size of
289	the cube's array, speeding up the execution. The amount of trimming is
290	recorded, and these pixels are added back in once the source-detection
291	is completed (so that quoted pixel positions are applicable to the
292	original cube).
293
294	Rows and columns are trimmed one at a time until the first non-BLANK
295	pixel is reached, so that the image remains rectangular. In practice,
296	this means that there will be BLANK pixels left in the trimmed image
297	(if the non-BLANK region is non-rectangular). However, these are
298	ignored in all further calculations done on the cube.
299
300	\subsubsection{Baseline removal}
301
302	Second, the user may request the removal of baselines from the
303	spectra, via the parameter \texttt{flagBaseline}. This may be necessary
304	if there is a strong baseline ripple present, which can result in
305	spurious detections on the high points of the ripple. The baseline is
306	calculated from a wavelet reconstruction procedure (see
307	\S\ref{sec-recon}) that keeps only the two largest scales. This is
308	done separately for each spatial pixel (\ie for each spectrum in the
309	cube), and the baselines are stored and added back in before any
310	output is done. In this way the quoted fluxes and displayed spectra
311	are as one would see from the input cube itself -- even though the
312	detection (and reconstruction if applicable) is done on the
313	baseline-removed cube.
314
315	The presence of very strong signals (for instance, masers at several
316	hundred Jy) can affect the determination of the baseline, leading to a
317	large dip centred on the signal in the baseline-subtracted
318	spectrum. To prevent this, the signal is trimmed prior to the
319	reconstruction process at some standard threshold (at $8\sigma$ above
320	the mean). The baseline determined should thus be representative of
321	the true, signal-free baseline. Note that this trimming is only a
322	temporary measure which does not affect the source-detection.
323
324	\subsubsection{Ignoring bright Milky Way emission}
325
326	Also, a single set of contiguous channels can be ignored -- these may
327	exhibit very strong emission, such as that from the Milky Way as seen
328	in extragalactic \hi\ cubes (hence the references to ``Milky Way'' in
329	relation to this task -- apologies to Galactic astronomers!). Such
330	dominant channels will both produce many unnecessary, uninteresting
331	and large (in size and hence in memory usage) detections, and so will
332	slow the program down and detract from the interesting detections. The
333	use of this feature is controlled by the \texttt{flagMW} parameter, and
334	the exact channels concerned are able to be set by the user (using
335	\texttt{maxMW} and \texttt{minMW} -- these give an inclusive range of
336	channels). When employed, these channels are temporarily blanked out
337	for the searching, and the scaling of the spectral output (see
338	Fig.~\ref{fig-spect}) will not take them into account. They will be
339	present in the reconstructed array, however, and so will be included
340	in the saved FITS file (see \S\ref{sec-reconIO}). When the final
341	spectra are plotted, the range of channels covered by these parameters
342	is indicated by a green hashed box.
343
344	\subsection{Image reconstruction}
345	\label{sec-recon}
346
347	This is an optional step, but one that greatly enhances the
348	source-detection process. The user can direct \duchamp\ to reconstruct
349	the data cube using the \atrous\ wavelet procedure. A good
350	description of the procedure can be found in
351	\citet{starck02:book}. The reconstruction is an effective way of
352	removing a lot of the noise in the image, allowing one to search
353	reliably to fainter levels, and reducing the number of spurious
354	detections. The payoff is that it can be relatively time- and
355	memory-intensive.
356
357	\subsubsection{Algorithm}
358
359	The steps in the \atrous\ reconstruction are as follows:
360	\begin{enumerate}
361	\item Set the reconstructed array to 0 everywhere.
362	\item The input array is discretely convolved with a given filter
363	function. This is determined from the parameter file via the
364	\texttt{filterCode} parameter -- see Appendix~\ref{app-param} for
365	details on the filters available.
366	\item The wavelet coefficients are calculated by taking the difference
367	between the convolved array and the input array.
368	\item If the wavelet coefficients at a given point are above the
369	requested threshold (given by \texttt{snrRecon} as the number of
370	$\sigma$ above the mean and adjusted to the current scale -- see
371	Appendix~\ref{app-scaling}), add these to the reconstructed array.
372	\item The separation of the filter coefficients is doubled.
373	\item The procedure is repeated from step 2, using the convolved array
374	as the input array.
375	\item Continue until the required maximum number of scales is reached.
376	\item Add the final smoothed (\ie convolved) array to the
377	reconstructed array. This provides the ``DC offset'', as each of the
378	wavelet coefficient arrays will have zero mean.
379	\end{enumerate}
380
381	The reconstruction has at least two iterations. The first iteration
382	makes a first pass at the wavelet reconstruction (the process outlined
383	in the 8 stages above), but the residual array will inevitably have
384	some structure still in it, so the wavelet filtering is done on the
385	residual, and any significant wavelet terms are added to the final
386	reconstruction. This step is repeated until the change in the $\sigma$
387	of the background is less than some fiducial amount.
388
389	It is important to note that the \atrous\ decomposition is an
390	example of a ``redundant'' transformation. If no thresholding is
391	performed, the sum of all the wavelet coefficient arrays and the final
392	smoothed array is identical to the input array. The thresholding thus
393	removes only the unwanted structure in the array.
394
395	Note that any BLANK pixels that are still in the cube will not be
396	altered by the reconstruction -- they will be left as BLANK so that
397	the shape of the valid part of the cube is preserved.
398
399	\subsubsection{Statistics}
400
401	The correct calculation of the reconstructed array needs good
402	estimation of the underlying mean and standard deviation of the
403	background noise distribution. These statistics are estimated using
404	robust methods, to avoid corruption by strong outlying points. The
405	mean of the distribution is actually estimated by the median, while
406	the median absolute deviation from the median (MADFM) is calculated
407	and corrected assuming Gaussianity to estimate the underlying standard
408	deviation $\sigma$. The Gaussianity (or Normality) assumption is
409	critical, as the MADFM does not give the same value as the usual rms
410	or standard deviation value -- for a normal distribution
411	$N(\mu,\sigma)$ we find MADFM$=0.6744888\sigma$. The difference
412	between the MADFM and $\sigma$ is corrected for, so the user need only
413	think in the usual multiples of $\sigma$ when setting
414	\texttt{snrRecon}. See Appendix~\ref{app-madfm} for a derivation of
415	this value.
416
417	When thresholding the different wavelet scales, the value of $\sigma$
418	as measured from the wavelet array needs to be scaled to account for the
419	increased amount of correlation between neighbouring pixels (due to
420	the convolution). See Appendix~\ref{app-scaling} for details on this
421	scaling.
422
423	\subsubsection{User control of reconstruction parameters}
424
425	The most important parameter for the user to select in relation to the
426	reconstruction is the threshold for each wavelet array. This is set
427	using the \texttt{snrRecon} parameter, and is given as a multiple of the
428	rms (estimated by the MADFM) above the mean (which for the wavelet
429	arrays should be approximately zero). There are several other
430	parameters that can be altered as well that affect the outcome of the
431	reconstruction.
432
433	By default, the cube is reconstructed in three dimensions, using a
434	3-dimensional filter and 3-dimensional convolution. This can be
435	altered, however, using the parameter \texttt{reconDim}. If set to 1,
436	this means the cube is reconstructed by considering each spectrum
437	separately, whereas \texttt{reconDim=2} will mean the cube is
438	reconstructed by doing each channel map separately. The merits of
439	these choices are discussed in \S\ref{sec-notes}, but it should be
440	noted that a 2-dimensional reconstruction can be susceptible to edge
441	effects if the spatial shape is not rectangular.
442
443	The user can also select the minimum scale to be used in the
444	reconstruction -- the first scale exhibits the highest frequency
445	variations, and so ignoring this one can sometimes be beneficial in
446	removing excess noise. The default, however, is to use all scales
447	(\texttt{minscale = 1}).
448
449	Finally, the filter that is used for the convolution can be selected
450	by using \texttt{filterCode} and the relevant code number -- the choices
451	are listed in Appendix~\ref{app-param}. When multi-dimensional
452	reconstruction is selected, this filter is used to construct a 2- or
453	3-dimensional equivalent.
454
455	\subsection{Reconstruction I/O}
456	\label{sec-reconIO}
457
458	The reconstruction stage can be relatively time-consuming, particularly
459	for large cubes and reconstructions in 3-D. To get around this, \duchamp\
460	provides a shortcut to allow users to perform multiple searches (\eg with
461	different thresholds) on the same reconstruction without calculating the
462	reconstruction each time.
463
464	The first step is to choose to save the reconstructed array as a FITS
465	file by setting \texttt{flagOutputRecon = true}. The file will be saved
466	in the same directory as the input image, so the user needs to have write
467	permissions for that directory.
468
469	The filename will be derived from the input filename, with extra information
470	detailing the reconstruction that has been done. For example, suppose
471	\texttt{image.fits} has been reconstructed using a 3-dimensional
472	reconstruction with filter 2, thresholded at $4\sigma$ using all scales. The
473	output filename will then be \texttt{image.RECON-3-2-4-1.fits} (\ie it uses
474	the four parameters relevant for the \atrous\ reconstruction as listed in
475	Appendix~\ref{app-param}). The new FITS file will also have these
476	parameters as header keywords.
477
478	Likewise, the residual image, defined as the difference between the input
479	and reconstructed arrays, can also be saved in the same manner by setting
480	\texttt{flagOutputResid = true}. Its filename will be the same as above,
481	with RESID replacing RECON.
482
483	If a reconstructed image has been saved, it can be read in and used
484	instead of redoing the reconstruction. To do so, the user should set
485	\texttt{flagReconExists = true}. The user can indicate the name of the
486	reconstructed FITS file using the \texttt{reconFile} parameter, or, if
487	this is not specified, \duchamp\ searches for the file with the name
488	as defined above. If the file is not found, the reconstruction is
489	performed as normal. Note that to do this, the user needs to set
490	\texttt{flagAtrous = true} (obviously, if this is \texttt{false}, the
491	reconstruction is not needed).
492
493	\subsection{Searching the image}
494	\label{sec-detection}
495
496	The image is searched for detections in two ways: spectrally (a
497	1-dimensional search in the spectrum in each spatial pixel), and
498	spatially (a 2-dimensional search in the spatial image in each
499	channel). In both cases, the algorithm finds connected pixels that are
500	above the user-specified threshold. In the case of the spatial image
501	search, the algorithm of \citet{lutz80} is used to raster scan through
502	the image and connect groups of pixels on neighbouring rows.
503
504	Note that this algorithm cannot be applied directly to a 3-dimensional
505	case, as it requires that objects are completely nested in a row: that
506	is, if you are scanning along a row, and one object finishes and
507	another starts, you know that you will not get back to the first one
508	(if at all) until the second is completely finished for that
509	row. Three-dimensional data does not have this property, which is why
510	we break up the searching into 1- and 2-dimensional cases.
511
512	The determination of the threshold is done in one of two ways. The
513	first way is a simple sigma-clipping, where a threshold is set at a
514	fixed number $n$ of standard deviations above the mean, and pixels
515	above this threshold are flagged as detected. The value of $n$ is set
516	with the parameter \texttt{snrCut}. As before, the value of the
517	standard deviation is estimated by the MADFM, and corrected by the
518	ratio derived in Appendix~\ref{app-madfm}.
519
520	The second method uses the False Discovery Rate (FDR) technique
521	\citep{miller01,hopkins02}, whose basis we briefly detail here. The
522	false discovery rate (given by the number of false detections divided
523	by the total number of detections) is fixed at a certain value
524	$\alpha$ (\eg $\alpha=0.05$ implies 5\% of detections are false
525	positives). In practice, an $\alpha$ value is chosen, and the ensemble
526	average FDR (\ie $\langle FDR \rangle$) when the method is used will
527	be less than $\alpha$. One calculates $p$ -- the probability,
528	assuming the null hypothesis is true, of obtaining a test statistic as
529	extreme as the pixel value (the observed test statistic) -- for each
530	pixel, and sorts them in increasing order. One then calculates $d$
531	where
532	\[
533	d = \max_j \left\{ j : P_j < \frac{j\alpha}{c_N N} \right\},
534	\]
535	and then rejects all hypotheses whose $p$-values are less than or equal
536	to $P_d$. (So a $P_i<P_d$ will be rejected even if $P_i \geq
537	j\alpha/c_N N$.) Note that ``reject hypothesis'' here means ``accept
538	the pixel as an object pixel'' (\ie we are rejecting the null
539	hypothesis that the pixel belongs to the background).
540
541	The $c_N$ values here are normalisation constants that depend on the
542	correlated nature of the pixel values. If all the pixels are
543	uncorrelated, then $c_N=1$. If $N$ pixels are correlated, then their
544	tests will be dependent on each other, and so $c_N = \sum_{i=1}^N
545	i^{-1}$. \citet{hopkins02} consider real radio data, where the pixels
546	are correlated over the beam. In this case the sum is made over the
547	$N$ pixels that make up the beam. The value of $N$ is calculated from
548	the FITS header (if the correct keywords -- BMAJ, BMIN -- are not
549	present, a default value of 10 pixels is assumed).
550
551	The theory behind the FDR method implies a direct connection between the
552	choice of $\alpha$ and the fraction of detections that will be false
553	positives. However, due to the merging process, this direct connection is
554	lost when looking at the final number of detections -- see discussion in
555	\S\ref{sec-notes}. The effect is that the number of false detections will
556	be less than indicated by the $\alpha$ value used.
557
558	If a reconstruction has been made, the residuals (defined as original
559	$-$ reconstruction) are used to estimate the noise parameters of the
560	cube. Otherwise they are estimated directly from the cube itself. In
561	both cases, robust estimators are used as described above.
562
563	Detections must have a minimum number of pixels to be counted. This
564	minimum number is given by the input parameters \texttt{minPix} (for
565	2-dimensional searches) and \texttt{minChannels} (for 1-dimensional
566	searches).
567
568	The search only looks for positive features. If one is interested
569	instead in negative features (such as absorption lines), set the
570	parameter \texttt{flagNegative = true}. This will invert the cube (\ie
571	multiply all pixels by $-1$) prior to the search, and then re-invert
572	the cube (and the fluxes of any detections) after searching is
573	complete. All outputs are done in the same manner as normal, so that
574	fluxes of detections will be negative.
575
576	\subsection{Merging detected objects}
577	\label{sec-merger}
578
579	The searching step produces a list of detected objects that will have many
580	repeated detections of a given object -- for instance, spectral
581	detections in adjacent pixels of the same object and/or spatial
582	detections in neighbouring channels. These are then combined in an
583	algorithm that matches all objects judged to be ``close''. This
584	determination is made in one of two ways.
585
586	One way is to define two thresholds -- one spatial and one in velocity
587	-- and say that two objects should be merged if there is at least one
588	pair of pixels that lie within these threshold distances of each
589	other. These thresholds are specified by the parameters
590	\texttt{threshSpatial} and \texttt{threshVelocity} (in units of pixels
591	and channels respectively).
592
593	Alternatively, the spatial requirement can be changed to say that
594	there must be a pair of pixels that are \emph{adjacent} -- a stricter,
595	but perhaps more realistic requirement, particularly when the spatial pixels
596	have a large angular size (as is the case for \hi\ surveys). This
597	method can be selected by setting the parameter
598	\texttt{flagAdjacent} to 1 (\ie \texttt{true}) in the parameter file. The
599	velocity thresholding is done in the same way as the first option.
600
601	Once the detections have been merged, they may be ``grown''. This is a
602	process of increasing the size of the detection by adding adjacent
603	pixels that are above some secondary threshold. This threshold is
604	lower than the one used for the initial detection, but above the noise
605	level, so that faint pixels are only detected when they are close to a
606	bright pixel. The value of this threshold is a possible input
607	parameter (\texttt{growthCut}), with a default value of $1.5\sigma$. The
608	use of the growth algorithm is controlled by the \texttt{flagGrowth}
609	parameter -- the default value of which is \texttt{false}. If the
610	detections are grown, they are sent through the merging algorithm a
611	second time, to pick up any detections that now overlap or have grown
612	over each other.
613
614	Finally, to be accepted, the detections must span \emph{both} a minimum
615	number of channels (to remove any spurious single-channel spikes that
616	may be present), and a minimum number of spatial pixels. These
617	numbers, as for the original detection step, are set with the
618	\texttt{minChannels} and \texttt{minPix} parameters. The channel
619	requirement means there must be at least one set of \texttt{minChannels}
620	consecutive channels in the source for it to be accepted.
621
622	\section{Outputs}
623	\label{sec-output}
624
625	\subsection{During execution}
626
627	\duchamp\ provides the user with feedback whilst it is running, to
628	keep the user informed on the progress of the analysis. Most of this
629	consists of self-explanatory messages about the particular stage the
630	program is up to. The relevant parameters are printed to the screen at
631	the start (once the file has been successfully read in), so the user
632	is able to make a quick check that the setup is correct (see
633	Appendix~{app-input} for an example).
634
635	If the cube is being trimmed (\S\ref{sec-modify}), the resulting
636	dimensions are printed to indicate how much has been trimmed. If a
637	reconstruction is being done, a continually updating message shows
638	either the current iteration and scale, compared to the maximum scale
639	(when \texttt{reconDim=3}), or a progress bar showing the amount of
640	the cube that has been reconstructed (for smaller values of
641	\texttt{reconDim}).
642
643	During the searching algorithms, the progress through the 1D and 2D
644	searches are shown. When the searches have completed,
645	the number of objects found in both the 1D and 2D searches are
646	reported (see \S\ref{sec-detection} for details).
647
648	In the merging process (where multiple detections of the same object
649	are combined -- see \S\ref{sec-merger}), two stages of output
650	occur. The first is when each object in the list is compared with all
651	others. The output shows two numbers: the first being how far through
652	the list the current object is, and the second being the length of the
653	list. As the algorithm proceeds, the first number should increase and
654	the second should decrease (as objects are combined). When the numbers
655	meet (\ie the whole list has been compared), the second phase begins,
656	in which multiply-appearing pixels in each object are removed, as are
657	objects not meeting the minimum channels requirement. During this
658	phase, the total number of accepted objects is shown, which should
659	steadily increase until all have been accepted or rejected. Note that
660	these steps can be very quick for small numbers of detections.
661
662	Since this continual printing to screen has some overhead of time and
663	CPU involved, the user can elect to not print this information by
664	setting the parameter \texttt{verbose = 0}. In this case, the user is
665	still informed as to the steps being undertaken, but the details of
666	the progress are not shown.
667
668	\subsection{Results}
669
670	\subsubsection{Table of Results}
671
672	Finally, we get to the results -- the reason for running \duchamp\ in
673	the first place. Once the detection list is finalised, it is sorted by
674	the mean velocity of the detections (or, if there is no good WCS
675	associated with the cube, by the mean Z-pixel position). The results
676	are then printed to the screen and to the output file, given by the
677	\texttt{OutFile} parameter. The results list, an example of which can be
678	seen in Appendix~\ref{app-output}, contains the following columns
679	(note that the title of the columns depending on WCS information will
680	depend on the projection of the WCS):
681
682	\begin{entry}
683	\item[Obj\#] The ID number of the detection (simply the sequential
684	count for the list, which is ordered by increasing velocity).
685	\item[Name] The IAU-format name of the detection (derived from the WCS
686	projection).
687	\item[X] The average X-pixel position.
688	\item[Y] The average Y-pixel position.
689	\item[Z] The average Z-pixel position.
690	\item[RA/GLON] The Right Ascension or Galactic Longitude of the centre
691	of the object.
692	\item[DEC/GLAT] The Declination or Galactic Latitude of the centre of
693	the object.
694	\item[VEL] The mean velocity of the object [units given by the
695	\texttt{spectralUnits} parameter].
696	\item[w\_RA/w\_GLON] The width of the object in Right Ascension or
697	Galactic Longitude [arcmin].
698	\item[w\_DEC/w\_GLAT] The width of the object in Declination Galactic
699	Latitude [arcmin].
700	\item[w\_VEL] The full velocity width of the detection (max channel
701	$-$ min channel, in velocity units [see note below]).
702	\item[F\_int] The integrated flux over the object, in the units of
703	flux times velocity, corrected for the beam if necessary.
704	\item[F\_peak] The peak flux over the object, in the units of flux.
705	\item[X1, X2] The minimum and maximum X-pixel coordinates.
706	\item[Y1, Y2] The minimum and maximum Y-pixel coordinates.
707	\item[Z1, Z2] The minimum and maximum Z-pixel coordinates.
708	\item[Npix] The number of voxels (\ie distinct $(x,y,z)$ coordinates)
709	in the detection.
710	\item[Flag] Whether the detection has any warning flags (see below).
711	\end{entry}
712	The Name is derived from the WCS position. For instance, a source
713	centred on the RA,Dec position 12$^h$53$^m$45$^s$,
714	-36$^\circ$24$'$12$''$ will be called J1253$-$3624 (if the epoch is
715	J2000) or B1253$-$3624 (if B1950). An alternative form is used for
716	Galactic coordinates: a source centred on the position ($l$,$b$) =
717	(323.1245, 5.4567) will be called G323.12$+$05.45. If the WCS is not
718	valid (\ie is not present or does not have all the necessary
719	information), the Name, RA, DEC, VEL and related columns are not
720	printed, but the pixel coordinates are still provided.
721
722	The velocity units can be specified by the user, using the parameter
723	\texttt{spectralUnits} (enter it as a single string). The default value
724	is km/s, which should be suitable for most users. These units are also
725	used to give the units of integrated flux.
726
727	The last column contains any warning flags about the detection. There
728	are currently two options here. An `E' is printed if the detection is
729	next to the edge of the image, meaning either the limit of the pixels,
730	or the limit of the non-BLANK pixel region. An `N' is printed if the
731	total flux, summed over all the (non-BLANK) pixels in the smallest box
732	that completely encloses the detection, is negative. Note that this
733	sum is likely to include non-detected pixels. It is of use in
734	pointing out detections that lie next to strongly negative pixels,
735	such as might arise due to interference -- the detected pixels might
736	then also be due to the interference, so caution is advised.
737
738	\subsubsection{Other results lists}
739
740	Two alternative results files can also be requested. One option is a
741	VOTable-format XML file, containing just the RA, Dec, Velocity and the
742	corresponding widths of the detections, as well as the fluxes. The
743	user should set \texttt{flagVOT = 1}, and put the desired filename in the
744	parameter \texttt{votFile} -- note that the default is for it not to be
745	produced. This file should be compatible with all Virtual Observatory
746	tools (such as Aladin\footnote{ Aladin can be found on the web at
747	\href{http://aladin.u-strasbg.fr/}{http://aladin.u-strasbg.fr/}}). The
748	second option is an annotation file for use with the Karma toolkit of
749	visualisation tools (in particular, with \texttt{kvis}). This will draw a
750	circle at the position of each detection, and number it according to
751	the Obj\# given above. To make use of this option, the user should
752	set \texttt{flagKarma = 1}, and put the desired filename in the parameter
753	\texttt{karmaFile} -- again, the default is for it not to be produced.
754
755	As the program is running, it also (optionally) records the detections
756	made in each individual spectrum or channel (see
757	\S\ref{sec-detection} for details on this process). This is
758	recorded in the file given by the parameter \texttt{LogFile}. This file
759	does not include the columns \texttt{Name, RA, DEC, w\_RA, w\_DEC, VEL,
760	w\_VEL}. This file is designed primarily for diagnostic purposes: \eg
761	to see if a given set of pixels is detected in, say, one channel
762	image, but does not survive the merging process. The list of pixels
763	(and their fluxes) in the final detection list are also printed to
764	this file, again for diagnostic purposes. This feature can be turned
765	off by setting \texttt{flagLog = false}. (This may be a good idea if you
766	are not interested in its contents, as it can be a large file.)
767
768	\begin{figure}[t]
769	\begin{center}
770	\includegraphics[width=\textwidth]{example_spectrum}
771	\end{center}
772	\caption{\footnotesize An example of the spectrum output. Note several
773	of the features discussed in the text: the red lines indicating the
774	reconstructed spectrum; the blue dashed lines indicating the
775	spectral extent of the detection; the green hashed area indicating
776	the Milky Way channels that are ignored by the searching algorithm;
777	the blue border showing its spatial extent on the 0th moment map;
778	and the 15~arcmin-long scale bar.}
779	\label{fig-spect}
780	\end{figure}
781
782	\begin{figure}[!t]
783	\begin{center}
784	\includegraphics[width=\textwidth]{example_moment_map}
785	\end{center}
786	\caption{\footnotesize An example of the moment map created by
787	\duchamp. The full extent of the cube is covered, and the 0th moment
788	of each object is shown (integrated individually over all the
789	detected channels).}
790	\label{fig-moment}
791	\end{figure}
792
793	\subsubsection{Graphical output -- spectra}
794
795	As well as the output data file, a postscript file is created that
796	shows the spectrum for each detection, together with a small cutout
797	image (the 0th moment) and basic information about the detection (note
798	that any flags are printed after the name of the detection, in the
799	format \texttt{[E]}). If the cube was reconstructed, the spectrum from
800	the reconstruction is shown in red, over the top of the original
801	spectrum. The spectral extent of the detected object is indicated by
802	two dashed blue lines, and the region covered by the ``Milky Way''
803	channels is shown by a green hashed box.
804
805	The spectrum that is plotted is governed by the
806	\texttt{spectralMethod} parameter. It can be either \texttt{peak},
807	where the spectrum is from the spatial pixel containing the
808	detection's peak flux; or \texttt{sum}, where the spectrum is summed
809	over all spatial pixels, and then corrected for the beam size.
810
811	The spectral extent of the detection is indicated with blue lines, and
812	a zoom is shown in a separate window. The cutout image can optionally
813	include a border around the spatial pixels that are in the detection
814	(turned on and off by the parameter \texttt{drawBorders} -- the
815	default is \texttt{true}). It also includes a scale bar in the bottom
816	left corner to indicate size -- it is 15~arcmin long (note that due to
817	projection effects it may be a slightly different physical length from
818	object to object). An example detection can be seen below in
819	Fig.~\ref{fig-spect}.
820
821	\subsubsection{Graphical output -- maps}
822
823	Finally, a couple of images are optionally produced: a 0th moment map
824	of the cube, combining just the detected channels in each object,
825	showing the integrated flux in grey-scale; and a ``detection image'',
826	a grey-scale image where the pixel values are the number of channels
827	that spatial pixel is detected in. In both cases, if
828	\texttt{drawBorders = true}, a border is drawn around the spatial
829	extent of each detection. An example moment map is shown in
830	Fig.~\ref{fig-moment}. The production or otherwise of these images is
831	governed by the \texttt{flagMaps} parameter.
832
833	The purpose of these images are to provide a visual guide to where the
834	detections have been made, and, particularly in the case of the moment
835	map, to provide an indication of the strength of the source. In both
836	cases, the detections are numbered (in the same sense as the output
837	list), and the spatial borders are marked out as for the cutout images
838	in the spectra file. Both these images are saved as postscript files
839	(given by the parameters \texttt{momentMap} and \texttt{detectionMap}
840	respectively), with the latter also displayed in a {\sc pgplot}
841	window (regardless of the state of \texttt{flagMaps}).
842
843	\section{Notes and hints on the use of \duchamp}
844	\label{sec-notes}
845
846	In using \duchamp, the user has to make a number of decisions about
847	the way the program runs. This section is designed to give the user
848	some idea about what to choose.
849
850	The main choice is whether or not to use the wavelet
851	reconstruction. The main benefits of this are the marked reduction in
852	the noise level, leading to regularly-shaped detections, and good
853	reliability for faint sources. The main drawback with its use is the
854	long execution time: to reconstruct a $170\times160\times1024$
855	(\hipass) cube often requires three iterations and takes about 20-25
856	minutes to run completely. Note that this is for the three-dimensional
857	reconstruction: using \texttt{reconDim=1} makes the reconstruction
858	quicker (the full program then takes about 6 minutes), but it is still
859	the largest part of the time.
860
861	The searching part of the procedure is much quicker: searching an
862	un-reconstructed cube leads to execution times of only a couple of
863	minutes. Alternatively, using the ability to read in previously-saved
864	reconstructed arrays makes running the reconstruction more than once a
865	more feasible prospect.
866
867	On the positive side, the shape of the detections in a cube that has
868	been reconstructed will be much more regular and smooth -- the ragged
869	edges that objects in the raw cube possess are smoothed by the removal
870	of most of the noise. This enables better determination of the shapes
871	and characteristics of objects.
872
873	A further point to consider when using the reconstruction is that if
874	the two-dimensional reconstruction is chosen (\texttt{reconDim=2}), it
875	can be susceptible to edge effects. If the valid area in the cube (\ie
876	the part that is not BLANK) has non-rectangular edges, the convolution
877	can produce artefacts in the reconstruction that mimic the edges and
878	can lead (depending on the selection threshold) to some spurious
879	sources. Caution is advised with such data -- the user is advised to
880	check carefully the reconstructed cube for the presence of such
881	artefacts. Note, however, that the 1- and 3-dimensional
882	reconstructions are \emph{not} susceptible in the same way, since the
883	spectral direction does not generally exhibit these BLANK edges, and
884	so we recommend the use of either of these.
885
886	If one chooses the reconstruction method, a further decision is
887	required on the signal-to-noise cutoff used in determining acceptable
888	wavelet coefficients. A larger value will remove more noise from the
889	cube, at the expense of losing fainter sources, while a smaller value
890	will include more noise, which may produce spurious detections, but
891	will be more sensitive to faint sources. Values of less than about
892	$3\sigma$ tend to not reduce the noise a great deal and can lead to
893	many spurious sources (although this will depend on the nature of the
894	cube).
895
896	When it comes to searching, the FDR method produces more reliable results
897	than simple sigma-clipping, particularly in the absence of reconstruction.
898	However, it does not work in exactly the way one would expect for a
899	given value of \texttt{alpha}. For instance, setting fairly liberal values
900	of \texttt{alpha} (say, 0.1) will often lead to a much smaller fraction
901	of false detections (\ie much less than 10\%). This is the effect of the
902	merging algorithms, that combine the sources after the detection stage,
903	and reject detections not meeting the minimum pixel or channel requirements.
904	It is thus better to aim for larger \texttt{alpha} values than those derived
905	from a straight conversion of the desired false detection rate.
906
907	Finally, as \duchamp\ is still undergoing development, there are some
908	elements that are not fully developed. In particular, it is not as
909	clever as I would like at avoiding interference. The ability to place
910	requirements on the minimum number of channels and pixels partially
911	circumvents this problem, but work is being done to make \duchamp\
912	smarter at rejecting signals that are clearly (to a human eye at
913	least) interference. See the following section for further
914	improvements that are planned.
915
916	%\section{Drawbacks of the current program}
917	%
918	%The program currently has a few problems/drawbacks/things to be aware
919	%of that will hopefully be fixed in the future:
920	%\begin{itemize}
921	%
922	%\item Narrow interference spikes are still getting found, particularly
923	% if there is no reconstruction, or reconstruction with a relatively
924	% low \texttt{snrRecon} (such as 2 or 3). Increasing the
925	% \texttt{minChannels} parameter is one way to circumvent this, but
926	% making the algorithm a bit more clever would be preferable.
927	%
928	%\item Sources that have strong continuum ripple and/or artefacts often
929	% generate many spurious detections. This needs some work to avoid
930	% \duchamp\ doing this, and until then users are advised to be aware
931	% of the possibility. Strong continuum ripples may generate many
932	% sources on the same spatial pixel, and this will be apparent on the
933	% detection images.
934	%
935	%\item Spectra are integrated over every spatial pixel of the
936	% detection, and this may dilute the actual detection, making it
937	% harder to see \ie the apparent strength of the line as plotted may
938	% not give a true indication of how strong it really is.
939	%
940	%%\item A caution on the merging part of the procedure. This can be time
941	%% consuming if there are many detections that do not require merging
942	%% -- in this case, the time will go like $N^2$ ($N$ = number of
943	%% detections). If there are plenty of mergers, the size of the list
944	%% reduces quickly, so the execution time will be less.
945	%
946	%
947	%\end{itemize}
948
949
950	%\section{Comparison with other software (to be developed further...)}
951	%
952	%\subsection{fred, by Matt Howlett}
953	%
954	%This is the program used in the \hipass\ analysis. It smoothes the
955	%data spectrally with a boxcar filter of a size that varies over a
956	%user-specified range, and then thresholds the data.
957	%
958	%Works effectively, but generally doesn't find as many sources as
959	%\duchamp, particularly when the reconstruction is used. Sensitive to
960	%faint, broad features that fall below the reconstruction threshold.
961	%
962	%Execution takes a long time, depending on the range of filter widths
963	%that are used.
964	%
965	%\subsection{sfind}
966	%
967	%Hard to evaluate, as it does not (as far as I can see) output the
968	%channel number at which detections are made, and does not merge
969	%detections made at adjacent channels (\ie it just works in 2
970	%dimensions).
971	%
972
973	\section{Future Developments}
974
975	This is both a list of planned improvements and a wish-list of
976	features that would be nice to include (but are not planned in the
977	immediate future). Let me know if there are items not on this list, or
978	items on the list you would like prioritised.
979
980	\begin{itemize}
981
982	\item Better determination of the noise characteristics of
983	spectral-line cubes, including understanding how the noise is
984	generated and developing a model for it. \textbf{Planned.}
985
986	\item Include more source analysis. Examples could be: shape
987	information; measurements of HI mass; more variety of measurements
988	of velocity width and profile. \textbf{Some planned.}
989
990	\item Provide some indication of the significance of the detection
991	(\ie some S/N-like value). \textbf{Planned.}
992
993	\item Improved ability to reject interference, possibly on the
994	spectral shape of features. \textbf{Planned.}
995
996	\item Ability to separate (de-blend) distinct sources that have been
997	merged. \textbf{Planned.}
998
999	\item Link to lists of possible counterparts (\eg via NED/SIMBAD/other
1000	VO tools?). \textbf{Wish-list.}
1001
1002	\item On-line web service interface, so a user can upload a cube and
1003	get back a source-list. \textbf{Wish-list}.
1004
1005	\item Embed \duchamp\ in a GUI, to move away from the text-based
1006	interaction. \textbf{Wish-list}.
1007	\end{itemize}
1008
1009
1010	%\bibliographystyle{mn2e}
1011	%\bibliographystyle{abbrvnat}
1012	%\bibliography{mnrasmnemonic,sourceDetection}
1013	\begin{thebibliography}{}
1014
1015	\bibitem[\protect\citeauthoryear{{Calabretta} \& {Greisen}}{{Calabretta} \&
1016	{Greisen}}{2002}]{calabretta02}
1017	{Calabretta} M., {Greisen} E., 2002, A\&A, 395, 1077
1018
1019	\bibitem[\protect\citeauthoryear{{Greisen} \& {Calabretta}}{{Greisen} \&
1020	{Calabretta}}{2002}]{greisen02}
1021	{Greisen} E., {Calabretta} M., 2002, A\&A, 395, 1061
1022
1023	\bibitem[\protect\citeauthoryear{{Hanisch}, {Farris}, {Greisen}, {Pence},
1024	{Schlesinger}, {Teuben}, {Thompson} \& {Warnock}}{{Hanisch}
1025	et~al.}{2001}]{hanisch01}
1026	{Hanisch} R., {Farris} A., {Greisen} E., {Pence} W., {Schlesinger} B.,
1027	{Teuben} P., {Thompson} R., {Warnock} A., 2001, A\&A, 376, 359
1028
1029	\bibitem[\protect\citeauthoryear{{Hopkins}, {Miller}, {Connolly}, {Genovese},
1030	{Nichol} \& {Wasserman}}{{Hopkins} et~al.}{2002}]{hopkins02}
1031	{Hopkins} A., {Miller} C., {Connolly} A., {Genovese} C., {Nichol} R.,
1032	{Wasserman} L., 2002, AJ, 123, 1086
1033
1034	\bibitem[\protect\citeauthoryear{Lutz}{Lutz}{1980}]{lutz80}
1035	Lutz R., 1980, The Computer Journal, 23, 262
1036
1037	\bibitem[\protect\citeauthoryear{{Meyer} et~al.,}{{Meyer}
1038	et~al.}{2004}]{meyer04:trunc}
1039	{Meyer} M., et~al., 2004, MNRAS, 350, 1195
1040
1041	\bibitem[\protect\citeauthoryear{{Miller}, {Genovese}, {Nichol}, {Wasserman},
1042	{Connolly}, {Reichart}, {Hopkins}, {Schneider} \& {Moore}}{{Miller}
1043	et~al.}{2001}]{miller01}
1044	{Miller} C., {Genovese} C., {Nichol} R., {Wasserman} L., {Connolly} A.,
1045	{Reichart} D., {Hopkins} A., {Schneider} J., {Moore} A., 2001, AJ, 122,
1046	3492
1047
1048	\bibitem[\protect\citeauthoryear{Minchin}{Minchin}{1999}]{minchin99}
1049	Minchin R., 1999, PASA, 16, 12
1050
1051	\bibitem[\protect\citeauthoryear{Starck \& Murtagh}{Starck \&
1052	Murtagh}{2002}]{starck02:book}
1053	Starck J.-L., Murtagh F., 2002, {``Astronomical Image and Data Analysis''}.
1054	Springer
1055
1056	\end{thebibliography}
1057
1058
1059	\appendix
1060	\newpage
1061	\section{Available parameters}
1062	\label{app-param}
1063
1064	The full list of parameters that can be listed in the input file are
1065	given here. If not listed, they take the default value given in
1066	parentheses. Since the order of the parameters in the input file does
1067	not matter, they are grouped here in logical sections.
1068
1069	\subsection*{Input-output related}
1070	\begin{entry}
1071	\item[ImageFile (no default assumed)] The filename of the
1072	data cube to be analysed.
1073	\item[flagSubsection \texttt{[false]}] A flag to indicate whether one
1074	wants a subsection of the requested image.
1075	\item[Subsection \texttt{[ [,,*] ]}] The requested subsection, which
1076	should be specified in the format \texttt{[x1:x2,y1:y2,z1:z2]}, where
1077	the limits are inclusive. If the full range of a dimension is
1078	required, use a \texttt{*}, \eg if you want the full spectral range of
1079	a subsection of the image, use \texttt{[30:140,30:140,*]}.
1080	\item[flagReconExists \texttt{[false]}] A flag to indicate whether the
1081	reconstructed array has been saved by a previous run of \duchamp. If
1082	set true, the reconstructed array will be read from the file given by
1083	\texttt{reconFile}, rather than calculated directly.
1084	\item[reconFile (no default assumed)] The FITS file that contains the
1085	reconstructed array. If \texttt{flagReconExists} is true and this
1086	parameter is not defined, the default file searched will be
1087	determined by the \atrous\ parameters (see \S\ref{sec-recon}).
1088	\item[OutFile \texttt{[duchamp-Results.txt]}] The file containing the
1089	final list of detections. This also records the list of input
1090	parameters.
1091	\item[SpectraFile \texttt{[duchamp-Spectra.ps]}] The postscript file
1092	containing the resulting integrated spectra and images of the
1093	detections.
1094	\item[flagLog \texttt{[true]}] A flag to indicate whether intermediate
1095	detections should be logged.
1096	\item[LogFile \texttt{[duchamp-Logfile.txt]}] The file in which intermediate
1097	detections are logged. These are detections that have not been
1098	merged. This is primarily for use in debugging and diagnostic
1099	purposes -- normal use of the program will probably not require
1100	this.
1101	\item[flagOutputRecon \texttt{[false]}] A flag to say whether or not to
1102	save the reconstructed cube as a FITS file. The filename will be
1103	derived from the ImageFile -- the reconstruction of \texttt{image.fits}
1104	will be saved as \texttt{image.RECON?.fits}, where \texttt{?} stands for
1105	the value of \texttt{snrRecon} (see below).
1106	\item[flagOutputResid \texttt{[false]}] As for \texttt{flagOutputRecon}, but
1107	for the residual array -- the difference between the original cube
1108	and the reconstructed cube. The filename will be \texttt{image.RESID?.fits}.
1109	\item[flagVOT \texttt{[false]}] A flag to say whether to create a VOTable
1110	file corresponding to the information in \texttt{outfile}. This will be
1111	an XML file in the Virtual Observatory VOTable format.
1112	\item[votFile \texttt{[duchamp-Results.xml]}] The VOTable file with the
1113	list of final detections. Some input parameters are also recorded.
1114	\item[flagKarma \texttt{[false]}] A flag to say whether to create a
1115	Karma annotation file corresponding to the information in
1116	\texttt{outfile}. This can be used as an overlay for the Karma
1117	programs such as \texttt{kvis}.
1118	\item[karmaFile \texttt{[duchamp-Results.ann]}] The Karma annotation
1119	file showing the list of final detections.
1120	\item[flagMaps \texttt{[true]}] A flag to say whether to save
1121	postscript files showing the 0th moment map of the whole cube
1122	(parameter \texttt{momentMap}) and the detection image
1123	(\texttt{detectionMap}).
1124	\item[momentMap \texttt{[duchamp-MomentMap.ps]}] A postscript file
1125	containing a map of the 0th moment of the detected sources, as well
1126	as pixel and WCS coordinates.
1127	\item[detectionMap \texttt{[duchamp-DetectionMap.ps]}] A postscript
1128	file showing each of the detected objects, coloured in greyscale by
1129	the number of channels spanned by each pixel. Also shows pixel and WCS
1130	coordinates.
1131	\end{entry}
1132
1133	\subsection*{Modifying the cube}
1134	\begin{entry}
1135	\item[flagBlankPix \texttt{[true]}] A flag to say whether to remove BLANK
1136	pixels from the analysis -- these are pixels set to some particular
1137	value because they fall outside the imaged area.
1138	\item[blankPixValue \texttt{[-8.00061]}] The value of the BLANK pixels,
1139	if this information is not contained in the FITS header (the usual
1140	procedure is to obtain this value from the header information -- in
1141	which case the value set by this parameter is ignored).
1142	\item[flagMW \texttt{[false]}] A flag to say whether to ignore channels
1143	contaminated by Milky Way (or other) emission -- the searching
1144	algorithms will not look at these channels.
1145	\item[maxMW \texttt{[112]}] The maximum channel number containing
1146	``Milky Way'' emission.
1147	\item[minMW \texttt{[75]}] The minimum channel number containing
1148	``Milky Way'' emission. Note that the range specified by
1149	\texttt{maxMW} and \texttt{minMW} is inclusive.
1150	\item[flagBaseline \texttt{[false]}] A flag to say whether to remove the
1151	baseline from each spectrum in the cube for the purposes of
1152	reconstruction and detection.
1153	\end{entry}
1154
1155	\subsection*{Detection related}
1156
1157	\subsubsection*{General detection}
1158	\begin{entry}
1159	\item[flagNegative \texttt{[false]}] A flag to indicate that the features
1160	being searched for are negative. The cube will be inverted prior to
1161	searching.
1162	\item[snrCut \texttt{[3.]}] The cut-off value for thresholding, in terms
1163	of number of $\sigma$ above the mean.
1164	\item[flagGrowth \texttt{[false]}] A flag indicating whether or not to
1165	grow the detected objects to a smaller threshold.
1166	\item[growthCut \texttt{[2.]}] The smaller threshold using in growing
1167	detections. In units of $\sigma$ above the mean.
1168	\end{entry}
1169
1170	\subsubsection*{\Atrous\ reconstruction}
1171	\begin{entry}
1172	\item [flagATrous \texttt{[true]}] A flag indicating whether or not to
1173	reconstruct the cube using the \atrous\ wavelet
1174	reconstruction. See \S\ref{sec-recon} for details.
1175	\item[reconDim \texttt{[3]}] The number of dimensions to use in the
1176	reconstruction. 1 means reconstruct each spectrum separately, 2
1177	means each channel map is done separately, and 3 means do the whole
1178	cube in one go.
1179	\item[scaleMin \texttt{[1]}] The minimum wavelet scale to be used in the
1180	reconstruction. A value of 1 means ``use all scales''.
1181	\item[snrRecon \texttt{[4]}] The thresholding cutoff used in the
1182	reconstruction -- only wavelet coefficients this many $\sigma$ above
1183	the mean (or greater) are included in the reconstruction.
1184	\item[filterCode \texttt{[2]}] The code number of the filter to use in
1185	the reconstruction. The options are:
1186	\begin{itemize}
1187	\item \textbf{1:} B$_3$-spline filter: coefficients =
1188	$(\frac{1}{16}, \frac{1}{4}, \frac{3}{8}, \frac{1}{4}, \frac{1}{16})$
1189	\item \textbf{2:} Triangle filter: coefficients = $(\frac{1}{4}, \frac{1}{2}, \frac{1}{4})$
1190	\item \textbf{3:} Haar wavelet: coefficients = $(0, \frac{1}{2}, \frac{1}{2})$
1191	\end{itemize}
1192	\end{entry}
1193
1194	\subsubsection*{FDR method}
1195	\begin{entry}
1196	\item[flagFDR \texttt{[false]}] A flag indicating whether or not to use
1197	the False Discovery Rate method in thresholding the pixels.
1198	\item[alphaFDR \texttt{[0.01]}] The $\alpha$ parameter used in the FDR
1199	analysis. The average number of false detections, as a fraction of the
1200	total number, will be less than $\alpha$ (see \S\ref{sec-detection}).
1201	\end{entry}
1202
1203	\subsubsection*{Merging detections}
1204	\begin{entry}
1205	\item[minPix \texttt{[2]}] The minimum number of spatial pixels for a single
1206	detection to be counted.
1207	\item[minChannels \texttt{[3]}] The minimum number of consecutive
1208	channels that must be present in a detection.
1209	\item[flagAdjacent \texttt{[true]}] A flag indicating whether to use the
1210	``adjacent pixel'' criterion to decide whether to merge objects. If
1211	not, the next two parameters are used to determine whether objects
1212	are within the necessary thresholds.
1213	\item[threshSpatial \texttt{[3.]}] The maximum allowed minimum spatial
1214	separation (in pixels) between two detections for them to be merged
1215	into one. Only used if \texttt{flagAdjacent = false}.
1216	\item[threshVelocity \texttt{[7.]}] The maximum allowed minimum channel
1217	separation between two detections for them to be merged into
1218	one.
1219	\end{entry}
1220
1221	\subsubsection*{Other parameters}
1222	\begin{entry}
1223	\item[spectralMethod \texttt{[peak]}] This indicates which method is used
1224	to plot the output spectra: \texttt{peak} means plot the spectrum
1225	containing the detection's peak pixel; \texttt{sum} means sum the
1226	spectra of each detected spatial pixel, and correct for the beam
1227	size. Any other choice defaults to \texttt{peak}.
1228	\item[spectralUnits \texttt{[km/s]}] The user can specify the units of
1229	the spectral axis. Assuming the WCS of the FITS file is valid, the
1230	spectral axis is transformed into velocity, and put into these units
1231	for all output and for calculations such as the integrated flux of a
1232	detection.
1233	\item[drawBorders \texttt{[true]}] A flag indicating whether borders
1234	are to be drawn around the detected objects in the moment maps
1235	included in the output (see for example Fig.~\ref{fig-spect}).
1236	\item[verbose \texttt{[true]}] A flag indicating whether to print the
1237	progress of computationally-intensive algorithms (such as the
1238	searching and merging) to screen.
1239	\end{entry}
1240
1241
1242	\newpage
1243	\section{Example parameter files}
1244	\label{app-input}
1245
1246	This is what a typical parameter file would look like.
1247
1248	\begin{verbatim}
1249	imageFile /DATA/SITAR_1/whi550/cubes/H201_abcde_luther_chop.fits
1250	logFile logfile.txt
1251	outFile results.txt
1252	spectraFile spectra.ps
1253	flagSubsection 0
1254	flagOutputRecon 0
1255	flagOutputResid 0
1256	flagBlankPix 1
1257	flagMW 1
1258	minMW 75
1259	maxMW 112
1260	minPix 3
1261	flagGrowth 1
1262	growthCut 1.5
1263	flagATrous 0
1264	scaleMin 1
1265	snrRecon 4
1266	flagFDR 1
1267	alphaFDR 0.1
1268	numPixPSF 20
1269	snrCut 3
1270	threshSpatial 3
1271	threshVelocity 7
1272	\end{verbatim}
1273
1274	Note that it is not necessary to include all these parameters in the
1275	file, only those that need to be changed from the defaults (as listed
1276	in Appendix~\ref{app-param}), which in this case would be very few. A
1277	minimal parameter file might look like:
1278	\begin{verbatim}
1279	imageFile /DATA/SITAR_1/whi550/cubes/H201_abcde_luther_chop.fits
1280	flagLog 0
1281	snrRecon 3
1282	snrCut 2.5
1283	minChannels 4
1284	\end{verbatim}
1285	This will reconstruct the cube with a lower SNR value than the
1286	default, select objects at a lower threshold, with a looser minimum
1287	channel requirement, and not keep a log of the intermediate
1288	detections.
1289
1290	The following page demonstrates how the parameters are presented to the
1291	user, both on the screen at execution time, and in the output and log
1292	files. On each line, there is a description on the parameter, the relevant
1293	parameter name that is used in the input file (if there is one that the
1294	user can enter), and the value of the parameter being used.
1295	\newpage
1296	\begin{landscape}
1297	Typical presentation of parameters in output and log files:
1298	\begin{verbatim}
1299	---- Parameters ----
1300	Image to be analysed.........................[imageFile] = input.fits
1301	Intermediate Logfile...........................[logFile] = duchamp-Logfile.txt
1302	Final Results file.............................[outFile] = duchamp-Results.txt
1303	Spectrum file..............................[spectraFile] = duchamp-Spectra.ps
1304	0th Moment Map...............................[momentMap] = duchamp-MomentMap.ps
1305	Detection Map.............................[detectionMap] = duchamp-DetectionMap.ps
1306	Saving reconstructed cube?.............[flagoutputrecon] = false
1307	Saving residuals from reconstruction?..[flagoutputresid] = false
1308	------
1309	Searching for Negative features?..........[flagNegative] = false
1310	Fixing Blank Pixels?......................[flagBlankPix] = true
1311	Blank Pixel Value....................................... = -8.00061
1312	Removing Milky Way channels?....................[flagMW] = true
1313	Milky Way Channels.......................[minMW - maxMW] = 75-112
1314	Beam Size (pixels)...................................... = 10.1788
1315	Removing baselines before search?.........[flagBaseline] = false
1316	Minimum # Pixels in a detection.................[minPix] = 2
1317	Minimum # Channels in a detection..........[minChannels] = 3
1318	Growing objects after detection?............[flagGrowth] = false
1319	Using A Trous reconstruction?...............[flagATrous] = true
1320	Number of dimensions in reconstruction........[reconDim] = 3
1321	Minimum scale in reconstruction...............[scaleMin] = 1
1322	SNR Threshold within reconstruction...........[snrRecon] = 4
1323	Filter being used for reconstruction........[filterCode] = 1 (B3 spline function)
1324	Using FDR analysis?............................[flagFDR] = false
1325	SNR Threshold...................................[snrCut] = 3
1326	Using Adjacent-pixel criterion?...........[flagAdjacent] = true
1327	Max. velocity separation for merging....[threshVelocity] = 7
1328	Method of spectral plotting.............[spectralMethod] = peak
1329	\end{verbatim}
1330
1331	\newpage
1332	\section{Example results file}
1333	\label{app-output}
1334	This the typical content of an output file, after running \duchamp\
1335	with the parameters illustrated on the previous page.
1336
1337	{\scriptsize
1338	\begin{verbatim}
1339	Results of the \duchamp\ source finder: Tue May 23 14:51:38 2006
1340	---- Parameters ----
1341	(... omitted for clarity -- see previous page for examples...)
1342	--------------------
1343	Total number of detections = 25
1344	--------------------
1345	------------------------------------------------------------------------------------------------------------------------------------------------------
1346	Obj# Name X Y Z RA DEC VEL w_RA w_DEC w_VEL F_int F_peak X1 X2 Y1 Y2 Z1 Z2 Npix Flag
1347	[km/s] [arcmin] [arcmin] [km/s] [Jy km/s] [Jy/beam] [pix]
1348	------------------------------------------------------------------------------------------------------------------------------------------------------
1349	1 J0618-2532 30.2 86.0 113.3 06:18:12.54 -25:32:44.79 208.502 45.17 34.61 26.383 24.394 0.350 25 35 82 90 112 114 137 E
1350	2 J0609-2156 59.5 140.6 114.6 06:09:19.66 -21:56:31.20 225.572 44.39 31.47 65.957 16.128 0.213 55 65 137 144 113 118 153
1351	3 J0545-2143 141.2 143.2 114.8 05:45:51.71 -21:43:36.20 228.470 19.61 16.66 26.383 2.412 0.090 139 143 142 145 114 116 29
1352	4 J0617-2633 33.3 70.8 115.6 06:17:25.52 -26:33:33.83 238.736 65.02 30.10 26.383 9.776 0.117 26 41 68 75 115 117 104 E
1353	5 J0601-2500 86.2 94.9 117.9 06:01:39.54 -25:00:32.46 269.419 27.99 24.02 26.383 3.920 0.124 83 89 92 97 117 119 44
1354	6 J0602-2547 84.0 83.1 118.0 06:02:18.29 -25:47:31.69 270.319 20.01 19.99 26.383 2.999 0.118 82 86 81 85 117 119 34
1355	7 J0547-2448 133.0 97.2 118.7 05:47:52.53 -24:48:38.16 279.113 19.72 12.54 26.383 1.474 0.074 131 135 96 98 118 120 21
1356	8 J0606-2719 71.1 60.0 121.3 06:06:10.99 -27:19:48.61 314.090 52.36 39.59 39.574 14.268 0.150 65 77 55 64 120 123 154
1357	9 J0611-2137 52.4 145.3 162.5 06:11:20.92 -21:37:29.57 857.955 32.39 23.49 118.722 43.178 0.410 49 56 142 147 158 167 265 E
1358	10 J0600-2859 89.7 35.3 202.4 06:00:34.08 -28:59:00.43 1383.160 23.93 24.10 171.487 24.439 0.173 87 92 33 38 196 209 271
1359	11 J0558-2638 95.4 70.3 223.1 05:58:53.03 -26:38:45.91 1656.140 11.93 12.07 92.339 1.045 0.063 94 96 69 71 220 227 18
1360	12 J0617-2723 34.7 58.3 227.4 06:17:07.07 -27:23:50.65 1712.868 16.75 23.53 290.209 8.529 0.093 33 36 56 61 215 237 118
1361	13 J0558-2525 95.8 88.6 231.7 05:58:49.27 -25:25:33.60 1770.134 27.87 24.16 237.444 12.863 0.115 92 98 86 91 221 239 175
1362	14 J0600-2141 88.8 144.4 232.5 06:00:54.02 -21:41:57.06 1780.188 27.96 24.13 224.252 30.743 0.166 86 92 142 147 222 239 344 E
1363	15 J0615-2634 40.0 70.8 232.6 06:15:25.50 -26:34:20.04 1782.214 12.44 15.69 52.765 2.084 0.068 39 41 69 72 231 235 31
1364	16 J0604-2606 76.0 78.4 233.0 06:04:41.13 -26:06:21.19 1787.226 24.13 23.87 211.061 23.563 0.155 73 78 76 81 225 241 278
1365	17 J0601-2340 87.9 114.9 235.8 06:01:08.83 -23:40:19.37 1824.122 31.95 28.09 237.444 82.380 0.297 85 92 112 118 227 245 647
1366	18 J0615-2235 38.2 130.5 254.5 06:15:32.09 -22:35:37.24 2070.934 12.29 11.70 105.531 1.555 0.070 37 39 129 131 249 257 24
1367	19 J0617-2305 31.4 122.8 258.1 06:17:33.45 -23:05:28.94 2118.752 12.34 11.65 26.383 1.022 0.062 30 32 122 124 257 259 16
1368	20 J0612-2149 49.6 142.2 270.3 06:12:11.04 -21:49:29.72 2279.926 16.27 15.73 395.740 15.156 0.101 48 51 141 144 257 287 204
1369	21 J0616-2133 35.3 146.0 300.6 06:16:15.78 -21:33:09.69 2679.148 20.22 7.47 224.252 3.014 0.127 33 37 145 146 294 311 28 E
1370	22 J0555-2956 107.3 20.9 367.6 05:55:08.02 -29:56:09.08 3562.236 19.71 20.30 39.574 5.891 0.169 105 109 19 23 366 369 58
1371	23 J0557-2246 99.8 128.2 434.0 05:57:43.77 -22:46:42.95 4438.776 11.88 16.12 105.531 1.703 0.167 99 101 127 130 430 438 17 N
1372	24 J0616-2648 38.1 67.2 546.8 06:16:02.10 -26:48:35.49 5926.464 12.35 11.67 26.383 1.276 0.064 37 39 66 68 546 548 18
1373	25 J0552-2916 117.0 30.5 727.0 05:52:13.64 -29:16:58.02 8303.952 11.59 20.25 303.400 35.523 0.479 116 118 28 32 716 739 111
1374	\end{verbatim}
1375	}
1376	Note that the
1377	width of the table can make it hard to read. A good trick for those
1378	using UNIX/Linux is to make use of the \texttt{a2ps} command. The
1379	following works well, producing a postscript file \texttt{results.ps}:
1380	\\\verb\|a2ps -1 -r -f8 -o duchamp-Results.ps duchamp-Results.txt\|
1381
1382	%\end{landscape}
1383
1384	\newpage
1385	\section{Example VOTable output}
1386	\label{app-votable}
1387	This is part of the VOTable, in XML format, corresponding to the
1388	output file in Appendix~\ref{app-output} (the indentation has been
1389	removed to make it fit on the page).
1390
1391	%\begin{landscape}
1392	{\scriptsize
1393	\begin{verbatim}
1394	<?xml version="1.0"?>
1395	<VOTABLE version="1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
1396	xsi:noNamespaceSchemaLocation="http://www.ivoa.net/xml/VOTable/VOTable/v1.1">
1397	<COOSYS ID="J2000" equinox="J2000." epoch="J2000." system="eq_FK5"/>
1398	<RESOURCE name="Duchamp Output">
1399	<TABLE name="Detections">
1400	<DESCRIPTION>Detected sources and parameters from running the Duchamp source finder.</DESCRIPTION>
1401	<PARAM name="FITS file" datatype="char" ucd="meta.file;meta.fits" value="/DATA/SITAR_1/whi550/cubes/H201_abcde_luther_chop.fits"/>
1402	<PARAM name="Threshold" datatype="float" ucd="stat.snr" value="2.5">
1403	<PARAM name="ATrous note" datatype="char" ucd="meta.note" value="The a trous reconstruction method was used, with the following parameters.">
1404	<PARAM name="ATrous Dimension" datatype="int" ucd="meta.code;stat" value="3">
1405	<PARAM name="ATrous Cut" datatype="float" ucd="stat.snr" value="4">
1406	<PARAM name="ATrous Minimum Scale" datatype="int" ucd="stat.param" value="1">
1407	<PARAM name="ATrous Filter" datatype="char" ucd="meta.code;stat" value="B3 spline function">
1408	<FIELD name="ID" ID="col1" ucd="meta.id" datatype="int" width="4"/>
1409	<FIELD name="Name" ID="col2" ucd="meta.id;meta.main" datatype="char" arraysize="14"/>
1410	<FIELD name="RA" ID="col3" ucd="pos.eq.ra;meta.main" ref="J2000" datatype="float" width="10" precision="6" unit="deg"/>
1411	<FIELD name="Dec" ID="col4" ucd="pos.eq.dec;meta.main" ref="J2000" datatype="float" width="10" precision="6" unit="deg"/>
1412	<FIELD name="w_RA" ID="col3" ucd="phys.angSize;pos.eq.ra" ref="J2000" datatype="float" width="7" precision="2" unit="arcmin"/>
1413	<FIELD name="w_Dec" ID="col4" ucd="phys.angSize;pos.eq.dec" ref="J2000" datatype="float" width="7" precision="2" unit="arcmin"/>
1414	<FIELD name="Vel" ID="col4" ucd="phys.veloc;src.dopplerVeloc" datatype="float" width="9" precision="3" unit="km/s"/>
1415	<FIELD name="w_Vel" ID="col4" ucd="phys.veloc;src.dopplerVeloc;spect.line.width" datatype="float" width="8" precision="3" unit="km/s"/>
1416	<FIELD name="Integrated_Flux" ID="col4" ucd="phys.flux;spect.line.intensity" datatype="float" width="10" precision="3" unit="km/s"/>
1417	<DATA>
1418	<TABLEDATA>
1419	<TR>
1420	<TD> 1</TD><TD> J0609-2200</TD><TD> 92.410416</TD><TD>-22.013390</TD><TD> 48.50</TD><TD> 39.42</TD><TD> 213.061</TD><TD> 65.957</TD><TD> 17.572</TD>
1421	</TR>
1422	<TR>
1423	<TD> 2</TD><TD> J0608-2605</TD><TD> 92.042633</TD><TD>-26.085157</TD><TD> 44.47</TD><TD> 39.47</TD><TD> 233.119</TD><TD> 39.574</TD><TD> 4.144</TD>
1424	</TR>
1425	<TR>
1426	<TD> 3</TD><TD> J0606-2724</TD><TD> 91.637840</TD><TD>-27.412022</TD><TD> 52.48</TD><TD> 47.57</TD><TD> 302.213</TD><TD> 39.574</TD><TD> 17.066</TD>
1427	</TR>
1428	(... table truncated for clarity ...)
1429	</TABLEDATA>
1430	</DATA>
1431	</TABLE>
1432	</RESOURCE>
1433	</VOTABLE>
1434	\end{verbatim}
1435	}
1436	\end{landscape}
1437
1438	\newpage
1439	\section{Example Karma Annotation File output}
1440	\label{app-karma}
1441
1442	This is the format of the Karma Annotation file, showing the locations
1443	of the detected objects. This can be loaded by the plotting tools of
1444	the Karma package (for instance, \texttt{kvis}) as an overlay on the FITS
1445	file.
1446
1447	\begin{verbatim}
1448	# Duchamp Source Finder results for
1449	# cube /DATA/SITAR_1/whi550/cubes/H201_abcde_luther_chop.fits
1450	COLOR RED
1451	COORD W
1452	CIRCLE 92.3376 -21.9475 0.403992
1453	TEXT 92.3376 -21.9475 1
1454	CIRCLE 91.9676 -26.0193 0.37034
1455	TEXT 91.9676 -26.0193 2
1456	CIRCLE 91.5621 -27.3459 0.437109
1457	TEXT 91.5621 -27.3459 3
1458	CIRCLE 92.8285 -21.6344 0.269914
1459	TEXT 92.8285 -21.6344 4
1460	CIRCLE 90.1381 -28.9838 0.234179
1461	TEXT 90.1381 -28.9838 5
1462	CIRCLE 89.72 -26.6513 0.132743
1463	TEXT 89.72 -26.6513 6
1464	CIRCLE 94.2743 -27.4003 0.195175
1465	TEXT 94.2743 -27.4003 7
1466	CIRCLE 92.2739 -21.6941 0.134538
1467	TEXT 92.2739 -21.6941 8
1468	CIRCLE 89.7133 -25.4259 0.232252
1469	TEXT 89.7133 -25.4259 9
1470	CIRCLE 90.2206 -21.6993 0.266247
1471	TEXT 90.2206 -21.6993 10
1472	CIRCLE 93.8581 -26.5766 0.163153
1473	TEXT 93.8581 -26.5766 11
1474	CIRCLE 91.176 -26.1064 0.234356
1475	TEXT 91.176 -26.1064 12
1476	CIRCLE 90.2844 -23.6716 0.299509
1477	TEXT 90.2844 -23.6716 13
1478	CIRCLE 93.8774 -22.581 0.130925
1479	TEXT 93.8774 -22.581 14
1480	CIRCLE 94.3882 -23.0934 0.137108
1481	TEXT 94.3882 -23.0934 15
1482	CIRCLE 93.0491 -21.8223 0.202928
1483	TEXT 93.0491 -21.8223 16
1484	CIRCLE 94.0685 -21.5603 0.168456
1485	TEXT 94.0685 -21.5603 17
1486	CIRCLE 86.0568 -27.6095 0.101113
1487	TEXT 86.0568 -27.6095 18
1488	CIRCLE 88.7932 -29.9453 0.202624
1489	TEXT 88.7932 -29.9453 19
1490	\end{verbatim}
1491
1492	\newpage
1493	\section{Installing \duchamp\ (README file)}
1494	\begin{verbatim}
1495	There is an executable (Duchamp) that has been compiled on a Debian
1496	Linux kernel 2.6.8-2-686, with gcc version 3.3.5 (Debian 1:3.3.5-13)
1497
1498	If that is no good to you, you can compile it yourself using the
1499	Makefile included in this directory (sorry for not having a configure
1500	script or similar yet!).
1501
1502	Duchamp uses three main external libraries: pgplot, cfitsio and
1503	wcslib. You will need to set the paths for the base directory and
1504	three libraries, as they are currently configured for my use and will
1505	not be of much use to you. These are:
1506
1507	BASE --> the current directory
1508	PGDIR --> where the pgplot libraries (and header files) are located
1509	CFITSIODIR --> where the header file fitsio.h is
1510	CFITSIOLDIR --> where the cfitsio library is located (libcfitsio.a)
1511	WCSDIR --> where the wcslib header files are
1512	WCSLDIR --> where the wcslib library is located (libwcs.a)
1513
1514	If you do not have the libraries, they can be downloaded from the
1515	following locations:
1516	PGPlot -- http://www.astro.caltech.edu/~tjp/pgplot/
1517	cfitsio -- http://heasarc.gsfc.nasa.gov/docs/software/fitsio/fitsio.html
1518	wcslib -- http://www.atnf.csiro.au/people/Mark.Calabretta/WCS/index.html
1519
1520	Once you've set up the Makefile correctly, then simply typing
1521	> make duchamp
1522	will compile the program.
1523
1524	To run it, you need to use the syntax
1525	> Duchamp -p parameterFile
1526	where parameterFile is a file with the input parameters, including the
1527	name of the cube you want to search.
1528
1529	There are two example input files included with the distribution. The
1530	smaller one, InputExample, shows the typical parameters one might want
1531	to set. The large one, InputComplete, lists all parameters that can be
1532	entered, and a brief description of them. Refer to the documentation
1533	for further details.
1534
1535	To get going quickly, just replace the "your-file-here" in
1536	InputExample with your image name, and type
1537	> Duchamp -p InputExample
1538	and you're off!
1539	\end{verbatim}
1540
1541	\section{Robust statistics for a Normal distribution}
1542	\label{app-madfm}
1543
1544	The Normal, or Gaussian, distribution for mean $\mu$ and standard
1545	deviation $\sigma$ can be written as
1546	\[
1547	f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}\ e^{-(x-\mu)^2/2\sigma^2}.
1548	\]
1549
1550	When one has a purely Gaussian signal, it is straightforward to
1551	estimate $\sigma$ by calculating the standard deviation (or rms) of
1552	the data. However, if there is a small amount of signal present on top
1553	of Gaussian noise, and one wants to estimate the $\sigma$ for the
1554	noise, the presence of the large values from the signal can bias the
1555	estimator to higher values.
1556
1557	An alternative way is to use the median ($m$) and median absolute deviation
1558	from the median ($s$) to estimate $\mu$ and $\sigma$. The median is the
1559	middle of the distribution, defined for a continuous distribution by
1560	\[
1561	\int_{-\infty}^{m} f(x) \diff x = \int_{m}^{\infty} f(x) \diff x.
1562	\]
1563	From symmetry, we quickly see that for the continuous Normal
1564	distribution, $m=\mu$. We consider the case henceforth of $\mu=0$,
1565	without loss of generality.
1566
1567	To find $s$, we find the distribution of the absolute deviation from
1568	the median, and then find the median of that distribution. This
1569	distribution is given by
1570	\begin{eqnarray*}
1571	g(x) &= &{\mbox{\rm distribution of }} \|x\|\\
1572	&= &f(x) + f(-x),\ x\ge0\\
1573	&= &\sqrt{\frac{2}{\pi\sigma^2}}\, e^{-x^2/2\sigma^2},\ x\ge0.
1574	\end{eqnarray*}
1575	So, the median absolute deviation from the median, $s$, is given by
1576	\[
1577	\int_{0}^{s} g(x) \diff x = \int_{s}^{\infty} g(x) \diff x.
1578	\]
1579	Now, $\int_{0}^{\infty}e^{-x^2/2\sigma^2} \diff x = \sqrt{\pi\sigma^2/2}$, and
1580	so $\int_{s}^{\infty} e^{-x^2/2\sigma^2} \diff x =
1581	\sqrt{\pi\sigma^2/2} - \int_{0}^{s} e^{-\frac{x^2}{2\sigma^2}} \diff x
1582	$. Hence, to find $s$ we simply solve the following equation (setting $\sigma=1$ for
1583	simplicity -- equivalent to stating $x$ and $s$ in units of $\sigma$):
1584	\[
1585	\int_{0}^{s}e^{-x^2/2} \diff x - \sqrt{\pi/8} = 0.
1586	\]
1587	This is hard to solve analytically (no nice analytic solution exists
1588	for the finite integral that I'm aware of), but straightforward to
1589	solve numerically, yielding the value of $s=0.6744888$. Thus, to
1590	estimate $\sigma$ for a Normally distributed data set, one can calculate
1591	$s$, then divide by 0.6744888 (or multiply by 1.4826042) to obtain the
1592	correct estimator.
1593
1594	Note that this is different to solutions quoted elsewhere,
1595	specifically in \citet{meyer04:trunc}, where the same robust estimator
1596	is used but with an incorrect conversion to standard deviation -- they
1597	assume $\sigma = s\sqrt{\pi/2}$. This, in fact, is the conversion used
1598	to convert the \emph{mean} absolute deviation from the mean to the
1599	standard deviation. This means that the cube noise in the \hipass\
1600	catalogue (their parameter Rms$_{\rm cube}$) should be 18\% larger
1601	than quoted.
1602
1603	\section{How Gaussian noise changes with wavelet scale.}
1604	\label{app-scaling}
1605
1606	The key element in the wavelet reconstruction of an array is the
1607	thresholding of the individual wavelet coefficient arrays. This is
1608	usually done by choosing a level to be some number of standard
1609	deviations above the mean value.
1610
1611	However, since the wavelet arrays are produced by convolving the input
1612	array by an increasingly large filter, the pixels in the coefficient
1613	arrays become increasingly correlated as the scale of the filter
1614	increases. This results in the measured standard deviation from a
1615	given coefficient array decreasing with increasing scale. To calculate
1616	this, we need to take into account how many other pixels each pixel in
1617	the convolved array depends on.
1618
1619	To demonstrate, suppose we have a 1-D array with $N$ pixel values
1620	given by $F_i,\ i=1,...,N$, and we convolve it with the B$_3$-spline
1621	filter, defined by the set of coefficients
1622	$\{1/16,1/4,3/8,1/4,1/16\}$. The flux of the $i$th pixel in the
1623	convolved array will be
1624	\[
1625	F'_i = \frac{1}{16}F_{i-2} + \frac{1}{4}F_{i-1} + \frac{3}{8}F_{i}
1626	+ \frac{1}{4}F_{i+1} + \frac{1}{16}F_{i+2}
1627	\]
1628	and the flux of the corresponding pixel in the wavelet array will be
1629	\[
1630	W'_i = F_i - F'_i = \frac{-1}{16}F_{i-2} - \frac{1}{4}F_{i-1} + \frac{5}{8}F_{i}
1631	- \frac{1}{4}F_{i+1} - \frac{1}{16}F_{i+2}
1632	\]
1633	Now, assuming each pixel has the same standard deviation
1634	$\sigma_i=\sigma$, we can work out the standard deviation for the
1635	wavelet array:
1636	\[
1637	\sigma'_i = \sigma \sqrt{\left(\frac{1}{16}\right)^2 + \left(\frac{1}{4}\right)^2
1638	+ \left(\frac{5}{8}\right)^2 + \left(\frac{1}{4}\right)^2 + \left(\frac{1}{16}\right)^2}
1639	= 0.72349\ \sigma
1640	\]
1641	Thus, the first scale wavelet coefficient array will have a standard
1642	deviation of 72.3\% of the input array. This procedure can be followed
1643	to calculate the necessary values for all scales, dimensions and
1644	filters used by \duchamp.
1645
1646	Calculating these values is clearly a critical step in performing the
1647	reconstruction. \citet{starck02:book} did so by simulating data sets
1648	with Gaussian noise, taking the wavelet transform, and measuring the
1649	value of $\sigma$ for each scale. We take a different approach, by
1650	calculating the scaling factors directly from the filter coefficients
1651	by taking the wavelet transform of an array made up of a 1 in the
1652	central pixel and 0s everywhere else. The scaling value is then
1653	derived by taking the square root of the sum (in quadrature) of all
1654	the wavelet coefficient values at each scale. We give the scaling
1655	factors for the three filters available to \duchamp\ on the following
1656	page. These values are hard-coded into \duchamp, so no on-the-fly
1657	calculation of them is necessary.
1658
1659	Memory limitations prevent us from calculating factors for large
1660	scales, particularly for the three-dimensional case (hence the --
1661	symbols in the tables). To calculate factors for
1662	higher scales than those available, we note the following
1663	relationships apply for large scales to a sufficient level of precision:
1664	\begin{itemize}
1665	\item 1-D: factor(scale $i$) = factor(scale $i-1$)$/\sqrt{2}$.
1666	\item 2-D: factor(scale $i$) = factor(scale $i-1$)$/2$.
1667	\item 1-D: factor(scale $i$) = factor(scale $i-1$)$/\sqrt{8}$.
1668	\end{itemize}
1669
1670	\newpage
1671	\begin{itemize}
1672	\item \textbf{B$_3$-Spline Function:} $\{1/16,1/4,3/8,1/4,1/16\}$
1673
1674	\begin{tabular}{llll}
1675	Scale & 1 dimension & 2 dimension & 3 dimension\\ \hline
1676	1 & 0.723489806 & 0.890796310 & 0.956543592\\
1677	2 & 0.285450405 & 0.200663851 & 0.120336499\\
1678	3 & 0.177947535 & 0.0855075048 & 0.0349500154\\
1679	4 & 0.122223156 & 0.0412474444 & 0.0118164242\\
1680	5 & 0.0858113122 & 0.0204249666 & 0.00413233507\\
1681	6 & 0.0605703043 & 0.0101897592 & 0.00145703714\\
1682	7 & 0.0428107206 & 0.00509204670 & 0.000514791120\\
1683	8 & 0.0302684024 & 0.00254566946 & --\\
1684	9 & 0.0214024008 & 0.00127279050 & --\\
1685	10 & 0.0151336781 & 0.000636389722 & --\\
1686	11 & 0.0107011079 & 0.000318194170 & --\\
1687	12 & 0.00756682272 & -- & --\\
1688	13 & 0.00535055108 & -- & --\\
1689	%14 & 0.00378341085 & -- & --\\
1690	%15 & 0.00267527545 & -- & --\\
1691	%16 & 0.00189170541 & -- & --\\
1692	%17 & 0.00133763772 & -- & --\\
1693	%18 & 0.000945852704 & -- & --
1694	\end{tabular}
1695
1696	\item \textbf{Triangle Function:} $\{1/4,1/2,1/4\}$
1697
1698	\begin{tabular}{llll}
1699	Scale & 1 dimension & 2 dimension & 3 dimension\\ \hline
1700	1 & 0.612372436 & 0.800390530 & 0.895954449 \\
1701	2 & 0.330718914 & 0.272878894 & 0.192033014\\
1702	3 & 0.211947812 & 0.119779282 & 0.0576484078\\
1703	4 & 0.145740298 & 0.0577664785 & 0.0194912393\\
1704	5 & 0.102310944 & 0.0286163283 & 0.00681278387\\
1705	6 & 0.0722128185 & 0.0142747506 & 0.00240175885\\
1706	7 & 0.0510388224 & 0.00713319703 & 0.000848538128 \\
1707	8 & 0.0360857673 & 0.00356607618 & 0.000299949455 \\
1708	9 & 0.0255157615 & 0.00178297280 & -- \\
1709	10 & 0.0180422389 & 0.000891478237 & -- \\
1710	11 & 0.0127577667 & 0.000445738098 & -- \\
1711	12 & 0.00902109930 & 0.000222868922 & -- \\
1712	13 & 0.00637887978 & -- & -- \\
1713	%14 & 0.00451054902 & -- & -- \\
1714	%15 & 0.00318942978 & -- & -- \\
1715	%16 & 0.00225527449 & -- & -- \\
1716	%17 & 0.00159471988 & -- & -- \\
1717	%18 & 0.000112763724 & -- & --
1718
1719	\end{tabular}
1720
1721	\item \textbf{Haar Wavelet:} $\{0,1/2,1/2\}$
1722
1723	\begin{tabular}{llll}
1724	Scale & 1 dimension & 2 dimension & 3 dimension\\ \hline
1725	1 & 0.707167810 & 0.433012702 & 0.935414347 \\
1726	2 & 0.500000000 & 0.216506351 & 0.330718914\\
1727	3 & 0.353553391 & 0.108253175 & 0.116926793\\
1728	4 & 0.250000000 & 0.0541265877 & 0.0413398642\\
1729	5 & 0.176776695 & 0.0270632939 & 0.0146158492\\
1730	6 & 0.125000000 & 0.0135316469 & 0.00516748303
1731
1732	\end{tabular}
1733
1734
1735	\end{itemize}
1736
1737	\end{document}

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: branches/fitshead-branch/docs/Guide.tex @ 1441

Download in other formats: