Context Navigation

source: tags/release-0.9/docs/Guide.tex

Visit:

Last change on this file was 37, checked in by Matthew Whiting, 18 years ago
Changed reference to the Meyer et al determination of sigma, better describing how they did it and the reason for the difference in our conversions.
File size: 66.0 KB

Line
1	\documentclass[12pt,a4paper]{article}
2
3	%Define a test for doing PDF format -- use different code below
4	\newif\ifPDF
5	\ifx\pdfoutput\undefined\PDFfalse
6	\else\ifnum\pdfoutput > 0\PDFtrue
7	\else\PDFfalse
8	\fi
9	\fi
10
11	\textwidth=161 mm
12	\textheight=240 mm
13	\topmargin=-18 mm
14	\oddsidemargin=0 mm
15	\parindent=6 mm
16
17	\usepackage[sort]{natbib}
18	\usepackage{lscape}
19	\bibpunct[,]{(}{)}{;}{a}{}{,}
20
21	\newcommand{\eg}{e.g.\ }
22	\newcommand{\ie}{i.e.\ }
23	\newcommand{\hi}{H{\sc i}}
24	\newcommand{\hipass}{{\sc hipass}}
25	\newcommand{\progname}{{\tt Duchamp}}
26	\newcommand{\diff}{{\rm d}}
27	\newcommand{\entrylabel}[1]{\mbox{\textsf{\bf{#1:}}}\hfil}
28	\newenvironment{entry}
29	{\begin{list}{}%
30	{\renewcommand{\makelabel}{\entrylabel}%
31	\setlength{\labelwidth}{30mm}%
32	\setlength{\labelsep}{5pt}%
33	\setlength{\itemsep}{2pt}%
34	\setlength{\parsep}{2pt}%
35	\setlength{\leftmargin}{35mm}%
36	}%
37	}%
38	{\end{list}}
39
40
41	\title{A Guide to the {\it Duchamp} Source Finding Software}
42	\author{Matthew Whiting\\
43	%{\small \href{mailto:Matthew.Whiting@csiro.au}{Matthew.Whiting@csiro.au}}\\
44	Australia Telescope National Facility\\CSIRO}
45	%\date{January 2006}
46	\date{}
47
48	% If we are creating a PDF, use different options for graphicx, hyperref.
49	\ifPDF
50	\usepackage[pdftex]{graphicx,color}
51	\usepackage[pdftex]{hyperref}
52	\hypersetup{colorlinks=true,%
53	% citecolor=black,%
54	% filecolor=black,%
55	% linkcolor=black,%
56	% urlcolor=black,%
57	}
58	\else
59	\usepackage[dvips]{graphicx}
60	\usepackage[dvips]{hyperref}
61	\fi
62
63	\begin{document}
64
65	\maketitle
66	\tableofcontents
67
68	\newpage
69	\section{Introduction and getting going quickly}
70
71	This document gives details on the use of the program Duchamp. This
72	has been written to provide a source-detection facility for
73	spectral-line data cubes. The basic execution of Duchamp is to read
74	in a FITS data cube, find sources in the cube, and produce a text
75	file of positions, velocities and fluxes of the detections, as well as
76	a postscript file of the spectra of each detection.
77
78	So, you have a FITS cube, and you want to find the sources in it. What
79	do you do? The first step is to make an input file that contains the
80	list of parameters. Brief and detailed examples are shown in
81	Appendix~\ref{app-input}. This provides the input file name, the various
82	output files, and defines various parameters that control the
83	execution.
84
85	The program is run by the command
86	\begin{quote}
87	{\tt Duchamp -p [parameter file]}
88	\end{quote}
89	replacing {\tt [parameter file]} with the name of the file you have
90	just created/copied. The program will then work away and give you the
91	list of detections and their spectra. The program execution is
92	summarise below, and detailed in \S\ref{sec-flow}. Information on
93	inputs is in \S\ref{sec-param} and Appendix~\ref{app-param}, and
94	descriptions of the output is in \S\ref{sec-output}.
95
96	\subsection{A summary of the execution steps}
97
98	The basic flow of the program is summarised here. All these steps are
99	discussed in more detail in the following sections, so read on if
100	you have questions!
101	\begin{enumerate}
102	\item The parameter file given on the command line is read in, and the
103	parameters absorbed.
104	\item From the parameter file, the FITS image is located and read in
105	to memory.
106	\item If requested, blank pixels are trimmed from the edges, and
107	channels corresponding to bright (\eg Galactic) emission are
108	excised.
109	\item If requested, the baseline of each spectrum is removed.
110	\item If the reconstruction method is requested, the cube is
111	reconstructed using the {\it {\' a} trous} wavelet method.
112	\item Searching for objects then takes place, using the requested
113	thresholding method.
114	\item The list of objects is trimmed by merging neighbouring objects
115	and removing those deemed unacceptable.
116	\item The baselines and trimmed pixels are replaced prior to output.
117	\item The details on the detections are written to screen and to the
118	requested output file.
119	\item Maps showing the spatial location of the detections are written.
120	\item The integrated spectra of each detection are written to a
121	postscript file.
122	\item If requested, the reconstructed array can be written to a new
123	FITS file.
124	\end{enumerate}
125
126	\subsection{Guide to terminology}
127
128	First, a brief note on the use of terminology in this guide. Duchamp
129	is designed to work on FITS ``cubes''. These are FITS\footnote{FITS is
130	the Flexible Image Transport System -- see \citet{hanisch01} or
131	websites such as
132	\href{http://fits.cv.nrao.edu/FITS.html}{http://fits.cv.nrao.edu/FITS.html}
133	for details.} image arrays with three dimensions -- they are assumed
134	to have the following form: the first two dimensions (referred to as
135	$x$ and $y$) are spatial directions (that is, relating to the position
136	on the sky), while the third dimension, $z$, is the spectral
137	direction, which can correspond to frequency, wavelength, or velocity.
138
139	Each spatial pixel (a given $(x,y)$ coordinate) can be said to be a
140	single spectrum, while a slice through the cube perpendicular to the
141	spectral direction at a given $z$-value is a single channel (the 2-D
142	image is a channel map).
143
144	Features that are detected are assumed to be positive. If one wants to
145	search for absorption (negative) features, try multiplying your cube
146	by $-1$ before running Duchamp.
147
148	Note that it is possible to run Duchamp on a two-dimensional image
149	(\ie one with no frequency or velocity information), or indeed a
150	one-dimensional array, and many of the features of the program will
151	work fine. The focus, however, is on object detection in three
152	dimensions.
153
154	\subsection{Why ``Duchamp''?}
155
156	Well, it's important for a program to have a name, and it certainly
157	beats the initial version of ``cubefind''. I had planned to call it
158	``Picasso'' (as in the father of cubism), but sadly this had already
159	been used before \citep{minchin99}. So I settled on naming it after
160	Marcel Duchamp, another cubist, but also one of the first artists to
161	work with ``found objects''.
162
163	\section{User Inputs}
164	\label{sec-param}
165
166	Input to the program is provided by means of a parameter file. Parameters
167	are listed in the file, followed by the value that should be assigned
168	to them. The syntax used is {\tt paramName value}. The file is not
169	case-sensitive, and lines in the input file that start with {\tt \#} are
170	ignored. If a parameter is listed more than once, the latter value is
171	used, but otherwise the order in which the parameters are listed in the
172	input file is arbitrary.
173
174	If a parameter is not listed, the default value is assumed. The
175	defaults are chosen to provide a good result (using the reconstruction
176	method), so the user doesn't need to specify many new parameters in
177	the input file. Note that the image file {\bf must} be specified! The
178	parameters that can be set are listed in Appendix~\ref{app-param},
179	with their default values in parentheses.
180
181	The 'flag' parameters are stored as {\tt bool} variables, and so are
182	either {\tt true = 1} or {\tt false = 0}. Currently the program only
183	reads them from the file as integers, and so they should be entered in
184	the file as 0 or 1 (see example file in Appendix~\ref{app-input}).
185
186	\section{What the program is doing}
187	\label{sec-flow}
188
189	The execution flow of the program is detailed here, indicating the
190	main algorithmic steps that are used. The program is written in C/C++
191	and makes use of the {\sc cfitsio}, {\sc wcslib} and {\sc pgplot}
192	libraries.
193
194	%\subsection{Parameter input}
195	%
196	%The user provides parameters that govern the selection of files and
197	%the parameters used by the various subroutines in the program. This is
198	%done via a parameter file, and the parameters are stored in a C++
199	%class for use throughout the program. The form of the parameter file is
200	%discussed in \S\ref{sec-param}, and the parameters themselves are
201	%listed in Appendix~\ref{app-param}.
202
203	\subsection{Image input}
204
205	The cube is read in using basic {\sc cfitsio} commands, and stored as
206	an array in a special C++ class structure. This class keeps track of
207	the list of detected objects, as well as any reconstructed arrays that
208	are made (see \S\ref{sec-recon}). The World Coordinate System (WCS)
209	information for the cube is also obtained from the FITS header by {\sc
210	wcslib} functions \citep{greisen02, calabretta02}, and this
211	information, in the form of a {\tt wcsprm} structure, is also stored
212	in the same class.
213
214	A sub-section of an image can be requested. This is done via the {\tt
215	subsection} parameter in the parameter file. The generalised form of
216	the subsection that is used by {\sc cfitsio} is {\tt
217	[x1:x2:dx,y1:y2:dy,z1:z2:dz]}, such that the x-coordinates run from
218	{\tt x1} to {\tt x2} (inclusive), with steps of {\tt dx}. The step
219	value can be omitted (so a subsection of the form {\tt
220	[2:50,2:50,10:1000]} is still valid). Duchamp does not at this
221	stage deal with the presence of steps in the subsection string, and
222	any that are present are removed before the file is opened.
223
224	If one wants the full range of a value then replace the range with an
225	asterisk, \eg {\tt [2:50,2:50,*]}. If one wants to use just a
226	subsection, one must set {\tt flagSubsection = 1}. A complete
227	description of the section syntax can be found at the {\sc fitsio} web
228	site
229	\footnote{
230	\href{http://heasarc.gsfc.nasa.gov/docs/software/fitsio/c/c\_user/node90.html}%
231	{http://heasarc.gsfc.nasa.gov/docs/software/fitsio/c/c\_user/node90.html}}.
232
233	\subsection{Image modification}
234	\label{sec-modify}
235
236	Several modifications to the cube can be made that improve the
237	execution and efficiency of Duchamp (these are optional -- their
238	use is indicated by the relevant flags set in the input parameter
239	file).
240
241	\subsubsection{Milky-Way removal}
242
243	First, a single set of contiguous channels can be removed -- these may
244	exhibit very strong emission, such as that from the Milky Way as seen
245	in extragalactic \hi\ cubes (hence the references to ``Milky Way'' in
246	relation to this task -- apologies to Galactic astronomers!). Such
247	dominant channels will both produce many unnecessary, uninteresting
248	and large (in size and hence in memory usage) detections, and will
249	also affect any reconstruction that is performed (see next
250	section). The use of this feature is controlled by the {\tt flagMW}
251	parameter, and the exact channels concerned are able to be set by the
252	user (using {\tt maxMW} and {\tt minMW}). When employed, the flux in
253	these channels is set to zero. The information in those channels is
254	not kept.
255
256	\subsubsection{Blank pixel removal}
257
258	Second, the cube is trimmed of any BLANK pixels that pad the image
259	out to a rectangular shape. This is also optional, being determined by
260	the {\tt flagBlankPix} parameter. The value for these pixels is read from
261	the FITS header (using the BLANK, BSCALE and BZERO keywords), but if
262	these are not present then the value can be specified by the user in
263	the parameter file. If these blank pixels are stored as NaNs, then a
264	normal number will be substituted (allowing these pixels to be
265	accurately removed without adverse effects). [NOTE: this appears not
266	to be working correctly at time of writing. If your data has
267	unspecified BLANKs, be wary...]
268
269	This stage is particularly important for the reconstruction step, as
270	lots of BLANK pixels on the edges will smooth out features in the
271	wavelet calculation stage. The trimming will also reduce the size of
272	the cube's array, speeding up the execution. The amount of trimming is
273	recorded, and these pixels are added back in once the source-detection
274	is completed (so that quoted pixel positions are applicable to the
275	original cube).
276
277	Rows and columns are trimmed one at a time until the first non-BLANK
278	pixel is reached, so that the image remains rectangular. In practice,
279	this means that there will be BLANK pixels left in the trimmed image
280	(if the non-BLANK region is non-rectangular). However, these are
281	ignored in all further calculations done on the cube.
282
283	\subsubsection{Baseline removal}
284
285	Finally, the user may request the removal of baselines from the
286	spectra, via the parameter {\tt flagBaseline}. This may be necessary
287	if there is a strong baseline ripple present, which can result in
288	spurious detections on the high points of the ripple. The baseline is
289	calculated from a wavelet reconstruction procedure (see
290	\S\ref{sec-recon}) that keeps only the two largest scales. This is
291	done separately for each spatial pixel (\ie for each spectrum in the
292	cube), and the baselines are stored and added back in before any
293	output is done. In this way the quoted fluxes and displayed spectra
294	are as one would see from the input cube itself -- even though the
295	detection (and reconstruction if applicable) is done on the
296	baseline-removed cube.
297
298	\subsection{Image reconstruction}
299	\label{sec-recon}
300
301	This is an optional step. The user can direct Duchamp to
302	reconstruct the data cube using the {\it {\`a} trous} wavelet
303	procedure. A good description of the procedure can be found in
304	\citet{starck02:book}. This is an effective way of removing a
305	lot of the noise in the image, but at this stage is relatively time-
306	and memory-intensive. The steps in the procedure are as follows:
307	\begin{enumerate}
308	\item Set the reconstructed array to 0 everywhere.
309	\item The cube is discretely convolved with a given filter
310	function. This is determined from the parameter file via the {\tt
311	filterCode} parameter -- see Appendix~\ref{app-param} for details on
312	the filters available.
313	\item The wavelet coefficients are calculated by taking the difference
314	between the convolved array and the input array.
315	\item If the wavelet coefficients at a given point are above the
316	threshold requested (given by {\tt snrRecon} as the number of
317	$\sigma$ above the mean and adjusted to the current scale), add
318	these to the reconstructed array.
319	\item The separation of the filter coefficients is doubled.
320	\item The procedure is repeated from step 2, using the convolved array
321	as the input array.
322	\item Continue until the required maximum number of scales is reached.
323	\item Add the final smoothed (\ie convolved) array to the
324	reconstructed array. This provides the ``DC offset'', as each of the
325	wavelet coefficient arrays will have zero mean.
326	\end{enumerate}
327
328	It is important to note that the {\it {\`a} trous} decomposition is
329	an example of a ``redundant'' transformation. If no thresholding is
330	performed, the sum of all the wavelet coefficient arrays and the final
331	smoothed array is identical to the input array. The thresholding thus
332	removes only the unwanted structure in the array.
333
334	The statistics of the cube are estimated using robust methods, to
335	avoid corruption by strong outlying points. The mean is actually
336	estimated by the median, while the median absolute deviation from the
337	median (MADFM) is calculated and corrected assuming Gaussianity to
338	estimate the standard deviation $\sigma$. The Gaussianity (or
339	Normality) assumption is critical, as the MADFM does not give the same
340	value as the usual rms or standard deviation value -- for a normal
341	distribution $N(\mu,\sigma)$ we find MADFM$=0.6744888\sigma$. The
342	difference between the MADFM and $\sigma$ is corrected for, so the
343	user need only think in the usual multiples of $\sigma$ when setting
344	{\tt snrRecon}. See Appendix~\ref{app-madfm} for a derivation of this
345	value.
346
347	When thresholding the different wavelet scales, the value of $\sigma$
348	as measured from the input array needs to be scaled to account for the
349	increased amount of correlation between neighbouring pixels (due to
350	the convolution). See Appendix~\ref{app-madfm} for details on this
351	scaling.
352
353	The user can also select the minimum scale to be used in the
354	reconstruction -- the first scale exhibits the highest frequency
355	variations, and so ignoring this one can sometimes be beneficial in
356	removing excess noise. The default, however, is to use all scales
357	({\tt minscale = 1}).
358
359	The reconstruction has at least two iterations. The first iteration
360	makes a first pass at the wavelet reconstruction (the process outlined
361	in the 8 stages above), but the residual array will inevitably have
362	some structure still in it, so the wavelet filtering is done on the
363	residual, and any significant wavelet terms are added to the final
364	reconstruction. This step is repeated until the change in the $\sigma$
365	of the background is less than some fiducial amount.
366
367	The user can optionally select to save the reconstructed image as a
368	FITS file -- at the moment this is just saved in the same directory as
369	the input file, so it won't work if the user does not have write
370	permissions on that directory. See Appendix~\ref{app-param} for details
371	on the naming of the output image. The residual image, which is the
372	difference between the input image and the reconstructed image, can
373	also be saved in the same manner.
374
375	Finally, note that any BLANK pixels that are still in the cube
376	will not be altered by the reconstruction -- they will be left as
377	BLANK so that the shape of the valid part of the cube is preserved.
378
379	\subsection{Searching the image}
380	\label{sec-detection}
381
382	The image is searched for detections in two ways: spectrally (a
383	1-dimensional search in the spectrum in each spatial pixel), and
384	spatially (a 2-dimensional search in the spatial image in each
385	channel). In both cases, the algorithm finds connected pixels that are
386	above the user-specified threshold. In the case of the spatial image
387	search, the algorithm of \citet{lutz80} is used to raster scan through
388	the image and connect groups of pixels on neighbouring rows.
389
390	Note that this algorithm cannot be applied directly to a 3-dimensional
391	case, as it requires that objects are completely nested in a row: that
392	is, if you are scanning along a row, and one object finishes and
393	another starts, you know that you will not get back to the first one
394	(if at all) until the second is finished for that
395	row. Three-dimensional data does not have this property, which is why
396	we break up the searching into 1- and 2-dimensional cases.
397
398	The determination of the threshold is done in one of two ways. The
399	first way is a simple sigma-clipping, where a threshold defined as
400	$n\sigma$ above the mean is set and pixels above this threshold are
401	flagged as detected. As before, the value for $\sigma$ is estimated by
402	the MADFM, and corrected by the ratio derived in
403	Appendix~\ref{app-madfm}.
404
405	The second method uses the False Discovery Rate (FDR) technique
406	\citep{miller01,hopkins02}, whose basis we briefly detail here. The
407	false discovery rate (given by the number of false detections divided
408	by the total number of detections) is fixed at a certain value
409	$\alpha$ (\eg $\alpha=0.05$ implies 5\% of detections are false
410	positives). In practice, an $\alpha$ value is chosen, and the ensemble
411	average FDR (\ie $<FDR>$) when the method is used will be less than
412	$\alpha$. One calculates $p$ -- the probability, assuming the null
413	hypothesis is true, of obtaining a test statistic as extreme as the
414	pixel value (the observed test statistic) -- for each pixel, and sorts
415	them in increasing order. One then calculates $d$ where
416	\[
417	d = \max_j \left\{ j : P_j < \frac{j\alpha}{c_N N} \right\},
418	\]
419	and then rejects all hypotheses whose $p$-values are less than or equal
420	to $P_d$. (So a $P_i<P_d$ will be rejected even if $P_i \geq
421	j\alpha/c_N N$.) Note that ``reject hypothesis'' here means ``accept
422	the pixel as an object pixel'' (\ie we are rejecting the null
423	hypothesis that the pixel belongs to the background).
424
425	The $c_N$ values here are normalisation constants that depend on the
426	correlated nature of the pixel values. If all the pixels are
427	uncorrelated, then $c_N=1$. If $N$ pixels are correlated, then their
428	tests will be dependent on each other, and so $c_N = \sum_{i=1}^N
429	i^{-1}$. \citet{hopkins02} consider real radio data, where the pixels
430	are correlated over the beam. In this case the sum is made over the
431	$N$ pixels that make up the beam. The value of $N$ is calculated from
432	the FITS header (if the correct keywords -- BMAJ, BMIN -- are not
433	present, a default value of 10 pixels is assumed).
434
435	If a reconstruction has been made, the residuals (defined as original
436	$-$ reconstruction) are used to estimate the noise parameters of the
437	cube. Otherwise they are estimated directly from the cube itself. In
438	both cases, the median is used as a robust estimator of the mean
439	value, although the $\sigma$ is estimated by the standard deviation
440	(of the residual array, in the case of the reconstruction, but of the
441	original array otherwise).
442
443	Detections must have a minimum number of pixels to be counted. This
444	minimum number is given by the input parameters {\tt minPix} (for
445	2-dimensional searches) and {\tt minChannels} (for 1-dimensional
446	searches). Note again that only positive thresholding is done --
447	negative features are not searched for.
448
449	\subsection{Merging detected objects}
450	\label{sec-merger}
451
452	The searching step produces a list of detections that will have many
453	repeated detections of a given object -- for instance, spectral
454	detections in adjacent pixels of the same object and/or spatial
455	detections in neighbouring channels. These are then combined in an
456	algorithm that matches all objects judged to be ``close''. This
457	determination is made in one of two ways.
458
459	One way is to define two thresholds -- one spatial and one in velocity
460	-- and say that two objects should be merged if there is at least one
461	pair of pixels that lie within these threshold distances of each
462	other. These thresholds are specified by the parameters {\tt
463	threshSpatial} and {\tt threshVelocity} (in units of pixels and
464	channels respectively).
465
466	Alternatively, the spatial requirement can be changed to say that
467	there must be a pair of pixels that are {\it adjacent} -- a stricter,
468	but more realistic requirement, particularly when the spatial pixels
469	have a large angular size (as is the case for \hi\ surveys). This
470	method can be selected by setting the parameter
471	{\tt flagAdjacent} to 1 (\ie {\tt true}) in the parameter file. The
472	velocity thresholding is done in the same way as the first option.
473
474	Once the detections have been merged, they may be ``grown''. This is a
475	process of increasing the size of the detection by adding adjacent
476	pixels that are above some secondary threshold. This threshold is
477	lower than the one used for the initial detection, but above the noise
478	level, so that faint pixels are only detected when they are close to a
479	bright pixel. The value of this threshold is a possible input
480	parameter ({\tt growthCut}), with a default value of $1.5\sigma$. The
481	use of the growth algorithm is controlled by the {\tt flagGrowth}
482	parameter -- the default value of which is {\tt false}. If the
483	detections are grown, they are sent through the merging algorithm a
484	second time, to pick up any detections that now overlap or have grown
485	over each other.
486
487	Finally, to be accepted, the detections must span {\it both} a minimum
488	number of channels (to remove any spurious single-channel spikes that
489	may be present), and a minimum number of spatial pixels. These
490	numbers, as for the original detection step, are set with the {\tt
491	minChannels} and {\tt minPix} parameters. The channel requirement
492	means there must be at least one set of this many consecutive channels
493	in the source for it to be accepted.
494
495	\section{Outputs}
496	\label{sec-output}
497
498	\subsection{During execution}
499
500	Duchamp provides the user with feedback whilst it is running, to
501	keep the user informed on the progress of the analysis. Most of this
502	consists of self-explanatory messages about the particular stage the
503	program is up to. The relevant parameters are printed to the screen at
504	the start (once the file has been successfully read in), so the user
505	is able to make a quick check that the setup is correct.
506
507	If the cube is being trimmed (\S\ref{sec-modify}), the resulting
508	dimensions are printed to indicate how much has been trimmed. If a
509	reconstruction is being done, a continually updating message shows the
510	current iteration and scale (compared to the maximum scale).
511
512	During the searching algorithms, the progress through the 1D and 2D
513	searches are shown. When the searches have completed,
514	the number of objects found in both the 1D and 2D searches are
515	reported (see \S\ref{sec-detection} for details).
516
517	In the merging process (where multiple detections of the same object
518	are combined -- see \S\ref{sec-merger}), two stages of output
519	occur. The first is when each object in the list is compared with all
520	others. The output shows two numbers: the first being how far through
521	the list we are, and the second being the length of the list. As the
522	algorithm proceeds, the first number should increase and the second
523	should decrease (as objects are being combined). When the numbers
524	meet, the second phase begins, of removing multiply-appearing pixels
525	in each object and removing objects not meeting the minimum channels
526	requirement. During this phase, the total number of accepted objects
527	is shown, which should steadily increase until all have been accepted
528	or rejected. Note that these steps can be very quick for small numbers
529	of detections.
530
531	Since this continual printing to screen has some overhead of time and
532	CPU involved, the user can elect to not print this information by
533	setting the parameter {\tt verbose = 0}. In this case, the user is
534	still informed as to the steps being undertaken, but the details of
535	the progress are not shown.
536
537	\subsection{Results}
538
539	Finally, we get to the results -- the reason for running Duchamp in
540	the first place. Once the detection list is finalised, it is sorted by
541	the mean velocity of the detections (or, if there is no good WCS with
542	the cube, by the mean Z-pixel position). The results are then
543	printed to the screen and to the output file, denoted by the {\tt
544	OutFile} parameter. The results list, an example of which can be seen
545	in Appendix~\ref{app-output}, contains the following columns:
546
547	\begin{entry}
548	\item[Obj\#] The ID number of the detection (simply the sequential
549	count for the list, which is ordered by increasing velocity).
550	\item[Name] The IAU-format name of the detection (based on the RA \&
551	Dec).
552	\item[X] The average X-pixel position.
553	\item[Y] The average Y-pixel position.
554	\item[Z] The average Z-pixel position.
555	\item[RA] The Right Ascension of the centre of the object.
556	\item[DEC] The Declination of the centre of the object.
557	\item[w\_RA] The width of the object in Right Ascension [arcmin].
558	\item[w\_DEC] The width of the object in Declination [arcmin].
559	\item[VEL] The mean velocity of the object [km/s].
560	\item[w\_VEL] The full velocity width of the detection (max channel
561	$-$ min channel, in velocity units [km/s]).
562	\item[X1, X2] The minimum and maximum X-pixel coordinates.
563	\item[Y1, Y2] The minimum and maximum Y-pixel coordinates.
564	\item[Z1, Z2] The minimum and maximum Z-pixel coordinates.
565	\item[Npix] The number of pixels \& channels (\ie distinct $(x,y,z)$
566	coordinates) in the detection.
567	\item[F\_tot] The integrated flux over the object, in the units of the
568	FITS file. %This is calculated
569	\item[F\_peak] The peak flux over the object, in the units of the FITS
570	file.
571	\end{entry}
572	If the WCS is not valid (\ie is not present or does not have all the
573	necessary information), the Name, RA, DEC, VEL and related columns are not
574	printed, but the pixel coordinates are still provided.
575
576	\begin{figure}[t]
577	\begin{center}
578	\includegraphics[width=\textwidth]{example_spectrum}
579	\end{center}
580	\caption{\footnotesize An example of the spectrum output. Note several
581	of the features discussed in the text: the removal of the Milky Way
582	emission around 0 km/s; the red lines indicating the reconstructed
583	spectrum; the blue dashed lines indicating the spectral extent of
584	the detection; and the blue border showing its spatial extent on the
585	0th moment map.}
586	\label{fig-spect}
587	\end{figure}
588
589	Two alternative results files can also be requested. One option is a
590	VOTable-format XML file, containing just the RA, Dec, Velocity and the
591	corresponding widths of the detections, as well as the fluxes. The
592	user should set {\tt flagVOT = 1}, and put the desired filename in the
593	parameter {\tt votFile} -- note that the default is for it not to be
594	produced. This file should be compatible with all Virtual Observatory
595	tools (such as Aladin\footnote{ Aladin can be found on the web at
596	\href{http://aladin.u-strasbg.fr/}{http://aladin.u-strasbg.fr/}}). The
597	second option is an annotation file for use with the Karma toolkit of
598	visualisation tools (in particular, with {\tt kvis}). This will draw a
599	circle at the position of each detection, and number it according to
600	the Obj\# given above. To use, the user should set {\tt flagKarma = 1},
601	and put the desired filename in the parameter {\tt karmaFile} -- again,
602	the default is for it not to be produced.
603
604	As the program is running, it also (optionally) records the detections
605	made in each individual spectrum or channel (see
606	\S\ref{sec-detection} for details on this process). This is
607	recorded in the file denoted by the parameter {\tt LogFile}. This file
608	does not include the columns {\tt Name, RA, DEC, w\_RA, w\_DEC, VEL,
609	w\_VEL}. This file is designed primarily for diagnostic purposes: \eg
610	to see if a given set of pixels is detected in, say, one channel
611	image, but does not survive the merging process. The list of pixels
612	(and their fluxes) in the final detection list are also printed to
613	this file, again for diagnostic purposes. This feature can be turned
614	off by setting {\tt flagLog = false}. (This may be a good idea if you
615	are not interested in its contents, as it can be a large file.)
616
617	\begin{figure}[!t]
618	\begin{center}
619	\includegraphics[width=\textwidth]{example_moment_map}
620	\end{center}
621	\caption{\footnotesize An example of the moment map created by
622	Duchamp. The full extent of the cube is covered, and the 0th moment
623	of each object is shown (integrated individually over all the
624	detected channels).}
625	\label{fig-moment}
626	\end{figure}
627
628	As well as the output data file, a postscript file is created that
629	shows the integrated spectra of each detection, together with a small
630	cutout image (0th moment) and basic information of the detection. If
631	the cube was reconstructed, the spectrum from the reconstruction is
632	shown in red, over the top of the original spectrum. The spectral
633	extent of the detection is indicated with green lines, and a zoom is
634	shown in a separate window. The cutout image can optionally include a
635	border around the spatial pixels that are in the detection (turned on
636	and off by the parameter {\tt drawBorders}). It also includes a scale
637	bar in the bottom left corner to indicate size -- it is 30~arcmin
638	long. An example detection can be seen below in Fig.~\ref{fig-spect}.
639
640	Finally, a couple of images are optionally produced: a 0th moment map
641	of the cube, combining just the detected channels in each object,
642	showing the integrated flux in grey-scale; and a ``detection image'',
643	a grey-scale image where the pixel values are the number of channels
644	that spatial pixel is detected in. In both cases, if {\tt drawBorders =
645	true}, a border is drawn around the spatial extent of each
646	detection. An example moment map is shown in Fig.~\ref{fig-moment}.
647	The production or otherwise of these images is governed by the {\tt
648	flagMaps} parameter.
649
650	The purpose of these images are to provide a visual guide to where the
651	detections have been made, and, particularly in the case of the moment
652	map, to provide an indication of the strength of the source. In both
653	cases, the detections are numbered (in the same way as the output
654	list), and the spatial borders are marked out as for the cutout images
655	in the spectra file. Both these images are saved as postscript files
656	(given by the parameters {\tt momentMap} and {\tt detectionMap}
657	respectively), with the latter also displayed in a {\sc pgplot}
658	window (regardless of the state of {\tt flagMaps}).
659
660	\section{Notes and hints on the use of Duchamp}
661
662	In using Duchamp, the user has to make a number of decisions about
663	the way the program runs. This section is designed to give the user
664	some idea about what the various selections do...
665
666	The main choice is whether or not to use the wavelet
667	reconstruction. The main benefits of this are the marked reduction in
668	the noise level, leading to regularly-shaped detections, and good
669	reliability for faint sources. The main drawback with its use is the
670	long execution time: to reconstruct a $170\times160\times1024$
671	(\hipass) cube often requires three iterations and takes about 20-25
672	minutes. The searching part of the procedure is much quicker (although
673	see the note on merging, below), so if one uses the FDR method on the
674	un-reconstructed cube, the execution time is only a couple of minutes.
675
676	A further drawback with the reconstruction is that it is susceptible
677	to edge effects. If the valid area in the cube (\ie the part that is
678	not BLANK) has very curved edges (such as the \hipass\ polar cap cube,
679	H001, which has a roughly circular shape after gridding), the
680	convolution can produce artefacts in the reconstruction that mimic
681	the edges and lead to some spurious sources. Caution is advised with
682	such data -- the user is advised to check carefully the reconstructed
683	cube for the presence of such artefacts.
684
685	If one chooses the reconstruction method, a further decision is
686	required on the signal-to-noise cutoff used in determining acceptable
687	wavelet coefficients. A larger value will remove more noise from the
688	cube, at the expense of losing fainter sources, while a smaller value
689	will include more noise, which may produce spurious detections, but
690	will be more sensitive to faint sources. Values of less than about
691	$3\sigma$ tend to not reduce the noise a great deal and can lead to
692	many spurious sources.
693
694	The FDR method certainly produces more reliable results than a simple
695	sigma-clipping (\ie thresholding at some number of $\sigma$ above the
696	mean). However, at this point it does not seem to be giving the
697	sensitivity expected for the supplied value of {\tt alpha} (\ie it is
698	not finding as many sources as expected). Work is
699	being done to assess this, and to judge whether there is a real
700	problem (such as with the determination of the statistics), or simply
701	a result of working in 3 dimensions as opposed to 2.
702
703	A further point to bear in mind is that the shape of the detections in
704	a cube that has been reconstructed will be much more regular and
705	smooth -- the ragged edges that objects in the raw cube possess are
706	smoothed by the removal of most of the noise.
707
708	Finally, as Duchamp is still undergoing development, there are some
709	elements that are not fully developed. In particular, it is not as
710	clever as I would like at avoiding interference. The ability to place
711	requirements on the minimum number of channels and pixels partially
712	circumvents this problem, but work is being done to make Duchamp
713	smarter at rejecting signals that are clearly (to a human eye at
714	least) interference. See the following section for further
715	improvements that are planned.
716
717	%\section{Drawbacks of the current program}
718	%
719	%The program currently has a few problems/drawbacks/things to be aware
720	%of that will hopefully be fixed in the future:
721	%\begin{itemize}
722	%
723	%\item Narrow interference spikes are still getting found, particularly
724	% if there is no reconstruction, or reconstruction with a relatively
725	% low {\tt snrRecon} (such as 2 or 3). Increasing the {\tt
726	% minChannels} parameter is one way to circumvent this, but making the
727	% algorithm a bit more clever would be preferable.
728	%
729	%\item Sources that have strong continuum ripple and/or artefacts often
730	% generate many spurious detections. This needs some work to avoid
731	% Duchamp doing this, and until then users are advised to be aware
732	% of the possibility. Strong continuum ripples may generate many
733	% sources on the same spatial pixel, and this will be apparent on the
734	% detection images.
735	%
736	%\item Spectra are integrated over every spatial pixel of the
737	% detection, and this may dilute the actual detection, making it
738	% harder to see \ie the apparent strength of the line as plotted may
739	% not give a true indication of how strong it really is.
740	%
741	%%\item A caution on the merging part of the procedure. This can be time
742	%% consuming if there are many detections that do not require merging
743	%% -- in this case, the time will go like $N^2$ ($N$ = number of
744	%% detections). If there are plenty of mergers, the size of the list
745	%% reduces quickly, so the execution time will be less.
746	%
747	%
748	%\end{itemize}
749
750
751	%\section{Comparison with other software (to be developed further...)}
752	%
753	%\subsection{fred, by Matt Howlett}
754	%
755	%This is the program used in the \hipass\ analysis. It smoothes the
756	%data spectrally with a boxcar filter of a size that varies over a
757	%user-specified range, and then thresholds the data.
758	%
759	%Works effectively, but generally doesn't find as many sources as
760	%Duchamp, particularly when the reconstruction is used. Sensitive to
761	%faint, broad features that fall below the reconstruction threshold.
762	%
763	%Execution takes a long time, depending on the range of filter widths
764	%that are used.
765	%
766	%\subsection{sfind}
767	%
768	%Hard to evaluate, as it does not (as far as I can see) output the
769	%channel number at which detections are made, and does not merge
770	%detections made at adjacent channels (\ie it just works in 2
771	%dimensions).
772	%
773
774	\section{Future Developments}
775
776	This is both a list of planned improvements and a wish-list of
777	features that would be nice to include (but are not planned in the
778	immediate future):
779
780	\begin{itemize}
781
782	\item Ability to invert cube to search for absorption features. {\bf
783	Planned.}
784
785	\item More varied output formats. {\bf Planned.}
786
787	\item Better determination of the noise characteristics of
788	spectral-line cubes, including understanding how the noise is
789	generated and developing a model for it. {\bf Planned.}
790
791	\item Include more source analysis. Examples could be: shape
792	information; measurements of HI mass; better measurements of
793	velocity width and profile... {\bf Some planned.}
794
795	\item Provide some indication of the significance of the detection
796	(\ie some S/N-like value). {\bf Planned.}
797
798	\item Improved ability to reject interference, possibly on the
799	spectral shape of features. {\bf Planned.}
800
801	\item Link to lists of possible counterparts (\eg via NED/SIMBAD/other
802	VO tools?). {\bf Wishlist.}
803
804	\item Add ability to read in a reconstructed cube that has been
805	saved. In this case the residual array will also need to be read
806	in. The idea of this will be to avoid the extended time required for
807	the reconstruction if the same cube is being analysed multiple
808	times. {\bf Wishlist.}
809
810	\item At this point, the ``Milky Way'' channels are discarded and set
811	to zero. It may be that users would like to have those put back in
812	the final cube after the source detection is done, so at some point
813	this option may be added. {\bf Wishlist -- if needed.}
814
815	\end{itemize}
816
817
818	%\bibliographystyle{mn2e}
819	%\bibliographystyle{abbrvnat}
820	%\bibliography{mnrasmnemonic,sourceDetection}
821	\begin{thebibliography}{}
822
823	\bibitem[\protect\citeauthoryear{{Calabretta} \& {Greisen}}{{Calabretta} \&
824	{Greisen}}{2002}]{calabretta02}
825	{Calabretta} M., {Greisen} E., 2002, A\&A, 395, 1077
826
827	\bibitem[\protect\citeauthoryear{{Greisen} \& {Calabretta}}{{Greisen} \&
828	{Calabretta}}{2002}]{greisen02}
829	{Greisen} E., {Calabretta} M., 2002, A\&A, 395, 1061
830
831	\bibitem[\protect\citeauthoryear{{Hanisch}, {Farris}, {Greisen}, {Pence},
832	{Schlesinger}, {Teuben}, {Thompson} \& {Warnock}}{{Hanisch}
833	et~al.}{2001}]{hanisch01}
834	{Hanisch} R., {Farris} A., {Greisen} E., {Pence} W., {Schlesinger} B.,
835	{Teuben} P., {Thompson} R., {Warnock} A., 2001, A\&A, 376, 359
836
837	\bibitem[\protect\citeauthoryear{{Hopkins}, {Miller}, {Connolly}, {Genovese},
838	{Nichol} \& {Wasserman}}{{Hopkins} et~al.}{2002}]{hopkins02}
839	{Hopkins} A., {Miller} C., {Connolly} A., {Genovese} C., {Nichol} R.,
840	{Wasserman} L., 2002, AJ, 123, 1086
841
842	\bibitem[\protect\citeauthoryear{Lutz}{Lutz}{1980}]{lutz80}
843	Lutz R., 1980, The Computer Journal, 23, 262
844
845	\bibitem[\protect\citeauthoryear{{Meyer} et~al.,}{{Meyer}
846	et~al.}{2004}]{meyer04:trunc}
847	{Meyer} M., et~al., 2004, MNRAS, 350, 1195
848
849	\bibitem[\protect\citeauthoryear{{Miller}, {Genovese}, {Nichol}, {Wasserman},
850	{Connolly}, {Reichart}, {Hopkins}, {Schneider} \& {Moore}}{{Miller}
851	et~al.}{2001}]{miller01}
852	{Miller} C., {Genovese} C., {Nichol} R., {Wasserman} L., {Connolly} A.,
853	{Reichart} D., {Hopkins} A., {Schneider} J., {Moore} A., 2001, AJ, 122,
854	3492
855
856	\bibitem[\protect\citeauthoryear{Minchin}{Minchin}{1999}]{minchin99}
857	Minchin R., 1999, PASA, 16, 12
858
859	\bibitem[\protect\citeauthoryear{Starck \& Murtagh}{Starck \&
860	Murtagh}{2002}]{starck02:book}
861	Starck J.-L., Murtagh F., 2002, {``Astronomical Image and Data Analysis''}.
862	Springer
863
864	\end{thebibliography}
865
866
867	\appendix
868	\newpage
869	\section{Available parameters}
870	\label{app-param}
871
872	The full list of parameters that can be listed in the input file are
873	given here. If not listed, they take the default value given in
874	parentheses. Since the order of the parameters in the input file does
875	not matter, they are grouped here in logical sections.
876
877	\subsection*{Input-output related}
878	\begin{entry}
879	\item[ImageFile (no default assumed)] The filename of the
880	data cube to be analysed.
881	\item[OutFile {\tt [./duchamp-Results]}] The file where the final
882	detections are to be recorded. This also records the list of input
883	parameters.
884	\item[SpectraFile {\tt [./duchamp-Spectra.ps]}] The postscript file
885	containing the resulting integrated spectra and images of the
886	detections.
887	\item[flagLog {\tt [true]}] A flag to indicate whether intermediate
888	detections should be logged.
889	\item[LogFile {\tt [./duchamp-Logfile]}] The file in which intermediate
890	detections are logged. These are detections that have not been
891	merged. This is primarily for use in debugging and diagnostic
892	purposes -- normal use of the program will probably not require
893	this.
894	\item[flagSubsection {\tt [false]}] A flag to indicate whether one
895	wants a subsection of the requested image.
896	\item[Subsection {\tt [ [,,*] ]}] The requested subsection, which
897	should be specified in the format {\tt [x1:x2,y1:y2,z1:z2]}, where
898	the limits are inclusive. If the full range of a dimension is
899	required, use a {\tt *}, \eg if you want the full spectral range of
900	a subsection of the image, use {\tt [30:140,30:140,*]}.
901	\item[flagOutputRecon {\tt [false]}] A flag to say whether or not to
902	save the reconstructed cube as a FITS file. The filename will be
903	derived from the ImageFile -- the reconstruction of {\tt image.fits}
904	will be saved as {\tt image.RECON?.fits}, where {\tt ?} stands for
905	the value of {\tt snrRecon} (see below).
906	\item[flagOutputResid {\tt [false]}] As for {\tt flagOutputRecon}, but
907	for the residual array -- the difference between the original cube
908	and the reconstructed cube. The filename will be {\tt
909	image.RESID?.fits}.
910	\item[flagVOT {\tt [false]}] A flag to say whether to create a VOTable
911	file corresponding to the information in {\tt outfile}. This will be
912	an XML file in the Virtual Observatory VOTable format.
913	\item[votFile {\tt [./duchamp-Results.xml]}] The VOTable file with the
914	list of final detections. Some input parameters are also recorded.
915	\item[flagKarma {\tt [false]}] A flag to say whether to create a Karma
916	annotation file corresponding to the information in {\tt
917	outfile}. This can be used as an overlay for the Karma programs such
918	as {\tt kvis}.
919	\item[karmaFile {\tt [./duchamp-Results.ann]}] The Karma annotation
920	file showing the list of final detections.
921	\item[flagMaps {\tt [true]}] A flag to say whether to save postscript
922	files showing the 0th moment map of the whole cube (parameter {\tt
923	momentMap}) and the detection image ({\tt detectionMap}).
924	\item[momentMap {\tt [./latest-moment-map.ps]}] A postscript file
925	containing a map of the 0th moment of the detected sources, as well
926	as pixel and WCS coordinates.
927	\item[detectionMap {\tt [./latest-detection-map.ps]}] A postscript
928	file showing each of the detected objects, coloured in greyscale by
929	the number of channels they span. Also shows pixel and WCS
930	coordinates.
931	\end{entry}
932
933	\subsection*{Modifying the cube}
934	\begin{entry}
935	\item[flagBlankPix {\tt [true]}] A flag to say whether to remove BLANK
936	pixels from the analysis -- these are pixels set to some particular
937	value because they fall outside the imaged area.
938	\item[blankPixValue {\tt [-8.00061]}] The value of the BLANK pixels,
939	if this information is not contained in the FITS header (the usual
940	procedure is to obtain this value from the header information -- in
941	which case the value set by this parameter is ignored).
942	\item[flagMW {\tt [true]}] A flag to say whether to remove channels
943	contaminated by Milky Way (or other) emission -- the flux in these
944	channels is currently just set to 0.
945	\item[maxMW {\tt [112]}] The maximum channel for the Milky Way
946	emission.
947	\item[minMW {\tt [75]}] The minimum channel for the Milky Way
948	emission. Note that the channels specified by {\tt maxMW} and {\tt
949	minMW} are assumed to be Milky Way channels (\ie the range is
950	inclusive).
951	\item[flagBaseline {\tt [false]}] A flag to say whether to remove the
952	baseline from each spectrum in the cube for the purposes of
953	reconstruction and detection.
954	\end{entry}
955
956	\subsection*{Detection related}
957
958	\subsubsection*{General detection}
959	\begin{entry}
960	\item[snrCut {\tt [3.]}] The cut-off value for thresholding, in terms
961	of number of $\sigma$ above the mean.
962	\item[flagGrowth {\tt [true]}] A flag indicating whether or not to
963	grow the detected objects to a smaller threshold.
964	\item[growthCut {\tt [1.5]}] The smaller threshold using in growing
965	detections. In units of $\sigma$ above the mean.
966	\end{entry}
967
968	\subsubsection*{{\` a} trous reconstruction}
969	\begin{entry}
970	\item [flagATrous {\tt [true]}] A flag indicating whether or not to
971	reconstruct the cube using the {\it {\`a} trous} wavelet
972	reconstruction. Currently does this in 3-dimensions. See
973	\S\ref{sec-recon} for details.
974	\item[scaleMin {\tt [1]}] The minimum wavelet scale to be used in the
975	reconstruction. A value of 1 means ``use all scales''.
976	\item[snrRecon {\tt [4]}] The thresholding cutoff used in the
977	reconstruction -- only wavelet coefficients this many $\sigma$ above
978	the mean (or greater) are included in the reconstruction.
979	\item[filterCode {\tt [2]}] The code number of the filter to use in
980	the reconstruction. The options are:
981	\begin{itemize}
982	\item {\bf 1:} B$_3$-spline filter: coefficients =
983	$(\frac{1}{16}, \frac{1}{4}, \frac{3}{8}, \frac{1}{4}, \frac{1}{16})$
984	\item {\bf 2:} Triangle filter: coefficients = $(\frac{1}{4}, \frac{1}{2}, \frac{1}{4})$
985	\item {\bf 3:} Haar wavelet: coefficients = $(0, \frac{1}{2}, \frac{1}{2})$
986	\end{itemize}
987	\end{entry}
988
989	\subsubsection*{FDR method}
990	\begin{entry}
991	\item[flagFDR {\tt [false]}] A flag indicating whether or not to use
992	the False Discovery Rate method in thresholding the pixels.
993	\item[alphaFDR {\tt [0.01]}] The $\alpha$ parameter used in the FDR
994	analysis. The average number of false detections, as a fraction of the
995	total number, will be less than $\alpha$ (see \S\ref{sec-detection}).
996	\end{entry}
997
998	\subsubsection*{Merging detections}
999	\begin{entry}
1000	\item[flagAdjacent {\tt [true]}] A flag indicating whether to use the
1001	``adjacent pixel'' criterion to decide whether to merge objects. If
1002	not, the next two parameters are used to determine whether objects
1003	are within the necessary thresholds.
1004	\item[minPix {\tt [2]}] The minimum number of spatial pixels for a single
1005	detection to be counted.
1006	\item[minChannels {\tt [3]}] The minimum number of consecutive
1007	channels that must be present in the detection for it to be accepted
1008	by the Merging algorithm.
1009	%The minimum number of channels that a
1010	% detection must span for it to be accepted by the Merging algorithm.
1011	\item[threshSpatial {\tt [3.]}] The maximum allowed minimum spatial
1012	separation (in pixels) between two detections for them to be merged
1013	into one. Only used if {\tt flagAdjacent = false}.
1014	\item[threshVelocity {\tt [7.]}] The maximum allowed minimum channel
1015	separation between two detections for them to be merged into
1016	one. %Only used if {\tt flagAdjacent = false}.
1017	\end{entry}
1018
1019	\subsubsection*{Other parameters}
1020	\begin{entry}
1021	\item[drawBorders {\tt [true]}] A flag indicating whether borders
1022	are to be drawn around the detected objects in the moment maps
1023	included in the output (see for example Fig.~\ref{fig-spect}).
1024	\item[verbose {\tt [true]}] A flag indicating whether to print the
1025	progress of computationally-intensive algorithms (such as the
1026	searching and merging) to screen.
1027	\end{entry}
1028
1029
1030	\newpage
1031	\section{Example parameter files}
1032	\label{app-input}
1033
1034	This is what a typical parameter file would look like.
1035
1036	\begin{verbatim}
1037	imageFile /DATA/SITAR_1/whi550/cubes/H201_abcde_luther_chop.fits
1038	logFile temp-Logfile
1039	outFile temp-Results
1040	spectraFile spectra.ps
1041	flagSubsection 0
1042	flagOutputRecon 0
1043	flagOutputResid 0
1044	flagBlankPix 1
1045	blankPixValue -8.00061
1046	flagMW 1
1047	minMW 75
1048	maxMW 112
1049	minPix 3
1050	flagGrowth 1
1051	growthCut 1.5
1052	flagATrous 0
1053	scaleMin 1
1054	snrRecon 4
1055	flagFDR 1
1056	alphaFDR 0.1
1057	numPixPSF 20
1058	snrCut 3
1059	threshSpatial 3
1060	threshVelocity 7
1061	minChannels 4
1062	\end{verbatim}
1063
1064	Note that it is not necessary to include all these parameters in the
1065	file, only those that need to be changed from the defaults (as listed
1066	in Appendix~\ref{app-param}), which in this case would be very few. A
1067	minimal parameter file might look like:
1068	\begin{verbatim}
1069	imageFile /DATA/SITAR_1/whi550/cubes/H201_abcde_luther_chop.fits
1070	flagLog 0
1071	snrRecon 3
1072	snrCut 2.5
1073	minChannels 3
1074	\end{verbatim}
1075	This will reconstruct the cube with a lower SNR value than the
1076	default, select objects at a lower threshold, with a looser minimum
1077	channel requirement, and not keep a log of the intermediate
1078	detections.
1079
1080	The following page demonstrates how the parameters are presented to
1081	the user, both on the screen at execution time and in the output and
1082	log files:
1083	\newpage
1084	\begin{landscape}
1085	Presentation of parameters in output and log files:
1086	\begin{verbatim}
1087	---- Parameters ----
1088	Image to be analysed = /DATA/SITAR_1/whi550/cubes/H201_abcde_luther_chop.fits
1089	Intermediate Logfile = logfile.txt
1090	Final Results file = results.txt
1091	Spectrum file = spectra.ps
1092	VOTable file = results.xml
1093	0th Moment Map = latest-moment-map.ps
1094	Detection Map = latest-detection-map.ps
1095	Saving reconstructed cube? = false
1096	Saving residuals from reconstruction? = false
1097	------
1098	Fixing Blank Pixels? = true
1099	Blank Pixel Value = -8.00061
1100	Removing Milky Way channels? = true
1101	Milky Way Channels = 75-112
1102	Beam Size (pixels) = 10.1788
1103	Removing baselines before search? = false
1104	Minimum # Pixels in a detection = 2
1105	Growing objects after detection? = false
1106	Using A Trous reconstruction? = true
1107	Minimum scale in reconstruction = 1
1108	SNR Threshold within reconstruction = 4
1109	Filter being used for reconstruction = B3 spline function
1110	Using FDR analysis? = false
1111	SNR Threshold = 2.5
1112	Using Adjacent-pixel criterion? = true
1113	Min. # channels for merging = 4
1114	--------------------
1115	\end{verbatim}
1116
1117	\newpage
1118	\section{Example output file}
1119	\label{app-output}
1120	This the typical content of an output file, after running Duchamp
1121	with the parameters illustrated on the previous page.
1122
1123	{\scriptsize
1124	\begin{verbatim}
1125	Results of the Duchamp source finder: Tue Mar 21 16:28:50 EST 2006
1126	---- Parameters ----
1127
1128	(... omitted for clarity -- see previous page for examples...)
1129
1130	--------------------
1131	Total number of detections = 23
1132	--------------------
1133	Obj# Name X Y Z RA DEC w_RA w_DEC VEL w_VEL X1 X2 Y1 Y2 Z1 Z2 Npix F_tot F_peak
1134	-----------------------------------------------------------------------------------------------------------------------------------------------------
1135	1 J0609-2200 59.4 140.6 114.7 06:09:38.50 -22:00:48.20 48.50 39.42 213.061 65.957 55 66 136 145 113 118 185 17.5725 0.2125
1136	2 J0608-2605 65.2 79.6 116.2 06:08:10.23 -26:05:06.57 44.47 39.47 233.119 39.574 60 70 76 85 115 118 50 4.1441 0.1002
1137	3 J0606-2724 70.8 59.8 121.4 06:06:33.08 -27:24:43.28 52.48 47.57 302.213 39.574 65 77 53 64 120 123 213 17.0659 0.1497
1138	4 J0611-2142 52.5 145.1 162.5 06:11:36.34 -21:42:00.01 32.40 23.47 843.727 118.722 49 56 142 147 158 167 303 44.3940 0.4103
1139	5 J0600-2903 89.7 35.3 202.4 06:00:51.38 -29:03:02.51 23.94 28.09 1370.285 184.679 87 92 32 38 195 209 319 26.5725 0.1729
1140	6 J0559-2643 95.5 70.2 222.6 05:59:10.59 -26:43:05.94 15.94 12.09 1637.316 105.531 94 97 69 71 219 227 35 1.9253 0.0630
1141	7 J0617-2727 34.8 58.3 227.5 06:17:24.45 -27:27:53.89 20.77 23.41 1701.802 303.400 33 37 56 61 215 238 176 11.4138 0.0929
1142	8 J0609-2145 60.3 144.4 229.6 06:09:23.17 -21:45:36.06 16.15 11.81 1729.279 105.531 59 62 143 145 225 233 25 1.4760 0.0679
1143	9 J0559-2529 95.7 88.6 231.1 05:59:08.81 -25:29:34.50 27.88 24.14 1749.440 250.635 92 98 86 91 220 239 257 16.9297 0.1155
1144	10 J0601-2145 88.9 144.4 232.3 06:01:10.14 -21:45:58.59 31.96 24.13 1764.657 224.253 86 93 142 147 222 239 415 34.0304 0.1655
1145	11 J0615-2638 40.0 70.8 232.6 06:15:44.32 -26:38:29.42 16.56 19.57 1769.033 52.765 38 41 69 73 231 235 44 2.7565 0.0685
1146	12 J0605-2610 75.9 78.4 233.1 06:05:00.16 -26:10:21.68 28.14 23.84 1775.066 224.253 73 79 76 81 225 242 352 27.0587 0.1545
1147	13 J0601-2344 88.0 114.9 235.7 06:01:25.72 -23:44:18.18 35.96 32.07 1809.749 263.826 84 92 112 119 226 246 724 85.1317 0.2968
1148	14 J0615-2238 38.2 130.6 253.6 06:15:48.32 -22:38:45.75 12.39 15.70 2046.530 118.722 37 39 129 132 248 257 40 2.3169 0.0697
1149	15 J0617-2309 31.4 122.8 258.0 06:17:51.07 -23:09:29.22 16.46 15.53 2103.912 39.574 30 33 121 124 256 259 23 1.4243 0.0624
1150	16 J0612-2153 49.5 142.3 271.1 06:12:29.32 -21:53:16.05 24.36 19.56 2276.976 395.740 47 52 140 144 257 287 318 20.7117 0.1008
1151	17 J0616-2137 35.2 145.9 300.0 06:16:34.07 -21:37:30.95 20.22 7.46 2658.607 224.252 33 37 145 146 294 311 40 3.8507 0.1271
1152	18 J0544-2740 144.0 54.9 325.4 05:44:31.07 -27:40:42.30 3.58 12.13 2993.384 39.574 144 144 54 56 324 327 7 0.4362 0.0569
1153	19 J0555-3000 107.2 20.7 367.5 05:55:28.58 -30:00:46.76 19.67 24.31 3547.812 39.574 105 109 18 23 366 369 72 6.4819 0.1692
1154	20 J0559-2325 96.0 119.6 532.1 05:59:04.98 -23:25:19.04 11.92 16.08 5720.289 52.765 95 97 118 121 530 534 27 1.2865 0.0508
1155	21 J0616-2653 37.9 67.0 547.0 06:16:23.08 -26:53:11.73 12.36 11.67 5916.731 39.574 37 39 66 68 546 549 25 1.6374 0.0642
1156	22 J0619-2256 25.1 125.9 724.2 06:19:39.49 -22:56:06.15 12.38 11.60 8254.112 39.573 24 26 125 127 723 726 13 0.6982 0.0593
1157	23 J0552-2920 116.9 30.5 727.0 05:52:33.03 -29:20:54.43 11.60 20.25 8290.842 303.400 116 118 28 32 716 739 132 35.8343 0.4787
1158	\end{verbatim}
1159	}
1160
1161	%\end{landscape}
1162
1163	\newpage
1164	\section{Example VOTable output}
1165	\label{app-votable}
1166	This is part of the VOTable, in XML format, corresponding to the
1167	output file in Appendix~\ref{app-output} (the indentation has been removed to make it fit on the page!).
1168
1169	%\begin{landscape}
1170	{\scriptsize
1171	\begin{verbatim}
1172	<?xml version="1.0"?>
1173	<VOTABLE version="1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
1174	xsi:noNamespaceSchemaLocation="http://www.ivoa.net/xml/VOTable/VOTable/v1.1">
1175	<COOSYS ID="J2000" equinox="J2000." epoch="J2000." system="eq_FK5"/>
1176	<RESOURCE name="Duchamp Output">
1177	<TABLE name="Detections">
1178	<DESCRIPTION>Detected sources and parameters from running the Duchamp source finder.</DESCRIPTION>
1179	<PARAM name="FITS file" datatype="char" ucd="meta.file;meta.fits" value="/DATA/SITAR_1/whi550/cubes/H201_abcde_luther_chop.fits"/>
1180	<FIELD name="ID" ID="col1" ucd="meta.id" datatype="int" width="4"/>
1181	<FIELD name="Name" ID="col2" ucd="meta.id;meta.main" datatype="char" arraysize="14"/>
1182	<FIELD name="RA" ID="col3" ucd="pos.eq.ra;meta.main" ref="J2000" datatype="float" width="10" precision="6" unit="deg"/>
1183	<FIELD name="Dec" ID="col4" ucd="pos.eq.dec;meta.main" ref="J2000" datatype="float" width="10" precision="6" unit="deg"/>
1184	<FIELD name="w_RA" ID="col3" ucd="phys.angSize;pos.eq.ra" ref="J2000" datatype="float" width="7" precision="2" unit="arcmin"/>
1185	<FIELD name="w_Dec" ID="col4" ucd="phys.angSize;pos.eq.dec" ref="J2000" datatype="float" width="7" precision="2" unit="arcmin"/>
1186	<FIELD name="Vel" ID="col4" ucd="phys.veloc;src.dopplerVeloc" datatype="float" width="9" precision="3" unit="km/s"/>
1187	<FIELD name="w_Vel" ID="col4" ucd="phys.veloc;src.dopplerVeloc;spect.line.width" datatype="float" width="8" precision="3" unit="km/s"/>
1188	<FIELD name="Integrated_Flux" ID="col4" ucd="phys.flux;spect.line.intensity" datatype="float" width="10" precision="3" unit="km/s"/>
1189	<DATA>
1190	<TABLEDATA>
1191	<TR>
1192	<TD> 1</TD><TD> J0609-2200</TD><TD> 92.410416</TD><TD>-22.013390</TD><TD> 48.50</TD><TD> 39.42</TD><TD> 213.061</TD><TD> 65.957</TD><TD> 17.572</TD>
1193	</TR>
1194	<TR>
1195	<TD> 2</TD><TD> J0608-2605</TD><TD> 92.042633</TD><TD>-26.085157</TD><TD> 44.47</TD><TD> 39.47</TD><TD> 233.119</TD><TD> 39.574</TD><TD> 4.144</TD>
1196	</TR>
1197	<TR>
1198	<TD> 3</TD><TD> J0606-2724</TD><TD> 91.637840</TD><TD>-27.412022</TD><TD> 52.48</TD><TD> 47.57</TD><TD> 302.213</TD><TD> 39.574</TD><TD> 17.066</TD>
1199	</TR>
1200	<TR>
1201	<TD> 4</TD><TD> J0611-2142</TD><TD> 92.901421</TD><TD>-21.700003</TD><TD> 32.40</TD><TD> 23.47</TD><TD> 843.727</TD><TD> 118.722</TD><TD> 44.394</TD>
1202	</TR>
1203	<TR>
1204	<TD> 5</TD><TD> J0600-2903</TD><TD> 90.214081</TD><TD>-29.050697</TD><TD> 23.94</TD><TD> 28.09</TD><TD> 1370.285</TD><TD> 184.679</TD><TD> 26.573</TD>
1205	</TR>
1206	(... table truncated for clarity ...)
1207	</TABLEDATA>
1208	</DATA>
1209	</TABLE>
1210	</RESOURCE>
1211	</VOTABLE>
1212	\end{verbatim}
1213	}
1214	\end{landscape}
1215
1216	\section{Robust statistics for a Normal distribution}
1217	\label{app-madfm}
1218
1219	The Normal, or Gaussian, distribution for mean $\mu$ and standard
1220	deviation $\sigma$ can be written as
1221	\[
1222	f(x) = \frac{1}{\sqrt{2\pi\sigma^2}}\ e^{-(x-\mu)^2/2\sigma^2}.
1223	\]
1224
1225	When one has a purely Gaussian signal, it is straightforward to
1226	estimate $\sigma$ by calculating the standard deviation (or rms) of
1227	the data. However, if there is a small amount of signal present on top
1228	of Gaussian noise, and one wants to estimate the $\sigma$ for the
1229	noise, the presence of the large values from the signal can bias the
1230	estimator to higher values.
1231
1232	An alternative way is to use the median ($m$) and median absolute deviation
1233	from the median ($s$) to estimate $\mu$ and $\sigma$. The median is the
1234	middle of the distribution, defined for a continuous distribution by
1235	\[
1236	\int_{-\infty}^{m} f(x) \diff x = \int_{m}^{\infty} f(x) \diff x.
1237	\]
1238	From symmetry, we quickly see that for the continuous Normal
1239	distribution, $m=\mu$. We consider the case henceforth of $\mu=0$,
1240	without loss of generality.
1241
1242	To find $s$, we find the distribution of the absolute deviation from
1243	the median, and then find the median of that distribution. This
1244	distribution is given by
1245	\begin{eqnarray*}
1246	g(x) &= &{\mbox{\rm distribution of }} \|x\|\\
1247	&= &f(x) + f(-x),\ x\ge0\\
1248	&= &\sqrt{\frac{2}{\pi\sigma^2}}\, e^{-x^2/2\sigma^2},\ x\ge0.
1249	\end{eqnarray*}
1250	So, the median absolute deviation from the median, $s$, is given by
1251	\[
1252	\int_{0}^{s} g(x) \diff x = \int_{s}^{\infty} g(x) \diff x.
1253	\]
1254	Now, $\int_{0}^{\infty}e^{-x^2/2\sigma^2} \diff x = \sqrt{\pi\sigma^2/2}$, and
1255	so $\int_{s}^{\infty} e^{-x^2/2\sigma^2} \diff x =
1256	\sqrt{\pi\sigma^2/2} - \int_{0}^{s} e^{-\frac{x^2}{2\sigma^2}} \diff x
1257	$. Hence, to find $s$ we simply solve the following equation (setting $\sigma=1$ for
1258	simplicity -- equivalent to stating $x$ and $s$ in units of $\sigma$):
1259	\[
1260	\int_{0}^{s}e^{-x^2/2} \diff x - \sqrt{\pi/8} = 0.
1261	\]
1262	This is hard to solve analytically (no nice analytic solution exists
1263	for the finite integral that I'm aware of), but straightforward to
1264	solve numerically, yielding the value of $s=0.6744888$. Thus, to
1265	estimate $\sigma$ for a Normally distributed data set, one can calculate
1266	$s$, then divide by 0.6744888 (or multiply by 1.4826042) to obtain the
1267	correct estimator.
1268
1269	Note that this is different to solutions quoted elsewhere,
1270	specifically in \citet{meyer04:trunc}, where the same robust estimator
1271	is used but with an incorrect conversion to standard deviation -- they
1272	assume $\sigma = s\sqrt{\pi/2}$. This, in fact, is the conversion used
1273	to convert the {\it mean} absolute deviation from the mean to the
1274	standard deviation. This means that the cube noise in the \hipass\
1275	catalogue (parameter Rms$_{\rm cube}$) should be 18\% larger than
1276	quoted.
1277
1278	%a different value for $s/\sigma$ (of
1279	%$\sqrt{2/\pi}\approx0.797885$) is quoted. It should thus be noted that
1280	%this means the values quoted by \citet{meyer04:trunc} for the cube noise in
1281	%the \hipass\ catalogue should be 18\% larger (since 0.1486042 is 18\%
1282	%larger than $\sqrt{\pi/2}\approx1.253314$).
1283
1284	\section{How Gaussian noise changes with wavelet scale.}
1285	\label{app-scaling}
1286
1287	The key element in the wavelet reconstruction of an array is the
1288	thresholding of the individual wavelet coefficient arrays. This is
1289	usually done by choosing a level to be some number of standard
1290	deviations above the mean value.
1291
1292	However, since the wavelet arrays are produced by convolving the input
1293	array by an increasingly large filter, the pixels in the coefficient
1294	arrays become increasingly correlated as the scale of the filter
1295	increases. This results in the measured standard deviation from a
1296	given coefficient array decreasing with increasing scale. To calculate
1297	this, we need to take into account how many other pixels each pixel in
1298	the convolved array depends on.
1299
1300	To demonstrate, suppose we have a 1-D array with $N$ pixel values
1301	given by $F_i,\ i=1,...,N$, and we convolve it with the B$_3$-spline
1302	filter with coefficients $\{1/16,1/4,3/8,1/4,1/16\}$. The flux of the
1303	$i$th pixel in the convolved array will be
1304	\[
1305	F'_i = \frac{1}{16}F_{i-2} + \frac{1}{16}F_{i-2} + \frac{3}{8}F_{i}
1306	+ \frac{1}{4}F_{i-1} + \frac{1}{16}F_{i+2}
1307	\]
1308	and the flux of the corresponding pixel in the wavelet array will be
1309	\[
1310	W'_i = F_i - F'_i = \frac{1}{16}F_{i-2} + \frac{1}{16}F_{i-2} + \frac{5}{8}F_{i}
1311	+ \frac{1}{4}F_{i-1} + \frac{1}{16}F_{i+2}
1312	\]
1313	Now, assuming each pixel has the same standard deviation
1314	$\sigma_i=\sigma$, we can work out the standard deviation for the
1315	coefficient array:
1316	\[
1317	\sigma'_i = \sigma \sqrt{\left(\frac{1}{16}\right)^2 + \left(\frac{1}{4}\right)^2
1318	+ \left(\frac{5}{8}\right)^2 + \left(\frac{1}{4}\right)^2 + \left(\frac{1}{16}\right)^2}
1319	= 0.72349\ \sigma
1320	\]
1321	Thus, the first scale wavelet coefficient array will have a standard
1322	deviation of 72.3\% of the input array. This procedure can be followed
1323	to calculate the necessary values for all scales, dimensions and
1324	filters used by Duchamp.
1325
1326	Calculating these values is, therefore, a critical step in performing
1327	the reconstruction. \citet{starck02:book} did so by simulating data sets
1328	with Gaussian noise, taking the wavelet transform, and measuring the
1329	value of $\sigma$ for each scale. We take a different approach, by
1330	calculating the scaling factors directly from the filter coefficients
1331	by taking the wavelet transform of an array made up of a 1 in the
1332	central pixel and 0s everywhere else. The scaling value is then
1333	derived by adding in quadrature all the wavelet coefficient values at
1334	each scale. We give the scaling factors for the three filters
1335	available to Duchamp on the following page. These values are
1336	hard-coded into Duchamp, so no on-the-fly calculation of them is
1337	necessary.
1338
1339	Memory limitations prevent us from calculating factors for large
1340	scales, particularly for the three-dimensional case (hence the --
1341	symbols in the tables). To calculate factors for
1342	higher scales than those available, we note the following
1343	relationships apply for large scales to a sufficient level of precision:
1344	\begin{itemize}
1345	\item 1-D: factor(scale $i$) = factor(scale $i-1$)$/\sqrt{2}$.
1346	\item 2-D: factor(scale $i$) = factor(scale $i-1$)$/2$.
1347	\item 1-D: factor(scale $i$) = factor(scale $i-1$)$/\sqrt{8}$.
1348	\end{itemize}
1349
1350	\newpage
1351	\begin{itemize}
1352	\item {\bf B$_3$-Spline Function:} $\{1/16,1/4,3/8,1/4,1/16\}$
1353
1354	\begin{tabular}{llll}
1355	Scale & 1 dimension & 2 dimension & 3 dimension\\ \hline
1356	1 & 0.723489806 & 0.890796310 & 0.956543592\\
1357	2 & 0.285450405 & 0.200663851 & 0.120336499\\
1358	3 & 0.177947535 & 0.0855075048 & 0.0349500154\\
1359	4 & 0.122223156 & 0.0412474444 & 0.0118164242\\
1360	5 & 0.0858113122 & 0.0204249666 & 0.00413233507\\
1361	6 & 0.0605703043 & 0.0101897592 & 0.00145703714\\
1362	7 & 0.0428107206 & 0.00509204670 & 0.000514791120\\
1363	8 & 0.0302684024 & 0.00254566946 & --\\
1364	9 & 0.0214024008 & 0.00127279050 & --\\
1365	10 & 0.0151336781 & 0.000636389722 & --\\
1366	11 & 0.0107011079 & 0.000318194170 & --\\
1367	12 & 0.00756682272 & -- & --\\
1368	13 & 0.00535055108 & -- & --\\
1369	%14 & 0.00378341085 & -- & --\\
1370	%15 & 0.00267527545 & -- & --\\
1371	%16 & 0.00189170541 & -- & --\\
1372	%17 & 0.00133763772 & -- & --\\
1373	%18 & 0.000945852704 & -- & --
1374	\end{tabular}
1375
1376	\item {\bf Triangle Function:} $\{1/4,1/2,1/4\}$
1377
1378	\begin{tabular}{llll}
1379	Scale & 1 dimension & 2 dimension & 3 dimension\\ \hline
1380	1 & 0.612372436 & 0.800390530 & 0.895954449 \\
1381	2 & 0.330718914 & 0.272878894 & 0.192033014\\
1382	3 & 0.211947812 & 0.119779282 & 0.0576484078\\
1383	4 & 0.145740298 & 0.0577664785 & 0.0194912393\\
1384	5 & 0.102310944 & 0.0286163283 & 0.00681278387\\
1385	6 & 0.0722128185 & 0.0142747506 & 0.00240175885\\
1386	7 & 0.0510388224 & 0.00713319703 & 0.000848538128 \\
1387	8 & 0.0360857673 & 0.00356607618 & 0.000299949455 \\
1388	9 & 0.0255157615 & 0.00178297280 & -- \\
1389	10 & 0.0180422389 & 0.000891478237 & -- \\
1390	11 & 0.0127577667 & 0.000445738098 & -- \\
1391	12 & 0.00902109930 & 0.000222868922 & -- \\
1392	13 & 0.00637887978 & -- & -- \\
1393	%14 & 0.00451054902 & -- & -- \\
1394	%15 & 0.00318942978 & -- & -- \\
1395	%16 & 0.00225527449 & -- & -- \\
1396	%17 & 0.00159471988 & -- & -- \\
1397	%18 & 0.000112763724 & -- & --
1398
1399	\end{tabular}
1400
1401	\item {\bf Haar Wavelet:} $\{0,1/2,1/2\}$
1402
1403	\begin{tabular}{llll}
1404	Scale & 1 dimension & 2 dimension & 3 dimension\\ \hline
1405	1 & 0.707167810 & 0.433012702 & 0.935414347 \\
1406	2 & 0.500000000 & 0.216506351 & 0.330718914\\
1407	3 & 0.353553391 & 0.108253175 & 0.116926793\\
1408	4 & 0.250000000 & 0.0541265877 & 0.0413398642\\
1409	5 & 0.176776695 & 0.0270632939 & 0.0146158492\\
1410	6 & 0.125000000 & 0.0135316469 & 0.00516748303
1411
1412	\end{tabular}
1413
1414
1415	\end{itemize}
1416
1417	\end{document}
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450

Note: See TracBrowser for help on using the repository browser.

Download in other formats: