source: trunk/docs/hints.tex @ 1016

Last change on this file since 1016 was 1011, checked in by MatthewWhiting, 12 years ago

Adding sections on memory usage and on how the detected objects are represented.

File size: 10.1 KB
Line 
1% -----------------------------------------------------------------------
2% hints.tex: Section giving some tips & hints on how Duchamp is best
3%            used.
4% -----------------------------------------------------------------------
5% Copyright (C) 2006, Matthew Whiting, ATNF
6%
7% This program is free software; you can redistribute it and/or modify it
8% under the terms of the GNU General Public License as published by the
9% Free Software Foundation; either version 2 of the License, or (at your
10% option) any later version.
11%
12% Duchamp is distributed in the hope that it will be useful, but WITHOUT
13% ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
14% FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
15% for more details.
16%
17% You should have received a copy of the GNU General Public License
18% along with Duchamp; if not, write to the Free Software Foundation,
19% Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA
20%
21% Correspondence concerning Duchamp may be directed to:
22%    Internet email: Matthew.Whiting [at] atnf.csiro.au
23%    Postal address: Dr. Matthew Whiting
24%                    Australia Telescope National Facility, CSIRO
25%                    PO Box 76
26%                    Epping NSW 1710
27%                    AUSTRALIA
28% -----------------------------------------------------------------------
29\secA{Notes and hints on the use of \duchamp}
30\label{sec-notes}
31
32In using \duchamp, the user has to make a number of decisions about
33the way the program runs. This section is designed to give the user
34some idea about what to choose.
35
36\secB{Memory usage}
37
38A lot of attention has been paid to the memory usage in \duchamp,
39recognising that data cubes are going to be increasing in size with
40new generation correlators and wider fields of view. However, users
41with large cubes should be aware of the likely usage for different
42modes of operation and plan their \duchamp execution carefully.
43
44At the start of the program, memory is allocated sufficient for:
45\begin{itemize}
46\item The entire pixel array (as requested, subject to any
47subsection).
48\item The spatial extent, which holds the map of detected pixels (for
49output into the detection map).
50\item If smoothing or reconstruction has been selected, another array
51of the same size as the pixel array. This will hold the
52smoothed/reconstructed array (the original needs to be kept to do the
53correct parameterisation of detected sources).
54\item If baseline-subtraction has been selected, a further array of
55the same size as the pixel array. This holds the baseline values,
56which need to be added back in prior to parameterisation.
57\end{itemize}
58All of these will be float type, except for the detection map, which
59is short.
60
61There will, of course, be additional allocation during the course of
62the program. The detection list will progressively grow, with each
63detection having a memory footprint as described in
64Section~\ref{sec-scan}. But perhaps more important and with a larger
65impact will be the temporary space allocated for various algorithms.
66
67The largest of these will be the wavelet reconstruction. This will
68require an additional allocation of twice the size of the array being
69reconstructed, one for the coefficients and one for the wavelets -
70each scale will overwrite the previous one. So, for the 1D case, this
71means an additional allocation of twice the spectral dimension (since
72we only reconstruct one spectrum at a time), but the 3D case will
73require an additional allocation of twice the cube size (this means
74there needs to be available at least four times the size of the input
75cube for 3D reconstruction, plus the additional overheads of
76detections and so forth).
77
78The smoothing has less of an impact, since it only operates on the
79lower dimensions, but it will make an additional allocation of twice
80the relevant size (spectral dimension for spectral smoothing, or
81spatial image size for the spatial Gaussian smoothing).
82
83The other large allocation of temporary space will be for calculating
84robust statistics. The median-based calculations require at least
85partial sorting of the data, and so cannot be done on the original
86image cube. This is done for the entire cube and so the temporary
87memory increase can be large.
88
89
90
91\secB{Preprocessing}
92
93\secC{Should I do any preprocessing?}
94
95The main choice is whether to alter the cube to try and enhance the
96detectability of objects, by either smoothing or reconstructing via
97the \atrous method. The main benefits of both methods are the marked
98reduction in the noise level, leading to regularly-shaped detections,
99and good reliability for faint sources.
100
101The main drawback with the \atrous method is the long execution time:
102to reconstruct a $170\times160\times1024$ (\hipass) cube often
103requires three iterations and takes about 20-25 minutes to run
104completely. Note that this is for the more complete three-dimensional
105reconstruction: using \texttt{reconDim = 1} makes the reconstruction
106quicker (the full program then takes less than 5 minutes), but it is
107still the largest part of the time.
108
109The smoothing procedure is computationally simpler, and thus quicker,
110than the reconstruction. The spectral Hanning method adds only a very
111small overhead on the execution, and the spatial Gaussian method,
112while taking longer, will be done (for the above example) in less than
1132 minutes. Note that these times will depend on the size of the
114filter/kernel used: a larger filter means more calculations.
115
116The searching part of the procedure is much quicker: searching an
117un-reconstructed cube leads to execution times of less than a
118minute. Alternatively, using the ability to read in previously-saved
119reconstructed arrays makes running the reconstruction more than once a
120more feasible prospect.
121
122On the positive side, the shape of the detections in a cube that has
123been reconstructed or smoothed will be much more regular and smooth --
124the ragged edges that objects in the raw cube possess are smoothed by
125the removal of most of the noise. This enables better determination of
126the shapes and characteristics of objects.
127
128\secC{Reconstruction vs Smoothing}
129
130While the time overhead is larger for the reconstruction case, it will
131potentially provide a better recovery of real sources than the
132smoothing case. This is because it probes the full range of scales
133present in the cube (or spectral domain), rather than the specific
134scale determined by the Hanning filter or Gaussian kernel used in the
135smoothing.
136
137When considering the reconstruction method, note that the 2D
138reconstruction (\texttt{reconDim = 2}) can be susceptible to edge
139effects. If the valid area in the cube (\ie the part that is not
140BLANK) has non-rectangular edges, the convolution can produce
141artefacts in the reconstruction that mimic the edges and can lead
142(depending on the selection threshold) to some spurious
143sources. Caution is advised with such data -- the user is advised to
144check carefully the reconstructed cube for the presence of such
145artefacts.
146
147A more important effect that can be important for 2D reconstructions
148is the fact that the pixels in the spatial domain typically exhibit
149some correlation due to the beam. Since each channel is reconstructed
150independently, beam-sized noise fluctuations can rise above the
151reconstruction threshold more frequency than in the 1D case, providing
152a greater number of spurious single-channel spikes in a given
153reconstructed spectrum. This effect will also be present in 3D
154reconstructions, although to a lesser degree since information in the
155spectral direction is also taken into account.
156
157If one chooses the reconstruction method, a further decision is
158required on the signal-to-noise cutoff used in determining acceptable
159wavelet coefficients. A larger value will remove more noise from the
160cube, at the expense of losing fainter sources, while a smaller value
161will include more noise, which may produce spurious detections, but
162will be more sensitive to faint sources. Values of less than about
163$3\sigma$ tend to not reduce the noise a great deal and can lead to
164many spurious sources (this depends, of course on the cube itself).
165
166The smoothing options have less parameters to consider: basically just
167the size of the smoothing function or kernel. Spectrally smoothing
168with a Hanning filter of width 3 (the smallest possible) is very
169efficient at removing spurious one-channel objects that may result
170just from statistical fluctuations of the noise. One may want to use
171larger widths or kernels of larger size to look for features of a
172particular scale in the cube.
173
174When it comes to searching, the FDR method produces more reliable
175results than simple sigma-clipping, particularly in the absence of
176reconstruction.  However, it does not work in exactly the way one
177would expect for a given value of \texttt{alpha}. For instance,
178setting fairly liberal values of \texttt{alpha} (say, 0.1) will often
179lead to a much smaller fraction of false detections (\ie much less
180than 10\%). This is the effect of the merging algorithms, that combine
181the sources after the detection stage, and reject detections not
182meeting the minimum pixel or channel requirements.  It is thus better
183to aim for larger \texttt{alpha} values than those derived from a
184straight conversion of the desired false detection rate.
185
186If the FDR method is not used, caution is required when choosing the
187S/N cutoff. Typical cubes have very large numbers of pixels, so even
188an apparently large cutoff will still result in a not-insignificant
189number of detections simply due to random fluctuations of the noise
190background. For instance, a $4\sigma$ threshold on a cube of Gaussian
191noise of size $100\times100\times1024$ will result in $\sim340$
192single-pixel detections. This is where the minimum channel and pixel
193requirements are important in rejecting spurious detections.
194
195%Finally, as \duchamp is still undergoing development, there are some
196%elements that are not fully developed. In particular, it is not as
197%clever as I would like at avoiding interference. The ability to place
198%requirements on the minimum number of channels and pixels partially
199%circumvents this problem, but work is being done to make \duchamp
200%smarter at rejecting signals that are clearly (to a human eye at
201%least) interference. See the following section for further
202%improvements that are planned.
Note: See TracBrowser for help on using the repository browser.