1 | % ----------------------------------------------------------------------- |
---|
2 | % executionFlow.tex: Section detailing each of the main algorithms |
---|
3 | % used by Duchamp. |
---|
4 | % ----------------------------------------------------------------------- |
---|
5 | % Copyright (C) 2006, Matthew Whiting, ATNF |
---|
6 | % |
---|
7 | % This program is free software; you can redistribute it and/or modify it |
---|
8 | % under the terms of the GNU General Public License as published by the |
---|
9 | % Free Software Foundation; either version 2 of the License, or (at your |
---|
10 | % option) any later version. |
---|
11 | % |
---|
12 | % Duchamp is distributed in the hope that it will be useful, but WITHOUT |
---|
13 | % ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or |
---|
14 | % FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
---|
15 | % for more details. |
---|
16 | % |
---|
17 | % You should have received a copy of the GNU General Public License |
---|
18 | % along with Duchamp; if not, write to the Free Software Foundation, |
---|
19 | % Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA |
---|
20 | % |
---|
21 | % Correspondence concerning Duchamp may be directed to: |
---|
22 | % Internet email: Matthew.Whiting [at] atnf.csiro.au |
---|
23 | % Postal address: Dr. Matthew Whiting |
---|
24 | % Australia Telescope National Facility, CSIRO |
---|
25 | % PO Box 76 |
---|
26 | % Epping NSW 1710 |
---|
27 | % AUSTRALIA |
---|
28 | % ----------------------------------------------------------------------- |
---|
29 | \secA{What \duchamp is doing} |
---|
30 | \label{sec-flow} |
---|
31 | |
---|
32 | Each of the steps that \duchamp goes through in the course of its |
---|
33 | execution are discussed here in more detail. This should provide |
---|
34 | enough background information to fully understand what \duchamp is |
---|
35 | doing and what all the output information is. For those interested in |
---|
36 | the programming side of things, \duchamp is written in C/C++ and makes |
---|
37 | use of the \textsc{cfitsio}, \textsc{wcslib} and \textsc{pgplot} |
---|
38 | libraries. |
---|
39 | |
---|
40 | \secB{Image input} |
---|
41 | \label{sec-input} |
---|
42 | |
---|
43 | The cube is read in using basic \textsc{cfitsio} commands, and stored |
---|
44 | as an array in a special C++ class. This class keeps track of the list |
---|
45 | of detected objects, as well as any reconstructed arrays that are made |
---|
46 | (see \S\ref{sec-recon}). The World Coordinate System |
---|
47 | (WCS)\footnote{This is the information necessary for translating the |
---|
48 | pixel locations to quantities such as position on the sky, |
---|
49 | frequency, velocity, and so on.} information for the cube is also |
---|
50 | obtained from the FITS header by \textsc{wcslib} functions |
---|
51 | \citep{greisen02, calabretta02,greisen06}, and this information, in |
---|
52 | the form of a \texttt{wcsprm} structure, is also stored in the same |
---|
53 | class. See \S\ref{sec-wcs} for more details. |
---|
54 | |
---|
55 | A sub-section of a cube can be requested by defining the subsection |
---|
56 | with the \texttt{subsection} parameter and setting |
---|
57 | \texttt{flagSubsection = true} -- this can be a good idea if the cube |
---|
58 | has very noisy edges, which may produce many spurious detections. |
---|
59 | |
---|
60 | There are two ways of specifying the \texttt{subsection} string. The |
---|
61 | first is the generalised form |
---|
62 | \texttt{[x1:x2:dx,y1:y2:dy,z1:z2:dz,...]}, as used by the |
---|
63 | \textsc{cfitsio} library. This has one set of colon-separated numbers |
---|
64 | for each axis in the FITS file. In this manner, the x-coordinates run |
---|
65 | from \texttt{x1} to \texttt{x2} (inclusive), with steps of |
---|
66 | \texttt{dx}. The step value can be omitted, so a subsection of the |
---|
67 | form \texttt{[2:50,2:50,10:1000]} is still valid. In fact, \duchamp |
---|
68 | does not make use of any step value present in the subsection string, |
---|
69 | and any that are present are removed before the file is opened. |
---|
70 | |
---|
71 | If the entire range of a coordinate is required, one can replace the |
---|
72 | range with a single asterisk, \eg \texttt{[2:50,2:50,*]}. Thus, the |
---|
73 | subsection string \texttt{[*,*,*]} is simply the entire cube. Note |
---|
74 | that the pixel ranges for each axis start at 1, so the full pixel |
---|
75 | range of a 100-pixel axis would be expressed as 1:100. A complete |
---|
76 | description of this section syntax can be found at the |
---|
77 | \textsc{fitsio} web site% |
---|
78 | \footnote{% |
---|
79 | \href% |
---|
80 | {http://heasarc.gsfc.nasa.gov/docs/software/fitsio/c/c\_user/node91.html}% |
---|
81 | {http://heasarc.gsfc.nasa.gov/docs/software/fitsio/c/c\_user/node91.html}}. |
---|
82 | |
---|
83 | |
---|
84 | Making full use of the subsection requires knowledge of the size of |
---|
85 | each of the dimensions. If one wants to, for instance, trim a certain |
---|
86 | number of pixels off the edges of the cube, without examining the cube |
---|
87 | to obtain the actual size, one can use the second form of the |
---|
88 | subsection string. This just gives a number for each axis, \eg |
---|
89 | \texttt{[5,5,5]} (which would trim 5 pixels from the start \emph{and} |
---|
90 | end of each axis). |
---|
91 | |
---|
92 | All types of subsections can be combined \eg \texttt{[5,2:98,*]}. |
---|
93 | |
---|
94 | Typically, the units of pixel brightness are given by the FITS file's |
---|
95 | BUNIT keyword. However, this may often be unwieldy (for instance, the |
---|
96 | units are Jy/beam, but the values are around a few mJy/beam). It is |
---|
97 | therefore possible to nominate new units, to which the pixel values |
---|
98 | will be converted, by using the \texttt{newFluxUnits} input |
---|
99 | parameter. The units must be directly translatable from the existing |
---|
100 | ones -- for instance, if BUNIT is Jy/beam, you cannot specify mJy, it |
---|
101 | must be mJy/beam. If an incompatible unit is given, the BUNIT value is |
---|
102 | used instead. |
---|
103 | |
---|
104 | \secB{World Coordinate System} |
---|
105 | \label{sec-wcs} |
---|
106 | |
---|
107 | \duchamp uses the \textsc{wcslib} package to handle the conversions |
---|
108 | between pixel and world coordinates. This package uses the |
---|
109 | transformations described in the WCS papers |
---|
110 | \citep{greisen02,calabretta02,greisen06}. The same package handles the |
---|
111 | WCS axes in the spatial plots. The conversions used are governed by |
---|
112 | the information in the FITS header -- this is parsed by |
---|
113 | \textsc{wcslib} to create the appropriate transformations. |
---|
114 | |
---|
115 | For the spectral axis, however, \duchamp provides the ability to change the |
---|
116 | type of transformation used, so that different spectral quantities can |
---|
117 | be calculated. By using the parameter \texttt{spectralType}, the user |
---|
118 | can change from the type given in the FITS header. This should be done |
---|
119 | in line with the conventions outlined in \citet{greisen06}. The |
---|
120 | spectral type can be either a full 8-character string (\eg |
---|
121 | 'VELO-F2V'), or simply the 4-character ``S-type'' (\eg 'VELO'), in |
---|
122 | which case \textsc{wcslib} will handle the conversion. |
---|
123 | |
---|
124 | The rest frequency can be provided as well. This may be necessary, if |
---|
125 | the FITS header does not specify one and you wish to transform to |
---|
126 | velocity. Alternatively, you may want to make your measurements based |
---|
127 | on a different spectral line (\eg OH1665 instead of |
---|
128 | H\textsc{i}-21cm). The input parameter \texttt{restFrequency} is used, |
---|
129 | and this will override the FITS header value. |
---|
130 | |
---|
131 | Finally, the user may also request different spectral units from those |
---|
132 | in the FITS file, or from the defaults arising from the |
---|
133 | \textsc{wcslib} transformation. The input parameter |
---|
134 | \texttt{spectralUnits} should be used, and \citet{greisen02} should be |
---|
135 | consulted to ensure the syntax is appropriate. |
---|
136 | |
---|
137 | \secB{Image modification} |
---|
138 | \label{sec-modify} |
---|
139 | |
---|
140 | Several modifications to the cube can be made that improve the |
---|
141 | execution and efficiency of \duchamp (their use is optional, governed |
---|
142 | by the relevant flags in the parameter file). |
---|
143 | |
---|
144 | \secC{BLANK pixel removal} |
---|
145 | \label{sec-blank} |
---|
146 | |
---|
147 | If the imaged area of a cube is non-rectangular (see the example in |
---|
148 | Fig.~\ref{fig-moment}, a cube from the HIPASS survey), BLANK pixels |
---|
149 | are used to pad it out to a rectangular shape. The value of these |
---|
150 | pixels is given by the FITS header keywords BLANK, BSCALE and |
---|
151 | BZERO. While these pixels make the image a nice shape, they will take |
---|
152 | up unnecessary space in memory, and so to potentially speed up the |
---|
153 | processing we can trim them from the edge. This is done when the |
---|
154 | parameter \texttt{flagTrim = true}. If the above keywords are not |
---|
155 | present, the trimming will not be done (in this case, a similar effect |
---|
156 | can be accomplished, if one knows where the ``blank'' pixels are, by |
---|
157 | using the subsection option). |
---|
158 | |
---|
159 | The amount of trimming is recorded, and these pixels are added back in |
---|
160 | once the source-detection is completed (so that quoted pixel positions |
---|
161 | are applicable to the original cube). Rows and columns are trimmed one |
---|
162 | at a time until the first non-BLANK pixel is reached, so that the |
---|
163 | image remains rectangular. In practice, this means that there will be |
---|
164 | some BLANK pixels left in the trimmed image (if the non-BLANK region |
---|
165 | is non-rectangular). However, these are ignored in all further |
---|
166 | calculations done on the cube. |
---|
167 | |
---|
168 | \secC{Baseline removal} |
---|
169 | \label{sec-baseline} |
---|
170 | |
---|
171 | Second, the user may request the removal of baselines from the |
---|
172 | spectra, via the parameter \texttt{flagBaseline}. This may be |
---|
173 | necessary if there is a strong baseline ripple present, which can |
---|
174 | result in spurious detections at the high points of the ripple. |
---|
175 | |
---|
176 | There are two ways in which the baseline can be calculated. The first |
---|
177 | makes use of the \atrous wavelet reconstruction procedure (see |
---|
178 | \S\ref{sec-recon}), where only the two largest scales are kept. |
---|
179 | |
---|
180 | The second method uses a median filter combined with a sliding box of some |
---|
181 | specified width. For each pixel, the baseline is determined to be the |
---|
182 | median value of all pixels in a region of that width centred on the |
---|
183 | pixel of interest. The box is truncated at the edges of the spectrum |
---|
184 | (so that fewer pixels are included), but there will always be at least |
---|
185 | half the width of the box present. |
---|
186 | |
---|
187 | The choice between these methods is made using the |
---|
188 | \texttt{baselineType} parameter -- only values of \texttt{atrous} (the |
---|
189 | default) or \texttt{median} are accepted. If the \texttt{median} |
---|
190 | method is being used, the full width of the box is specified with |
---|
191 | \texttt{baselineBoxWidth}. |
---|
192 | |
---|
193 | The baseline calculation is done separately for each spatial pixel |
---|
194 | (\ie for each spectrum in the cube), and the baselines are stored and |
---|
195 | added back in before any output is done. In this way the quoted fluxes |
---|
196 | and displayed spectra are as one would see from the input cube itself |
---|
197 | -- even though the detection (and reconstruction if applicable) is |
---|
198 | done on the baseline-removed cube. |
---|
199 | |
---|
200 | When the \texttt{atrous} method is used, the presence of very strong |
---|
201 | signals (for instance, masers at several hundred Jy) could affect the |
---|
202 | determination of the baseline, and would lead to a large dip centred |
---|
203 | on the signal in the baseline-subtracted spectrum. To prevent this, |
---|
204 | the signal is trimmed prior to the reconstruction process at some |
---|
205 | standard threshold (at $5\sigma$ above the mean). The baseline |
---|
206 | determined should thus be representative of the true, signal-free |
---|
207 | baseline. Note that this trimming is only a temporary measure which |
---|
208 | does not affect the source-detection. |
---|
209 | |
---|
210 | The baseline values can be saved to a FITS file for later |
---|
211 | examination. See \S\ref{sec-baselineOut} for details. |
---|
212 | |
---|
213 | \secC{Flagging channels} |
---|
214 | \label{sec-flagging} |
---|
215 | |
---|
216 | Finally, it is possible to flag particular channels so that they are |
---|
217 | not included in the search. This is an extension of the old ``Milky |
---|
218 | Way'' channel range. That allowed the specification of a single |
---|
219 | contiguous block of channels, and was aimed at excluding Galactic |
---|
220 | emission in extragalactic \hi cubes. |
---|
221 | |
---|
222 | The new flagging approach allows the specification of a series of |
---|
223 | channels and channel ranges. This allows the user to block the |
---|
224 | detection of known regions of RFI, or known but uninteresting emission |
---|
225 | (\eg Galactic \hi emission if you are searching for extragalactic |
---|
226 | sources). |
---|
227 | |
---|
228 | Flagged channels are specified using the \texttt{flaggedChannels} |
---|
229 | parameter, and can be given by a comma-separated list of single values |
---|
230 | or ranges. For instance: \texttt{flaggedChannels 5,6,12-20,87}. These |
---|
231 | channels refer to channel numbers in the \textbf{the full cube}, |
---|
232 | before any subsection is applied. Also note that \textbf{the channel |
---|
233 | numbering starts at zero}, that is, channel 0 is the first channel |
---|
234 | of the cube. |
---|
235 | |
---|
236 | The effect is to ignore detections that lie within these channels. If |
---|
237 | a spatial search is being conducted (\ie one channel map at a time), |
---|
238 | these channels are simply not searched. If a spectral search is being |
---|
239 | conducted, those channels will be flagged so that no detection is made |
---|
240 | within them. The spectral output (see Fig.~\ref{fig-spect}) will |
---|
241 | ignore them as far as scaling the plot goes, and the channel range |
---|
242 | will be indicated by a green hatched box. |
---|
243 | |
---|
244 | Note that these channels will be included in any smoothing or |
---|
245 | reconstruction that is done on the array, and so will be included in |
---|
246 | any saved FITS file (see \S\ref{sec-reconIO}). |
---|
247 | |
---|
248 | \secB{Image reconstruction} |
---|
249 | \label{sec-recon} |
---|
250 | |
---|
251 | The user can direct \duchamp to reconstruct the data cube using the |
---|
252 | multi-resolution \atrous wavelet algorithm. A good description of the |
---|
253 | procedure can be found in \citet{starck02a}. The reconstruction is an |
---|
254 | effective way of removing a lot of the noise in the image, allowing |
---|
255 | one to search reliably to fainter levels, and reducing the number of |
---|
256 | spurious detections. This is an optional step, but one that greatly |
---|
257 | enhances the reliability of the resulting catalogue, at the cost of |
---|
258 | additional CPU and memory usage (see \S\ref{sec-notes} for |
---|
259 | discussion). |
---|
260 | |
---|
261 | \secC{Algorithm} |
---|
262 | |
---|
263 | The steps in the \atrous reconstruction are as follows: |
---|
264 | \begin{enumerate} |
---|
265 | \item The reconstructed array is set to 0 everywhere. |
---|
266 | \item The input array is discretely convolved with a given filter |
---|
267 | function. This is determined from the parameter file via the |
---|
268 | \texttt{filterCode} parameter -- see Appendix~\ref{app-param} for |
---|
269 | details on the filters available. Edges are dealt with by assuming |
---|
270 | reflection at the boundary. |
---|
271 | \item The wavelet coefficients are calculated by taking the difference |
---|
272 | between the convolved array and the input array. |
---|
273 | \item If the wavelet coefficients at a given point are above the |
---|
274 | requested reconstruction threshold (given by \texttt{snrRecon} as |
---|
275 | the number of $\sigma$ above the mean and adjusted to the current |
---|
276 | scale -- see Appendix~\ref{app-scaling}), add these to the |
---|
277 | reconstructed array. |
---|
278 | \item The separation between the filter coefficients is doubled. (Note |
---|
279 | that this step provides the name of the procedure\footnote{\atrous |
---|
280 | means ``with holes'' in French.}, as gaps or holes are created in |
---|
281 | the filter coverage.) |
---|
282 | \item The procedure is repeated from step 2, using the convolved array |
---|
283 | as the input array. |
---|
284 | \item Continue until the required maximum number of scales is reached. |
---|
285 | \item Add the final smoothed (\ie convolved) array to the |
---|
286 | reconstructed array. This provides the ``DC offset'', as each of the |
---|
287 | wavelet coefficient arrays will have zero mean. |
---|
288 | \end{enumerate} |
---|
289 | |
---|
290 | The range of scales at which the selection of wavelet coefficients is |
---|
291 | made is governed by the \texttt{scaleMin} and \texttt{scaleMax} |
---|
292 | parameters. The minimum scale used is given by \texttt{scaleMin}, |
---|
293 | where the default value is 1 (the first scale). This parameter is |
---|
294 | useful if you want to ignore the highest-frequency features |
---|
295 | (e.g. high-frequency noise that might be present). Normally the |
---|
296 | maximum scale is calculated from the size of the input array, but it |
---|
297 | can be specified by using \texttt{scaleMax}. A value $\le0$ will |
---|
298 | result in the use of the calculated value, as will a value of |
---|
299 | \texttt{scaleMax} greater than the calculated value. Use of these two |
---|
300 | parameters can allow searching for features of a particular scale |
---|
301 | size, for instance searching for narrow absorption features. |
---|
302 | |
---|
303 | The reconstruction has at least two iterations. The first iteration |
---|
304 | makes a first pass at the wavelet reconstruction (the process outlined |
---|
305 | in the 8 stages above), but the residual array will likely have some |
---|
306 | structure still in it, so the wavelet filtering is done on the |
---|
307 | residual, and any significant wavelet terms are added to the final |
---|
308 | reconstruction. This step is repeated until the relative change in the |
---|
309 | measured standard deviation of the residual (see note below on the |
---|
310 | evaluation of this quantity) is less than some value, given by the |
---|
311 | \texttt{reconConvergence} parameter. |
---|
312 | |
---|
313 | It is important to note that the \atrous decomposition is an example |
---|
314 | of a ``redundant'' transformation. If no thresholding is performed, |
---|
315 | the sum of all the wavelet coefficient arrays and the final smoothed |
---|
316 | array is identical to the input array. The thresholding thus removes |
---|
317 | only the unwanted structure in the array. |
---|
318 | |
---|
319 | Note that any BLANK pixels that are still in the cube will not be |
---|
320 | altered by the reconstruction -- they will be left as BLANK so that |
---|
321 | the shape of the valid part of the cube is preserved. |
---|
322 | |
---|
323 | \secC{Note on Statistics} |
---|
324 | |
---|
325 | The correct calculation of the reconstructed array needs good |
---|
326 | estimators of the underlying mean and standard deviation (or rms) of |
---|
327 | the background noise distribution. The methods used to estimate these |
---|
328 | quantities are detailed in \S\ref{sec-stats} -- the default behaviour |
---|
329 | is to use robust estimators, to avoid biasing due to bright pixels. |
---|
330 | |
---|
331 | When thresholding the different wavelet scales, the value of the rms |
---|
332 | as measured from the wavelet array needs to be scaled to account for |
---|
333 | the increased amount of correlation between neighbouring pixels (due |
---|
334 | to the convolution). See Appendix~\ref{app-scaling} for details on |
---|
335 | this scaling. |
---|
336 | |
---|
337 | \secC{User control of reconstruction parameters} |
---|
338 | |
---|
339 | The most important parameter for the user to select in relation to the |
---|
340 | reconstruction is the threshold for each wavelet array. This is set |
---|
341 | using the \texttt{snrRecon} parameter, and is given as a multiple of |
---|
342 | the rms (estimated by the MADFM) above the mean (which for the wavelet |
---|
343 | arrays should be approximately zero). There are several other |
---|
344 | parameters that can be altered as well that affect the outcome of the |
---|
345 | reconstruction. |
---|
346 | |
---|
347 | By default, the cube is reconstructed in three dimensions, using a |
---|
348 | three-dimensional filter and three-dimensional convolution. This can be |
---|
349 | altered, however, using the parameter \texttt{reconDim}. If set to 1, |
---|
350 | this means the cube is reconstructed by considering each spectrum |
---|
351 | separately, whereas \texttt{reconDim=2} will mean the cube is |
---|
352 | reconstructed by doing each channel map separately. The merits of |
---|
353 | these choices are discussed in \S\ref{sec-notes}, but it should be |
---|
354 | noted that a 2-dimensional reconstruction can be susceptible to edge |
---|
355 | effects if the spatial shape of the pixel array is not rectangular. |
---|
356 | |
---|
357 | The user can also select the minimum and maximum scales to be used in |
---|
358 | the reconstruction. The first scale exhibits the highest frequency |
---|
359 | variations, and so ignoring this one can sometimes be beneficial in |
---|
360 | removing excess noise. The default is to use all scales |
---|
361 | (\texttt{minscale = 1}). |
---|
362 | |
---|
363 | The convergence of the \atrous iterations is governed by the |
---|
364 | \texttt{reconConvergence} parameter, which is the fractional decrease |
---|
365 | in the standard deviation of the residuals from one iteration to the |
---|
366 | next. \duchamp will do at least two iterations, and then continue |
---|
367 | until the decrease is less than the value of this parameter. |
---|
368 | |
---|
369 | Finally, the filter that is used for the convolution can be selected |
---|
370 | by using \texttt{filterCode} and the relevant code number -- the |
---|
371 | choices are listed in Appendix~\ref{app-param}. A larger filter will |
---|
372 | give a better reconstruction, but take longer and use more memory when |
---|
373 | executing. When multi-dimensional reconstruction is selected, this |
---|
374 | filter is used to construct a 2- or 3-dimensional equivalent. |
---|
375 | |
---|
376 | \secB{Smoothing the cube} |
---|
377 | \label{sec-smoothing} |
---|
378 | |
---|
379 | An alternative to doing the wavelet reconstruction is to smooth the |
---|
380 | cube. This technique can be useful in reducing the noise level (at |
---|
381 | the cost of making neighbouring pixels correlated and blurring any |
---|
382 | signal present), and is particularly well suited to the case where a |
---|
383 | particular signal size (\ie a certain channel width or spatial size) |
---|
384 | is believed to be present in the data. |
---|
385 | |
---|
386 | There are two alternative methods that can be used: spectral |
---|
387 | smoothing, using the Hanning filter; or spatial smoothing, using a 2D |
---|
388 | Gaussian kernel. These alternatives are outlined below. To utilise the |
---|
389 | smoothing option, set the parameter \texttt{flagSmooth=true} and set |
---|
390 | \texttt{smoothType} to either \texttt{spectral} or \texttt{spatial}. |
---|
391 | |
---|
392 | \secC{Spectral smoothing} |
---|
393 | |
---|
394 | When \texttt{smoothType = spectral} is selected, the cube is smoothed |
---|
395 | only in the spectral domain. Each spectrum is independently smoothed |
---|
396 | by a Hanning filter, and then put back together to form the smoothed |
---|
397 | cube, which is then used by the searching algorithm (see below). Note |
---|
398 | that in the case of both the reconstruction and the smoothing options |
---|
399 | being requested, the reconstruction will take precedence and the |
---|
400 | smoothing will \emph{not} be done. |
---|
401 | |
---|
402 | There is only one parameter necessary to define the degree of |
---|
403 | smoothing -- the Hanning width $a$ (given by the user parameter |
---|
404 | \texttt{hanningWidth}). The coefficients $c(x)$ of the Hanning filter |
---|
405 | are defined by |
---|
406 | \[ |
---|
407 | c(x) = |
---|
408 | \begin{cases} |
---|
409 | \frac{1}{2}\left(1+\cos(\frac{\pi x}{a})\right) &|x| < (a+1)/2\\ |
---|
410 | 0 &|x| \geq (a+1)/2. |
---|
411 | \end{cases},\ a,x \in \mathbb{Z} |
---|
412 | \] |
---|
413 | Note that the width specified must be an |
---|
414 | odd integer (if the parameter provided is even, it is incremented by |
---|
415 | one). |
---|
416 | |
---|
417 | \secC{Spatial smoothing} |
---|
418 | |
---|
419 | When \texttt{smoothType = spatial} is selected, the cube is smoothed |
---|
420 | only in the spatial domain. Each channel map is independently smoothed |
---|
421 | by a two-dimensional Gaussian kernel, put back together to form the |
---|
422 | smoothed cube, and used in the searching algorithm (see below). Again, |
---|
423 | reconstruction is always done by preference if both techniques are |
---|
424 | requested. |
---|
425 | |
---|
426 | The two-dimensional Gaussian has three parameters to define it, |
---|
427 | governed by the elliptical cross-sectional shape of the Gaussian |
---|
428 | function: the FWHM (full-width at half-maximum) of the major and minor |
---|
429 | axes, and the position angle of the major axis. These are given by the |
---|
430 | user parameters \texttt{kernMaj, kernMin} \& \texttt{kernPA}. If a |
---|
431 | circular Gaussian is required, the user need only provide the |
---|
432 | \texttt{kernMaj} parameter. The \texttt{kernMin} parameter will then |
---|
433 | be set to the same value, and \texttt{kernPA} to zero. If we define |
---|
434 | these parameters as $a,b,\theta$ respectively, we can define the |
---|
435 | kernel by the function |
---|
436 | \[ |
---|
437 | k(x,y) = \frac{1}{2\pi\sigma_X\sigma_Y} \exp\left[-0.5 |
---|
438 | \left(\frac{X^2}{\sigma_X^2} + \frac{Y^2}{\sigma_Y^2} \right) |
---|
439 | \right] |
---|
440 | \] |
---|
441 | where $(x,y)$ are the offsets from the central pixel of the gaussian |
---|
442 | function, and |
---|
443 | \begin{align*} |
---|
444 | X& = x\sin\theta - y\cos\theta& |
---|
445 | Y&= x\cos\theta + y\sin\theta\\ |
---|
446 | \sigma_X^2& = \frac{(a/2)^2}{2\ln2}& |
---|
447 | \sigma_Y^2& = \frac{(b/2)^2}{2\ln2}\\ |
---|
448 | \end{align*} |
---|
449 | |
---|
450 | The size of the kernel is determined both by these FWHM parameters and |
---|
451 | the cutoff parameter \texttt{spatialSmoothCutoff}. The width is |
---|
452 | determined by |
---|
453 | \[ |
---|
454 | W = \sigma_\text{maj} \sqrt{(2\log(C))} |
---|
455 | \] |
---|
456 | where $C$ is the cutoff value. The value of $W$ is rounded up to get |
---|
457 | the half-width of the kernel in pixels. The default value of |
---|
458 | \texttt{spatialSmoothCutoff} is 1.e-10, which gives full kernel widths |
---|
459 | of 31 pix for \texttt{kernMaj=5} and 19 pix for \texttt{kernMaj=3}. |
---|
460 | |
---|
461 | The algorithm provides different ways, controlled by the input |
---|
462 | parameter \texttt{smoothEdgeMethod}, to handle those pixels that lie |
---|
463 | within a kernel half-width from the edge of the image and so cannot |
---|
464 | have the full kernel placed on top of them. The default behaviour, |
---|
465 | with \texttt{smoothEdgeMethod=equal}, simply treats these pixels in |
---|
466 | the same way, adding up the product of the kernel pixels with the |
---|
467 | image pixels for all cases that lie within the image boundary. A more |
---|
468 | drastic alternative, \texttt{smoothEdgeMethod=truncate}, is to simply |
---|
469 | not evaluate the convolution at these pixels -- they will be set at |
---|
470 | some null value that will not contribute to any detection. Thirdly, |
---|
471 | setting \texttt{smoothEdgeMethod=scale} will evaluate the convolution |
---|
472 | as for the \texttt{'equal'} case, but scale down the edge pixels by |
---|
473 | summing over only those kernel pixels contributing. |
---|
474 | |
---|
475 | \secB{Input/Output of reconstructed/smoothed arrays} |
---|
476 | \label{sec-reconIO} |
---|
477 | |
---|
478 | The smoothing and reconstruction stages can be relatively |
---|
479 | time-consuming, particularly for large cubes and reconstructions in |
---|
480 | 3-D (or even spatial smoothing). To get around this, \duchamp provides |
---|
481 | a shortcut to allow users to perform multiple searches (\eg with |
---|
482 | different thresholds) on the same reconstruction/smoothing setup |
---|
483 | without re-doing the calculations each time. |
---|
484 | |
---|
485 | To save the reconstructed array as a FITS file, set |
---|
486 | \texttt{flagOutputRecon = true}. The file will be saved in the same |
---|
487 | directory as the input image, so the user needs to have write |
---|
488 | permissions for that directory. |
---|
489 | |
---|
490 | The name of the file can given by the \texttt{fileOutputRecon} |
---|
491 | parameter, but this can be ignored and \duchamp will present a name |
---|
492 | based on the reconstruction parameters. The filename will be derived |
---|
493 | from the input filename, with extra information detailing the |
---|
494 | reconstruction that has been done. For example, suppose |
---|
495 | \texttt{image.fits} has been reconstructed using a 3-dimensional |
---|
496 | reconstruction with filter \#2, thresholded at $4\sigma$ using all |
---|
497 | scales from 1 to 5, with a convergence criterion of 0.005. The output |
---|
498 | filename will then be \texttt{image.RECON-3-2-4-1-5-0.005.fits} (\ie |
---|
499 | it uses the six parameters relevant for the \atrous reconstruction as |
---|
500 | listed in Appendix~\ref{app-param}). The new FITS file will also have |
---|
501 | these parameters as header keywords. If a subsection of the input |
---|
502 | image has been used (see \S\ref{sec-input}), the format of the output |
---|
503 | filename will be \texttt{image.sub.RECON-3-2-4-1-5-0.005.fits}, and the |
---|
504 | subsection that has been used is also stored in the FITS header. |
---|
505 | |
---|
506 | Likewise, the residual image, defined as the difference between the |
---|
507 | input and reconstructed arrays, can also be saved in the same manner |
---|
508 | by setting \texttt{flagOutputResid = true}. Its filename will be the |
---|
509 | same as above, with \texttt{RESID} replacing \texttt{RECON}. |
---|
510 | |
---|
511 | If a reconstructed image has been saved, it can be read in and used |
---|
512 | instead of redoing the reconstruction. To do so, the user should set |
---|
513 | the parameter \texttt{flagReconExists = true}. The user can indicate |
---|
514 | the name of the reconstructed FITS file using the \texttt{reconFile} |
---|
515 | parameter, or, if this is not specified, \duchamp searches for the |
---|
516 | file with the name as defined above. If the file is not found, the |
---|
517 | reconstruction is performed as normal. Note that to do this, the user |
---|
518 | needs to set \texttt{flagAtrous = true} (obviously, if this is |
---|
519 | \texttt{false}, the reconstruction is not needed). |
---|
520 | |
---|
521 | To save the smoothed array, set \texttt{flagOutputSmooth = true}. As |
---|
522 | for the reconstructed/residual arrays, the name of the file can given |
---|
523 | by the parameter \texttt{fileOutputSmooth}, but this can be ignored |
---|
524 | and \duchamp will present a name that indicates the both the type and |
---|
525 | the details of the smoothing method used. It will be either |
---|
526 | \texttt{image.SMOOTH-1D-a.fits}, where a is replaced by the Hanning |
---|
527 | width used, or \texttt{image.SMOOTH-2D-a-b-c-X-f.fits}, where a,b,c |
---|
528 | are the Gaussian kernel parameters, X represents the edge method |
---|
529 | (E=equal, T=truncate, S=scale), and f is |
---|
530 | $-\log_{10}(\text{cutoff})$. Similarly to the reconstruction case, a |
---|
531 | saved file can be read in by setting \texttt{flagSmoothExists = true} |
---|
532 | and either specifying a file to be read with the \texttt{smoothFile} |
---|
533 | parameter or relying on \duchamp to find the file with the name as |
---|
534 | given above. |
---|
535 | |
---|
536 | |
---|
537 | \secB{Searching the image} |
---|
538 | \label{sec-detection} |
---|
539 | |
---|
540 | \secC{Representation of detected objects} |
---|
541 | \label{sec-scan} |
---|
542 | |
---|
543 | \begin{figure}[t] |
---|
544 | \includegraphics[width=\textwidth]{exampleObject} |
---|
545 | \caption{An example of the run-length encoding method of storing |
---|
546 | pixel information. The scans used to encode the image are listed |
---|
547 | alongside the relevant row. The pixels are colour-coded by |
---|
548 | nominal pixel values, but note that the pixel values themselves |
---|
549 | do not form part of the encoding and are not kept as part of the |
---|
550 | object class. } |
---|
551 | \label{fig-objExample} |
---|
552 | \end{figure} |
---|
553 | |
---|
554 | The principle aim of \duchamp is to provide a catalogue of sources |
---|
555 | located in the image. While running, \duchamp needs to maintain for |
---|
556 | each source several data structures that will contribute to the memory |
---|
557 | footprint: a record of which pixels contribute to the source; a set of |
---|
558 | measured parameters that will go into the catalogue; and a separate |
---|
559 | two-dimensional map showing the spatial location of detected pixels |
---|
560 | (carrying this around makes the computation of detection maps easier |
---|
561 | -- see \S\ref{sec-spatialmaps}). |
---|
562 | |
---|
563 | To keep track of the set of detected pixels, \duchamp |
---|
564 | employs specialised techniques that keep the memory usage |
---|
565 | manageable. A naive method could be to store each single pixel, but |
---|
566 | this results in a lot of redundant information being stored in memory. |
---|
567 | |
---|
568 | To reduce the storage requirements, the run-length encoding method is |
---|
569 | used for storing the spatial information. In this fashion, an object |
---|
570 | in 2D is stored as a series of ``runs'', encoded by a row number (the |
---|
571 | $y$-value), the starting column (the minimum $x$-value) and the run |
---|
572 | length ($\ell_x$: the number of contiguous pixels in that row |
---|
573 | connected to the starting pixel). A single set of $(y,x,\ell_x)$ |
---|
574 | values is called a ``scan''. A two-dimensional image is therefore made |
---|
575 | up of a set of scans. An example can be seen in |
---|
576 | Fig.~\ref{fig-objExample}. Note that the object shown has fourteen |
---|
577 | pixels, and so would require 28 integers to record the positions of |
---|
578 | all pixels. The run-length encoding uses just 18 integers to record |
---|
579 | the same information. The longer the runs are in each scan, the |
---|
580 | greater the saving of storage over the naive method. |
---|
581 | |
---|
582 | A 3D object is stored as a set of channel maps, with a channel map |
---|
583 | being a 2D plane with constant $z$-value. Each channel map is itself a |
---|
584 | set of scans showing the $(x,y)$ position of the pixels. The |
---|
585 | additional detection map is stored as a separate channel map, also |
---|
586 | made up of scans. |
---|
587 | |
---|
588 | Note that these pixel map representations do not carry the flux |
---|
589 | information with them. They store just the pixel locations and need to |
---|
590 | be combined with an array of flux values to provide parameters such as |
---|
591 | integrated flux. The advantage of this approach is that the pixel |
---|
592 | locations can be easily applied to different flux arrays as the need |
---|
593 | permits (for instance, defining them using the reconstructed array, |
---|
594 | yet evaluating parameters on the original array). |
---|
595 | |
---|
596 | This scan-based run-length encoding is how the individual detections |
---|
597 | are stored in the binary catalogue described in \S\ref{sec-bincat}. |
---|
598 | |
---|
599 | \secC{Technique} |
---|
600 | \label{sec-searchTechnique} |
---|
601 | |
---|
602 | The basic idea behind detection in \duchamp is to locate sets of |
---|
603 | contiguous voxels that lie above some threshold. No size or shape |
---|
604 | requirement is imposed upon the detections, and no fitting (for |
---|
605 | instance, fitting Gaussian profiles) is done on the sources. All |
---|
606 | \duchamp does is find connected groups of bright voxels and report |
---|
607 | their locations and basic parameters. |
---|
608 | |
---|
609 | One threshold is calculated for the entire cube, enabling calculation |
---|
610 | of signal-to-noise ratios for each source (see |
---|
611 | \S\ref{sec-output} for details). The user can manually specify a |
---|
612 | value (using the parameter \texttt{threshold}) for the threshold, |
---|
613 | which will override the calculated value. Note that this option |
---|
614 | overrides any settings of \texttt{snrCut} or FDR options (see below). |
---|
615 | |
---|
616 | The cube can be searched in one of two ways, governed by the input |
---|
617 | parameter \texttt{searchType}. If \texttt{searchType=spatial}, the |
---|
618 | cube is searched one channel map at a time, using the 2-dimensional |
---|
619 | raster-scanning algorithm of \citet{lutz80} that connects groups of |
---|
620 | neighbouring pixels. Such an algorithm cannot be applied directly to a |
---|
621 | 3-dimensional case, as it requires that objects are completely nested |
---|
622 | in a row (when scanning along a row, if an object finishes and other |
---|
623 | starts, you won't get back to the first until the second is completely |
---|
624 | finished for the row). Three-dimensional data does not have this |
---|
625 | property, hence the need to treat the data on a 2-dimensional basis at |
---|
626 | most. |
---|
627 | |
---|
628 | Alternatively, if \texttt{searchType=spectral}, the searching is done |
---|
629 | in one dimension on each individual spatial pixel's spectrum. This is |
---|
630 | a simpler search, but there are potentially many more of them. |
---|
631 | |
---|
632 | Although there are parameters that govern the minimum number of pixels |
---|
633 | in a spatial, spectral and total sense that an object must have |
---|
634 | (\texttt{minPix}, \texttt{minChannels} and \texttt{minVoxels} |
---|
635 | respectively), these criteria are not applied at this point - see |
---|
636 | \S\ref{sec-reject} for details. |
---|
637 | |
---|
638 | Finally, the search only looks for positive features. If one is |
---|
639 | interested instead in negative features (such as absorption lines), |
---|
640 | set the parameter \texttt{flagNegative = true}. This will invert the |
---|
641 | cube (\ie multiply all pixels by $-1$) prior to the search, and then |
---|
642 | re-invert the cube (and the fluxes of any detections) after searching |
---|
643 | is complete. If the reconstructed or smoothed array has been read in |
---|
644 | from disk, this will also be inverted at the same time. All outputs |
---|
645 | are done in the same manner as normal, so that fluxes of detections |
---|
646 | will be negative. |
---|
647 | |
---|
648 | \secC{Calculating statistics} |
---|
649 | \label{sec-stats} |
---|
650 | |
---|
651 | A crucial part of the detection process (as well as the wavelet |
---|
652 | reconstruction: \S\ref{sec-recon}) is estimating the statistics that |
---|
653 | define the detection threshold. To determine a threshold, we need to |
---|
654 | estimate from the data two parameters: the middle of the noise |
---|
655 | distribution (the ``noise level''), and the width of the distribution |
---|
656 | (the ``noise spread''). The noise level is estimated by either the |
---|
657 | mean or the median, and the noise spread by the rms (or the standard |
---|
658 | deviation) or the median absolute deviation from the median |
---|
659 | (MADFM). The median and MADFM are robust statistics, in that they are |
---|
660 | not biased by the presence of a few pixels much brighter than the |
---|
661 | noise. |
---|
662 | |
---|
663 | All four statistics are calculated automatically, but the choice of |
---|
664 | parameters that will be used is governed by the input parameter |
---|
665 | \texttt{flagRobustStats}. This has the default value \texttt{true}, |
---|
666 | meaning the underlying mean of the noise distribution is estimated by |
---|
667 | the median, and the underlying standard deviation is estimated by the |
---|
668 | MADFM. In the latter case, the value is corrected, under the |
---|
669 | assumption that the underlying distribution is Normal (Gaussian), by |
---|
670 | dividing by 0.6744888 -- see Appendix~\ref{app-madfm} for details. If |
---|
671 | \texttt{flagRobustStats=false}, the mean and rms are used instead. |
---|
672 | |
---|
673 | The choice of pixels to be used depend on the analysis method. If the |
---|
674 | wavelet reconstruction has been done, the residuals (defined |
---|
675 | in the sense of original $-$ reconstruction) are used to estimate the |
---|
676 | noise spread of the cube, since the reconstruction should pick out |
---|
677 | all significant structure. The noise level (the middle of the |
---|
678 | distribution) is taken from the original array. |
---|
679 | |
---|
680 | If smoothing of the cube has been done instead, all noise parameters |
---|
681 | are measured from the smoothed array, and detections are made with |
---|
682 | these parameters. When the signal-to-noise level is quoted for each |
---|
683 | detection (see \S\ref{sec-output}), the noise parameters of the |
---|
684 | original array are used, since the smoothing process correlates |
---|
685 | neighbouring pixels, reducing the noise level. |
---|
686 | |
---|
687 | If neither reconstruction nor smoothing has been done, then the |
---|
688 | statistics are calculated from the original, input array. |
---|
689 | |
---|
690 | The parameters that are estimated should be representative of the |
---|
691 | noise in the cube. For the case of small objects embedded in many |
---|
692 | noise pixels (\eg the case of \hi surveys), using the full cube will |
---|
693 | provide good estimators. It is possible, however, to use only a |
---|
694 | subsection of the cube by setting the parameter \texttt{flagStatSec = |
---|
695 | true} and providing the desired subsection to the \texttt{StatSec} |
---|
696 | parameter. This subsection works in exactly the same way as the pixel |
---|
697 | subsection discussed in \S\ref{sec-input}. The \texttt{StatSec} will |
---|
698 | be trimmed if necessary so that it lies wholly within the image |
---|
699 | subsection being used (\ie that given by the \texttt{subsection} |
---|
700 | parameter - this governs what pixels are read in and so are able to be |
---|
701 | used in the calculations). |
---|
702 | |
---|
703 | Note that \texttt{StatSec} applies only to the statistics used to |
---|
704 | determine the threshold. It does not affect the calculation of |
---|
705 | statistics in the case of the wavelet reconstruction. Note also that |
---|
706 | pixels identified as BLANK or as flagged via the |
---|
707 | \texttt{flaggedChannels} parameter are ignored in the statistics |
---|
708 | calculations. |
---|
709 | |
---|
710 | \secC{Determining the threshold} |
---|
711 | |
---|
712 | Once the statistics have been calculated, the threshold is determined |
---|
713 | in one of two ways. The first way is a simple sigma-clipping, where a |
---|
714 | threshold is set at a fixed number $n$ of standard deviations above |
---|
715 | the mean, and pixels above this threshold are flagged as detected. The |
---|
716 | value of $n$ is set with the parameter \texttt{snrCut}. The ``mean'' |
---|
717 | and ``standard deviation'' here are estimated according to |
---|
718 | \texttt{flagRobustStats}, as discussed in \S\ref{sec-stats}. In this |
---|
719 | first case only, if the user specifies a threshold, using the |
---|
720 | \texttt{threshold} parameter, the sigma-clipped value is ignored. |
---|
721 | |
---|
722 | The second method uses the False Discovery Rate (FDR) technique |
---|
723 | \citep{miller01,hopkins02}, whose basis we briefly detail here. The |
---|
724 | false discovery rate (given by the number of false detections divided |
---|
725 | by the total number of detections) is fixed at a certain value |
---|
726 | $\alpha$ (\eg $\alpha=0.05$ implies 5\% of detections are false |
---|
727 | positives). In practice, an $\alpha$ value is chosen, and the ensemble |
---|
728 | average FDR (\ie $\langle FDR \rangle$) when the method is used will |
---|
729 | be less than $\alpha$. One calculates $p$ -- the probability, |
---|
730 | assuming the null hypothesis is true, of obtaining a test statistic as |
---|
731 | extreme as the pixel value (the observed test statistic) -- for each |
---|
732 | pixel, and sorts them in increasing order. One then calculates $d$ |
---|
733 | where |
---|
734 | \[ |
---|
735 | d = \max_j \left\{ j : P_j < \frac{j\alpha}{c_N N} \right\}, |
---|
736 | \] |
---|
737 | and then rejects all hypotheses whose $p$-values are less than or |
---|
738 | equal to $P_d$. (So a $P_i<P_d$ will be rejected even if $P_i \geq |
---|
739 | j\alpha/c_N N$.) Note that ``reject hypothesis'' here means ``accept |
---|
740 | the pixel as an object pixel'' (\ie we are rejecting the null |
---|
741 | hypothesis that the pixel belongs to the background). |
---|
742 | |
---|
743 | The $c_N$ value here is a normalisation constant that depends on the |
---|
744 | correlated nature of the pixel values. If all the pixels are |
---|
745 | uncorrelated, then $c_N=1$. If $N$ pixels are correlated, then their |
---|
746 | tests will be dependent on each other, and so $c_N = \sum_{i=1}^N |
---|
747 | i^{-1}$. \citet{hopkins02} consider real radio data, where the pixels |
---|
748 | are correlated over the beam. For the calculations done in \duchamp, |
---|
749 | $N = B \times C$, where $B$ is the beam area in pixels, calculated |
---|
750 | from the FITS header (if the correct keywords -- BMAJ, BMIN -- are not |
---|
751 | present, the size of the beam is taken from the input parameters - see |
---|
752 | discussion in \S\ref{sec-results}, and if these parameters are not |
---|
753 | given, $B=1$), and $C$ is the number of neighbouring channels that can |
---|
754 | be considered to be correlated. |
---|
755 | |
---|
756 | The use of the FDR method is governed by the \texttt{flagFDR} flag, |
---|
757 | which is \texttt{false} by default. To set the relevant parameters, |
---|
758 | use \texttt{alphaFDR} to set the $\alpha$ value, and |
---|
759 | \texttt{FDRnumCorChan} to set the $C$ value discussed above. These |
---|
760 | have default values of 0.01 and 2 respectively. |
---|
761 | |
---|
762 | The theory behind the FDR method implies a direct connection between |
---|
763 | the choice of $\alpha$ and the fraction of detections that will be |
---|
764 | false positives. These detections, however, are individual pixels, |
---|
765 | which undergo a process of merging and rejection (\S\ref{sec-merger}), |
---|
766 | and so the fraction of the final list of detected objects that are |
---|
767 | false positives will be much smaller than $\alpha$. See the discussion |
---|
768 | in \S\ref{sec-notes}. |
---|
769 | |
---|
770 | %\secC{Storage of detected objects in memory} |
---|
771 | % |
---|
772 | %It is useful to understand how \duchamp stores the detected objects in |
---|
773 | %memory while it is running. This makes use of nested C++ classes, so |
---|
774 | %that an object is stored as a class that includes the set of detected |
---|
775 | %pixels, plus all the various calculated parameters (fluxes, WCS |
---|
776 | %coordinates, pixel centres and extrema, flags,...). The set of pixels |
---|
777 | %are stored using another class, that stores 3-dimensional objects as a |
---|
778 | %set of channel maps, each consisting of a $z$-value and a |
---|
779 | %2-dimensional object (a spatial map if you like). This 2-dimensional |
---|
780 | %object is recorded using ``run-length'' encoding, where each row (a |
---|
781 | %fixed $y$ value) is stored by the starting $x$-value and the length |
---|
782 | |
---|
783 | \secB{Merging, growing and rejecting detected objects} |
---|
784 | \label{sec-merger} |
---|
785 | |
---|
786 | \secC{Merging} |
---|
787 | |
---|
788 | The searches described above are either 1- or 2-dimensional only. They |
---|
789 | do not know anything about the third dimension that is likely to be |
---|
790 | present. To build up 3D sources, merging of detections must be |
---|
791 | done. This is done via an algorithm that matches objects judged to be |
---|
792 | ``close'', according to one of two criteria. |
---|
793 | |
---|
794 | One criterion is to define two thresholds -- one spatial and one in |
---|
795 | velocity -- and say that two objects should be merged if there is at |
---|
796 | least one pair of pixels that lie within these threshold distances of |
---|
797 | each other. These thresholds are specified by the parameters |
---|
798 | \texttt{threshSpatial} and \texttt{threshVelocity} (in units of pixels |
---|
799 | and channels respectively). |
---|
800 | |
---|
801 | Alternatively, the spatial requirement can be changed to say that |
---|
802 | there must be a pair of pixels that are \emph{adjacent} -- a stricter, |
---|
803 | but perhaps more realistic requirement, particularly when the spatial |
---|
804 | pixels have a large angular size (as is the case for \hi |
---|
805 | surveys). This method can be selected by setting the parameter |
---|
806 | \texttt{flagAdjacent=true} in the parameter file. The velocity |
---|
807 | thresholding is always done with the \texttt{threshVelocity} test. |
---|
808 | |
---|
809 | |
---|
810 | \secC{Stages of merging} |
---|
811 | |
---|
812 | This merging can be done in two stages. The default behaviour is for |
---|
813 | each new detection to be compared with those sources already detected, |
---|
814 | and for it to be merged with the first one judged to be close. No |
---|
815 | other examination of the list is done at this point. |
---|
816 | |
---|
817 | This step can be turned off by setting |
---|
818 | \texttt{flagTwoStageMerging=false}, so that new detections are simply |
---|
819 | added to the end of the list, leaving all merging to be done in the |
---|
820 | second stage. |
---|
821 | |
---|
822 | The second, main stage of merging is more thorough, Once the searching |
---|
823 | is completed, the list is iterated through, looking at each pair of |
---|
824 | objects, and merging appropriately. The merged objects are then |
---|
825 | included in the examination, to see if a merged pair is suitably close |
---|
826 | to a third. |
---|
827 | |
---|
828 | \secC{Growing} |
---|
829 | |
---|
830 | Once the detections have been merged, they may be ``grown'' (this is |
---|
831 | essentially the process known elsewhere as ``floodfill''). This is a |
---|
832 | process of increasing the size of the detection by adding nearby |
---|
833 | pixels (according to the \texttt{threshSpatial} and |
---|
834 | \texttt{threshVelocity} parameters) that are above some secondary |
---|
835 | threshold and not already part of a detected object. This threshold |
---|
836 | should be lower than the one used for the initial detection, but above |
---|
837 | the noise level, so that faint pixels are only detected when they are |
---|
838 | close to a bright pixel. This threshold is specified via one of two |
---|
839 | input parameters. It can be given in terms of the noise statistics via |
---|
840 | \texttt{growthCut} (which has a default value of $3\sigma$), or it can |
---|
841 | be directly given via \texttt{growthThreshold}. Note that if you have |
---|
842 | given the detection threshold with the \texttt{threshold} parameter, |
---|
843 | the growth threshold \textbf{must} be given with |
---|
844 | \texttt{growthThreshold}. If \texttt{growthThreshold} is not provided |
---|
845 | in this situation, the growing will not be done. |
---|
846 | |
---|
847 | The use of the growth algorithm is controlled by the |
---|
848 | \texttt{flagGrowth} parameter -- the default value of which is |
---|
849 | \texttt{false}. If the detections are grown, they are sent through the |
---|
850 | merging algorithm a second time, to pick up any detections that should |
---|
851 | be merged at the new lower threshold (\ie they have grown into each |
---|
852 | other). |
---|
853 | |
---|
854 | \secC{Rejecting} |
---|
855 | \label{sec-reject} |
---|
856 | |
---|
857 | Finally, to be accepted, the detections must satisfy minimum (and, |
---|
858 | optionally, maximum) size criteria, relating to the number of |
---|
859 | channels, spatial pixels and voxels occupied by the object. These |
---|
860 | criteria are set using the \texttt{minChannels}, \texttt{minPix} and |
---|
861 | \texttt{minVoxels} parameters respectively. The channel requirement |
---|
862 | means a source must have at least one set of \texttt{minChannels} |
---|
863 | consecutive channels to be accepted. The spatial pixels |
---|
864 | (\texttt{minPix}) requirement refers to distinct spatial pixels (which |
---|
865 | are possibly in different channels), while the voxels requirement |
---|
866 | refers to the total number of voxels detected. If the |
---|
867 | \texttt{minVoxels} parameter is not provided, it defaults to |
---|
868 | \texttt{minPix}$+$\texttt{minChannels}-1. |
---|
869 | |
---|
870 | It is possible to also place upper limits on the size of detections in |
---|
871 | a similar fashion. The corresponding parameters are \texttt{maxPix}, |
---|
872 | \texttt{maxChannels} and \texttt{maxVoxels}. These will all default to |
---|
873 | a value of -1 -- if they are negative the check is \textbf{not} |
---|
874 | made. If they are provided by the user, an object will be rejected if |
---|
875 | one of the metrics exceeds the limit (in each case, the value is the |
---|
876 | highest allowed value for an object to be accepted). The channel |
---|
877 | requirement differs slightly from the minimum check -- the limit |
---|
878 | applies to the total number of channels in the object. |
---|
879 | |
---|
880 | It is possible to do this rejection stage before the main merging and |
---|
881 | growing stage. This could be done to remove narrow (hopefully |
---|
882 | spurious) sources from the list before growing them, to reduce the |
---|
883 | number of false positives in the final list. This mode can be selected |
---|
884 | by setting the input parameter \texttt{flagRejectBeforeMerge=true} -- |
---|
885 | caution is urged if you use this in conjunction with |
---|
886 | \texttt{flagTwoStageMerging=false}, as you can throw away parts of |
---|
887 | objects that you may otherwise wish to keep. |
---|
888 | |
---|
889 | %%% Local Variables: |
---|
890 | %%% mode: latex |
---|
891 | %%% TeX-master: "Guide" |
---|
892 | %%% End: |
---|