[303] | 1 | % ----------------------------------------------------------------------- |
---|
| 2 | % hints.tex: Section giving some tips & hints on how Duchamp is best |
---|
| 3 | % used. |
---|
| 4 | % ----------------------------------------------------------------------- |
---|
| 5 | % Copyright (C) 2006, Matthew Whiting, ATNF |
---|
| 6 | % |
---|
| 7 | % This program is free software; you can redistribute it and/or modify it |
---|
| 8 | % under the terms of the GNU General Public License as published by the |
---|
| 9 | % Free Software Foundation; either version 2 of the License, or (at your |
---|
| 10 | % option) any later version. |
---|
| 11 | % |
---|
| 12 | % Duchamp is distributed in the hope that it will be useful, but WITHOUT |
---|
| 13 | % ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or |
---|
| 14 | % FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
---|
| 15 | % for more details. |
---|
| 16 | % |
---|
| 17 | % You should have received a copy of the GNU General Public License |
---|
| 18 | % along with Duchamp; if not, write to the Free Software Foundation, |
---|
| 19 | % Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA |
---|
| 20 | % |
---|
| 21 | % Correspondence concerning Duchamp may be directed to: |
---|
| 22 | % Internet email: Matthew.Whiting [at] atnf.csiro.au |
---|
| 23 | % Postal address: Dr. Matthew Whiting |
---|
| 24 | % Australia Telescope National Facility, CSIRO |
---|
| 25 | % PO Box 76 |
---|
| 26 | % Epping NSW 1710 |
---|
| 27 | % AUSTRALIA |
---|
| 28 | % ----------------------------------------------------------------------- |
---|
[158] | 29 | \secA{Notes and hints on the use of \duchamp} |
---|
| 30 | \label{sec-notes} |
---|
| 31 | |
---|
| 32 | In using \duchamp, the user has to make a number of decisions about |
---|
| 33 | the way the program runs. This section is designed to give the user |
---|
| 34 | some idea about what to choose. |
---|
| 35 | |
---|
[1011] | 36 | \secB{Memory usage} |
---|
| 37 | |
---|
| 38 | A lot of attention has been paid to the memory usage in \duchamp, |
---|
| 39 | recognising that data cubes are going to be increasing in size with |
---|
| 40 | new generation correlators and wider fields of view. However, users |
---|
| 41 | with large cubes should be aware of the likely usage for different |
---|
| 42 | modes of operation and plan their \duchamp execution carefully. |
---|
| 43 | |
---|
| 44 | At the start of the program, memory is allocated sufficient for: |
---|
| 45 | \begin{itemize} |
---|
| 46 | \item The entire pixel array (as requested, subject to any |
---|
| 47 | subsection). |
---|
| 48 | \item The spatial extent, which holds the map of detected pixels (for |
---|
| 49 | output into the detection map). |
---|
| 50 | \item If smoothing or reconstruction has been selected, another array |
---|
| 51 | of the same size as the pixel array. This will hold the |
---|
| 52 | smoothed/reconstructed array (the original needs to be kept to do the |
---|
| 53 | correct parameterisation of detected sources). |
---|
| 54 | \item If baseline-subtraction has been selected, a further array of |
---|
| 55 | the same size as the pixel array. This holds the baseline values, |
---|
| 56 | which need to be added back in prior to parameterisation. |
---|
| 57 | \end{itemize} |
---|
| 58 | All of these will be float type, except for the detection map, which |
---|
| 59 | is short. |
---|
| 60 | |
---|
| 61 | There will, of course, be additional allocation during the course of |
---|
| 62 | the program. The detection list will progressively grow, with each |
---|
| 63 | detection having a memory footprint as described in |
---|
| 64 | Section~\ref{sec-scan}. But perhaps more important and with a larger |
---|
| 65 | impact will be the temporary space allocated for various algorithms. |
---|
| 66 | |
---|
| 67 | The largest of these will be the wavelet reconstruction. This will |
---|
| 68 | require an additional allocation of twice the size of the array being |
---|
| 69 | reconstructed, one for the coefficients and one for the wavelets - |
---|
| 70 | each scale will overwrite the previous one. So, for the 1D case, this |
---|
| 71 | means an additional allocation of twice the spectral dimension (since |
---|
| 72 | we only reconstruct one spectrum at a time), but the 3D case will |
---|
| 73 | require an additional allocation of twice the cube size (this means |
---|
| 74 | there needs to be available at least four times the size of the input |
---|
| 75 | cube for 3D reconstruction, plus the additional overheads of |
---|
| 76 | detections and so forth). |
---|
| 77 | |
---|
| 78 | The smoothing has less of an impact, since it only operates on the |
---|
| 79 | lower dimensions, but it will make an additional allocation of twice |
---|
| 80 | the relevant size (spectral dimension for spectral smoothing, or |
---|
| 81 | spatial image size for the spatial Gaussian smoothing). |
---|
| 82 | |
---|
| 83 | The other large allocation of temporary space will be for calculating |
---|
| 84 | robust statistics. The median-based calculations require at least |
---|
| 85 | partial sorting of the data, and so cannot be done on the original |
---|
| 86 | image cube. This is done for the entire cube and so the temporary |
---|
| 87 | memory increase can be large. |
---|
| 88 | |
---|
| 89 | |
---|
| 90 | |
---|
[993] | 91 | \secB{Preprocessing} |
---|
| 92 | |
---|
[1011] | 93 | \secC{Should I do any preprocessing?} |
---|
| 94 | |
---|
[285] | 95 | The main choice is whether to alter the cube to try and enhance the |
---|
| 96 | detectability of objects, by either smoothing or reconstructing via |
---|
| 97 | the \atrous method. The main benefits of both methods are the marked |
---|
| 98 | reduction in the noise level, leading to regularly-shaped detections, |
---|
| 99 | and good reliability for faint sources. |
---|
[158] | 100 | |
---|
[285] | 101 | The main drawback with the \atrous method is the long execution time: |
---|
| 102 | to reconstruct a $170\times160\times1024$ (\hipass) cube often |
---|
| 103 | requires three iterations and takes about 20-25 minutes to run |
---|
| 104 | completely. Note that this is for the more complete three-dimensional |
---|
[298] | 105 | reconstruction: using \texttt{reconDim = 1} makes the reconstruction |
---|
[285] | 106 | quicker (the full program then takes less than 5 minutes), but it is |
---|
| 107 | still the largest part of the time. |
---|
| 108 | |
---|
| 109 | The smoothing procedure is computationally simpler, and thus quicker, |
---|
| 110 | than the reconstruction. The spectral Hanning method adds only a very |
---|
| 111 | small overhead on the execution, and the spatial Gaussian method, |
---|
| 112 | while taking longer, will be done (for the above example) in less than |
---|
| 113 | 2 minutes. Note that these times will depend on the size of the |
---|
| 114 | filter/kernel used: a larger filter means more calculations. |
---|
| 115 | |
---|
[158] | 116 | The searching part of the procedure is much quicker: searching an |
---|
[285] | 117 | un-reconstructed cube leads to execution times of less than a |
---|
| 118 | minute. Alternatively, using the ability to read in previously-saved |
---|
[158] | 119 | reconstructed arrays makes running the reconstruction more than once a |
---|
| 120 | more feasible prospect. |
---|
| 121 | |
---|
| 122 | On the positive side, the shape of the detections in a cube that has |
---|
[285] | 123 | been reconstructed or smoothed will be much more regular and smooth -- |
---|
| 124 | the ragged edges that objects in the raw cube possess are smoothed by |
---|
| 125 | the removal of most of the noise. This enables better determination of |
---|
| 126 | the shapes and characteristics of objects. |
---|
[158] | 127 | |
---|
[1011] | 128 | \secC{Reconstruction vs Smoothing} |
---|
| 129 | |
---|
[292] | 130 | While the time overhead is larger for the reconstruction case, it will |
---|
| 131 | potentially provide a better recovery of real sources than the |
---|
| 132 | smoothing case. This is because it probes the full range of scales |
---|
| 133 | present in the cube (or spectral domain), rather than the specific |
---|
| 134 | scale determined by the Hanning filter or Gaussian kernel used in the |
---|
| 135 | smoothing. |
---|
| 136 | |
---|
| 137 | When considering the reconstruction method, note that the 2D |
---|
[298] | 138 | reconstruction (\texttt{reconDim = 2}) can be susceptible to edge |
---|
[292] | 139 | effects. If the valid area in the cube (\ie the part that is not |
---|
| 140 | BLANK) has non-rectangular edges, the convolution can produce |
---|
| 141 | artefacts in the reconstruction that mimic the edges and can lead |
---|
| 142 | (depending on the selection threshold) to some spurious |
---|
[158] | 143 | sources. Caution is advised with such data -- the user is advised to |
---|
| 144 | check carefully the reconstructed cube for the presence of such |
---|
[1011] | 145 | artefacts. |
---|
[158] | 146 | |
---|
[1011] | 147 | A more important effect that can be important for 2D reconstructions |
---|
| 148 | is the fact that the pixels in the spatial domain typically exhibit |
---|
| 149 | some correlation due to the beam. Since each channel is reconstructed |
---|
| 150 | independently, beam-sized noise fluctuations can rise above the |
---|
| 151 | reconstruction threshold more frequency than in the 1D case, providing |
---|
| 152 | a greater number of spurious single-channel spikes in a given |
---|
| 153 | reconstructed spectrum. This effect will also be present in 3D |
---|
| 154 | reconstructions, although to a lesser degree since information in the |
---|
| 155 | spectral direction is also taken into account. |
---|
[964] | 156 | |
---|
[158] | 157 | If one chooses the reconstruction method, a further decision is |
---|
| 158 | required on the signal-to-noise cutoff used in determining acceptable |
---|
| 159 | wavelet coefficients. A larger value will remove more noise from the |
---|
| 160 | cube, at the expense of losing fainter sources, while a smaller value |
---|
| 161 | will include more noise, which may produce spurious detections, but |
---|
| 162 | will be more sensitive to faint sources. Values of less than about |
---|
| 163 | $3\sigma$ tend to not reduce the noise a great deal and can lead to |
---|
[160] | 164 | many spurious sources (this depends, of course on the cube itself). |
---|
[158] | 165 | |
---|
[285] | 166 | The smoothing options have less parameters to consider: basically just |
---|
| 167 | the size of the smoothing function or kernel. Spectrally smoothing |
---|
| 168 | with a Hanning filter of width 3 (the smallest possible) is very |
---|
| 169 | efficient at removing spurious one-channel objects that may result |
---|
| 170 | just from statistical fluctuations of the noise. One may want to use |
---|
| 171 | larger widths or kernels of larger size to look for features of a |
---|
| 172 | particular scale in the cube. |
---|
| 173 | |
---|
[158] | 174 | When it comes to searching, the FDR method produces more reliable |
---|
| 175 | results than simple sigma-clipping, particularly in the absence of |
---|
| 176 | reconstruction. However, it does not work in exactly the way one |
---|
| 177 | would expect for a given value of \texttt{alpha}. For instance, |
---|
| 178 | setting fairly liberal values of \texttt{alpha} (say, 0.1) will often |
---|
| 179 | lead to a much smaller fraction of false detections (\ie much less |
---|
| 180 | than 10\%). This is the effect of the merging algorithms, that combine |
---|
| 181 | the sources after the detection stage, and reject detections not |
---|
| 182 | meeting the minimum pixel or channel requirements. It is thus better |
---|
| 183 | to aim for larger \texttt{alpha} values than those derived from a |
---|
| 184 | straight conversion of the desired false detection rate. |
---|
| 185 | |
---|
[292] | 186 | If the FDR method is not used, caution is required when choosing the |
---|
| 187 | S/N cutoff. Typical cubes have very large numbers of pixels, so even |
---|
| 188 | an apparently large cutoff will still result in a not-insignificant |
---|
| 189 | number of detections simply due to random fluctuations of the noise |
---|
| 190 | background. For instance, a $4\sigma$ threshold on a cube of Gaussian |
---|
| 191 | noise of size $100\times100\times1024$ will result in $\sim340$ |
---|
[964] | 192 | single-pixel detections. This is where the minimum channel and pixel |
---|
| 193 | requirements are important in rejecting spurious detections. |
---|
[292] | 194 | |
---|
[964] | 195 | %Finally, as \duchamp is still undergoing development, there are some |
---|
| 196 | %elements that are not fully developed. In particular, it is not as |
---|
| 197 | %clever as I would like at avoiding interference. The ability to place |
---|
| 198 | %requirements on the minimum number of channels and pixels partially |
---|
| 199 | %circumvents this problem, but work is being done to make \duchamp |
---|
| 200 | %smarter at rejecting signals that are clearly (to a human eye at |
---|
| 201 | %least) interference. See the following section for further |
---|
| 202 | %improvements that are planned. |
---|