Context Navigation

executionFlow.tex @ 1455

Visit:

Last change on this file since 1455 was 255, checked in by Matthew Whiting, 17 years ago
Mostly Guide changes, thanks to the xspace package. Some alteration of the statistics calculation description, thanks to the new statsec parameter. Also changed the default value of the reconDim parameter -- now default is 1-dimensional.
File size: 26.2 KB

Rev	Line
[255]	1	\secA{What \duchamp is doing}
[158]	2	\label{sec-flow}
	3
[255]	4	The execution flow of \duchamp is detailed here, indicating the main
[158]	5	algorithmic steps that are used. The program is written in C/C++ and
	6	makes use of the \textsc{cfitsio}, \textsc{wcslib} and \textsc{pgplot}
	7	libraries.
	8
	9	\secB{Image input}
	10	\label{sec-input}
	11
[162]	12	The cube is read in using basic \textsc{cfitsio} commands, and stored
	13	as an array in a special C++ class. This class keeps track of the list
	14	of detected objects, as well as any reconstructed arrays that are made
	15	(see \S\ref{sec-recon}). The World Coordinate System
	16	(WCS)\footnote{This is the information necessary for translating the
	17	pixel locations to quantities such as position on the sky, frequency,
	18	velocity, and so on.} information for the cube is also obtained from
	19	the FITS header by \textsc{wcslib} functions \citep{greisen02,
	20	calabretta02}, and this information, in the form of a \texttt{wcsprm}
	21	structure, is also stored in the same class.
[158]	22
[231]	23	A sub-section of a cube can be requested by defining the subsection
	24	with the \texttt{subsection} parameter and setting
	25	\texttt{flagSubsection=true} -- this can be a good idea if the cube
	26	has very noisy edges, which may produce many spurious detections.
	27
	28	There are two ways of specifying the \texttt{subsection} string. The
	29	first is the generalised form
	30	\texttt{[x1:x2:dx,y1:y2:dy,z1:z2:dz,...]}, as used by the
	31	\textsc{cfitsio} library. This has one set of colon-separated numbers
	32	for each axis in the FITS file. In this manner, the x-coordinates run
[158]	33	from \texttt{x1} to \texttt{x2} (inclusive), with steps of
[231]	34	\texttt{dx}. The step value can be omitted, so a subsection of the
[255]	35	form \texttt{[2:50,2:50,10:1000]} is still valid. In fact, \duchamp
[231]	36	does not make use of any step value present in the subsection string,
	37	and any that are present are removed before the file is opened.
[158]	38
[231]	39	If the entire range of a coordinate is required, one can replace the
	40	range with a single asterisk, \eg \texttt{[2:50,2:50,*]}. Thus, the
	41	subsection string \texttt{[,,*]} is simply the entire cube. A
	42	complete description of this section syntax can be found at the
	43	\textsc{fitsio} web site%
[158]	44	\footnote{%
	45	\href%
[223]	46	{http://heasarc.gsfc.nasa.gov/docs/software/fitsio/c/c\_user/node91.html}%
	47	{http://heasarc.gsfc.nasa.gov/docs/software/fitsio/c/c\_user/node91.html}}.
[158]	48
[231]	49
	50	Making full use of the subsection requires knowledge of the size of
	51	each of the dimensions. If one wants to, for instance, trim a certain
	52	number of pixels off the edges of the cube, without examining the cube
	53	to obtain the actual size, one can use the second form of the
	54	subsection string. This just gives a number for each axis, \eg
	55	\texttt{[5,5,5]} (which would trim 5 pixels from the start \emph{and}
	56	end of each axis).
	57
	58	All types of subsections can be combined \eg \texttt{[5,2:98,*]}.
	59
	60
	61	%A sub-section of an image can be requested via the \texttt{subsection}
	62	%parameter -- this can be a good idea if the cube has very noisy edges,
	63	%which may produce many spurious detections. The generalised form of
	64	%the subsection that is used by \textsc{cfitsio} is
	65	%\texttt{[x1:x2:dx,y1:y2:dy,z1:z2:dz,...]}, such that the x-coordinates run
	66	%from \texttt{x1} to \texttt{x2} (inclusive), with steps of
	67	%\texttt{dx}. The step value can be omitted (so a subsection of the
[255]	68	%form \texttt{[2:50,2:50,10:1000]} is still valid). \duchamp does not
[231]	69	%make use of any step value present in the subsection string, and any
	70	%that are present are removed before the file is opened.
	71	%
	72	%If one wants the full range of a coordinate then replace the range
	73	%with an asterisk, \eg \texttt{[2:50,2:50,*]}. If one wants to use a
	74	%subsection, one must set \texttt{flagSubsection = 1}. A complete
	75	%description of the section syntax can be found at the \textsc{fitsio}
	76	%web site%
	77	%\footnote{%
	78	%\href%
	79	%{http://heasarc.gsfc.nasa.gov/docs/software/fitsio/c/c\_user/node91.html}%
	80	%{http://heasarc.gsfc.nasa.gov/docs/software/fitsio/c/c\_user/node91.html}}.
	81	%
	82
[158]	83	\secB{Image modification}
	84	\label{sec-modify}
	85
	86	Several modifications to the cube can be made that improve the
[255]	87	execution and efficiency of \duchamp (their use is optional, governed
[158]	88	by the relevant flags in the parameter file).
	89
	90	\secC{BLANK pixel removal}
	91
[162]	92	If the imaged area of a cube is non-rectangular (see the example in
	93	Fig.~\ref{fig-moment}, a cube from the HIPASS survey), BLANK pixels are
	94	used to pad it out to a rectangular shape. The value of these pixels
	95	is given by the FITS header keywords BLANK, BSCALE and BZERO. While
	96	these pixels make the image a nice shape, they will unnecessarily
	97	interfere with the processing (as well as taking up needless
	98	memory). The first step, then, is to trim them from the edge. This is
	99	done when the parameter \texttt{flagBlankPix=true}. If the above
	100	keywords are not present, the user can specify the BLANK value by the
	101	parameter \texttt{blankPixValue}.
[158]	102
	103	Removing BLANK pixels is particularly important for the reconstruction
	104	step, as lots of BLANK pixels on the edges will smooth out features in
	105	the wavelet calculation stage. The trimming will also reduce the size
	106	of the cube's array, speeding up the execution. The amount of trimming
	107	is recorded, and these pixels are added back in once the
	108	source-detection is completed (so that quoted pixel positions are
	109	applicable to the original cube).
	110
	111	Rows and columns are trimmed one at a time until the first non-BLANK
	112	pixel is reached, so that the image remains rectangular. In practice,
[162]	113	this means that there will be some BLANK pixels left in the trimmed
	114	image (if the non-BLANK region is non-rectangular). However, these are
[158]	115	ignored in all further calculations done on the cube.
	116
	117	\secC{Baseline removal}
	118
	119	Second, the user may request the removal of baselines from the
	120	spectra, via the parameter \texttt{flagBaseline}. This may be
	121	necessary if there is a strong baseline ripple present, which can
	122	result in spurious detections at the high points of the ripple. The
	123	baseline is calculated from a wavelet reconstruction procedure (see
	124	\S\ref{sec-recon}) that keeps only the two largest scales. This is
	125	done separately for each spatial pixel (\ie for each spectrum in the
	126	cube), and the baselines are stored and added back in before any
	127	output is done. In this way the quoted fluxes and displayed spectra
	128	are as one would see from the input cube itself -- even though the
	129	detection (and reconstruction if applicable) is done on the
	130	baseline-removed cube.
	131
	132	The presence of very strong signals (for instance, masers at several
[162]	133	hundred Jy) could affect the determination of the baseline, and would
	134	lead to a large dip centred on the signal in the baseline-subtracted
[158]	135	spectrum. To prevent this, the signal is trimmed prior to the
	136	reconstruction process at some standard threshold (at $8\sigma$ above
	137	the mean). The baseline determined should thus be representative of
	138	the true, signal-free baseline. Note that this trimming is only a
	139	temporary measure which does not affect the source-detection.
	140
	141	\secC{Ignoring bright Milky Way emission}
	142
	143	Finally, a single set of contiguous channels can be ignored -- these
	144	may exhibit very strong emission, such as that from the Milky Way as
[255]	145	seen in extragalactic \hi cubes (hence the references to ``Milky
[158]	146	Way'' in relation to this task -- apologies to Galactic
	147	astronomers!). Such dominant channels will produce many detections
	148	that are unnecessary, uninteresting (if one is interested in
	149	extragalactic \hi) and large (in size and hence in memory usage), and
	150	so will slow the program down and detract from the interesting
	151	detections.
	152
	153	The use of this feature is controlled by the \texttt{flagMW}
	154	parameter, and the exact channels concerned are able to be set by the
	155	user (using \texttt{maxMW} and \texttt{minMW} -- these give an
	156	inclusive range of channels). When employed, these channels are
	157	ignored for the searching, and the scaling of the spectral output (see
	158	Fig.~\ref{fig-spect}) will not take them into account. They will be
	159	present in the reconstructed array, however, and so will be included
	160	in the saved FITS file (see \S\ref{sec-reconIO}). When the final
	161	spectra are plotted, the range of channels covered by these parameters
	162	is indicated by a green hashed box.
	163
	164	\secB{Image reconstruction}
	165	\label{sec-recon}
	166
[255]	167	The user can direct \duchamp to reconstruct the data cube using the
	168	\atrous wavelet procedure. A good description of the procedure can be
[158]	169	found in \citet{starck02:book}. The reconstruction is an effective way
	170	of removing a lot of the noise in the image, allowing one to search
	171	reliably to fainter levels, and reducing the number of spurious
	172	detections. This is an optional step, but one that greatly enhances
	173	the source-detection process, with the payoff that it can be
	174	relatively time- and memory-intensive.
	175
	176	\secC{Algorithm}
	177
[255]	178	The steps in the \atrous reconstruction are as follows:
[158]	179	\begin{enumerate}
[162]	180	\item The reconstructed array is set to 0 everywhere.
[158]	181	\item The input array is discretely convolved with a given filter
	182	function. This is determined from the parameter file via the
	183	\texttt{filterCode} parameter -- see Appendix~\ref{app-param} for
	184	details on the filters available.
	185	\item The wavelet coefficients are calculated by taking the difference
	186	between the convolved array and the input array.
	187	\item If the wavelet coefficients at a given point are above the
	188	requested threshold (given by \texttt{snrRecon} as the number of
	189	$\sigma$ above the mean and adjusted to the current scale -- see
	190	Appendix~\ref{app-scaling}), add these to the reconstructed array.
	191	\item The separation of the filter coefficients is doubled. (Note that
[255]	192	this step provides the name of the procedure\footnote{\atrous means
[158]	193	``with holes'' in French.}, as gaps or holes are created in the
	194	filter coverage.)
	195	\item The procedure is repeated from step 2, using the convolved array
	196	as the input array.
	197	\item Continue until the required maximum number of scales is reached.
	198	\item Add the final smoothed (\ie convolved) array to the
	199	reconstructed array. This provides the ``DC offset'', as each of the
	200	wavelet coefficient arrays will have zero mean.
	201	\end{enumerate}
	202
	203	The reconstruction has at least two iterations. The first iteration
	204	makes a first pass at the wavelet reconstruction (the process outlined
[162]	205	in the 8 stages above), but the residual array will likely have some
	206	structure still in it, so the wavelet filtering is done on the
[158]	207	residual, and any significant wavelet terms are added to the final
[162]	208	reconstruction. This step is repeated until the change in the measured
	209	standard deviation of the background (see note below on the evaluation
	210	of this quantity) is less than some fiducial amount.
[158]	211
[255]	212	It is important to note that the \atrous decomposition is an example
[158]	213	of a ``redundant'' transformation. If no thresholding is performed,
	214	the sum of all the wavelet coefficient arrays and the final smoothed
	215	array is identical to the input array. The thresholding thus removes
	216	only the unwanted structure in the array.
	217
	218	Note that any BLANK pixels that are still in the cube will not be
	219	altered by the reconstruction -- they will be left as BLANK so that
	220	the shape of the valid part of the cube is preserved.
	221
	222	\secC{Note on Statistics}
	223
	224	The correct calculation of the reconstructed array needs good
	225	estimators of the underlying mean and standard deviation of the
	226	background noise distribution. These statistics are estimated using
	227	robust methods, to avoid corruption by strong outlying points. The
	228	mean of the distribution is actually estimated by the median, while
	229	the median absolute deviation from the median (MADFM) is calculated
	230	and corrected assuming Gaussianity to estimate the underlying standard
	231	deviation $\sigma$. The Gaussianity (or Normality) assumption is
	232	critical, as the MADFM does not give the same value as the usual rms
[231]	233	or standard deviation value -- for a Normal distribution
[158]	234	$N(\mu,\sigma)$ we find MADFM$=0.6744888\sigma$. Since this ratio is
	235	corrected for, the user need only think in the usual multiples of
	236	$\sigma$ when setting \texttt{snrRecon}. See Appendix~\ref{app-madfm}
	237	for a derivation of this value.
	238
	239	When thresholding the different wavelet scales, the value of $\sigma$
	240	as measured from the wavelet array needs to be scaled to account for
	241	the increased amount of correlation between neighbouring pixels (due
	242	to the convolution). See Appendix~\ref{app-scaling} for details on
	243	this scaling.
	244
	245	\secC{User control of reconstruction parameters}
	246
	247	The most important parameter for the user to select in relation to the
	248	reconstruction is the threshold for each wavelet array. This is set
	249	using the \texttt{snrRecon} parameter, and is given as a multiple of
	250	the rms (estimated by the MADFM) above the mean (which for the wavelet
	251	arrays should be approximately zero). There are several other
	252	parameters that can be altered as well that affect the outcome of the
	253	reconstruction.
	254
	255	By default, the cube is reconstructed in three dimensions, using a
	256	3-dimensional filter and 3-dimensional convolution. This can be
	257	altered, however, using the parameter \texttt{reconDim}. If set to 1,
	258	this means the cube is reconstructed by considering each spectrum
	259	separately, whereas \texttt{reconDim=2} will mean the cube is
	260	reconstructed by doing each channel map separately. The merits of
	261	these choices are discussed in \S\ref{sec-notes}, but it should be
	262	noted that a 2-dimensional reconstruction can be susceptible to edge
[162]	263	effects if the spatial shape of the pixel array is not rectangular.
[158]	264
	265	The user can also select the minimum scale to be used in the
	266	reconstruction. The first scale exhibits the highest frequency
	267	variations, and so ignoring this one can sometimes be beneficial in
	268	removing excess noise. The default is to use all scales
	269	(\texttt{minscale = 1}).
	270
	271	Finally, the filter that is used for the convolution can be selected
	272	by using \texttt{filterCode} and the relevant code number -- the
	273	choices are listed in Appendix~\ref{app-param}. A larger filter will
	274	give a better reconstruction, but take longer and use more memory when
	275	executing. When multi-dimensional reconstruction is selected, this
	276	filter is used to construct a 2- or 3-dimensional equivalent.
	277
	278	\secB{Input/Output of reconstructed arrays}
	279	\label{sec-reconIO}
	280
	281	The reconstruction stage can be relatively time-consuming,
	282	particularly for large cubes and reconstructions in 3-D. To get around
[255]	283	this, \duchamp provides a shortcut to allow users to perform multiple
[158]	284	searches (\eg with different thresholds) on the same reconstruction
	285	without calculating the reconstruction each time.
	286
	287	The first step is to choose to save the reconstructed array as a FITS
	288	file by setting \texttt{flagOutputRecon = true}. The file will be
	289	saved in the same directory as the input image, so the user needs to
	290	have write permissions for that directory.
	291
	292	The filename will be derived from the input filename, with extra
	293	information detailing the reconstruction that has been done. For
	294	example, suppose \texttt{image.fits} has been reconstructed using a
[162]	295	3-dimensional reconstruction with filter \#2, thresholded at $4\sigma$
[158]	296	using all scales. The output filename will then be
	297	\texttt{image.RECON-3-2-4-1.fits} (\ie it uses the four parameters
[255]	298	relevant for the \atrous reconstruction as listed in
[158]	299	Appendix~\ref{app-param}). The new FITS file will also have these
	300	parameters as header keywords. If a subsection of the input image has
	301	been used (see \S\ref{sec-input}), the format of the output filename
	302	will be \texttt{image.sub.RECON-3-2-4-1.fits}, and the subsection that
	303	has been used is also stored in the FITS header.
	304
	305	Likewise, the residual image, defined as the difference between the
	306	input and reconstructed arrays, can also be saved in the same manner
	307	by setting \texttt{flagOutputResid = true}. Its filename will be the
	308	same as above, with \texttt{RESID} replacing \texttt{RECON}.
	309
	310	If a reconstructed image has been saved, it can be read in and used
	311	instead of redoing the reconstruction. To do so, the user should set
[231]	312	the parameter \texttt{flagReconExists = true}. The user can indicate
	313	the name of the reconstructed FITS file using the \texttt{reconFile}
[255]	314	parameter, or, if this is not specified, \duchamp searches for the
[231]	315	file with the name as defined above. If the file is not found, the
	316	reconstruction is performed as normal. Note that to do this, the user
	317	needs to set
[158]	318	\texttt{flagAtrous = true} (obviously, if this is \texttt{false}, the
	319	reconstruction is not needed).
	320
[208]	321	\secB{Smoothing the cube}
	322	\label{sec-smoothing}
	323
	324	An alternative to doing the wavelet reconstruction is to Hanning
	325	smooth the cube. This technique can be useful in reducing the noise
	326	level slightly (at the cost of making neighbouring pixels correlated
	327	and blurring any signal present), and is particularly well suited to
	328	the case where a particular signal width is believed to be present in
	329	the data. It is also substantially faster than the wavelet
	330	reconstruction.
	331
	332	The cube is smoothed only in the spectral domain. That is, each
[231]	333	spectrum is independently smoothed, and then put back together to form
	334	the smoothed cube. This is then treated in the same way as the
[208]	335	reconstructed cube, and is used for the searching algorithm (see
	336	below). Note that in the case of both the reconstruction and the
	337	smoothing options being requested, the reconstruction will take
	338	precedence and the smoothing will \emph{not} be done.
	339
	340	There is only one parameter necessary to define the degree of
	341	smoothing -- the Hanning width $a$ (given by the user parameter
[231]	342	\texttt{hanningWidth}). The coefficients $c(x)$ of the Hanning filter
	343	are defined by
[208]	344	\[
[231]	345	c(x) =
	346	\begin{cases}
	347	\frac{1}{2}\left(1+\cos(\frac{\pi x}{a})\right) &\|x\| \leq (a+1)/2\\
	348	0 &\|x\| > (a+1)/2.
	349	\end{cases}
[208]	350	\]
[232]	351	where $a,x \in \mathbb{Z}$. Note that the width specified must be an
	352	odd integer (if the parameter provided is even, it is incremented by
	353	one).
[208]	354
	355	The user is able to save the smoothed array in exactly the same manner
	356	as for the reconstructed array -- set \texttt{flagOutputSmooth =
	357	true}, and then the smoothed array will be saved in
	358	\texttt{image.SMOOTH-a.fits}, where a is replaced by the Hanning width
	359	used. Similarly, a saved file can be read in by setting
	360	\texttt{flagSmoothExists = true} and either specifying a file to be
[255]	361	read with the \texttt{smoothFile} parameter or relying on \duchamp to
[208]	362	find the file with the name as given above.
	363
[158]	364	\secB{Searching the image}
	365	\label{sec-detection}
	366
[255]	367	The basic idea behind detection is to locate sets of contiguous voxels
	368	that lie above some threshold. \duchamp now calculates one threshold
	369	for the entire cube (previous versions calculated thresholds for each
	370	spectrum and image). This enables calculation of signal-to-noise
	371	ratios for each source (see Section~\ref{sec-output} for details). The
	372	user can manually specify a value (using the parameter
	373	\texttt{threshold}) for the threshold, which will override the
	374	calculated value. Note that this only applies for the first of the two
	375	cases discussed below -- the FDR case ignores any manually-set
	376	threshold value.
	377
	378	The image is searched for detections in two ways: spectrally (\ie a
[158]	379	1-dimensional search in the spectrum in each spatial pixel), and
	380	spatially (a 2-dimensional search in the spatial image in each
	381	channel). In both cases, the algorithm finds connected pixels that are
	382	above the user-specified threshold. In the case of the spatial image
[162]	383	search, the algorithm of \citet{lutz80} is used to raster-scan through
[158]	384	the image and connect groups of pixels on neighbouring rows.
	385
	386	Note that this algorithm cannot be applied directly to a 3-dimensional
	387	case, as it requires that objects are completely nested in a row: that
	388	is, if you are scanning along a row, and one object finishes and
	389	another starts, you know that you will not get back to the first one
	390	(if at all) until the second is completely finished for that
	391	row. Three-dimensional data does not have this property, which is why
	392	we break up the searching into 1- and 2-dimensional cases.
	393
[255]	394	Detections must have a minimum number of pixels to be counted. This
	395	number is given by the input parameters \texttt{minPix} (for
	396	2-dimensional searches) and \texttt{minChannels} (for 1-dimensional
	397	searches).
[194]	398
[255]	399	Finally, the search only looks for positive features. If one is
	400	interested instead in negative features (such as absorption lines),
	401	set the parameter \texttt{flagNegative = true}. This will invert the
	402	cube (\ie multiply all pixels by $-1$) prior to the search, and then
	403	re-invert the cube (and the fluxes of any detections) after searching
	404	is complete. All outputs are done in the same manner as normal, so
	405	that fluxes of detections will be negative.
[158]	406
[255]	407	\secC{Calculating statistics}
	408
	409	A crucial part of the detection process is estimating the statistics
	410	that define the detection threshold. To determine a threshold, we need
	411	to estimate from the data two parameters: the middle of the pixel
	412	distribution, and its spread. For both cases, we again use robust
	413	methods, using the median and MADFM. If the cube has been
	414	reconstructed or smoothed, the residuals (defined in the sense of
	415	original $-$ reconstruction) are used to estimate the noise parameters
	416	of the cube. Otherwise they are estimated directly from the cube
	417	itself. In both cases, robust estimators are used.
	418
	419	The parameters that are estimated should be representative of the
	420	noise in the cube. For the case of small objects embedded in many
	421	noise pixels (\eg the case of \hi surveys), using the full cube will
	422	provide good estimators. It is possible, however, to use only a
	423	subsection of the cube by setting the parameter \texttt{flagStatSec =
	424	true} and providing the desired subsection to the \texttt{StatSec}
	425	parameter. This subsection works in exactly the same way as the pixel
	426	subsection discussed in \S\ref{sec-input}.
	427
	428	\secC{Determining the threshold}
	429
	430	Once the statistics have been calculated, the threshold is determined
	431	in one of two ways. The first way is a simple sigma-clipping, where a
	432	threshold is set at a fixed number $n$ of standard deviations above
	433	the mean, and pixels above this threshold are flagged as detected. The
	434	value of $n$ is set with the parameter \texttt{snrCut}. As before, the
	435	value of the standard deviation is estimated by the MADFM, and
	436	corrected by the ratio derived in Appendix~\ref{app-madfm}.
	437
[158]	438	The second method uses the False Discovery Rate (FDR) technique
	439	\citep{miller01,hopkins02}, whose basis we briefly detail here. The
	440	false discovery rate (given by the number of false detections divided
	441	by the total number of detections) is fixed at a certain value
	442	$\alpha$ (\eg $\alpha=0.05$ implies 5\% of detections are false
	443	positives). In practice, an $\alpha$ value is chosen, and the ensemble
	444	average FDR (\ie $\langle FDR \rangle$) when the method is used will
	445	be less than $\alpha$. One calculates $p$ -- the probability,
	446	assuming the null hypothesis is true, of obtaining a test statistic as
	447	extreme as the pixel value (the observed test statistic) -- for each
	448	pixel, and sorts them in increasing order. One then calculates $d$
	449	where
	450	\[
	451	d = \max_j \left\{ j : P_j < \frac{j\alpha}{c_N N} \right\},
	452	\]
	453	and then rejects all hypotheses whose $p$-values are less than or
	454	equal to $P_d$. (So a $P_i<P_d$ will be rejected even if $P_i \geq
	455	j\alpha/c_N N$.) Note that ``reject hypothesis'' here means ``accept
	456	the pixel as an object pixel'' (\ie we are rejecting the null
	457	hypothesis that the pixel belongs to the background).
	458
	459	The $c_N$ values here are normalisation constants that depend on the
	460	correlated nature of the pixel values. If all the pixels are
	461	uncorrelated, then $c_N=1$. If $N$ pixels are correlated, then their
	462	tests will be dependent on each other, and so $c_N = \sum_{i=1}^N
	463	i^{-1}$. \citet{hopkins02} consider real radio data, where the pixels
	464	are correlated over the beam. In this case the sum is made over the
	465	$N$ pixels that make up the beam. The value of $N$ is calculated from
	466	the FITS header (if the correct keywords -- BMAJ, BMIN -- are not
[173]	467	present, the size of the beam is taken from the parameter
	468	\texttt{beamSize}).
[158]	469
	470	The theory behind the FDR method implies a direct connection between
	471	the choice of $\alpha$ and the fraction of detections that will be
	472	false positives. However, due to the merging process, this direct
	473	connection is lost when looking at the final number of detections --
	474	see discussion in \S\ref{sec-notes}. The effect is that the number of
	475	false detections will be less than indicated by the $\alpha$ value
	476	used.
	477
	478
	479	\secB{Merging detected objects}
	480	\label{sec-merger}
	481
	482	The searching step produces a list of detected objects that will have
	483	many repeated detections of a given object -- for instance, spectral
	484	detections in adjacent pixels of the same object and/or spatial
	485	detections in neighbouring channels. These are then combined in an
	486	algorithm that matches all objects judged to be ``close'', according
	487	to one of two criteria.
	488
	489	One criterion is to define two thresholds -- one spatial and one in
	490	velocity -- and say that two objects should be merged if there is at
	491	least one pair of pixels that lie within these threshold distances of
	492	each other. These thresholds are specified by the parameters
	493	\texttt{threshSpatial} and \texttt{threshVelocity} (in units of pixels
	494	and channels respectively).
	495
	496	Alternatively, the spatial requirement can be changed to say that
	497	there must be a pair of pixels that are \emph{adjacent} -- a stricter,
	498	but perhaps more realistic requirement, particularly when the spatial
[255]	499	pixels have a large angular size (as is the case for
	500	\hi surveys). This
	501	method can be selected by setting the parameter
[158]	502	\texttt{flagAdjacent} to 1 (\ie \texttt{true}) in the parameter
	503	file. The velocity thresholding is done in the same way as the first
	504	option.
	505
	506	Once the detections have been merged, they may be ``grown''. This is a
	507	process of increasing the size of the detection by adding adjacent
	508	pixels that are above some secondary threshold. This threshold is
	509	lower than the one used for the initial detection, but above the noise
	510	level, so that faint pixels are only detected when they are close to a
	511	bright pixel. The value of this threshold is a possible input
	512	parameter (\texttt{growthCut}), with a default value of
	513	$1.5\sigma$. The use of the growth algorithm is controlled by the
	514	\texttt{flagGrowth} parameter -- the default value of which is
	515	\texttt{false}. If the detections are grown, they are sent through the
	516	merging algorithm a second time, to pick up any detections that now
	517	overlap or have grown over each other.
	518
	519	Finally, to be accepted, the detections must span \emph{both} a
	520	minimum number of channels (to remove any spurious single-channel
	521	spikes that may be present), and a minimum number of spatial
	522	pixels. These numbers, as for the original detection step, are set
	523	with the \texttt{minChannels} and \texttt{minPix} parameters. The
	524	channel requirement means there must be at least one set of
	525	\texttt{minChannels} consecutive channels in the source for it to be
	526	accepted.

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: branches/pixel-map-branch/docs/executionFlow.tex @ 1455

Download in other formats: