Opened 12 years ago

Closed 12 years ago

#164 closed defect (fixed)

Segfault with reading large reconstructed array

Reported by: MatthewWhiting Owned by: MatthewWhiting
Priority: normal Milestone: Release-1.2.2
Component: Wavelet reconstruction Version: 1.2
Severity: normal Keywords:
Cc:

Description

From BiQing? For:

Hi, Matt,

I am working on the big data cube that I sent to you as test cube with 
the new Duchamp. When I switch on the "flagReconExists" flag and use the 
created recon file from previous run, it gives me segmentation fault. 
See the below parameters. Using this option should shorten the 
processing time (ie, without recreating the reconstruction cube) and
get into the source finding with new threshold value.

WARNING <Reading parameters> : Changing minVoxels to 14 given minPix=10 
and minChannels=5
Opening image: ms_p.fits
Dimensions of FITS file: 3835x6074x160x1
Reading data ...
 About to allocate 27.8118GB of which 13.8842GB is for the image
Done. Data array has dimensions: 3835x6074x160
Opened successfully.
Reading reconstructed array:
Segmentation fault

Thanks,
BiQing

imageFile		ms_p.fits
#flagSubsection		true
#Subsection		[*,*,*]
flaglog			true
logFile			duchamp-Logfile_MSp_th012_gr005.txt
outFile			duchamp-Results_MSp_th012_gr005.txt
flagOutputMomentMap 	false
fileOutputMomentMap	duchamp_moment_MSp.fits
flagPlotSpectra		true
spectraFile		duchamp-Spectra_MSp_th012_gr005.ps
flagOutputMask  	true
fileOutputMask  	MSp_th012_gr005.MASK.fits
flagMaskWithObjectNum 	true
flagKarma       	true
karmaFile       	duchamp-Results_MSp_th012_gr005.ann
precFlux        	3
precVel         	3
precSNR         	2
flagTrim        	false
flagMW			false
#minMW			168
#maxMW			397
flagReconExists		true
reconFile		MSp_ATCA.recon.fits
flagOutputRecon		false
fileOutputRecon 	MSp_ATCA.recon.fits
flagBaseline    	false
flagRobustStats		1
flagNegative    	0
threshold		0.12
flagGrowth		1
growthThreshold 	0.05
flagATrous		true
reconDim		3
scaleMin		3
snrRecon		2.
filterCode      	1
flagSmooth      	false
smoothType      	spatial
hanningWidth		4
kernMaj         	5.
kernMin         	1.
kernPA          	0.
flagFDR 		false
flagAdjacent		true
threshSpatial		3
threshVelocity		7
flagRejectBeforeMerge   true
flagTwoStageMerging     true
minChannels		5
minPix			10
verbose         	1
drawBorders		1
drawBlankEdges  	1
spectralMethod  	peak
spectralUnits   	km/s
pixelCentre     	centroid
sortingParam    	vel

Change History (4)

comment:1 Changed 12 years ago by MatthewWhiting

BiQing? is using 1.2.

I was able to reproduce the problem using the large cube ms_n.fits that she sent me earlier. In order to do so, I had to fake up a saved reconstructed array by copying the input (and using CASA to remove the degenerate Stokes axis). Reading the full array (of size 3835x6074x140) generates the segfault.

Using gdb isolates the problem to the cfitsio library:

Starting program: /work/whi550/Duchamp-working/Duchamp-1.2.1 -p biqing-duchamp.in
WARNING <Reading parameters> : Changing minVoxels to 14 given minPix=10 and minChannels=5
Opening image: ms_n.fits
Dimensions of FITS file: 3835x6074x140x1
Reading data ... 
 About to allocate 24.3407GB of which 12.1487GB is for the image
Done. Data array has dimensions: 3835x6074x140
Opened successfully.
Reading reconstructed array: 

Program received signal SIGSEGV, Segmentation fault.
memcpy () at ../sysdeps/x86_64/memcpy.S:392
392     ../sysdeps/x86_64/memcpy.S: No such file or directory.
        in ../sysdeps/x86_64/memcpy.S
Current language:  auto
The current source language is "auto; currently asm".
(gdb) bt
#0  memcpy () at ../sysdeps/x86_64/memcpy.S:392
#1  0x0000000000619324 in ffgbyt (fptr=0x9df050, nbytes=-4135346784, buffer=<value optimized out>, status=0x7fffffffcf28) at buffers.c:346
#2  0x000000000061956c in ffgr4b (fptr=0x9df050, byteloc=<value optimized out>, nvals=-1033836696, incre=<value optimized out>, values=0x7ff9e0363010, status=0x7fffffffcf28) at buffers.c:1010
#3  0x0000000000584d91 in ffgcle (fptr=0x9df050, colnum=<value optimized out>, firstrow=<value optimized out>, firstelem=<value optimized out>, nelem=3261130600, elemincre=1, nultyp=1, 
    nulval=<value optimized out>, array=0x7ff9e0363010, nularray=0x7fffffffc75f "", anynul=0x7fffffffcf18, status=0x7fffffffcf28) at getcole.c:853
#4  0x000000000057b2f3 in ffgpxvll (fptr=0x9df050, datatype=0, firstpix=0x7fffffffc7d0, nelem=3261130600, nulval=<value optimized out>, array=0x7ff9e0363010, anynul=0x7fffffffcf18, 
    status=0x7fffffffcf28) at getcol.c:220
#5  0x000000000057b57d in ffgpxv (fptr=0x9df050, datatype=42, firstpix=0x9df9a0, nelem=<value optimized out>, nulval=<value optimized out>, array=<value optimized out>, anynul=0x7fffffffcf18, 
    status=0x7fffffffcf28) at getcol.c:43
#6  0x000000000047ef33 in duchamp::Cube::readReconCube (this=0x9dcf90) at src/Cubes/readRecon.cc:209
#7  0x0000000000469658 in duchamp::Cube::readSavedArrays (this=0x9dcf90) at src/Cubes/cubes_extended.cc:100
#8  0x000000000040c6e8 in main (argc=<value optimized out>, argv=0x7fffffffd9b8) at src/mainDuchamp.cc:80

Notice that the number of elements goes negative for ffgr4b - this is because it requires it to be long, rather than LONGLONG. Need 64-bit build of cfitsio?

comment:2 Changed 12 years ago by MatthewWhiting

Actually, if this was a general problem, why can we read the original image?

The difference is that the original input cube is read via fits_read_subset_flt, while the recon array is read with fits_read_pix.

Since rebuilding the cfitsio library doesn't seem feasible (the long is hard-coded in for that function), perhaps re-write the reading function to use the same procedure might be the way to go.

comment:3 Changed 12 years ago by MatthewWhiting

Milestone: Release-1.2.2

comment:4 Changed 12 years ago by MatthewWhiting

Resolution: fixed
Status: newclosed

Have implemented the change to the new class structure described in #166, and make use of the alternative cftisio reading function fits_read_subset_flt as described above.

It seems that later versions of cfitsio get around this problem, but the new function will allow older versions to work fine as well.

Closing ticket, as this seems to be fixed.

Note: See TracTickets for help on using tickets.