Opened 16 years ago

Closed 15 years ago

#67 closed defect (fixed)

Solve large file issues in miriad tasks

Reported by: MarkWieringa Owned by: MarkWieringa
Priority: major Milestone: 10. Stage 3 - available for testing
Component: MIRIAD - CABB branch Version:
Keywords: large file support Cc:
Estimated Number of Hours: 20 Add Hours to Ticket: 5
Billable?: yes Total Hours: 0

Description (last modified by VincentMcIntyre)

Issue reported by email - we may need to split this up by task

On Thu 2008/07/31 17:11:45 MST, Juergen Ott wrote
in a message to: Mark Calabretta <mcalabre@atnf.csiro.au>,
      Phil Edwards <Philip.Edwards@csiro.au>
and copied to: Adrienne Stilp <adrienne@astro.washington.edu>,
      Steven Warren <warren@astro.umn.edu>

Hi Juergen,

I was just wondering what the status is of upgrading MIRIAD to work with large datasets that will be delivered by CABB. In fact, right now I am using MIRIAD to reduce/image VLA data and I find that there are many issues that have to do with handling large volumes of data. E.g., the fits task is not able to convert all data in miriad format, but stops after ~3h worth of data, flagging and plotting tasks (blflag, uvplt) do

The low-level Miriad IO routines can handle large files but unfortunately the tasks themselves are limited by 4-byte, signed Fortran INTEGER variables and that causes task-specific problems of the sort you are seeing.

This is touched on briefly in the installation notes, ftp://ftp.atnf.csiro.au/pub/software/miriad/INSTALL.html, where task fits is mentioned by name.

not show all data, etc., and invert sometimes fails, too, because of the large datasets. I think that those things will be fixed by making MIRIAD ready for CABB data, so has there been any progress? Are there any beta versions of MIRIAD to do so?

Mark Wieringa is the one to ask.

Regards, Mark

see also: ticket:85

Change History (8)

comment:1 Changed 16 years ago by MarkWieringa

Status: newassigned

comment:2 Changed 15 years ago by MarkWieringa

Tested fits output of a file > 2GB, this also fails. Atlod can read a large file (2.8 GB) and produce a large (2.2GB) uv file, which is read fine by other uv programs (like uvindex). However fits fails to export a file of this size (it can export a subset of the file, e.g. single source).

comment:3 Changed 15 years ago by VincentMcIntyre

2009-04-29

Another issue that came up was the handling of flag tables larger than 2^(31-1) bytes. This required changes to the low-level io code (maskio.c).

Date: Fri, 17 Apr 2009 11:38:43 +1000
From: Mark Wieringa <Mark.Wieringa@csiro.au>

...

We've just encountered an integer overflow problem in miriad. The size  of the flag table is
limited to ~256MB because it uses an int to calculate offsets in bits into the file.
This means we can't keep a 12h run of CABB data in one uv file.

I'm working my way through the c code layers to see if we can fix  
this. Similar issues seem fixed elsewhere in the io routines (by using  
off_t), but not in the maskio routines.

There were separate issues to do with scratch files (srcio.c). I think these were found by Bob Sault.

comment:4 in reply to:  2 Changed 15 years ago by VincentMcIntyre

Replying to MarkWieringa:

Tested fits output of a file > 2GB, this also fails.

Tested on 64-bit, also fails:

delphinus-111% fits in=/DATA/DELPHINUS_3/len067/CABB/pictor-a.9000.uvaver out=test.fits op=uvout
Fits: version 1.1 09-Apr-09
Polarisations copied: XX,YY,XY,YX.
### Fatal Error:  Invalid argument
delphinus-112% echo $?
1

Repeating with strace

delphinus-114% strace -o ./fits.strace fits in=/DATA/DELPHINUS_3/len067/CABB/pictor-a.9000.uvaver out=test.fits op=uvout
Fits: version 1.1 09-Apr-09
Polarisations copied: XX,YY,XY,YX.
### Fatal Error:  Invalid argument

shows that there is a problem during a seek() call:

...blahblahblah...lseek(4, 2149679104, SEEK_SET)          = 2149679104
read(4, "\32\275\256A5\275\305?\246V:\276\32\275\256A\321\177\237"..., 16384) = 
16384
lseek(6, 2147569344, SEEK_SET)          = 2147569344
write(6, ">\243\367\340\276O\317^A\344\r\215?\231\'\213\277\n\210"..., 9948) = 9
948
lseek(6, 18446744071562163612, SEEK_SET) = -1 EINVAL (Invalid argument)
write(2, "### Fatal Error:  Invalid argume"..., 35) = 35
close(6)                                = 0
unlink("test.fits")                     = 0
close(4)                                = 0
close(5)                                = 0
close(3)                                = 0
lseek(2, 0, SEEK_CUR)                   = -1 ESPIPE (Illegal seek)
fstat(2, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 3), ...}) = 0
lseek(2, 0, SEEK_END)                   = -1 ESPIPE (Illegal seek)
lseek(2, 0, SEEK_CUR)                   = -1 ESPIPE (Illegal seek)
lseek(1, 0, SEEK_CUR)                   = -1 ESPIPE (Illegal seek)
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 3), ...}) = 0
lseek(1, 0, SEEK_END)                   = -1 ESPIPE (Illegal seek)
lseek(1, 0, SEEK_CUR)                   = -1 ESPIPE (Illegal seek)
munmap(0x2b66d4c9d000, 4096)            = 0
exit_group(1)                           = ?

ie we hit a problem at 2^32.

comment:5 Changed 15 years ago by VincentMcIntyre

Description: modified (diff)

comment:6 Changed 15 years ago by VincentMcIntyre

The workaround for this particular case is to break the UV data into a few time ranges to stay below the size limit.

Apparently BobSault? is looking into reworking the code to avoid the silly seek() from the start to the end of the file.

comment:7 Changed 15 years ago by MarkWieringa

Add Hours to Ticket: 05

From Bob Sault:

I have just installed a number of changes into the RCS system to allow the "fits" task to handle FITS files that are larger than 2 Gbytes in size. There are a large number of small changes. The changes are invisible to the user. Below is a sketch of the changes.

Best regards Bob

16jul09 rjs mp.for - Added mpSign routine and better comments. 16jul09 rjs hio3.f2c - New routines to handle large file offsets from FORTRAN. 16jul09 rjs fitsio.for,fitsio.h - Changes to handle FITS files larger than 2 Gbytes. 16jul09 rjs wrap.f2c - Add a caste operation in htell (pedantry). 20jul09 rjs fits.for - Some cosmetic changes to messages to users.

comment:8 Changed 15 years ago by MarkWieringa

Resolution: fixed
Status: assignedclosed

A further change by Bob to implement a PtrDiff? type, which is basically a fortran integer*8 has now solved the problem for large memory allocations in invert (and other tasks when needed).

Note: See TracTickets for help on using tickets.