Changes between Initial Version and Version 2 of Ticket #247


Ignore:
Timestamp:
09/03/11 10:39:48 (13 years ago)
Author:
Kana Sugimoto
Comment:

Commit Date: 2011/09/02
Commit Number: 2286 @ASAP trunk
Modfied file(s): python/scantable.py, src/Scantable.h, src/Scantable.cpp, src/python_Scantable.cpp, src/ScantableWrapper.h

I've tested scantable.summary by profiling and measuring the elapse times with scantables which has different number of rows (2,500 - 348,000). I found that the elapse times increases fast NON-linearly when the row number is ~> 25,000 which resulted in 51,022 lines of summary in the data set. The result implied that the slowness is caused by IO, because the code itself is written so that the elapse time is in proportion to the number of scans x beams x ifs and only the number of scans varied in the all test data sets.

After doing some IO testing, it turned out the slowness was caused by the way the summary text string is handled. Currently, Scantable::summary accumulates the whole text string and returns it to scantable.summary. This text string will be huge for large data set and seems to start overweighing the memory.

I updated scantable.summary so that it flushes the summary string more often to file/logger. scantable.summary and the functions called in it now takes a 'filename' as a parameter and outputs summary in Scantable::summary. After the modification, scantable.summary could list the data with 348,000 rows in ~ 7 MINUTES (709,942 lines of summary).

The side effect of it is that scantable.summary doesn't return summary string anymore. (But people may not happy with sub-million lines of string anyway.) Note, Scantable:::headerSummary still returns a summary of header info in string. This will help getting data overview in string.

Legend:

Unmodified
Added
Removed
Modified
  • Ticket #247

    • Property Status changed from new to assigned
    • Property Cc Malte.Marquarding@… added
  • Ticket #247 – Description

    initial v2  
    1 scantable.summary is very slow for large dataset (in row number).
    2 It takes >1.5h to list OTF raster scan of ~450,000 rows. ASAP should be able to list data with ~500,000 rows more fast.
    3 Speed-up scantable.summary
     1scantable.summary is very slow for large dataset (in row number) often outputted by modern telescopes.
     2It takes > 1.5 HOURS to list on-the-fly raster scan with ~350,000 rows. ASAP should be able to list data with ~several x 100,000 rows more fast. Speed-up scantable.summary.