#247 closed defect (fixed)
Speed up scantable.summary
Reported by: | Kana Sugimoto | Owned by: | Kana Sugimoto |
---|---|---|---|
Priority: | high | Milestone: | Unified development |
Component: | General | Version: | 2.0 |
Severity: | major | Keywords: | |
Cc: | Malte.Marquarding@… |
Description (last modified by )
scantable.summary is very slow for large dataset (in row number) often outputted by modern telescopes. It takes > 1.5 HOURS to list on-the-fly raster scan with ~350,000 rows. ASAP should be able to list data with ~several x 100,000 rows more fast. Speed-up scantable.summary.
Change History (7)
comment:1 by , 13 years ago
Status: | new → assigned |
---|
comment:2 by , 13 years ago
Cc: | added |
---|---|
Description: | modified (diff) |
comment:3 by , 13 years ago
Putting my email for the record of discussion.
From: Kanako Sugimoto Subject: Re: [ASAP] #247: Speed up scantable.summary Date : Sat, 03 Sep 2011 13:46:22 +0900 (JST)
Hi Malte,
Do you have any use case that you use returned text from scantable.summay? I've currently switched-off returning text string from the method to speed up scantable.summay. The details of discussions are at the bottom. However, thorough tests showed that it is not the huge string itself which makes summary slower, but the output of string to logger takes whole bunch of times. So it's rather an issue in CASA logger rather than ASAP. I'm not quite sure how fast ASAP logger is.
It's possible to put back the function to return string certainly. However, I don't see any use case to reuse the summary of the MAIN table in CASA. Is there any? The problem is that the summary of main table can be very huge to get useful information from it. Currently, you can get header information by scantable._list_header() as a string and plotter is using the function. I think scantable.str() should also call it instead of summary.
Any way I want to keep log and file output in Scantable::summay to save operation time. But I don't know well about logger in ASAP. Could you test if the current code works for ASAP logger please?
Cheers, kana.
comment:4 by , 13 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Commit Date: 2011/09/08
Commit Number: 2290 @ASAP trunk
Modfied file(s): python/scantable.py, python/asapplotter.py, src/Scantable.h, src/Scantable.cpp, src/ScantableWrapper.h
New scantable.summary. Less use of TableIterator for speed-up. The format of summary is also changed. The new scantable.summary lists the data with 348,000 rows in ~ 30 SECONDS (361,950 lines of summary).
comment:5 by , 13 years ago
I forgot to mention a thing. The previous Scantable::summary and Scantable::headerSummary is renamed as Scantable::oldsummary and Scantable::oldheaderSummary and preserved so far.
Commit Date: 2011/09/02
Commit Number: 2286 @ASAP trunk
Modfied file(s): python/scantable.py, src/Scantable.h, src/Scantable.cpp, src/python_Scantable.cpp, src/ScantableWrapper.h
I've tested scantable.summary by profiling and measuring the elapse times with scantables which has different number of rows (2,500 - 348,000). I found that the elapse times increases fast NON-linearly when the row number is ~> 25,000 which resulted in 51,022 lines of summary in the data set. The result implied that the slowness is caused by IO, because the code itself is written so that the elapse time is in proportion to the number of scans x beams x ifs and only the number of scans varied in the all test data sets.
After doing some IO testing, it turned out the slowness was caused by the way the summary text string is handled. Currently, Scantable::summary accumulates the whole text string and returns it to scantable.summary. This text string will be huge for large data set and seems to start overweighing the memory.
I updated scantable.summary so that it flushes the summary string more often to file/logger. scantable.summary and the functions called in it now takes a 'filename' as a parameter and outputs summary in Scantable::summary. After the modification, scantable.summary could list the data with 348,000 rows in ~ 7 MINUTES (709,942 lines of summary).
The side effect of it is that scantable.summary doesn't return summary string anymore. (But people may not happy with sub-million lines of string anyway.) Note, Scantable:::headerSummary still returns a summary of header info in string. This will help getting data overview in string.