Opened 12 years ago

Closed 12 years ago

#142 closed defect (fixed)

Inconsistencies in searches with MW vs subimages

Reported by: MatthewWhiting Owned by: MatthewWhiting
Priority: normal Milestone: Release-1.2
Component: Searching Version: 1.1.13
Severity: normal Keywords:
Cc:

Description

A series of emails with BiQing?, initiated by the following:

Hi, Matthew,

I wrote a paper (and about to submit) that is based on Duchamp so the 
following issue is really worried me.

I used Duchamp to search on a cube by specifying the MW channels. I got 
838 sources. Then, I decided to extract a subcube (using miriad imsub) 
that excludes those MW channels so the cube size is reduced and flags 
can be given from Duchamp if there is any source has velocity width cut 
off near the edge. Basically, both ran on the exact same channels and 
same parameters. I got 598 sources on the subcube. This shouldn't 
happen. Note: segmentation fault didn't occur on these cubes since they 
are smaller. The cube is in K unit rather than Jy/beam. The cubes were 
run with fix threshold.

I am sending you the output and input from both runs. Perhaps you can 
help looking into this?

Thanks,
BiQing

Investigate and see if it is still a problem with 1.1.14

Change History (3)

comment:1 Changed 12 years ago by MatthewWhiting

Status: newassigned

Further emails:

Me:

Hi BiQing,

I'm visiting Sydney Uni today, so I'm not able to look into this in too much detail, but a few questions come to mind:
1) What happens if you use the original cube, but with the subsection parameter to get the same subcube?
2) It may be that the reconstruction is different between the two, since the dimensions are different and you (may) have a different number of scales. You can write out the reconstructed cube using the flagOutputRecon parameter, so it might be an idea to compare the two.
3) On this point, what was the size of the cube before & after trimming?
4) I also noticed that the subcube case quotes a blank pixel value - this is presumably the one that comes out of the miriad construction. It probably doesn't affect anything, but using the subsection option above would get around any issues with this.
5) I'm wondering if the min pixels etc parameters may affect this in some way - I suspect any differences in reconstruction may play into this.
6) Do you get differences in the results if you *don't* do the reconstruction?

That's all that occurs to me at the moment. I'll have a bit more of a think about it and let you know if there's something else.

BTW, I've also been looking at your other data set - I located the segmentation fault, and fixed it and a couple of other (related) issues. I'm hoping to release an updated version this week, so you can try it with that... (The segfaults were essentially due to the size of the data set, so thanks for providing this useful test!)

Cheers,
Matt.

BiQing?:

I'll try what you suggested and let you know. For now,

1) What happens if you use the original cube, but with the 
subsection parameter to get the same subcube?

547 sources (original cube with specified subsection) vs 598 sources 
(subcube). The same source has a slight different parameters. See 
spsubcube* --> original cube with specified subsection. See previous 
email (subcube_duchamp-Results.txt) for the subcube result.

It looks like flagging out the MW channels might have something to do 
with it.

3) On this point, what was the size of the cube before&  after trimming?

1.1 GB (subcube) and 1.6 GB (whole, original cube)

Me again:

1) What happens if you use the original cube, but with the 
subsection parameter to get the same subcube?

547 sources (original cube with specified subsection) vs 598 sources 
(subcube). The same source has a slight different parameters. See 
spsubcube* --> original cube with specified subsection. See previous 
email (subcube_duchamp-Results.txt) for the subcube result.

It looks like flagging out the MW channels might have something to do 
with it.

Hmmm - this I don't really understand :)

I managed to convince myself that I could see how the first pair could be different - when you do the reconstruction, all channels are included, even the MW ones. It is only at the searching stage that they are ignored. So it's conceivable that the reconstruction would give different results with and without those flagged channels.

However, from your descriptions this above test should give the same result as the subcube case, and you aren't doing any flagging in either of them. It may be how the miriad imsub is treating the data - does it actually make a smaller image, or does it just flag out channels but keep the same original size


3) On this point, what was the size of the cube before&  after trimming?

1.1 GB (subcube) and 1.6 GB (whole, original cube)

Sorry - I actually meant the pixel sizes, but given the subsection in that parset I'm guessing the image is ~900x900x547 (or a
bit smaller spatially). That means the spectral direction is the smallest, and so will govern how many scales you can do in the reconstruction (it uses the same number of scales in all three directions). Trimming a third of the channels may reduce the number of scales used in the subcube case, and make the reconstruction everywhere a little less exact (and what you'll miss are the large scale fluctuations).

I'm happy to take a look at the data myself if you like...

comment:2 Changed 12 years ago by MatthewWhiting

Here is my email to Bi Qing from last week, detailing the results of my investigation on gismo:

Hi BiQing,

Sorry for not getting back to you sooner - I was away on holidays last week, and have had a somewhat interrupted week with 
sick kids and other things. 
Here's a summary of what I understand is going on:

* The reconstruction needs a bit of careful understanding. The reason you get different results between the full cube and the 
full cube with flagSubsection=true is due to the way the reconstruction behaves. One key feature is that boundaries are dealt 
with by assuming reflection. For a given point, calculating the large-scale wavelet coefficients involves multiplying the filter 
coefficients by points a long way away. Since we reflect at the boundary, if that boundary is in a different place, the filter 
coefficient will be multiplied by a different pixel value, giving a different result - *even if the point in question is well within 
your desired subsection*.
The differences will in general be small, but they are there, and are probably more obvious for your data where there is large-
scale structure. 

However, this is at odds with your comment in your last email on the weekend:

> to save you some time. I ran the original cube and the subcube. The output of both wavelet reconstruction cubes are 
> identical. Thus, the problem is something else as we suspected earlier.

I'm not sure I understand that though, as that is not what I'm finding (I trim the full reconstructed cube to the same dimensions 
as the subcube, and I get differences, explained by the above reasoning).

* The reason you get a difference between the flagSubsection=true case and the IMSUB case appears to be that IMSUB has 
written a BLANK=-1 keyword to the FITS header. This is then affecting the reconstruction (pixels that are deemed to be BLANK 
are left alone in the reconstruction). 
I've checked this in two ways:
1) I made a similar subcube using casapy, and got the same results (subject to the offset in the z-direction) as for the 
flagSubsection=true case - the reconstructed cubes here are identical.
2) I copied the imsub.fits file and changed the BLANK keyword so that it wouldn't get recognised. I also found this gave the 
same results as flagSubsection=true (save for the z-offset).

* The MW flagging does not seem to be important - this is done at the searching stage, and is just a way of not looking at the 
relevant channels. They still get reconstructed though. 

Does all this make sense? We can discuss further next week if you like - the monday meeting won't happen due to the public 
holiday, but we can talk another time if need be.

Cheers,
Matt.

Anyway, the upshot is that I think I understand what is going on now. I'm currently running a test with the latest code to check that there aren't any further issues cropping up, and that we get the same results as described above.

comment:3 Changed 12 years ago by MatthewWhiting

Resolution: fixed
Status: assignedclosed

Happy enough to close this, after the large number of tests run on giant (see evernote notes on this) and on the solution of #153.

Note: See TracTickets for help on using tickets.