Single_frame task produces different results each time

petarz · August 11, 2022, 7:55am

Hi,
I’m following this tutorial: The LSST Science Pipelines — LSST Science Pipelines and I’ve ran the first step “single_frame” task a few times. Each time it runs it produces different results: if I go through all calexps in the output collection (butler.registry.queryDatasets("calexp", collections=collection)), and look at their sky coverage (calexp width, height and WCS mapping), and then find the total coverage of the whole collection (max and min ra, dec coordinates), I get different results each time it runs. And I’m starting it like this (verbatim what’s in the tutorial):

pipetask run -b $RC2_SUBSET_DIR/SMALL_HSC/butler.yaml \
             -p $RC2_SUBSET_DIR/pipelines/DRP.yaml#singleFrame \
             -i HSC/RC2/defaults \
             -o u/$USER/single_frame \
             --register-dataset-types

What could be the explanation for this behavior?

Thanks,
Petar

timj · August 16, 2022, 10:57pm

Can you define what you mean by “different results”? Do you mean that the number of outputs differ or that the number of outputs are the same but the pixel values differ?

petarz · August 19, 2022, 12:13pm

I’m running the following code:


fi = sys.float_info
minratot = mindectot = minmjdtot = fi.max
maxratot = maxdectot = maxmjdtot = fi.min
for ref in butler.registry.queryDatasets("calexp", collections=in_collection):
    calexp = butler.get('calexp', dataId=ref.dataId.full, collections=in_collection)
        
    mjd = calexp.getMetadata().toDict()['MJD']
    if mjd < minmjdtot:
        minmjdtot = mjd
    if mjd > maxmjdtot:
        maxmjdtot = mjd
    
    w, h = calexp.width, calexp.height
    
    p = calexp.wcs.pixelToSky(0, 0)
    ra0 = p.getRa().asDegrees()
    dec0 = p.getDec().asDegrees()
    
    p = calexp.wcs.pixelToSky(w, h)
    ra1 = p.getRa().asDegrees()
    dec1 = p.getDec().asDegrees()
    
    if ra0 > ra1 or dec0 > dec1:
        ra0, ra1 = ra1, ra0
        dec0, dec1 = dec1, dec0
    if ra0 > ra1 or dec0 > dec1:
        print("ACHTUNG!")
    if ra0 < minratot:
        minratot = ra0
    if dec0 < mindectot:
        mindectot = dec0
    if ra1 > maxratot:
        maxratot = ra0
    if dec1 > maxdectot:
        maxdectot = dec0

I get different values for maxratot, maxdectot (sky coverage) for different single_frame collections (minmjdtot, maxmjdtot are the same). In my case (minratot, mindectot, maxratot, maxdectot, minmjdtot, maxmjdtot):

149.78290069652832, 1.9873448461547891, 150.61437929167496, 2.3215300040680105, 56741.4049780136, 57163.267164648
149.78290069652832, 1.9873448461547891, 150.61818703418197, 2.2938343604677245, 56741.4049780136, 57163.267164648

Am I doing something wrong?

timj · August 20, 2022, 9:33pm

Quick comment on the code:

Use calexp = butler.getDirect(ref) to get the actual dataset that the query has returned. Otherwise butler does a whole new query and so will very likely not return the thing that your ref is really associated with (because the query will return all matching datasets in those collections but the .get() will return the first match in the given collections). (you also should not need to use ref.dataid.full – ref.dataid should be sufficient).

The toDict is not needed. mjd = calexp.getMetadata()["MJD"] has worked for a few years now.

You should use getBBox() to get the bounding box and then use the upper and lower bounds from that rather than assuming 0,0.

petarz · October 13, 2022, 12:44pm

Forgot to say: thanks for the suggestions