Wcs mismatch in `forcedPhotCoadd.py`


(Jcoupon) #1

I’m trying to re-run the multiband process on existing HSC bands (no emulated data), but I get the following error:

Traceback (most recent call last):
  File "/Users/coupon/local/source/hscPipe6/stack/miniconda3-4.3.21-10a4fa6/DarwinX86/pipe_base/6.0-hsc+2/python/lsst/pipe/base/cmdLineTask.py", line 408, in __call__
    result = task.run(dataRef, **kwargs)
  File "/Users/coupon/local/source/hscPipe6/stack/miniconda3-4.3.21-10a4fa6/DarwinX86/meas_base/6.2-hsc+1/python/lsst/meas/base/forcedPhotImage.py", line 164, in run
    self.measurement.run(measCat, exposure, refCat, refWcs, exposureId=self.getExposureId(dataRef))
  File "/Users/coupon/local/source/hscPipe6/stack/miniconda3-4.3.21-10a4fa6/DarwinX86/meas_base/6.2-hsc+1/python/lsst/meas/base/forcedMeasurement.py", line 356, in run
    beginOrder=beginOrder, endOrder=endOrder)
  File "/Users/coupon/local/source/hscPipe6/stack/miniconda3-4.3.21-10a4fa6/DarwinX86/meas_base/6.2-hsc+1/python/lsst/meas/base/baseMeasurement.py", line 282, in callMeasure
    self.doMeasurement(plugin, measRecord, *args, **kwds)
  File "/Users/coupon/local/source/hscPipe6/stack/miniconda3-4.3.21-10a4fa6/DarwinX86/meas_base/6.2-hsc+1/python/lsst/meas/base/baseMeasurement.py", line 301, in doMeasurement
    plugin.measure(measRecord, *args, **kwds)
  File "/Users/coupon/local/source/hscPipe6/stack/miniconda3-4.3.21-10a4fa6/DarwinX86/meas_modelfit/6.0b6-hsc+6/python/lsst/meas/modelfit/cmodel/cmodelContinued.py", line 106, in measure
    "CModel forced measurement currently requires the measurement image to have the same"
lsst.pex.exceptions.wrappers.FatalAlgorithmError: CModel forced measurement currently requires the measurement image to have the same Wcs as the reference catalog (this is a temporary limitation).

the Wcs test (between the “reference” dataId and the actual exposure) is made on the Wcs object (but not value) , and the error probably originates from a mismatch between the Wcs that was recorded in the exposure during the HSC data processing and the Wcs taken from skyMap.pickle.

Don’t we want to compare the Wcs values instead? How can I fix it?


(Jim Bosch) #2

I heard from @natelust recently that @jeffcarlin encountered the same problem, so I think you must have stumbled on a new bug (could this be in the RingsSkyMap changes, @price?).

Could you let us know the tract/patch you’re working with and anything else we could use to try to reproduce?


(Jeff Carlin) #3

Indeed, that’s the same issue I’m having. I’m re-running my processing serially so I can figure out which patch it’s failing on. Will let you know when I come across it…


(Paul Price) #4

The RingsSkyMap bug is outlined here.

I think that the skymap changes shouldn’t affect the Wcs at all, but I’m not sure how the Wcses are compared.


(Jcoupon) #5

tract 8524 and patch 5,5 of the HSC-SSP sky map.

The comparison is made using the “==” operator: https://github.com/lsst/meas_modelfit/blob/c37dbc231404a72790a32a79ddd39442533e490f/python/lsst/meas/modelfit/cmodel/cmodelContinued.py#L104

Could it be that the tracts got reorganised in memory after correcting the “missing tract” bug (?).


(Paul Price) #6

The tracts are only renumbered if you’re using the new version of the RingsSkyMap, but you should be using the same version as before. And tract=8524 isn’t near the RA=0 boundary, is it? In any case, I don’t think the numbering has anything to do with the Wcs that’s used.

One way to settle this would be to use a version of hscPipe from before the skymap bug was fixed, and see if that solves the problem.


(Jcoupon) #7

8524 is at ~ 02h.

I have hscPipe v 6.5.1 installed, so I can rerun it with that version but that’s pretty much all I can do at the moment.


(Jcoupon) #8

OK - same problem with hscPipe 6.5.1… which is weird because if I read your e-mails, this version did not include the bug fix. So it could be something different. Here is what I run:

  multiBandDriver.py $ROOTDIR --rerun=multiband \
    --id tract=8524 patch=5,5 filter=HSC-R^HSC-I  \
    --clobber-config --no-versions \
    --nodes 1 --procs 1 --do-exec \
    --config measureCoaddSources.doPropagateFlags=False

with the data provided by NAOJ (which do not include single visits, hence the doPropagateFlags=False.

I also hacked multiband.py at line 749 to do the measurements on less sources:

        n = 100
        self.log.info("DEBUGGING: Keep {} sources".format(n))
        mergedList = mergedList[:n]

(Jcoupon) #9

[edit: moved to Undeblended aperture corrections not applied in hscPipe v 6.6]

it should probably be a separate issue, but if I force the pipeline to ignore this buggy Wcs mismatch (by just commenting the check in cmodelContinued.py), I get the following warning:

multiBandDriver.forcedPhotCoadd.applyApCorr WARN: Cannot aperture correct undeblended_ext_convolved_ConvolvedFlux_2_4_5 because could not find undeblended_ext_convolved_ConvolvedFlux_2_4_5_flux or undeblended_ext_convolved_ConvolvedFlux_2_4_5_fluxSigma in apCorrMap

which should not happen given the doUndeblended function you created in the forcedPhotCoadd.py config, right?


(Paul Price) #10

I noticed the following quote in the SkyWcs::operator== docs:

Equality is based on the string representations being equal

and

Thus equality is primarily useful for testing persistence.

I wonder if the two Wcses are equal as we would ordinarily consider them to be (or the old TanWcs considered them to be) but they are simply not comparing equal due to some deep magic in ast land? Could you print refWcs.writeString() and exposure.getWcs().writeString() at cmodelContinued.py:104 for us, please?


(Jcoupon) #11

I printed the output in files, which I attach. Below is the diff output:

ref.ascii (5.3 KB)
exposure.ascii (5.3 KB)

diff exposure.ascii ref.ascii

81c81
<           SRef2 = -0.0908725147732585142
---
>           SRef2 = -0.0908725147732585281
124c124
<                             M0 = -0.0736922569493146662
---
>                             M0 = -0.0736922569493146801
127c127
<                             M3 = -0.0529580940213578086
---
>                             M3 = -0.0529580940213578155
132c132
<                             M8 = -0.0907474983493142401
---
>                             M8 = -0.090747498349314254

Note: code added to cmodelContinue.py l. 104:

        file_in = open('ref.ascii', 'w')
        file_in.write(refWcs.writeString())
        file_in.close()
        file_in = open('exposure.ascii', 'w')
        file_in.write(exposure.getWcs().writeString())
        file_in.close()

(Paul Price) #12

So there are genuine differences, but they’re tiny. It seems possible that it’s due to numerical issues from building the SkyWcs on different machines: Linux (Japan) vs OSX (Coupon).

Since you’re in a rush, I recommend commenting out the check in cmodelContinued.py. In the longer run, we need to think about a better check than equality of the SkyWcs, perhaps testing just the local CD matrix?


(Jcoupon) #13

yep, that’s what I’m currently doing.


(Paul Price) #14

Filed DM-15181: Trivial SkyWcs differences prevent CModel from running.


(Paul Price) #15

@jcoupon, would you mind also printing for me:

  • refWcs.linearizeSkyToPixel(refRecord.getCoord(), lsst.afw.geom.arcseconds).getMatrix()
  • exposure.getWcs().linearizeSkyToPixel(refRecord.getCoord(), lsst.afw.geom.arcseconds).getMatrix()

(You’ll have to import lsst.afw.geom.)


(Jcoupon) #16
[[ -5.95240244e+00   8.42355644e-04   7.83062774e+05]
 [  8.28432580e-04   5.95240244e+00   1.29463302e+05]
 [  0.00000000e+00   0.00000000e+00   1.00000000e+00]]
[[ -5.95240244e+00   8.42355644e-04   7.83062774e+05]
 [  8.28432682e-04   5.95240244e+00   1.29463302e+05]
 [  0.00000000e+00   0.00000000e+00   1.00000000e+00]]