Wcs mismatch in `forcedPhotCoadd.py`

jeffcarlin · July 20, 2018, 4:37pm

Indeed, that’s the same issue I’m having. I’m re-running my processing serially so I can figure out which patch it’s failing on. Will let you know when I come across it…

price · July 20, 2018, 6:03pm

The RingsSkyMap bug is outlined here.

I think that the skymap changes shouldn’t affect the Wcs at all, but I’m not sure how the Wcses are compared.

jcoupon · July 20, 2018, 7:02pm

tract 8524 and patch 5,5 of the HSC-SSP sky map.

The comparison is made using the “==” operator: https://github.com/lsst/meas_modelfit/blob/c37dbc231404a72790a32a79ddd39442533e490f/python/lsst/meas/modelfit/cmodel/cmodelContinued.py#L104

Could it be that the tracts got reorganised in memory after correcting the “missing tract” bug (?).

price · July 20, 2018, 7:17pm

The tracts are only renumbered if you’re using the new version of the RingsSkyMap, but you should be using the same version as before. And tract=8524 isn’t near the RA=0 boundary, is it? In any case, I don’t think the numbering has anything to do with the Wcs that’s used.

One way to settle this would be to use a version of hscPipe from before the skymap bug was fixed, and see if that solves the problem.

jcoupon · July 20, 2018, 7:35pm

8524 is at ~ 02h.

I have hscPipe v 6.5.1 installed, so I can rerun it with that version but that’s pretty much all I can do at the moment.

jcoupon · July 20, 2018, 7:57pm

OK - same problem with hscPipe 6.5.1… which is weird because if I read your e-mails, this version did not include the bug fix. So it could be something different. Here is what I run:

  multiBandDriver.py $ROOTDIR --rerun=multiband \
    --id tract=8524 patch=5,5 filter=HSC-R^HSC-I  \
    --clobber-config --no-versions \
    --nodes 1 --procs 1 --do-exec \
    --config measureCoaddSources.doPropagateFlags=False

with the data provided by NAOJ (which do not include single visits, hence the doPropagateFlags=False.

I also hacked multiband.py at line 749 to do the measurements on less sources:

        n = 100
        self.log.info("DEBUGGING: Keep {} sources".format(n))
        mergedList = mergedList[:n]

jcoupon · July 20, 2018, 8:18pm

[edit: moved to Undeblended aperture corrections not applied in hscPipe v 6.6]

it should probably be a separate issue, but if I force the pipeline to ignore this buggy Wcs mismatch (by just commenting the check in cmodelContinued.py), I get the following warning:

multiBandDriver.forcedPhotCoadd.applyApCorr WARN: Cannot aperture correct undeblended_ext_convolved_ConvolvedFlux_2_4_5 because could not find undeblended_ext_convolved_ConvolvedFlux_2_4_5_flux or undeblended_ext_convolved_ConvolvedFlux_2_4_5_fluxSigma in apCorrMap

which should not happen given the doUndeblended function you created in the forcedPhotCoadd.py config, right?

price · July 21, 2018, 5:10pm

I noticed the following quote in the SkyWcs::operator== docs:

Equality is based on the string representations being equal

and

Thus equality is primarily useful for testing persistence.

I wonder if the two Wcses are equal as we would ordinarily consider them to be (or the old TanWcs considered them to be) but they are simply not comparing equal due to some deep magic in ast land? Could you print refWcs.writeString() and exposure.getWcs().writeString() at cmodelContinued.py:104 for us, please?

jcoupon · July 21, 2018, 5:40pm

I printed the output in files, which I attach. Below is the diff output:

ref.ascii (5.3 KB)
exposure.ascii (5.3 KB)

diff exposure.ascii ref.ascii

81c81
<           SRef2 = -0.0908725147732585142
---
>           SRef2 = -0.0908725147732585281
124c124
<                             M0 = -0.0736922569493146662
---
>                             M0 = -0.0736922569493146801
127c127
<                             M3 = -0.0529580940213578086
---
>                             M3 = -0.0529580940213578155
132c132
<                             M8 = -0.0907474983493142401
---
>                             M8 = -0.090747498349314254

Note: code added to cmodelContinue.py l. 104:

        file_in = open('ref.ascii', 'w')
        file_in.write(refWcs.writeString())
        file_in.close()
        file_in = open('exposure.ascii', 'w')
        file_in.write(exposure.getWcs().writeString())
        file_in.close()

price · July 21, 2018, 5:55pm

So there are genuine differences, but they’re tiny. It seems possible that it’s due to numerical issues from building the SkyWcs on different machines: Linux (Japan) vs OSX (Coupon).

Since you’re in a rush, I recommend commenting out the check in cmodelContinued.py. In the longer run, we need to think about a better check than equality of the SkyWcs, perhaps testing just the local CD matrix?

jcoupon · July 21, 2018, 6:21pm

yep, that’s what I’m currently doing.

price · July 21, 2018, 9:11pm

Filed DM-15181: Trivial SkyWcs differences prevent CModel from running.

price · July 21, 2018, 9:32pm

@jcoupon, would you mind also printing for me:

refWcs.linearizeSkyToPixel(refRecord.getCoord(), lsst.afw.geom.arcseconds).getMatrix()
exposure.getWcs().linearizeSkyToPixel(refRecord.getCoord(), lsst.afw.geom.arcseconds).getMatrix()

(You’ll have to import lsst.afw.geom.)

jcoupon · July 21, 2018, 9:40pm

[[ -5.95240244e+00   8.42355644e-04   7.83062774e+05]
 [  8.28432580e-04   5.95240244e+00   1.29463302e+05]
 [  0.00000000e+00   0.00000000e+00   1.00000000e+00]]
[[ -5.95240244e+00   8.42355644e-04   7.83062774e+05]
 [  8.28432682e-04   5.95240244e+00   1.29463302e+05]
 [  0.00000000e+00   0.00000000e+00   1.00000000e+00]]

mardom · October 5, 2021, 8:09pm

Hi, I also found this problem recently trying to do forced photometry on actual HSC data. Unfortunately, I can’t figure what the recommended solution for these WCS differences is.

price · October 6, 2021, 2:47pm

Using which version?

mardom · October 6, 2021, 3:07pm

I’m using a Docker version of hscpipe 7.9.1, which seems to be based on lsst-scipipe-4d7b902

price · October 6, 2021, 3:19pm

Could you please try hscPipe 8.5.3, which is the latest release?

price · October 6, 2021, 3:39pm

Actually, I’m not sure upgrading will solve the problem. It looks like this never got fixed (DM-15181 and DM-28880 are both listed as “To Do”).

DM-15181 suggests the issue might be caused by generating the skyMap on one machine and using it on another (e.g., MacOS vs Linux). If that’s the case here, you could work around the problem by sticking to a single machine.

mardom · October 31, 2021, 11:27am

Thank you, Paul! It works like a charm. Excellent work with the multyBandDriver.py. I’m checking now the photometry, but it seems the bug is not present on this version.