I’m using the DRP step3
processing to create deep coadds and good seeing templates using a set of calibrated images from DECam, and I am running into a few issues when running the jointcal
task.
Slow convergence
I am finding that processing gets stuck on the jointcal
stage, and I’ve identified that it is the outlier rejection step that is very slow to converge. I’ve found a configurable astrometryOutlierRelativeTolerance
which I’ve set equal to 0.002
which allows for the astrometry solution to converge in a reasonable amount of time. However, I’ve found that there is no corresponding configurable for photometry, e.g. photometryOutlierRelativeTolerance
. I’ve manually edited jointcal.py
to set sigmaRelativeTolerance=0.002
as a default in _iterate_fit
as a work around to allow the photomoetry fit to converge.
I’ve also noticed from debug logging that the median/mode/standard deviation of chi^2/dof indicates that the chi^2 distribution is highly skewed due to the outliers, and so both the mean and standard deviation are biased high, making mean+3sigma cuts ineffective at removing outliers. I have actually deleted the logs since making this post (I can try to regenerate if needed), but from memory the chi^2/dof statistics were something like: median=~4, mean=~10, stddev=~20. So I was wondering if just using a different statistic, e.g. median + standard deviation, or perhaps using percentiles of the distribution or statistics derived from percentiles would be better for outlier determination.
Memory usage issues
I am finding that running jointcal has very high memory requirements when calibrating across ~250 visits from DECam (62-3=59 CCDs per visit) with imaging of dense fields (galactic bulge). I am seeing that memory usage grows over time up to ~250GB, and then I’ve had processes fail due to out-of-memory errors. Here are a few logs pointing to the number of sources used in fitting:
INFO 2022-04-30T01:09:08.630-07:00 lsst.jointcal (jointcal:{instrument: 'DECam', skymap: 'discrete_decaps_east', physical_filter: 'r DECam SDSS c0002 6415.0 1480.0', tract: 0, ...})(jointcal.py:1260) - ====== Now processing astrometry...
INFO 2022-04-30T01:22:33.088-07:00 lsst.jointcal (jointcal:{instrument: 'DECam', skymap: 'discrete_decaps_east', physical_filter: 'r DECam SDSS c0002 6415.0 1480.0', tract: 0, ...})(loadReferenceObjects.py:904) - Loading reference objects from gaia_dr2_20200414 in region bounded by [269.18883014, 272.59111395], [-31.00847184, -28.04821100] RA Dec
INFO 2022-04-30T01:23:16.169-07:00 lsst.jointcal (jointcal:{instrument: 'DECam', skymap: 'discrete_decaps_east', physical_filter: 'r DECam SDSS c0002 6415.0 1480.0', tract: 0, ...})(loadReferenceObjects.py:945) - Loaded 5172187 reference objects
INFO 2022-04-30T01:23:20.465-07:00 lsst.jointcal (jointcal:{instrument: 'DECam', skymap: 'discrete_decaps_east', physical_filter: 'r DECam SDSS c0002 6415.0 1480.0', tract: 0, ...})(loadReferenceObjects.py:1321) - Correcting reference catalog for proper motion to <Time object: scale='tai' format='jyear' value=2021.4277170694447>
INFO 2022-04-30T01:27:36.187-07:00 lsst.jointcal.astrometryReferenceSelector (jointcal:{instrument: 'DECam', skymap: 'discrete_decaps_east', physical_filter: 'r DECam SDSS c0002 6415.0 1480.0', tract: 0, ...})(sourceSelector.py:618) - Selected 5172187/5172187 references
WARNING 2022-04-30T01:27:36.785-07:00 lsst.jointcal.Associations (jointcal:{instrument: 'DECam', skymap: 'discrete_decaps_east', physical_filter: 'r DECam SDSS c0002 6415.0 1480.0', tract: 0, ...})(Associations.cc:255) - No ra/dec proper motion covariances in refcat: Field with name 'pm_ra_Dec_Cov' not found with type 'F'.
INFO 2022-04-30T01:27:42.829-07:00 lsst.jointcal.Associations (jointcal:{instrument: 'DECam', skymap: 'discrete_decaps_east', physical_filter: 'r DECam SDSS c0002 6415.0 1480.0', tract: 0, ...})(Associations.cc:333) - Associated 605638 reference stars among 5172187
INFO 2022-04-30T01:27:43.016-07:00 lsst.jointcal.Associations (jointcal:{instrument: 'DECam', skymap: 'discrete_decaps_east', physical_filter: 'r DECam SDSS c0002 6415.0 1480.0', tract: 0, ...})(Associations.cc:343) - Fitted stars before measurement # cut: 613829
INFO 2022-04-30T01:27:45.372-07:00 lsst.jointcal.Associations (jointcal:{instrument: 'DECam', skymap: 'discrete_decaps_east', physical_filter: 'r DECam SDSS c0002 6415.0 1480.0', tract: 0, ...})(Associations.cc:383) - Fitted stars after measurement # cut: 606747
INFO 2022-04-30T01:27:45.373-07:00 lsst.jointcal.Associations (jointcal:{instrument: 'DECam', skymap: 'discrete_decaps_east', physical_filter: 'r DECam SDSS c0002 6415.0 1480.0', tract: 0, ...})(Associations.cc:384) - Total, valid number of Measured stars: 25030947, 25023865
Should I play around with the sourceSelector
, astrometryReferenceSelector
, and photometryReferenceSelector
to select fewer sources? (If this is the cause of the memory usage – I know that jointcal is manipulating a very sparse matrix, so maybe the number of sources is not a problem.)
Determining if a potential bad fit is bad
I am finding however that a tolerance of 0.002
leads to the fit being “potentially bad”, e.g.:
INFO 2022-04-27T17:24:21.558-07:00 lsst.jointcal (jointcal:{instrument: 'DECam', skymap: 'discrete_decaps_east', physical_filter: 'i DECam SDSS c0003 7835.0 1470.0', tract: 0, ...})(jointcal.py:1410) - Fit completed chi2/ndof : 3.49659e+07/533408=65.5518
ERROR 2022-04-27T17:24:21.602-07:00 lsst.jointcal (jointcal:{instrument: 'DECam', skymap: 'discrete_decaps_east', physical_filter: 'i DECam SDSS c0003 7835.0 1470.0', tract: 0, ...})(jointcal.py:1715) - Potentially bad fit: High chi-squared/ndof.
So I have to do some more experimenting, but I am also looking for a way to verify that this fit is bad enough to produce WCS/photometry results worse than a per-detector WCS solution. I am wondering if anyone has suggestions.
Model is not valid
Finally, I’ve had photometric fitting fail after astrometric fitting succeeded:
ERROR 2022-04-30T12:27:33.735-07:00 lsst.ctrl.mpexec.singleQuantumExecutor (jointcal:{instrument: 'DECam', skymap: 'discrete_decaps_east', physical_filter: 'r DECam SDSS c0002 6415.0 1480.0', tract: 0, ...})(singleQuantumExecutor.py:196) - Execution of task 'jointcal' on quantum {instrument: 'DECam', skymap: 'discrete_decaps_east', physical_filter: 'r DECam SDSS c0002 6415.0 1480.0', tract: 0, ...} failed. Exception ValueError: Model is not valid: check log messages for warnings.
I am wondering what I should look for in the logs to indicate why this is happening. I find that there are warnings saying there are too few measured stars (which is curious since these are dense fields):
WARNING 2022-04-30T12:27:32.717-07:00 lsst.jointcal (jointcal:{instrument: 'DECam', skymap: 'discrete_decaps_east', physical_filter: 'r DECam SDSS c0002 6415.0 1480.0', tract: 0, ...})(jointcal.py:1631) - ccdImage 995274_62 has only 81 measuredStars (desired 100)
And warnings that the pixel values are negative:
WARNING 2022-04-30T12:27:33.716-07:00 lsst.jointcal.ConstrainedMagnitudeModel (jointcal:{instrument: 'DECam', skymap: 'discrete_decaps_east', physical_filter: 'r DECam SDSS c0002 6415.0 1480.0', tract: 0, ...})(PhotometryModel.cc:60) - CcdImage 995251_25 is negative at (0,2047.5): -12676.5
I am thinking I’ll inspect some of the CCDs that it claims have few sources and/or extreme negative pixel values to see if something has gone wrong in step{0-2} processing.
Thanks for any help with this.