Thanks! I’ll try a test omitting those 4 exposures–I’m surprised, though, as those aren’t the ones that were in the dangerous date range. Will report back…
Ian
Just to check: what version of jointcal and/or the Science Pipelines are you running? Note that jointcal doesn’t look at the images, just the catalogs.
You can try turning on the writeChi2FilesInitialFinal config option. This will result in a .csv file in your current working directory containing the contributions to the chi2 matrix from every source at the time of initialization. You should be able to find the NaNs in there, which might help you track down which detectors/visits are causing the problem. You can also try running with log level DEBUG (--loglevel jointcal=DEBUG in this case), though I’m not sure there will be much useful information during initialization.
The “Matched 0 objects” in the first block of the log is just the fact that those are the first detectors+visits in the list, so they will have no matches. Any detector on or past the edge of a tract may have zero cross matches, but should have refcat matches.
I wonder if this is related to this error reported by @sfu : DM-22548? We weren’t able to find anything conclusive in that case, and other things took priority after I’d initially looked into it.
Thanks! We’re using v19.0 still (we were hesitant to move to v20). I’ll add the writeChi2FilesInitialFinal and let you know what we find. (Last night I tried a couple of tests omitting or including files and can confirm that the “Matched 0 objects” entries don’t affect the completion of the jointcal processing, whereas it is the exposures from that one run that cause the NaN. Yes, this the same problem @sfu pointed out–we’re coming back to it and it’s increased in prominence because it seems to affect every exposure of that run, no matter the pointing…
So, the output of writeChi2FilesInitialFinal is interesting–the _meas version is where all of the NaN are. It looks like columns rx,ry,rxi,ryi (and chi2) for many objects are all nan. I am not sure I understand the “visit” entry, though. What’s listed is almost always the first visit in the sequence and not the offending images… I’m including the a link to the file (it’s too big to include). I notice that even the objects that aren’t nan seem to have weirdly large values for this column–unless I don’t understand the units?
We’re still puzzled by this problem. I spent much of last week going through the output of ProcessCcd on the “bad” run and the other runs, and couldn’t find anything that was out of place. That said, jointcal runs fine if the images are omitted, but crashes immediately with the NaN error when they are included. We’ve narrowed it down to the rx,ry,rxi,ryi, and chi2 entries in the astrometry_initial_chi2-0_z-meas.csv
file. Can someone give me a hint as to where to look to track down where these NaN could be calculated? (The link is included, but I reproduce a couple of lines from the file; headers and relevant entries) in case it’s useful:
#id xccd yccd rx ry xtp ytp mag mjd xErr yErr xyCov xtpi ytpi rxi ryi color fsindex ra dec chi2 nm chip visit
#id in source catalog coordinates in CCD (pixels) residual on TP (degrees) transformed coordinate in TP (degrees) rough magnitude Modified Julian Date of the measurement transformed measurement uncertainty (degrees) as-read position on TP (degrees) as-read residual on TP (degrees) currently unused unique index of the fittedStar on sky position of fittedStar contribution to Chi2 (2D dofs) number of measurements of this fittedStar chip id visit id
91815667665207338 967.330371 173.495251 nan nan -186.439451 -91.8579238 14.9774629 56461.0671 9.09943477e-08 9.17736513e-08 5.81127881e-11 -0.140123132 0.00627986687 nan nan 0 -1 227.261502 6.55549107 nan 5 1 213775
So it looks like there’s a problem with the residual position on the tangent plane (rx,ry,rxi,ryi; and chi2 is calculated from those). But the position on the tangent plane is fine so it’s not a projection problem. So does this mean that a NaN has gotten into the coefficients?
@idellant Are the images going into this jointcal run available for download somewhere easily accessible? (I would like to see the headers, mainly, but it might be helpful to see what the images look like.)
Is it possible for you to send me a pickle file with the list of visitInfos?
@price, I’ve put two images (one that works, visit 0202219, and one that fails, visit 0213777), plus the calexps and src catalogs of a chip (41) in case you want to see them in http://www.het.brown.edu:/people/ian/jointcal2 . I apologize, I’m not sure where the pickle file with the visitInfos would be stored–the only .pickle files that seem to be generated look to be the various packages.pickle and the SkyMap.pickle file. Sorry, Ian
I think this means that the airmass is NaN, which means CcdImage._tanZ is NaN, which means the refraction vector is NaN, which causes the rx,ry values to be NaN.
Can you check to see if the AIRMASS header is present and defined in the raw images?
The Proper Way to fix this is via a set of header correction files in obs_decam, but that scheme doesn’t currently exist, and I don’t remember what needs to be done to support that (@timj?). A quick and dirty fix would be to hack the headers (very naughty, I know!). And there should be a check for bad VisitInfo values in jointcal.
Indeed! The “bad” raw image has no AIRMASS keyword, whereas the good one does.
So, the practical question is–is it better to hack the raw image’s header and add the keyword, then rerun processCcd (I think so?)
or is it better/easier to just add the keyword to the calexps headers?
Thank you! This has been bugging us for so long…
The metadata translator is doing the right thing. It already has a facility for hacking headers on read that should be used here (but I don’t believe it’s set up yet for DECam).
If the airmass isn’t set in the VisitInfo, jointcal should throw an exception.
It is set up as of a week or two ago if there’s a straightforward answer for what should be done to AIRMASS-less DECam headers, please tell me, and I can file a ticket to add a fix to DECam’s relatively new fix_headers method I linked above.
I guess you could put something in the fix_headers method, but it would involve making an airmass calculation that has to get run every time we read those headers, so I think the best approach is doing the calculation once and placing the result in the correction file that @ktl and I mentioned earlier.
Other translators (eg imsim) have a fallback where if they don’t see an airmass header they calculate it from the azel value (which itself could be estimated from the tracking ra dec if necessary).
So if there are cases where airmass can be missing in DECam it would be best to fix the to_boresight_airmass method to handle it.
It really depends on how common this is. If there are months of obsevations with airmass missing then it’s better to do it in the to_boresight_airmass method. If it’s a handful then a correction file is better.