Help with NaN in jointcal fitting of DECam data (with obs_decam?)

I see several exposures where it Matched 0 objects for many of the CCDs. I suggest looking into these, as they may be the source of your NaN.

Looking at the log, I have some additional suggestions unrelated to the problem that I hope will make your experience better:

  • Specify a tract in the --id, and the job will go much faster.
  • Don’t use --clobber-config in production.
  • Set envvar OMP_NUM_THREADS=1 to disable the warning about implicit use of threads.

Thanks! I’ll try a test omitting those 4 exposures–I’m surprised, though, as those aren’t the ones that were in the dangerous date range. Will report back…
Ian

Just to check: what version of jointcal and/or the Science Pipelines are you running? Note that jointcal doesn’t look at the images, just the catalogs.

You can try turning on the writeChi2FilesInitialFinal config option. This will result in a .csv file in your current working directory containing the contributions to the chi2 matrix from every source at the time of initialization. You should be able to find the NaNs in there, which might help you track down which detectors/visits are causing the problem. You can also try running with log level DEBUG (--loglevel jointcal=DEBUG in this case), though I’m not sure there will be much useful information during initialization.

The “Matched 0 objects” in the first block of the log is just the fact that those are the first detectors+visits in the list, so they will have no matches. Any detector on or past the edge of a tract may have zero cross matches, but should have refcat matches.

I wonder if this is related to this error reported by @sfu : DM-22548? We weren’t able to find anything conclusive in that case, and other things took priority after I’d initially looked into it.

1 Like

Thanks! We’re using v19.0 still (we were hesitant to move to v20). I’ll add the writeChi2FilesInitialFinal and let you know what we find. (Last night I tried a couple of tests omitting or including files and can confirm that the “Matched 0 objects” entries don’t affect the completion of the jointcal processing, whereas it is the exposures from that one run that cause the NaN. Yes, this the same problem @sfu pointed out–we’re coming back to it and it’s increased in prominence because it seems to affect every exposure of that run, no matter the pointing…

So, the output of writeChi2FilesInitialFinal is interesting–the _meas version is where all of the NaN are. It looks like columns rx,ry,rxi,ryi (and chi2) for many objects are all nan. I am not sure I understand the “visit” entry, though. What’s listed is almost always the first visit in the sequence and not the offending images… I’m including the a link to the file (it’s too big to include). I notice that even the objects that aren’t nan seem to have weirdly large values for this column–unless I don’t understand the units?

We’re still puzzled by this problem. I spent much of last week going through the output of ProcessCcd on the “bad” run and the other runs, and couldn’t find anything that was out of place. That said, jointcal runs fine if the images are omitted, but crashes immediately with the NaN error when they are included. We’ve narrowed it down to the rx,ry,rxi,ryi, and chi2 entries in the astrometry_initial_chi2-0_z-meas.csv
file. Can someone give me a hint as to where to look to track down where these NaN could be calculated? (The link is included, but I reproduce a couple of lines from the file; headers and relevant entries) in case it’s useful:

#id     xccd    yccd    rx      ry      xtp     ytp     mag     mjd     xErr    yErr    xyCov   xtpi    ytpi    rxi     ryi     color   fsindex ra      dec     chi2    nm      chip    visit
#id in source catalog   coordinates in CCD (pixels)             residual on TP (degrees)                transformed coordinate in TP (degrees)          rough magnitude Modified Julian Date of the measurement transformed measurement uncertainty (degrees)                   as-read position on TP (degrees)                as-read residual on TP (degrees)                currently unused        unique index of the fittedStar  on sky position of fittedStar           contribution to Chi2 (2D dofs)  number of measurements of this fittedStar       chip id visit id
91815667665207338       967.330371      173.495251      nan     nan     -186.439451     -91.8579238     14.9774629      56461.0671      9.09943477e-08  9.17736513e-08  5.81127881e-11  -0.140123132    0.00627986687   nan     nan     0       -1      227.261502      6.55549107      nan     5       1       213775
1 Like

So it looks like there’s a problem with the residual position on the tangent plane (rx,ry,rxi,ryi; and chi2 is calculated from those). But the position on the tangent plane is fine so it’s not a projection problem. So does this mean that a NaN has gotten into the coefficients?

@idellant Are the images going into this jointcal run available for download somewhere easily accessible? (I would like to see the headers, mainly, but it might be helpful to see what the images look like.)

Is it possible for you to send me a pickle file with the list of visitInfos?

@price, I’ve put two images (one that works, visit 0202219, and one that fails, visit 0213777), plus the calexps and src catalogs of a chip (41) in case you want to see them in http://www.het.brown.edu:/people/ian/jointcal2 . I apologize, I’m not sure where the pickle file with the visitInfos would be stored–the only .pickle files that seem to be generated look to be the various packages.pickle and the SkyMap.pickle file. Sorry, Ian

1 Like

I think the problem is that the airmass isn’t set in the header for visits that fail.

The good calexp contains the headers:

BORE-RA =     227.701523365432                                                  
BORE-DEC=     5.71966372367164                                                  
BORE-AZ =                23.98                                                  
BORE-ALT=                51.49                                                  
HIERARCH BORE-AIRMASS =   1.28                                                  
HIERARCH BORE-ROTANG =     90.                                                  
ROTTYPE = 'SKY     '           / Type of rotation angle                         

But BORE-AIRMASS is missing in the bad calexp:

BORE-RA =     227.725990026014                                                  
BORE-DEC=     5.65894149906265                                                  
BORE-AZ =                12.52                                                  
BORE-ALT=                53.52                                                  
HIERARCH BORE-ROTANG =     90.                                                  
ROTTYPE = 'SKY     '           / Type of rotation angle                         

I think this means that the airmass is NaN, which means CcdImage._tanZ is NaN, which means the refraction vector is NaN, which causes the rx,ry values to be NaN.

Can you check to see if the AIRMASS header is present and defined in the raw images?

The Proper Way to fix this is via a set of header correction files in obs_decam, but that scheme doesn’t currently exist, and I don’t remember what needs to be done to support that (@timj?). A quick and dirty fix would be to hack the headers (very naughty, I know!). And there should be a check for bad VisitInfo values in jointcal.

2 Likes

Indeed! The “bad” raw image has no AIRMASS keyword, whereas the good one does.
So, the practical question is–is it better to hack the raw image’s header and add the keyword, then rerun processCcd (I think so?)
or is it better/easier to just add the keyword to the calexps headers?
Thank you! This has been bugging us for so long…

1 Like

I believe you can create a simple one-line header correction YAML file as described in https://github.com/lsst/astro_metadata_translator/blob/master/python/astro_metadata_translator/headers.py#L327-L347

It might go in corrections/DECam-0213777.yaml (or whatever the appropriate obs_id is).

1 Like

What behavior is desired for a DECam header where the AIRMASS keyword is missing? If this is something that should always be fixed for all DECam headers, it could be added here https://github.com/lsst/astro_metadata_translator/blob/master/python/astro_metadata_translator/translators/decam.py#L256-L298

The metadata translator is doing the right thing. It already has a facility for hacking headers on read that should be used here (but I don’t believe it’s set up yet for DECam).

If the airmass isn’t set in the VisitInfo, jointcal should throw an exception.

Thank you. Can I ask whether we should start from rerunning processCcd, or just rerun jointcal? Do we also need to adjust the config files? Thanks.

Thank you. Actually, this NaN error only happened at visits 21xxxx (year 2013).

It is set up as of a week or two ago :slight_smile: if there’s a straightforward answer for what should be done to AIRMASS-less DECam headers, please tell me, and I can file a ticket to add a fix to DECam’s relatively new fix_headers method I linked above.

I guess you could put something in the fix_headers method, but it would involve making an airmass calculation that has to get run every time we read those headers, so I think the best approach is doing the calculation once and placing the result in the correction file that @ktl and I mentioned earlier.

Other translators (eg imsim) have a fallback where if they don’t see an airmass header they calculate it from the azel value (which itself could be estimated from the tracking ra dec if necessary).

So if there are cases where airmass can be missing in DECam it would be best to fix the to_boresight_airmass method to handle it.

1 Like

It really depends on how common this is. If there are months of obsevations with airmass missing then it’s better to do it in the to_boresight_airmass method. If it’s a handful then a correction file is better.