Ingesting public HSC deepCoadd warps and detection catalogues

raphaelshirley · August 17, 2020, 11:34am

I am trying to get access to HSC public deep coadd images and detection catalogues from my obs_vista package and Butler repository. I want to do this so that I can run mergeCoaddDetections.py with my VISTA detections and the already produced HSC detections catalogues and then subsequently run deblendCoaddSources.py, measureCoaddSources.py, and forcedPhotCoadd.py with the merged detection catalogue on both my processed VISTA coadds and the public HSC coadds.

Do I need to write a specific ingestion task to add these files to the sql registry? In the example HSC processing some of the products (calibrations and reference catalogues) are simply added via symlinks so I wondered if I could simply symlink the files to my obs_vista Butler repo but I didn’t know if that was only possible with some datasets. When you use the --show data option on tasks does it access all possible inputs using the registry only or does it search files in the Butler repo?

I also wonder how this process might differ between gen2 and gen3.

Many thanks for any interest in this issue.

Best,

Raphael.

ktl · August 17, 2020, 12:40pm

Gen2 cannot use more than one instrument/obs package in a single Butler, so existing CmdLineTasks won’t usually work on multi-instrument data.

Since you are working with coadd detections, however, the only difference between instruments is filter names (and they are different). In Gen2, these filter names are provided to mergeCoaddDetections.py on the command line in --id parameters, so there’s no need to modify the registry database. As long as the skymaps and catalog schemas are identical and the filters are not, from a data reading perspective you should be able to link the files into the appropriate place in the repository (deepCoadd-results/%(filter)s/%(tract)d/%(patch)s/det-%(filter)s-%(tract)d-%(patch)s.fits I believe) and use them.

(If the filter names were the same, you’d have to invent new ones and link them in with those filter names.)

This method should also work for other datasets that have common templates defined in obs_base and for which complete data ids are provided (with no registry lookups required). For datasets that have obs-specific templates, this will not work in general, although there may be ways to hack it for specific cases.

It is possible that information about the filters themselves is used somewhere in the tasks you want to run. In that case, you will have difficulties since only one set of filters is defined by the obs package at a time.

Gen3 is designed to be able to use multiple Instruments within the same repository/Butler, so in theory this should be a bit easier. You would merge (in the Registry) the collections from the two instruments into a new collection that you would use for processing. But a Gen3 expert should weigh in on whether aspects of the pipelines and connections might frustrate this.

raphaelshirley · August 17, 2020, 2:01pm

Thank you for your answer. You are right; linking does at least allow me to merge in detections and then run on images. Albeit with my obs package throwing a warning: “CameraMapper WARN: Filter HSC-R not defined. Set to UNKNOWN.”

I’m very interested in how gen3 will permit this multi-camera combination. I’m particularly concerned about using hsc specific configurations and tasks for the later measurement and deblending which will not be present in my obs package unless I copy paste the code from obs_subaru. I’m currently considering “from lsst.obs.subaru import function” within my obs_vista package.

timj · August 17, 2020, 3:43pm

I admit that no-one has actually tried to build a multi-camera pipeline yet but it’s something that will have to work.

Note that gen3 obs packages are much simpler than gen2, that’s partly because there’s a lot more standardization.

Task configs will of course still need to be tweaked.