Set up RefMatchTask for reference catalog crossmatch

Hi, I am trying to crossmatch sources produced from the CalibrateTask to a reference catalog. In other words, construct a RefMatchTask that references one of my reference catalogs and can be run on an input exposure (calexp dataset type) with a source catalog (src dataset type).

The way I am doing this now is like so:

from lsst.meas.algorithms import LoadIndexedReferenceObjectsTask, LoadIndexedReferenceObjectsConfig
from lsst.meas.astrom.ref_match import RefMatchTask, RefMatchConfig

butler = dafButler.Butler('processing_repos/repo')

refObjLoaderConfig = LoadIndexedReferenceObjectsConfig()
refObjLoaderConfig.ref_dataset_name = "ps1_pv3_3pi_20170110"

refObjLoader = LoadIndexedReferenceObjectsTask(butler)

ref_match_config = RefMatchConfig()
ref_match = RefMatchTask(refObjLoader, config=ref_match_config)

however this gives an error when creating the LoadIndexedReferenceObjectsTask: KeyError: "Dataset type with name 'ref_cat_config' not found." or in full,

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-532-a77aceda51c2> in <module>
      7 refObjLoaderConfig.ref_dataset_name = "ps1_pv3_3pi_20170110"
      8 
----> 9 refObjLoader = LoadIndexedReferenceObjectsTask(butler)
     10 
     11 ref_match_config = RefMatchConfig()

/epyc/projects/decam_ddf/lsst_w_2021_17/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/meas_algorithms/21.0.0-20-g8cd22d88+eb66f50d9f/python/lsst/meas/algorithms/loadIndexedReferenceObjects.py in __init__(self, butler, *args, **kwargs)
     54     def __init__(self, butler, *args, **kwargs):
     55         LoadReferenceObjectsTask.__init__(self, *args, **kwargs)
---> 56         self.dataset_config = butler.get("ref_cat_config", name=self.config.ref_dataset_name, immediate=True)
     57         self.indexer = IndexerRegistry[self.dataset_config.indexer.name](self.dataset_config.indexer.active)
     58         # This needs to come from the loader config, not the dataset_config since directory aliases can

/epyc/projects/decam_ddf/lsst_w_2021_17/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/daf_butler/21.0.0-77-g2822d51d+75b22be884/python/lsst/daf/butler/_butler.py in get(self, datasetRefOrType, dataId, parameters, collections, **kwds)
   1006         """
   1007         log.debug("Butler get: %s, dataId=%s, parameters=%s", datasetRefOrType, dataId, parameters)
-> 1008         ref = self._findDatasetRef(datasetRefOrType, dataId, collections=collections, **kwds)
   1009         return self.getDirect(ref, parameters=parameters)
   1010 

/epyc/projects/decam_ddf/lsst_w_2021_17/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/daf_butler/21.0.0-77-g2822d51d+75b22be884/python/lsst/daf/butler/_butler.py in _findDatasetRef(self, datasetRefOrType, dataId, collections, allowUnresolved, **kwds)
    567             Raised if no collections were provided.
    568         """
--> 569         datasetType, dataId = self._standardizeArgs(datasetRefOrType, dataId, **kwds)
    570         if isinstance(datasetRefOrType, DatasetRef):
    571             idNumber = datasetRefOrType.id

/epyc/projects/decam_ddf/lsst_w_2021_17/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/daf_butler/21.0.0-77-g2822d51d+75b22be884/python/lsst/daf/butler/_butler.py in _standardizeArgs(self, datasetRefOrType, dataId, **kwds)
    511                 externalDatasetType = datasetRefOrType
    512             else:
--> 513                 internalDatasetType = self.registry.getDatasetType(datasetRefOrType)
    514 
    515         # Check that they are self-consistent

/epyc/projects/decam_ddf/lsst_w_2021_17/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/daf_butler/21.0.0-77-g2822d51d+75b22be884/python/lsst/daf/butler/registry/_sqlRegistry.py in getDatasetType(self, name)
    383     def getDatasetType(self, name: str) -> DatasetType:
    384         # Docstring inherited from lsst.daf.butler.registry.Registry
--> 385         return self._managers.datasets[name].datasetType
    386 
    387     def findDataset(self, datasetType: Union[DatasetType, str], dataId: Optional[DataId] = None, *,

/epyc/projects/decam_ddf/lsst_w_2021_17/stack/miniconda3-py38_4.9.2-0.5.0/Linux64/daf_butler/21.0.0-77-g2822d51d+75b22be884/python/lsst/daf/butler/registry/interfaces/_datasets.py in __getitem__(self, name)
    489         result = self.find(name)
    490         if result is None:
--> 491             raise KeyError(f"Dataset type with name '{name}' not found.")
    492         return result
    493 

KeyError: "Dataset type with name 'ref_cat_config' not found."

I have my reference catalogs in a collection called refcats/gen2, so I’ve tried including this collection explicitly when creating the butler in case it’s not picking it up:

from lsst.meas.astrom.ref_match import RefMatchTask, RefMatchConfig

butler = dafButler.Butler('processing_repos/repo', collections="refcats/gen2")

refObjLoaderConfig = LoadIndexedReferenceObjectsConfig()
refObjLoaderConfig.ref_dataset_name = "ps1_pv3_3pi_20170110"

refObjLoader = LoadIndexedReferenceObjectsTask(butler)

ref_match_config = RefMatchConfig()
ref_match = RefMatchTask(refObjLoader, config=ref_match_config)

but the same error occurs.

I see also that the CalibrateTask produces a match catalog from the astrometry task, and I can get those matches successfully:

data_ids = sorted(list(registry.queryDatasets(
    "srcMatch",
    collections="DECam/process/calexp/210318/cosmos_1",
    where="instrument='DECam' AND exposure=974961"
)), key=lambda dataRef : dataRef.dataId['detector'])
match_catalog = butler.get(data_ids[0], collections=data_ids[0].run)

data_ids = sorted(list(registry.queryDatasets(
    "src",
    collections="DECam/process/calexp/210318/cosmos_1",
    where="instrument='DECam' AND exposure=974961"
)), key=lambda dataRef : dataRef.dataId['detector'])
source_catalog = butler.get(data_ids[0], collections=data_ids[0].run)

which gives:

print(match_catalog[0:5])
       first              second              distance       
------------------- ------------------ ----------------------
3848786439521223808 209371282641256459  9.566928842884213e-08
3848786065859508096 209371282641256460  1.011515176402328e-07
3848786473880964608 209371282641256462  8.960889576536974e-08
3848786233362793472 209371282641256463 2.1904668262408046e-07
3848598285594190592 209371282641256469   9.61598079984137e-08

and

print(source_catalog[source_catalog['id'] == match_catalog['second'][0]])
        id             coord_ra     ... calib_photometry_reserved
                         rad        ...                          
------------------ ---------------- ... -------------------------
209371282641256459 2.61016137456497 ...                     False

However, the reference catalog used here is Gaia DR2, and I’d like to use the deeper PanSTARRS-1 catalog that is used for photometric calibration to find more matched sources in my data.

So my questions are given this:

  1. How can I run my own RefMatchTask using a desired reference catalog to produce matches?
  2. Is there way to configure CalibrateTask to save matches to my PS-1 catalog as well as Gaia? (I suppose I could run CalibrateTask twice: once using GaiaDR2 as the astrometry reference catalog and a second time using PS-1 as the astrometry reference catalog?)

Thanks for the help.

This is gen2 code so needs a gen2 butler and not a gen3 butler.

The src catalog is already astrometrically calibrated, so I recommend using lsst.meas.astrom.DirectMatchTask instead of RefMatchTask. The latter will attempt to rediscover the mapping to the reference catalog (which can sometimes be tricky), while the former will use the existing astrometric solution and therefore be much more robust.

Thanks for the help Tim and Paul. Yeah it looks like DirectMatchTask is what I’m looking for. How should I configure this to use a gen3 butler? Usually I run these tasks not in python but with the pipetask command and give it a path to my butler. However DirectMatchTask doesn’t seem to be a CmdLineTask or PipelineTask, so I guess that’s the not the right way to run these? If I try to pack this task in a YAML file:

description: Match to reference
instrument: lsst.obs.decam.DarkEnergyCamera
tasks:
  match: 
    class: lsst.meas.astrom.DirectMatchTask
    config:
      refObjLoader.ref_dataset_name: ps1_pv3_3pi_20170110

and run:

pipetask run -b processing_repos/repo -i DECam/process/calexp/210318/cosmos_1,refcats/gen2 -o DECam/process/ps1sources/210318/RefMatch -p ./processing_repos/PanStarrsMatch.yaml --register-dataset-types

I get the error

AttributeError: 'DirectMatchConfig' object has no attribute 'connections'

which makes sense when I look at the source code, the task doesn’t seem to be defined like CalibrateTask with inputs/outputs in a connections config.

I did have some luck re-running CalibrateTask with config.connections.astromRefCat: ps1_pv3_3pi_20170110 set but I don’t get as many matches back as a I thought in the srcMatch dataset. (51 matches from PS-1 and 46 from Gaia.) Perhaps this makes sense if astrometric calibration only uses a subset of stars for WCS correction and not doing a full catalog crossmatch?

I’ve been pointed to this documentation for creating a PipelineTask that works with gen3 middleware. I’m going to try this next for working with DirectMatchTask.