Best way to merge AFW Catalogs

Tags: #<Tag:0x00007fb383b00d90>

I’ll start out with the general problem that I’m trying to solve, in case there is a better solution that I can’t find, then follow-up with the issues faced with my current implementation.

In order to exactly match catalogs both meas_deblender and meas_extensions_scarlet have a deblend_peakId key, so that the combination of (parent, deblend_peakId) uniquely identifies a source. So any time catalogs are built downstream from the same mergeDet catalog (which means deblending and all of the measurement tasks) we can compare the results of the downstream processing by matching the catalogs on those two keys.

For numpy, pandas, and astropy tables, there is built in functionality to merge catalogs based on a set of matching keys, but as far as I can tell afw.table does not have that functionality. So in my local scripts I just convert the catalogs to pandas DataFrame objects and merge them. However, I need to have a solution that works in pipe_analysis in order to use in the compareCoaddAnalysisTask, which means I need to do one of the following:

  1. Merge afw catalogs directly
  2. Have a fast and efficient way to convert an astropy Table or pandas DataFrame into an afw SourceCatalog (the inverse of the asAstropy() method).

I’m fine with either solution but neither seems to work right now. I have written a function that takes two input catalogs and uses numpy.lib.recfunctions.join in order to get the indices needed to match the two catalogs. However, it does not appear that an afw Table can be indexed by a non-contiguous set of indices (e.g. catalog[np.array([0, 2, 4, 8, 6])]).

So unless there is some hidden API that I don’t know about/can’t find, it would appear that I have to do the matching using a 3rd party data structure and convert it back to an afw.table. But… even that is non-trivial, so I’m wondering if there is some hidden corner of the code that will automatically convert an astropy table into an afw catalog.

Any suggestions would be appreciated. If I end up just having to write a function to convert an astropy Table into an afw SourceCatalog I’ll included it in a future post.

1 Like

I’m not aware of a straightforward approach that currently exists. I would create a new catalog using the known schema and known number of elements, and use your indices to insert into that.

I think that this is something that could be added to the API to provide an addition overload option without too much work.

Thanks Paul, that was unfortunately what I was afraid of. “Without too much work” to you = a decent amount of work for me because I’ve never really gotten the hang of the relationship between afw schemas, tables, catalogs, etc, and all of the minor things that can cause them to behave unexpectedly. :frowning_face: But if I can make something general enough I’ll make a PR to add the feature into afw.table.

Completely without any testing:

def reindexCatalog(catalog, indices):
    """Apply a numpy index array to an afw Catalog

    catalog : subclass of `lsst.afw.table.BaseCatalog`
        Catalog to reindex.
    indices : `numpy.ndarray` of `int`
        Index array.

    new : subclass of `lsst.afw.table.BaseCatalog`
        Reindexed catalog. Records are shallow copies of those in ``catalog``.
    new = type(catalog)(catalog.schema)
    for ii in indices:
    return new

Let me know if that doesn’t work, and I’ll try to debug (or rewrite in C++).

EDIT: applied @fred3m’s fix, below.

Brilliant! This does work, with the slight modification that I needed to wrap the ii with an int cast since the get_item API is very particular about the integer class used.

Thank you!

1 Like

Also note that an afwTable can’t be referenced by a non-contiguous set of indices, but it can be referenced by a contiguous set of True/False flags. I’m not sure if that helps here at all, but it is what I’ve used in the past.

Filed ticket DM-27837: Add integer array indexing of Catalog.