Cross-matching external catalogs by coordinate

Dear Folks,

Let me know if this is the right place to ask this question. I am interested in cross-matching a catalog of my own (a few million locations) with the DP1 catalog (e.g. the object catalog), by coordinate (i.e. RA and Dec).

It looks like lsdb will have a utility to do this, but I also wonder if there is any SQL-tool accessible via TAP for this purpose, similar to SDSS’s fGetNearbyObj.

I don’t see any spatial indexing information in the schema for the Object table, so unless I missed something there, there might not be any fast way of doing this (yet).

I can wait until the lsdb form of DP1 is available in a couple of weeks, but if there’s a SQL way to do this too that would be terrific.

Thank you and congratulations on this amazing milestone,
Mike

Hi Mike, this is the right place to ask questions thank you!

Yes there is an SQL tool that can be used via TAP for this purpose. It’s possible to upload a user table to the TAP service, and then run a cross-match on it using RA, Dec (no other spatial indexing columns, like HTM or HEALPix indices, are needed for this).

We don’t yet have our tutorials for user-uploaded tables and cross-match ready for DP1. But I’ve confirmed that the process described in the DP0.3 Portal tutorial for user-uploaded tables works for a table I downloaded from the SDSS Skyserver and cross-matched to the DP1 Object table.

Skyserver_SDSS_Objects.csv (1.5 MB)

This was what the cross-match query looked like after following the DP0.3 Portal tutorial:

And this the results view:

Since you say you’d like to cross-match millions of rows I’m thinking that an implementation for the Notebook Aspect would be more what you’re after, so I’d like to also convert the DP0.3 tutorial notebook for user-uploaded tables to be a DP1-based demo.

Am I on the right track here, is this the kind of functionality you’re looking for?

Hi again, I was curious to try this out in a notebook and so I have a python example that I think meets the service you’re looking for?

Set up with the necessary imports and instantiate the RSP TAP service.

from astroquery.sdss import SDSS
from astropy import coordinates as coords
import matplotlib.pyplot as plt
from lsst.rsp import get_tap_service
rsp_tap = get_tap_service("tap")

Query SDSS for objects in the northernmost DP1 field (Low Ecliptic Latitude Field). Columns of type uint64, which is the returned type of the objid column, are not supported in user table uploads, so convert it to a long integer.

pos = coords.SkyCoord('02h31m26s +06d58m48s', frame='icrs')
ut1 = SDSS.query_region(pos, radius='3 arcmin', spectro=False)
ut1['objid'] = ut1['objid'].astype('long')
print(len(ut1))

This returns 896 rows of ut1 (user table 1).

Create the ADQL query statement that will cross match the SDSS table with the DP1 Object table using a match radius of 1 arcsecond. Submit they query to the RSP TAP service and retrieve the results.

query = """
        SELECT objectId, coord_ra, coord_dec,
        ut1.ra AS ut1_ra, ut1.dec AS ut1_dec, ut1.objid AS ut1_objid
        FROM dp1.Object, TAP_UPLOAD.ut1 AS ut1
        WHERE CONTAINS(POINT('ICRS', coord_ra, coord_dec),
        CIRCLE('ICRS', ut1.ra, ut1.dec, 0.00027))=1
        ORDER BY coord_ra ASC
        """

job = rsp_tap.submit_job(query, uploads={"ut1": ut1})
job.run()
job.wait(phases=['COMPLETED', 'ERROR'])
print('Job phase is', job.phase)
if job.phase == 'ERROR':
    job.raise_if_error()

assert job.phase == 'COMPLETED'
results = job.fetch_result()
print(len(results))

This returns a match result of 789.

Make a plot to visualize matches:

plt.plot(ut1['ra'], ut1['dec'], 'o', mew=0, alpha=0.5)
plt.plot(results['coord_ra'], results['coord_dec'], '+', color='black')
plt.show()

download

If this is what you were looking for, can you mark this post as the solution? If not that’s ok! We can continue to discuss.

Dear Melissa,

Thank you so much! This is along the lines I am looking for. I will try this out today!

Thanks,
Mike

Just as an update, when I tried on my full data set I got back:

DALServiceError: not found: phase
 for https://data.lsst.cloud/api/tap/async

That might mean I hit some sort of resource limit, I am not sure!

Hi Melissa,

Yes, this worked for me! See the attached PDF.

It does take about 15 minutes for a subset of about 100,000 objects. My full query is about 8.6M, so I am wondering if I should actually even try that?

Of course, I could make it a lot faster if I just cut my sample down by hand to just the DP1 fields, but I’m interested in a solution that avoids that (since I won’t be able to do that for the full data sets coming down the pipe).

Thanks,
Mike

match-melissa.pdf (95.5 KB)

Excellent @blanton144 it’s great to see this working, thanks for confirming.

We do generally recommend RSP users to make as efficient queries as possible, so subsetting your sample down to contain only DP1 field overlaps makes sense.

On that topic (and for others who might be reading this) DP1 is a “preview” both in the sense that the data set is a small “preview” but also the RSP itself is in its “preview era”. The user table upload and cross-match functionality are not (as I understand it) in their final form, and improvements are underway. I’m not sure that current performance with very large tables would be a good indicator of how to deal with what’s coming down the pipe in the future.

Side note, there’s a 32MB limit on user-uploaded tables.

Since we did get to a solution on the topic of how to cross-match, I’m going to mark one of my posts above as the solution. But please don’t hesitate to reply on this topic or to open a new topic for new questions any time.