Gaia DR2 reference catalog in LSST format

Gaia DR2 refcat on lsst-dev: gaia_dr2_20200414

The complete Gaia DR2 catalog is now available on lsst-dev at /datasets/refcats/htm/v1/gaia_dr2_20200414 (see RFC-634). This catalog contains about 1.7 x 10^9 sources between G magnitude ~7 - 21 in 131,073 HTM pixels (one file per pixel, shard level 7). The data totals about 310 GB. The directory contains a README.txt with more detailed information.

This is the first LSST refcat to contain coordinate errors, proper motions, and parallaxes. The coordinate errors will be incorporated into future runs of jointcal to improve its fitting and uncertainty estimates. The LSST Science Pipelines currently have some capacity for incorporating proper motion, but do not have any facility for using parallaxes: we plan to use this refcat to help develop such functionality.

Because of the large number of files, we do not recommend running ls or using tab-completion within this directory. Other than the HTM files, the directory contains: README.txt with some summary information; config.py used by LoadIndexedReferenceObjectsTask when reading the refcat; and IngestIndexedReferenceTask.py containing the configuration used to generate the refcat from the original data.

Using this refcat

The HSC datasets gen2 butler repository contains a ref_cats symlink to the HTM v1 reference catalogs, so no path updates are necessary to use this new refcat with HSC. Other lsst-dev /datasets may need an appropriate symlink created to make this refcat available to their respective butlers.

You can use this reference catalog with processCcd by specifying the following configuration file:

config.processCcd.calibrate.astromRefObjLoader.ref_dataset_name = "gaia_dr2_20200414"
config.processCcd.calibrate.astromRefObjLoader.anyFilterMapsToThis = "phot_g_mean"

or, with a LSST Science Pipelines version older than v21 (prior to anyFilterMapsToThis):

config.processCcd.calibrate.astromRefObjLoader.ref_dataset_name = "gaia_dr2_20200414"
config.processCcd.calibrate.astromRefObjLoader.filterMap = {}
for source in ('u', 'g', 'r', 'i', 'z', 'y'):
    config.processCcd.calibrate.astromRefObjLoader.filterMap[source] = "phot_g_mean"

As of v21, jointcal's default configuration uses Gaia DR2, so no additional configuration is necessary. For Pipelines versions older than v21, use the following configuration to use this refcat with jointcal:

config.astrometryRefObjLoader.ref_dataset_name = 'gaia_DR2'
# This refcat includes coordinate errors, so we don't have to fake them!
config.astrometryReferenceErr = None

for source in ('u', 'g', 'r', 'i', 'z', 'y'):
    config.astrometryRefObjLoader.filterMap[source] = "phot_g_mean"

Note that we do not have color terms for Gaia and the gaia magnitudes do not readily map to any LSST or HSC bands, so we only recommend its use for astrometric calibration, not photometric.

If you were previously using the Gaia DR1 or the PS1 refcats, you may have to alter your astrometric configuration to account for the much higher precision of this reference catalog. We do not yet have recommendations on what configuration changes will be necessary: a magnitude or S/N cut are a likely candidates. Please post below with questions or with new astrometric fitting configurations that you have success with.

Refcat READMEs

The lsst-dev:/datasets/refcats/htm directory now contains a README.txt with some summary information about the HTM reference catalogs. It summarizes the HTM refcat directory and notes which catalog directories have separate README.txt files with more detailed information (currently only v1/gaia_dr2_20200414/ and v1/ps1_pv3_3pi_20170110/).

Producers and maintainers of new LSST-style reference catalogs can refer to the developer guide datasets page for information about what to put in a refcat README.

3 Likes

Warning: the reference epoch in the gaia dr2 reference catalog listed above–20190808–is incorrect. I did not see that Gaia specified the reference epoch in TCB instead of UTC, resulting in a difference of about a minute. This is unlikely to matter for our uses, but we might as well get it right.

I’m fixing it on DM-22027, and will post an announcement here when the updated catalog is available on lsst-dev.

As we are not using parallax or proper motion in production currently, this error has no effect on users of this refcat, so if you have been making use of Gaia dr2, please continue to do so without worry.

Hat tip to Siegfried Eggl for pointing this out to me.

The updated Gaia DR2 refcat with the corrected epochs is now in place as gaia_dr2_20191105. I’ve updated the above post to refer to it instead of the 20190808 version, and also updated the respective readmes in the /datasets/refcats/htm directories.

I have discovered that the above Gaia DR2 reference catalogs have incorrect coordinate error fields (coord_raErr, coord_decErr) due to a unit conversion error. This primarily affects jointcal, as our AstrometryTask does not use the reference catalog errors when fitting single frame astrometry. If you have another use of this reference catalog that incorporates the coordinate errors, please update immediately.

I have corrected this as part of DM-24472, and regenerated the LSST reference catalog. The new catalog is now available on lsst-dev as gaia_dr2_20200414 on lsst-dev at /datasets/refcats/htm/v1. I have updated the above post to refer to it instead.

1 Like

Very much a newbie generated question: Is there an external site here I can retrieve this catalog from? I guess /datasets/refcats/htm/v1 is internal to some where. Thanks

I’ve put them up here. Be warned that they are not small.

57G     gaia_DR1_v1
301G    gaia_dr2_20200414
415G    ps1_pv3_3pi_20170110
20G     sdss_dr9
2 Likes

I’ve started to pull the Gaia files over. We noticed that the PS reference catalogs we pulled previously have a small astrometric offset compared to Gaia. .

The values in these reference catalogs come from the original catalogs, so you will have to contact those organizations about such offsets.

That said, I wouldn’t be surprised as to an offset: PS1 was produced well before even Gaia DR1, so it is not baselined to Gaia.

My understanding is that ps1_pv3_3pi_20170110 is supposed to be on the Gaia astrometric system.

Oh, I’m sorry. In that case, I guess the question is whether there’s an offset between Gaia DR1 and DR2.

Anyway, either way, the important thing here is that we do not modify the reference catalog values, we just reformat the catalogs for our use.

There is a small shift between PS1 release catalog and GaiaDR2. My understanding is that Eugene M. is aware and has corrected a version of the PS release catalog (which is what Stephen Gwyn uses to produce astrometric solutions for CFHT-Unions). We pushed a patch of HSC through the LSST process and compared to what we get with MegaPipe (CFHT/Megaprime process) and there is an offset, which is the GaiaDR2-PS offset, we think.

Likely this is best discussed in a different place, will provide more details when I might have them.

PS1 is mostly, but not completely corrected to GAIA. There are a few patches that are shifted by 0.1 to maybe 0.5 arcsec. Gene tells me he knows where they are and how to fix them, he just hasn’t done it yet. Up until recently, the patches I’ve found are pretty small (~.5 sq, deg) and isolated, with well defined edges. See below:

If you add 5 years of proper motion between 2015 and 2020 you get this:

(separate post, because apparently there is a 2 image maximum)
If I compare the positions of a HSC mosaic of the same patch of sky calibrated with LSSTpipe to GAIA, I get this:


which I think is caused by Pan-STARRS issues, but it’s not 100% clear.

I think the PS1 catalog has not been corrected for proper motion, so you get a field-dependent bias.

The PS1<->Gaia fuzz is caused by proper motion. The square of shifted astrometry in PS1<->Gaia is different. However I’m slightly puzzled by the HSC<->GAIA pattern.

The above discussion seems like it deserves its own thread.

I have generated gen3 refcat ingest index files for our primary reference catalogs (ps1, sdss-dr9, gaia-dr2) on lsst-devl at /datasets/refcats/htm/v1. These .ecsv files serve as input to the butler ingest-files command described in section 4 of How to generate an LSST reference catalog — LSST Science Pipelines. Our existing /repo/main repository already has these refcats, but this should simplify future butler repository creation.

Others can use these files to easily ingest these refcats into their own gen3 butler repositories (e.g. at the summit); if you are doing so on a system other than lsst-devl, you will need to change the paths in the files (with e.g. sed) to where you you put the HTM indexed files.

Note that if you want to ingest the sdss-dr9 refcat into a gen3 repo, your local directory and the entries in the .ecsv file need to be renamed to replace the - (dash) with _ (underscore), due to - not being allowed in dataset type names in gen3.

Hello, while using the catalog that have been generated following the tutorial: How to generate an LSST reference catalog, I find that the ra_range of each catalog is very different from that of the dec_range, for example, some ra_range nearly reach 90 degree, but all dec_range is below 2 degree. Which is not the same behavior of the article describe the Hierarchical Triangular Mesh (HTM). So, what’s the reason behind that?
thank you!

SimpleCatalog.readFits("/data/reference/gaia_dr2_converted/gaia_dr2/200707.fits")["coord_ra"].min()
4.751403421982007
SimpleCatalog.readFits("/data/reference/gaia_dr2_converted/gaia_dr2/200707.fits")["coord_ra"].max()
6.2628046546516885
data[np.array(data["ra_max"]-data["ra_min"])>80]
Out[223]: 
        Unnamed: 0         fits      ra_max  ...    dec_max    dec_min     len
15819        15819  200707.fits  358.832275  ...  89.502004  89.027851  187662
16291        16291  200704.fits  359.855332  ...  89.982329  89.307420  187662
24020        24020  217088.fits  269.998856  ...  89.990053  89.309711  187662
24089        24089  217091.fits  267.767696  ...  89.502240  89.022557  187662
25539        25539  233472.fits  179.957123  ...  89.975759  89.312684  187662
25572        25572  233475.fits  179.732044  ...  89.502452  89.016723  187662
74359        74359  135168.fits   89.966529  ... -89.305061 -89.988888  187662
74371        74371  135171.fits   89.320275  ... -89.010581 -89.501026  187662
83364        83364  151555.fits  179.506183  ... -89.012483 -89.502108  187662
83465        83465  151552.fits  179.998548  ... -89.310643 -89.992878  187662
102883      102883  249856.fits   89.952308  ...  89.989044  89.304693  187662
102906      102906  249859.fits   89.190951  ...  89.498965  89.006812  187662
103024      103024  167939.fits  268.792543  ... -89.012688 -89.501707  187662
103141      103141  167936.fits  269.993291  ... -89.299786 -89.988852  187662
122492      122492  184320.fits  359.935063  ... -89.298323 -89.986336  187662
123457      123457  184323.fits  359.869549  ... -89.016660 -89.500646  187662

But as for dec:

data[np.array(data["dec_max"]-data["dec_min"])>2]
Out[226]: 
Empty DataFrame
Columns: [Unnamed: 0, fits, ra_max, ra_min, dec_max, dec_min, len]
Index: []

This looks to me like regular spherical geometry: the files with large RA ranges are all polar (|dec| ~ 90). I recommend you plot the points (with a suitable projection!) and verify for yourself that the points for each file are compact on the sky.

1 Like