Producing the final catalogues

raphaelshirley · October 30, 2020, 1:10pm

Hi,

In a previous question I asked about using the isInner flags to stack the final catalogues into something like the public measurement and forced catalogues that are available for HSC PDR2. It was mentioned that I might be able to get hold of the scripts used to generate the qserv files for these. Could somebody point me to them?

I am presuming that we essentially loop over the patches and make a qserv file for each patch possibly removing objects that aren’t in the inner region. In pseudo code I am thinking of implementing this myself with something like:

for tract, patch in patches:
    for filter in filters:
        cat = butler.get('deepCoadd_forced_src', {'filter': filter, 'tract': tract, 'patch': patch}
        refTable = butler.get('deepCoadd_ref', {'filter': filter, 'tract':tract, 'patch':patch})
        cat =cat[refTable['detect_isPatchInner'] & refTable['detect_isTractInner'] ]
        cat = cat.asAstropy()
        #convert instfluxes to mags/Jy
        #convert to qserv
        cat.write('./data/tablesForPostGresIngestion/forcedCat_{}_{}.csv'.format(patch, filter))

However, there is clearly more to than that due to things such as splitting the tables up to reduce column numbers and converting instFluxToMagnitude to get magnitudes in AB, and joining the bands together neglecting duplicated columns etc. In an ideal world I could experiment on the actual PDR2 data available online by putting an obs_subaru _mapper file there and running on a test region. I can then modify the script for my own VISTA/HSC data.

Many thanks for the continuing support of the community!

Raphael.

price · October 30, 2020, 7:01pm

HSC PDR2 was produced using a custom database (not associated with LSST’s qserv) by the HSC team at NAOJ. I’m not sure what LSST uses for database loading.

If you write a database loader yourself, I recommend not using CSV files, as the read overhead required for large loads gets prohibitive.

raphaelshirley · November 2, 2020, 2:50pm

Thank you. I will contact the team at NAOJ. We are working on ingesting the catalogues using a qserv system so I will share any progress on that front here when we have run tests.

ktl · November 3, 2020, 1:57am

To get an idea as to how LSST is planning to generate final output catalogs, you may want to look at https://github.com/lsst/pipe_tasks/blob/master/python/lsst/pipe/tasks/postprocess.py.

In particular, we may need to merge together multiple constituent intermediate catalogs, add calibration information, and compute functions of the available data in order to provide the final columns in the Science Data Model. The resulting output catalogs will be in Parquet format. These Parquet files would then be partitioned, replicated, and loaded into Qserv.