Merging patches into a final large area catalogue

raphaelshirley · September 7, 2020, 4:31pm

Hi,

I have run the photometry pipeline on a number of patches and want to combine them into a final matched catalogue as has been done for the HSC PDR2 public catalogues. Is there already a task for merging patches or do I need to write a specific one for my dataset? Is there a command line task or other that would recreate the find HSC PDR2 catalogs. Presumably this involves a positional cross match taking the lower error measurement in cases of duplicates but I wondered if any cross match radii etc are computed automatically by a pipe task.

Alternatively are detections already consistent between patches at the mergeCoaddDetections stage? In that case can I simply merge patches by object id or use the Butler to retrieve a catalogue for multiple adjacent patches?

Many thanks,

Raphael.

price · September 9, 2020, 7:16pm

The HSC PDR2 catalogs are hosted in a PostgresQL database, unrelated to anything developed by LSST, and the data was ingested using custom scripts.

Position-based matching shouldn’t be necessary: objects in different filters should have the same object ID. (Uniqueness of object IDs across tracts+patches requires coordinating some things in the mapper with the choice of skyMap.)

I believe there are scripts to generate Parquet files that may have merged data from different filters, but I don’t know anything about that.

raphaelshirley · September 9, 2020, 8:38pm

Hi,

Thank you for getting back to me. Yes, I was aware that for a given patch the the IDs are consistent between bands after the detection merge but I was thinking about the overlap between patches and tracts. I will run some tests between adjacent patches and check if my mapper/SkyMap configs are correct and my IDs are consistent between them. I have copied the SkyMap config from obs_subaru so hopefully that is already the case.

I am not familiar with Parquet files I will also look at how to generate those.

Thanks again.

Raphael.

price · September 11, 2020, 4:13pm

There is duplication of objects due to the overlaps between patches and tracts, but there are flags indicating whether the sources in an individual catalog are in an area that would make them unique: detect_isPatchInner and detec_isTractInner. There is an additional flag, detect_isPrimary, which combines those flags with a deblender criterion (deblend_nChild == 0) and is the usual means people use to identify unique sources.

raphaelshirley · September 13, 2020, 10:50am

Thank you I understand now. I can simply set the is*Inner flags to true and stack all the objects. Could there be very subtle effects near the boundary where object positions might be just over the boundary in one but not the other due to extended objects? Or is this such a minor effect as to not worry about at all?

Thanks again,

Raphael.

price · September 14, 2020, 1:38pm

You shouldn’t need to set any flags; they will be set by the measurement scripts in the pipeline.

Yes, there may be subtle effects (especially due to different deblending). I don’t think anyone has noticed any, but they exist in theory.

MelissaGraham · October 21, 2020, 7:57pm

Hi @raphaelshirley, I’m following up on Support posts without marked solutions and was wondering if @price’s suggestions worked in your case – were you able to make a single merged catalog?

raphaelshirley · October 21, 2020, 9:33pm

Yes, the is_inner flags solve this issue. It would still be helpful to get hold of the scripts that were used to produce the final catalogues for HSC PDR2.