DESC is finding that the HATS version of DP1 contains duplicate forcedSourceOnDiaObjectIDs
. These duplicates have subsequently also been found in the Bulter SQL and the Butler Parquet files i.e. they do not appear to be an artifact in the process of creation of the HATS files.
e.g. forcedSourceOnDiaObjectId = 600386155389124609
has four entries in dia_object_forced_source
diaObjectId parentObjectId coord_ra coord_dec ... invalidPsfFlag tract patch forcedSourceOnDiaObjectId
208 648368125565206530 0 38.060371 7.411335 ... False 10463 90 600386155389124609
4529 648375547268694017 0 38.172620 7.433099 ... False 10464 98 600386155389124609
3527 650018080201637889 0 38.189089 7.510324 ... False 10704 0 600386155389124609
1517 650025570624602120 0 38.265854 7.446826 ... False 10705 9 600386155389124609
and are duplicated in four separate Parquet files:
file:///sdf/group/rubin/repo/dp1/LSSTComCam/runs/DRP/DP1/v29_0_0/DM-50260/20250419T073356Z/dia_object_forced_source/10705/9/dia_object_forced_source_10705_9_lsst_cells_v1_LSSTComCam_runs_DRP_DP1_v29_0_0_DM-50260_20250419T073356Z.parq
file:///sdf/group/rubin/repo/dp1/LSSTComCam/runs/DRP/DP1/v29_0_0/DM-50260/20250419T073356Z/dia_object_forced_source/10464/98/dia_object_forced_source_10464_98_lsst_cells_v1_LSSTComCam_runs_DRP_DP1_v29_0_0_DM-50260_20250419T073356Z.parq
file:///sdf/group/rubin/repo/dp1/LSSTComCam/runs/DRP/DP1/v29_0_0/DM-50260/20250419T073356Z/dia_object_forced_source/10463/90/dia_object_forced_source_10463_90_lsst_cells_v1_LSSTComCam_runs_DRP_DP1_v29_0_0_DM-50260_20250419T073356Z.parq
file:///sdf/group/rubin/repo/dp1/LSSTComCam/runs/DRP/DP1/v29_0_0/DM-50260/20250419T073356Z/dia_object_forced_source/10704/0/dia_object_forced_source_10704_0_lsst_cells_v1_LSSTComCam_runs_DRP_DP1_v29_0_0_DM-50260_20250419T073356Z.parq
For now we can match on the tuple of DIAObjectID, DIASourceID, but this behavior seems unexpected, and will be a significant issue with a larger area survey than DP1.
And to stress, these duplicates have been found in three places, butler parquet files, butler SQL tables, and HATS files.
Non-unique IDs will make database searches much more complicated. How are users to query for a specific forced source?