There is a problem with source association, where multiple DiaObjects exist at the same position within a 0.5’’ radius. Based on the source association pipeline tasks, DiaSources within a 0.5’’ radius should be matched to the pre-existing DiaObjects instead of having a new one created, but this does not seem to be happening in DP0.2.
I ran a test to see how prevalent this issue is by grabbing 1000 random DiaObjects and then doing a coordinate search to see how many duplicate DiaObjects are located at the ra and dec of each 1000 DiaObject.
nDiaSources_min = 25
results = service.search("SELECT TOP 1000 "
"ra, decl, diaObjectId, nDiaSources "
"FROM dp02_dc2_catalogs.DiaObject "
"WHERE nDiaSources > "+str(nDiaSources_min)+" ")
DiaObjs = results.to_table()
del results
NDup = np.zeros(len(DiaObjs))
for i in np.arange(len(DiaObjs)):
ra = DiaObjs['ra'][i]
decl = DiaObjs['decl'][i]
results = service.search("SELECT ra, decl, diaObjectId, diaSourceId, ccdVisitId,"
"filterName, midPointTai "
"FROM dp02_dc2_catalogs.DiaSource "
"WHERE CONTAINS(POINT('ICRS', coord_ra, coord_dec), "
"CIRCLE('ICRS'," + str(ra) + ", "
+ str(decl) + ", 0.000139)) = 1 ", maxrec=100000) #0.5'' radius coordinate search
DiaSrcs = results.to_table()
del results
NDup[i]=len(list(set(DiaSrcs['diaObjectId'])))
From running the code above and executing the following:
len(NDup[NDup>1])/len(NDup)
I find that ~70% of the DiaObjects have at least one more DiaObject as the same position.
Since this issue could impact Rubin science (e.g. transient and variable statistics, light curves), it would be great to understand and address what might be causing this.
Lastly, I’ll note that this problem is related to my previous post, here: