How are DIAObjects created?

deppep · November 11, 2025, 2:21pm

Hello,

I’m trying to understand the exact algorithm used for creating the DiaObject table. According to the DP1 paper:

The DiaObject catalog contains the astrophysical objects that DiaSources are associated with (i.e., the “DiaObjects”). The DiaObject catalog only contains non-Solar System Objects; Solar System Objects are instead recorded in the SSObject catalog (see below for a description of the SSObject catalog). When a DiaSource is identified, the DiaObject and SSObject catalogs are searched for objects to associate it with. If no association is found, a new DiaObject is created and the DiaSource is associated to it.

Based on this description, my interpretation of the algorithm is:

for source in diaSources_sorted_by_time:
    # Search for compatible DiaObjects within spatial tolerance
    candidate_objects = find_objects_within_cone(
        center=source.centroid, 
        radius=source.positional_error
    )
    
    if candidate_objects is empty:
        # No match found: create new DiaObject
        create_new_diaobject(source.id)
    else:
        # Multiple matches: associate with closest object
        source.diaObjectId = min(candidate_objects, key=distance_to_source)

This however is only my interpretation. I can’t find further details. In particular, I’d like to know:

What matching criteria are used to determine if an existing object is “compatible” with a source, exactly?
When multiple objects match a source, which object is selected? Is it purely nearest-neighbor by (error-weighted?) angular distance? Are other factors considered?
Do you expect the same strategy to be used for future data releases?

Thank you!

isullivan · November 11, 2025, 7:23pm

The algorithm for creating new DiaObjects is different in Alert Production [AP] and for the Data Releases [DRP]. In DRP we have all of the DiaSources available at the same time, and perform an N-way match to all of the DiaSources within the same spatial patch similar to your interpretation (code is here). In AP, we use the “optimistic pattern matcher B” from section 2.3 of https://arxiv.org/pdf/0710.3618
To your specific questions:

Association is purely based on separation, though quality cuts are used to exclude DiaSources.
Yes, when there are multiple matching objects to a source, the closest one is picked. In the future we intend to include a list of the next closest objects so that users can re-associate for their science.
I expect the association strategy to change significantly in future data releases. In particular, we intend to move to a probabilistic matcher for cases where sources are close to two objects, so that the associated object is chosen by e.g. flux instead of just separation.

deppep · November 11, 2025, 9:03pm

As always, thank you for your clear and precise answers, Ian! Till next time…