Unifying DIAObject and Object in DRP

jbosch · April 22, 2016, 9:03pm

It’s not immediately obvious from looking at it, but my diagram describing DRP processing flow implies that there’s no real distinction between DIAObjects and Objects in Data Release Production. This caused some confused at today’s SPDWG meeting, so I’ll try to lay out my vision a little more concretely.

In Nightly processing, we associate DIASources spatially to produce DIAObjects, then aggregate DIASource measurements to fill in the DIAObject table.

We could do the same in DRP, but I argue that it’s better to defer associating the DIASources with each other, and instead just wait until we’ve run detection on coadds and then associate all DIASources and coadd detections at the same time. Having access to the coadd detections (and possibly coadd pixel data) during association will give us a lot more information about how to resolve ambiguous matching due to blends. We can also do anything we could have done in associating DIASources with each other (such as reject spurious objects that have only had one detection over a long time period). This association procedure just produces Objects, which would then be a single consistent catalog that attempts to explain the represent everything in the sky but solar system objects.

We’ll need to trace the origin of all of these Objects (even non-DIASource Objects need to record which kinds of coadds they were detected in), and in the case of DIASource-derived or partially-DIASource-derived Objects it’s straightforward to both set bits in the Object table and add an Object ID column to the DIASource table (a DIASource will be associated with exactly one Object, unless it is declared to be spurious) to link them up.

Since we’re planning to do forced photometry on all Objects, DIASource-derived Objects will automatically get any forced photometry that they would have had as DIAObjects (but at better positions), and when Nightly processing queries the DR database for Objects near a new nightly DIASource, there’s no need for any additional spatial matches to find the DR DIASources associated with that Object.

That still leaves the question of what to do with the columns in the DIAObject table in DR. If we want to keep them, this is still straightforward - we can just have a DIAObject table with (conceptually) the same IDs as the Object table, but a subset of the Object table’s records, containing aggregate quantities computed from the DR DIASources associated with that Object. But I’m not sure we actually need to keep them, at least not beyond diagnostic/QA use; every quantity in the DIAObject table currently described by the DPDD is also present (but measured differently) in the Object table. I’d argue that the Object table measurements should be consistently higher-quality than their DIAObject counterparts, generally because they utilize more information:

The DIAObject positions, proper motion, parallax, and PSF flux parameters (defined by an aggregate over DIASource measurements) will be better measured by the multifit Moving Point Source model fit results in the Object table.
The lcPeriodic and lcNonPeriodic quantities in DIAObject are also present in Object, where they are determined from forced photometry rather than than independent DIASource measurements.

Overall, I’d say this puts the DIAObject measurements in the same position as Source: they’ll probably be superseded by other measurements, but we may not want to drop them until we can demonstrate that this is the case. In particular, I’m worried about whether multifit point source model fitting will actually be better in extremely crowded regions.

The biggest concern I have with this proposal is that it implies that we’ll run the full suite of Object measurements even for pure transients like supernovae, and that could be both wasteful (for e.g. multifit bulge+disk fits) and poorly-defined (for measurements on coadds). I don’t actually know if this is a new problem or just an unappreciated one that already existed; I think the current DPDD is quite vague about how/if DIASources can generate Objects at present. Obviously, having too many false detections in difference imaging analysis would make this an even more serious concern.

In any case, I think it’s straightforward to solve this by vertically partitioning the Object table to separate galaxy- or coadd-focused measurements and adding some language to the DPDD about when we decide we don’t need to do those measurements. We could almost certainly use the same criteria we’ll use to determine whether to mask a DIASource (because it’s transient) from the coadds or average it (because it’s variable), so it’s not like we’d have to make a scientific choice we wouldn’t have already had to make. And in no case would we be opting not to do galaxy- or coadd- measurements on stars.

ktl · April 22, 2016, 10:30pm

Two questions:

Is the density of coadded galaxies high enough that every DIASource will match one of them?
Where does the SSObject generation happen? I’m guessing we take the DIASources and coadd detections, associate them to form Objects (both coincident with coadd detections and not), use any unassociated DIASources to try to generate SSObjects, and any remaining DIASources after that become Objects as well. Perhaps there is a preliminary step of associating DIASources with an input SSObject catalog (but there be feedback dragons to watch out for).

jbosch · April 24, 2016, 5:13pm

I think the right way to think about this is that (I think) the vast majority of DIASources will in fact be physically associated with a static-sky object that LSST detects. That will obviously be true of variable stars and AGN, and I suspect it will be true for most supernovae (at least by the end of the survey) - even if a supernova is significantly brighter than its host galaxy, we’ll have enough observations of the host galaxy to make up for that (I’d be curious to hear e.g. @mwv’s opinion on this).

So I think it might be true that most DIASources will be blended with a galaxy, but that’s not necessarily a bad thing. It would be extremely problematic for a naive matching algorithm that just used centroids to make its decisions, but our deep association algorithm has always had to be more clever than that. I think that will have to include things like preferring to associating DIASources with coadd-detected point sources over galaxies, treating apparent supernovae (as ~month timescale transients) specially, and inferring movement for high-proper motion stars.

DIASources that correspond to solar system objects are a different case - there the overlap with galaxies is a serious problem…

I had been assuming we’d do the latter, and that the feedback dragons here are the sort we would have to find some way to slay.

That’s largely because I’d be worried about only using unassociated DIASources to generate SSObjects; as I mentioned above, I am worried that we’d incorrectly associate too many SSObject DIASources with galaxies, and that would compromise our ability to build SSObjects.

How does this work on the Nightly (or Daily) side? I’d assumed that essentially all DIASources would be input to (day) MOPS, but now that I think about that it sounds like it makes MOPS really, really hard. And if we’re only inputting unassociated DIASources into (day) MOPS there, that would make this proposal for DRP problematic.

ktl · April 25, 2016, 4:42pm

Indeed, the baseline has been to only send unassociated DIASources to DayMOPS. But those associations are only to prior DIAObjects, not to L2/DRP Objects, so it’s a bit better.

ivezic · May 2, 2016, 4:36am

Just a quick philosophical, and perhaps an obvious point: when we planned to replace the nightly Level 1 db with a new Level 1 db produced during DRP, both dbs of course had to have the same structure. But if we stick to the new plan where we have nightly Level 1 db evolve for all 10 years w/o being overwritten, then we will have more freedom to optimize what we do in DRPs, i.e. w/o the constraint to have the same db structure as in nightly processing. At least in principle, we could give up DIAObjects, or redefine what DIASource means and/or contains, etc. (on the grounds that the purpose and goal of nightly processing, i.e. fast response, is very different from DRP goals).

I also think that, in addition to DIASources, we need to add Sources to the Object list. I will discuss this proposal in a separate thread.