Meeting Notes: Level 1 Processing in DRP

Meeting happened at 11:30am Pacific on 2/16/2016.

Present: @davidciardi, @xiuqin, @gpdf, @ktl, @mjuric, @jbosch, @hsinfang, @rhl, @ksk, @jdswinbank (and maybe others?)

Summary (from @jbosch’s memory; please augment if anything important is missing):

  • DPDD states that we’ll do “Level 1 reprocessing” during each data release, and use this to replace the Level 1 database with results from using updates algorithms, templates, calibrations, etc. This reprocessing should include all observations up to the actual data release (more specifically, the Level 1 database replacement point), and it should use the same algorithms and pre-built inputs that will be used in Level 1 nightly processing that happens after this point.

  • The moving object database also needs to be refreshed at each data release. Given that this is also updated daily during normal Level 1 operations, it’s not clear whether there’s a meaningful distinction between this database and the rest of the Level 1 database aside from the fact that the moving objects can’t be localized to a particular part of the sky.

  • We also need to do image subtraction, DiaSource detection, DiaObject generation, and MOPS during the main Level 2 processing in slightly different ways. We’ll want to leverage the same calibration and low-level processing done for the rest of the Level 2, probably make more measurements on DiaObjects and DiaSources than we would nightly, and do a better job of associate DiaObjects with with other kinds of Objects. We’ll also need to use this image subtraction to find artifacts to mask when building coadds. The inputs to this processing must be just those identified at the start of DRP processing, because it needs to mostly happen in an ordered determined spatially, not temporally.

  • We have quite a bit of flexibility in choosing input cutoff dates and database refresh dates, and don’t need to make them the same to the extend implied by the DPDD if that helps resolve problems. Not obvious that it would.

  • We need to build coadds as templates before we can do image subtraction in Level 2, but we also need image subtraction to generate masks before we can build coadds. Proposal on the table is to do PSF-matched coaddition with outlier rejection prior to image subtraction, then do other forms of coaddition (and multifit) after image subtraction.

  • From a scientific standpoint, the Level 2 image differencing (etc) data products should be at least as good as the reprocessed Level 1 data products for observations before the DRP input cutoff date, but it’s not clear whether we can just use them to update the Level 1 database, because they’ll have different schemas and algorithmic provenance, making them harder to compare to subsequent measurements generated with Level 1 algorithms. It may be better to just run image subtraction twice on all images before the DRP cutoff date (once with Level 2 versions of the algorithms in spatial order, once with Level 1 versions in temporal order), and provide database functionality to join the two databases.

  • Any Level 1 reprocessing during a data release cannot start before the template generation and MOPS stages have completed in the Level 2 processing. Because MOPS in Level 2 is already expected to be a full-sky sequence point, this doesn’t introduce any new limits on the Level 2 processing, but it does mean that Level 1 reprocessing cannot be continuous throughout the year. New concern as I write up these notes: if Level 1 reprocessing depends on completing full-sky deep detection first (to generate regular Objects with which to associate), this would represent a new full-sky sequence point in Level 2 processing.

  • It may be possible to devise a Level 1 database in which updates at least appear to be continuous to the user. This is a database engineering challenge.

  • We need to put together a sequence diagram including all important dates for a data release, at a level of detail sufficient to include these dependencies.

  • Ownership is unclear for both the Level 2 variants of Level 1 algorithms and the Level 1 reprocessing during a data release. We need to make sure these don’t fall through the cracks.

I’m not sure if this is helpful information, just muddying the waters or something that’s already clear, but the moving object database generated during DRP and the moving object database maintained nightly as a level 1 product need to be different, and it’s really unclear how to update the level 1 version with the DRP version (or how to use the level 1 version in the DRP, if at all … I’d lean towards not using it actually).

I think one of the main problems with trying to update the level 1 database with the DRP is that orbit fitting and assigning diaSources to movingObjects is an iterative process, without clear boundaries other than “the entire dataset”, that will take some time to complete as the database gets bigger.

We still need to leave the old L1 values around as the alert IDs issued on the night have to be traceable back to the original values.

I found this discussion a bit hard to follow, and I think in part that’s due to a confusion in terminology. Specifically: the DPDD is quite specific in its definition of “level 1” and “level 2”. From §2.2:

Analysis of difference images […] will result […] in Level 1 data products.

Analysis of direct images […] results are Level 2 data products. It will also include fully reprocessed Level 1 data products.

This seems to be in conflict with the discussion yesterday which draws a distinction between “Level 2 image differencing (etc) data products” and “reprocessed Level 1 data products”.

Of course, as Jim’s writeup suggests, it will be necessary to generate some intermediates based on difference imaging to produce the level 2 catalogues. I guess what’s unclear is whether they’re just intermediates, or if they’ll be an official supported part of the release.

A clear statement of the goals would help.

No, we don’t. See paragraph 5 of section 4.3.5 of the DPDD and following.

There are three potentially different things: “live” Level 1 processing, Level 1 reprocessing, and Level 2 difference image processing. There can be simplifications if any two of these are really the same; it seems unlikely that all three are the same.

If we draw a distinction between Level 1 data products and reprocessed Level 1 data products, they may then use different algorithms and input data. If we draw a distinction between Level 2 image differencing data products and reprocessed Level 1 data products, we can make the Level 1 database “smoother” with more potential consistency.

This all stems from interpretation of the DPDD in paragraph 4 of section 4.3.5, which does not anticipate having separate processing between Level 1 and Level 2 and thus assumes that the same processing can be applied to the post-DRP-cutoff data as the pre-cutoff data. This section should be rewritten to match what we expect to be able to do.

Gulp. I had always thought that the alert that got issued was never going to be deleted and that the DRP reprocessing would be distinct. How do we handle people writing papers on alerts we issue but with no ability for anyone to later obtain that alert? Asking them to do a query on the current L1 database in that part of the sky and then trying to explain why the numbers are different to what was in the paper seems less than ideal. Are the issued alerts somehow not part of the L1 database?

I’m guessing that cross-matching the original L1 database with the DRP version is much easier than the general cross match problem as there will be far fewer sources. In an ideal world a previous alert would be traceable directly to the current DRP L1 version.

The issued alerts including all of their contents (which include a DiaObject and relevant DiaSources) are never deleted and are available as part of an Alert Table within the L1 Database that is never overwritten. The main part of the L1 database containing DiaSources and DiaObjects will change, however.