DRP Overview Documentation

I’ve just pushed my first draft of an overview of the eventual Data Release Production:

The idea is that this is best visualized as a big annotated graph with tasks and datasets as nodes - but I found myself distracted by presentation issues whenever I tried to do this in an actual graphical representation, so for now I’m just writing the content of the graph nodes (both tasks and datasets) as elements in yaml files, and some day in the future we’ll figure out how to better present that.

While this is only the very first step in a big project, I think there’s enough there now that it will hopefully bring more clarity than confusion to others. But there are some huge caveats:

  • No one else has signed off on this - so right now it’s just my ramblings. To improve that, I’d like to get some early feedback from @RHL, @mjuric, and @ktl as to whether it’s mostly going in the right direction. Eventually we’ll need to make sure it’s formally consistent with other existing documentation (mostly LDM-151 and the DPDD). My intent is that this the differences will be resolved in favor of this document, as it reflects more recent thinking, but those differences of course need to be negotiated.

  • There isn’t necessarily a 1-1 mapping between “tasks” I imagine here and pipe.base.Task classes I expect us to write, or between “datasets” here and Butler datasets, so please don’t worry about differences in that now (especially in naming). It’s also not safe to assume that I’m using a phrase here in exactly the same way it’s used in the current pipeline. But as this design is refined and made more detailed, I would like to get to the point where these graph elements do correspond to specific code elements.

  • I clearly need to describe at least one level (probably two) levels of tasks below these top-level ones, and it’s entirely possible that will raise some issues that will require reorganizing the top level.

  • There are some natural “cycles” in the processing that this design attempts to address, and figuring these out is most of why it’s taken me so long to get this far. There are many other ways to resolve these cycles, and while I think I’ve made sensible choices, I had no illusion that they’re unique or even optimal.

  • The calibration products pipeline is not included, and some datasets are really just grab-bag placeholders.

My immediate to-do list:

  • Cross-reference with DPDD
  • Cross-reference with LDM-151 (and WBS)
  • Calibration Products pipeline
  • Next level of detail for process_visit
  • Next level of detail for coaddition tasks, especially data flow for background matching.