Persistence of new image characterization products?

As part of overhauling processCcd we have a new ImageCharacterizationTask, which is run before CalibrationTask to measure the PSF and provide an initial background model. Potential outputs include a characterized image (new, e.g. “charexp”), source catalog (the existing “src” looks appropriate) and background model (new, e.g. “charexpBackground”). These are preliminary products that will be further refined by the calibration task. How much control do we need for which products are written? I propose at least one flag that enables and disables all of it, but do we need the further ability to control which products are written? If we’re not sure I suggest starting simple and adding flags as they prove necessary.

For reference, the old ProcessCcdTask offers the following flags: doWriteCalibrate doWriteCalibrateMatches, both of which control output after running the calibrate task, which is very different than the new calibrate task since the new one is the final operation, and these flags that control later outputs: doWriteSources, doWriteSourceMatches, doWriteHeavyFootprintsInSources.

Also the old ProcessCcdTask had persistBackgroundModel: if True then save background model separately from calexp and calexp is bg-subtracted. If False then don’t save background model and calexp is includes bg. That’s confusing. Do we still need both options? I propose to always leave the background in and persist the model (if the calexp is persisted) and likewise for the image char. task.

More generally some sort of rule of thumb for how much granularity is wanted would be helpful.

I’m anticipating that the New Butler (plus support in pipe_base) will provide configuration-based control over output of specific datasets (including the ability to change or add destinations, but also to remove them altogether). When that is available, it shouldn’t be necessary to build a lot of control into the task itself. However, this will likely be delivered after S16, so some task config flags would be needed in the meantime.

I generally agree that we should have less granularity at first and only add it as needed.

I strongly believe we should avoid having a dataset that can mean two very different things. But in the case of exposures and background subtraction, I think we should do the oppose of what you propose, at least for now, and save the calexp without the background and require users who want the background to add it back in, just because I think the vast majority of downstream code will want the background subtracted.

@jbosch my main concern about your proposal is that it does not match what we are doing right now, so it may be a very disruptive change. We usually persist the background model and leave the calexp with background.

I propose that we overhaul processCcd and the calib task first, then make the change you propose on a separate ticket (and only if others agree). I hope it would not be controversial; as long as you know the background you can easily add it or subtract it as desired, and having it subtracted by default does seem to match more use cases.

Please check that. We definitely subtract the background from the calexp on the HSC side, and from a quick glance at the code in meas_algorithms and pipe_tasks on the LSST side I think we subtract it there too (unless obs_* packages are routinely changing that with config overrides).

The code is clear. The config flag persistBackgroundModel has the following behavior:

  • If true (the default) then save the background model and do not subtract the background from the persisted calexp
  • if false then don’t save a background model and do subtract the background from the perissted calexp

Note: the calexp returned by the task is handled slightly differently than the persisted calexp: if both persistBackgroundModel and doWriteCalib are false then background is not subtracted; otherwise the above behavior applies. I suspect this is a bug, not a feature. The difference will go away in the new task.

The docs disagree:

persistBackgroundModel = pexConfig.Field(
    dtype = bool,
    default = True,
    doc = ("If True persist background model with background subtracted calexp.  "
           "If False persist calexp with the background included."),
)

…and I’m pretty sure the code does as well; note that calExposure has already been background-subtracted by the time we get to processCcd.py:194 (that happens within the multiple calls to meas.algorithms.estimateBackground, since we always pass subtract=True), so we have to call restoreBackgrounds to end up with a calexp that includes the background.

You’re right. That’s good. No changes needed beyond ditching the flag and acting as if persistBackgroundModel is true.