How to copy psf flags from icSrc to src, and must we?

Continuing the discussion from Requirements for overhauled calibration task?:

I have been implementing @jbosch’s proposed subtask hierarchy for ProcessCcdTask:

and I have run an issue propagating “used for PSF” flags from icSrc, the source catalog from image characterization, to src, the source catalog output by the calibration task. I want to get a sense of how important this is to people, because it is not easy to do.

The issue is that we want CalibrationTask to be usable as a standalone cmd-line task. If it is going to copy fields from icSrc, it must either have the schema (and some idea of which fields to copy) at construction time (preferably), or else it will have add the fields while running, which requires copying data and means the returned schema won’t match self.schema.

An alternative is to keep the catalogs output by image characterization and calibration separate, but add a third catalog that copies the flags from the first to the second (e.g. isSrc, calSrc and src. If we really need these flags then it would be fairly sane to have ProcessCcdTask perform this operation (in my opinion it’s much less sane to expect CalibrateTask to do this).

If we don’t actually need these flags in the catalog output by the calibration task, then the sanest and simplest alternative is to omit them. This simplifies the architecture and the code. That information could be regenerated later, if needed.

Propagating flags from the calibration catalog to the final measurement catalog is absolutely essential. It is a feature that was specifically requested quite a while ago on HSC, and it is regularly used.

I’m surprised this code hasn’t yet been pulled from the HSC side. You can grab it from https://github.com/HyperSuprime-Cam/pipe_tasks/blob/master/python/lsst/pipe/tasks/propagateVisitFlags.py

I sympathize with your argument that propagating the flags adds a lot of complexity and requirements to what would otherwise be much simpler Tasks in many respects. But it’s extremely useful for debugging, and I think we need to keep it in some form.

That doesn’t need mean it has to be part of ProcessCcdTask directly, though - and now that ProcessCcdTask is itself going to contain multiple CmdLineTasks as subtasks, it might make sense to do the same here, and add a new CmdLineTask (to be used as a ProcessCcdTask subtask) that matches and combines all of the catalog datasets into a new output catalog. I’m not at all convinced that will be easier to implement. But I do think it might be somewhat cleaner and more convenient in the end.

There’s also a more direct solution to the problem of CalibrationTask needing access to ImageCharacterizationTask's schema at construction time: use a ButlerInitializedTaskRunner to pass a butler argument to CalibrationTask's constructor, then use that to load ImageCharacterizationTask's config. This pattern is already in use in other parts of the pipeline, such as forced photometry (where we load the reference catalog schema this way).