As part of the upcoming SuperTask migration, we are refactoring many of our Tasks. This refactoring also provides an opportune moment to implement RFC-352. RFC-352 says that the default pipeBase.TaskRunner should call a task’s RunDataRef method on a single dataRef.
DM-2639 will:
(1) change the pipeBase.TaskRunner to call a Task’s RunDataRef() method instead of its run()method
(2) change the name of all existing CmdlineTasks’ run methods to runDataRef and the Task’s “heart” or “primary unit of execution” method (e.g. characterize(self, exposure, exposureIdInfo=None, background=None) for CharacterizeImageTask, and calibrate(self, exposure, exposureIdInfo=None, background=None, icSourceCat=None) for CalibrateTask) to run.
This is a major API change for the stack. We will migrate all packages in the lsst_distrib, the “lsst” github user organization, and the lsst-dm packages on our radar, which include:
Could you update lsst-dm/ap_pipe, please? ApPipeTask has a run that takes a dataRef, but no “primary unit of execution”.
Note that this package has a custom TaskRunner that reimplements __call__ (this is, fortunately, going away, but probably after DM-2639). I think I’ve seen about half a dozen similar runners scattered throughout the Stack.
jointcal does not operate on a dataRef, but rather a list of them, so runDataRef() does not make sense for it. It does have a custom JointcalRunner(pipeBase.ButlerInitializedTaskRunner), so that may save me here?
The method name runDataRef shouldn’t be taken too literally. (I’m not crazy about the new naming scheme, but that is neither here nor there because RFC-352 and RFC-26 have spoken)
The way to think about it is that CmdLineTasks’s parseAndRun() is going to call a TasksrunDataRef method and SuperTask is going to call runQuantum.
But like ApPipeTask, your JointcalTaskRunner overrides __call___ so technically Jointcal can opt out.
…until it’s time to make a jointcal SuperTask. At that point it will be highly desirable to have a run method that takes in-memory objects, returns all of its outputs, and does no I/O. It’s not an absolute requirement that all SuperTasks have that - not all can separate the I/O from the algorithm, and jointcal may be one of the few that can’t - but it’s much easier to make SuperTasks for those do.
In this case, the in-memory objects would be all of the catalogs and all of their metadata. I’m not sure we’re budgeted for that much memory. jointcal is currently designed to read one catalog at a time and strip out only what it needs.
cp_pipemight be a package to watch out for here. There is a large unmerged ticket which contains a new task, and which might break the mould here in a similar way to jointcal. It’s in review though, so perhaps by the time this actually gets done this will be moot, but given the request for input I just thought I’d mention it.