As part of the upcoming SuperTask migration, we are refactoring many of our Tasks. This refactoring also provides an opportune moment to implement RFC-352. RFC-352 says that the default
pipeBase.TaskRunner should call a task’s
RunDataRef method on a single
(1) change the
pipeBase.TaskRunner to call a Task’s
RunDataRef() method instead of its
(2) change the name of all existing CmdlineTasks’
run methods to
runDataRef and the Task’s “heart” or “primary unit of execution” method (e.g.
characterize(self, exposure, exposureIdInfo=None, background=None) for
calibrate(self, exposure, exposureIdInfo=None, background=None, icSourceCat=None) for
This is a major API change for the stack. We will migrate all packages in the lsst_distrib, the “lsst” github user organization, and the lsst-dm packages on our radar, which include:
Action requested: Please let us know if you have a package in lsst-dm or elsewhere you’d like a pull request for.
Could you update
ApPipeTask has a
run that takes a
dataRef, but no “primary unit of execution”.
Note that this package has a custom
TaskRunner that reimplements
__call__ (this is, fortunately, going away, but probably after DM-2639). I think I’ve seen about half a dozen similar runners scattered throughout the Stack.
Thanks @kfindeisen. Because your TaskRunner reimplements
ApPipeTask will continue to work as-is even after DM-2639 is merged.
I’ve added it to the list though, and you’ll see a pull request to bring it in line with the others in terms of the naming conventions.
jointcal does not operate on a
dataRef, but rather a list of them, so
runDataRef() does not make sense for it. It does have a custom
JointcalRunner(pipeBase.ButlerInitializedTaskRunner), so that may save me here?
The method name
runDataRef shouldn’t be taken too literally. (I’m not crazy about the new naming scheme, but that is neither here nor there because RFC-352 and RFC-26 have spoken)
The way to think about it is that CmdLineTasks’s
parseAndRun() is going to call a
runDataRef method and SuperTask is going to call
ApPipeTask, your JointcalTaskRunner overrides
__call___ so technically Jointcal can opt out.
…until it’s time to make a jointcal SuperTask. At that point it will be highly desirable to have a
run method that takes in-memory objects, returns all of its outputs, and does no I/O. It’s not an absolute requirement that all SuperTasks have that - not all can separate the I/O from the algorithm, and jointcal may be one of the few that can’t - but it’s much easier to make SuperTasks for those do.
In this case, the in-memory objects would be all of the catalogs and all of their metadata. I’m not sure we’re budgeted for that much memory. jointcal is currently designed to read one catalog at a time and strip out only what it needs.
cp_pipe might be a package to watch out for here. There is a large unmerged ticket which contains a new task, and which might break the mould here in a similar way to
jointcal. It’s in review though, so perhaps by the time this actually gets done this will be moot, but given the request for input I just thought I’d mention it.