API Change for Tasks. Rename run/<primaryMethod> to runDataRef/run

yusra · July 17, 2018, 9:38pm

Input Requested:

As part of the upcoming SuperTask migration, we are refactoring many of our Tasks. This refactoring also provides an opportune moment to implement RFC-352. RFC-352 says that the default pipeBase.TaskRunner should call a task’s RunDataRef method on a single dataRef.

DM-2639 will:
(1) change the pipeBase.TaskRunner to call a Task’s RunDataRef() method instead of its run()method
(2) change the name of all existing CmdlineTasks’ run methods to runDataRef and the Task’s “heart” or “primary unit of execution” method (e.g. characterize(self, exposure, exposureIdInfo=None, background=None) for CharacterizeImageTask, and calibrate(self, exposure, exposureIdInfo=None, background=None, icSourceCat=None) for CalibrateTask) to run.

This is a major API change for the stack. We will migrate all packages in the lsst_distrib, the “lsst” github user organization, and the lsst-dm packages on our radar, which include:

lsst-dm/qa_explorer
lsst-dm/pipe_analysis
lsst-dm/fgcmcal
lsst-dm/donut
lsst-dm/dm-demo-notebooks

Action requested: Please let us know if you have a package in lsst-dm or elsewhere you’d like a pull request for.

kfindeisen · July 17, 2018, 10:02pm

Could you update lsst-dm/ap_pipe, please? ApPipeTask has a run that takes a dataRef, but no “primary unit of execution”.

Note that this package has a custom TaskRunner that reimplements __call__ (this is, fortunately, going away, but probably after DM-2639). I think I’ve seen about half a dozen similar runners scattered throughout the Stack.

yusra · July 17, 2018, 10:26pm

Thanks @kfindeisen. Because your TaskRunner reimplements __call__, ApPipeTask will continue to work as-is even after DM-2639 is merged.

I’ve added it to the list though, and you’ll see a pull request to bring it in line with the others in terms of the naming conventions.

parejkoj · July 18, 2018, 8:57pm

jointcal does not operate on a dataRef, but rather a list of them, so runDataRef() does not make sense for it. It does have a custom JointcalRunner(pipeBase.ButlerInitializedTaskRunner), so that may save me here?

yusra · July 20, 2018, 2:59am

The method name runDataRef shouldn’t be taken too literally. (I’m not crazy about the new naming scheme, but that is neither here nor there because RFC-352 and RFC-26 have spoken)

The way to think about it is that CmdLineTasks’s parseAndRun() is going to call a Tasks runDataRef method and SuperTask is going to call runQuantum.

But like ApPipeTask, your JointcalTaskRunner overrides __call___ so technically Jointcal can opt out.

jbosch · July 20, 2018, 6:17pm

…until it’s time to make a jointcal SuperTask. At that point it will be highly desirable to have a run method that takes in-memory objects, returns all of its outputs, and does no I/O. It’s not an absolute requirement that all SuperTasks have that - not all can separate the I/O from the algorithm, and jointcal may be one of the few that can’t - but it’s much easier to make SuperTasks for those do.

parejkoj · July 20, 2018, 7:54pm

In this case, the in-memory objects would be all of the catalogs and all of their metadata. I’m not sure we’re budgeted for that much memory. jointcal is currently designed to read one catalog at a time and strip out only what it needs.

merlin · July 23, 2018, 4:41pm

cp_pipe might be a package to watch out for here. There is a large unmerged ticket which contains a new task, and which might break the mould here in a similar way to jointcal. It’s in review though, so perhaps by the time this actually gets done this will be moot, but given the request for input I just thought I’d mention it.