New in-Python pipeline executor available

kfindeisen · April 18, 2023, 12:35am

DM-36162 adds a third pipeline executor to the Science Pipelines. SeparablePipelineExecutor was designed for the Prompt Processing framework’s needs, but it’s flexible enough that other developers may find it useful.

Compared to the existing SimplePipelineExecutor (run from Python, no support for multiprocessing or anything other than immediate execution into a fresh run) and CmdLineFwk (run from shell as pipetask, lots of options and features), SeparablePipelineExecutor is intermediate in functionality: it is run from Python, and supports multiprocessing, skipping completed quanta, and overwriting existing datasets, but not saving/visualizing graphs, automatic collection management, or profiling/statistics.

SeparablePipelineExecutor also has two features that neither of its predecessors has:

each execution step is run independently, accepting and returning the objects (Pipeline, QuantumGraph, etc.) needed for the other steps. This is similar to how CmdLineFwk lets you save/load graphs from disk, but is completely in-memory.
you can provide your own implementations of certain Middleware APIs (currently TaskFactory, GraphBuilder, and QuantumGraphExecutor) to customize the execution for specific applications. The default is to use the same classes (and init arguments) as CmdLineFwk.

Note that the API for this class is not quite stable:

Pre-execution support will need to be completely rewritten after DM-38041, which is why the current class doesn’t offer a custom PreExecInit hook.
GraphBuilder was not designed as a generic interface, so there’s no guarantee that a future custom builder will be able to use the method signature currently assumed by SeparablePipelineExecutor.