LSST Batch Production Services [CONOPS] are about executing “[…] campaigns on computing resources to produce the desired LSST data products […]” where campaigns are defined as sets of pipelines (ordered ensembles of computational steps), inputs they are being run against, and methods handling their outputs. As campaigns can vary in size and complexity, we will face a non-trivial problem of orchestration their execution on, possibly distributed, computational resources to satisfy the data-dependencies. As this pattern is quite common in many scientific and business applications there exist many frameworks, workflow management systems, whose sheer purpose is to automate such processes.
To select an optimal workflow management system for LSST Batch Services we made a detailed, multi-aspect survey of a few available workflow management systems to estimate their usefulness in implementing objectives of LSST batch operations [CONOPS]. Based on our findings, we have selected three of them: Airflow, Pegasus, and PanDA for further tests in which picked candidates will be used to orchestrate execution of a few specific LSST pipelines (such as process_ccd) in NCSA’s HPC environment.
The document that goes into more detail about how we selected these workflows, can be found at https://dmtn-025.lsst.io/