Pipetasks in parallel or distributed computing

antenglert · July 6, 2023, 4:48pm

Hello, I’m currently building monthly calibration frames and have been trying to speed up the process.

At first I tried running multiple pipetasks in different sbatch scripts (effectively processing each month of data in parallel, each executed in a separate batch script), but ran into an issue where the repository is read-only (I think this is due to these independent pipetasks trying to write to the repository while another one is reading). One workaround would be to create several independent repositories/butlers on my machine and export the final calibration products to a final butler one-by-one then delete them… but when I get to assembling the flat-frames this will result in, err, several hundred unnecessary repositories. Is there any other workaround that would let me execute these pipetasks simultaneously?

Another solution for me is to speed-up the processing of each monthly calibration frame, currently I’m limited to around 20 cpu’s per-node. From what I can tell, -j only distributes across a single node/machine and with pipe-drivers module being deprecated, I’m not sure if there is anything else I can do to distribute a single pipetask across multiple nodes. Is there any workaround for this?

Right now I am running one command-line task at a time processing each monthly calibration frame (with the appropriate DATE filled in):

pipetask run --register-dataset-types -j 20
    -b repo --instrument lsst.obs.decam.DarkEnergyCamera
    -i DECam/raw/all,DECam/calib/curated/19700101T000000Z,DECam/calib/unbounded
    -o DECam/calib/uncertified/bias_DATE
    -p $CP_PIPE_DIR/pipelines/DarkEnergyCamera/cpBias.yaml
    -d "instrument='DECam' AND exposure.observation_type='zero' AND exposure.day_obs = DATE"

timj · July 6, 2023, 4:59pm

We have solved all your problems. pipetask is indeed solely for single node processing but the ctrl_bps system is designed for batch. This runs the graph builder and then converts it to a workflow graph suitable for processing with parsl or htcondor or PanDA. All of those can use slurm as the backend depending on how you have configured it. BPS runs the workflow jobs in a way that is optimized for large scale processing and will deal with the problem you are having with multiple pipetask jobs hitting a single registry as they execute. Batch processing specifically avoids touching registry apart from the final job when the workflow completes and everything is synced up with the original registry. Processing a lot of jobs with a SQLite butler registry is always going to be painful.

There is a small description of this in the pipeline execution paper on arXiv.

Search for ctrl_bps in the developer guide.

For example, the instructions to use bps at USDF include some instructions that you may find useful.

https://developer.lsst.io/usdf/batch.html

If you still don’t want to use the BPS infrastructure you will need to understand how we run pipetask from within BPS. We have a special execution mode that treats the graph as a registry and so doesn’t need the registry at all if you create the graph before executing the graph. There is then a command to transfer the outputs back to the main registry using the graph and the datastore.

antenglert · July 6, 2023, 5:06pm

Thank you! I’ll try implementing it right away.

antenglert · July 7, 2023, 12:21am

Alright, I have a local installation of the ctrl_bps_parsl module, but I’m having trouble figuring out how to tell ctrl_pbs where the WMS plugin is located (and I’m having some trouble finding it).

To install ctrl_bps_parsl I followed the tutorial here and installed it in the same folder as loadLSST.bash (just to make setup a little easier since its not loaded with lsst_distrib ). I’ve been trying to edit bps_config.yaml and change wmsServiceClass to the ParslService class in the new module, but everytime I run bps ping I get an error saying that the module doesn’t exist, e.g. in the config

wmsServiceClass: ctrl_bps_parsl.ParslService

Produces an error

bps ping
> ...
> ModuleNotFoundError: Unable to import 'ctrl_bps_parsl.ParslService' (No module named 'ctrl_bps_parsl')

I’m sure the issue is with the installation, so I checked eups list -s as well, and it is in the list

eups list ctrl_bps_parsl
> LOCAL:/.../lsst_stack_v24/ctrl_bps_parsl    setup

I think my installation is incorrect, but I can’t find any solid guidelines for adding this module into my lsst_stack, any advice?

timj · July 7, 2023, 1:07am

ctrl_bps_parsl is a standard part of our distribution. What version of the software are you running?

Edit: it looks like v24. That’s really far too old. Please install the most recent weekly.

timj · July 7, 2023, 1:09am

Follow the instructions here: Install with lsstinstall and eups distrib — LSST Science Pipelines and choose tag w_latest.

antenglert · July 7, 2023, 12:56pm

Everything is running smoothly now, thanks for the help!