Hello, I’m currently building monthly calibration frames and have been trying to speed up the process.
At first I tried running multiple pipetasks in different sbatch scripts (effectively processing each month of data in parallel, each executed in a separate batch script), but ran into an issue where the repository is read-only (I think this is due to these independent pipetasks trying to write to the repository while another one is reading). One workaround would be to create several independent repositories/butlers on my machine and export the final calibration products to a final butler one-by-one then delete them… but when I get to assembling the flat-frames this will result in, err, several hundred unnecessary repositories. Is there any other workaround that would let me execute these pipetasks simultaneously?
Another solution for me is to speed-up the processing of each monthly calibration frame, currently I’m limited to around 20 cpu’s per-node. From what I can tell, -j only distributes across a single node/machine and with
pipe-drivers module being deprecated, I’m not sure if there is anything else I can do to distribute a single pipetask across multiple nodes. Is there any workaround for this?
Right now I am running one command-line task at a time processing each monthly calibration frame (with the appropriate DATE filled in):
pipetask run --register-dataset-types -j 20
-b repo --instrument lsst.obs.decam.DarkEnergyCamera
-d "instrument='DECam' AND exposure.observation_type='zero' AND exposure.day_obs = DATE"
We have solved all your problems.
pipetask is indeed solely for single node processing but the
ctrl_bps system is designed for batch. This runs the graph builder and then converts it to a workflow graph suitable for processing with parsl or htcondor or PanDA. All of those can use slurm as the backend depending on how you have configured it. BPS runs the workflow jobs in a way that is optimized for large scale processing and will deal with the problem you are having with multiple pipetask jobs hitting a single registry as they execute. Batch processing specifically avoids touching registry apart from the final job when the workflow completes and everything is synced up with the original registry. Processing a lot of jobs with a SQLite butler registry is always going to be painful.
There is a small description of this in the pipeline execution paper on arXiv.
Search for ctrl_bps in the developer guide.
For example, the instructions to use bps at USDF include some instructions that you may find useful.
If you still don’t want to use the BPS infrastructure you will need to understand how we run pipetask from within BPS. We have a special execution mode that treats the graph as a registry and so doesn’t need the registry at all if you create the graph before executing the graph. There is then a command to transfer the outputs back to the main registry using the graph and the datastore.
Thank you! I’ll try implementing it right away.
Alright, I have a local installation of the
ctrl_bps_parsl module, but I’m having trouble figuring out how to tell
ctrl_pbs where the WMS plugin is located (and I’m having some trouble finding it).
ctrl_bps_parsl I followed the tutorial here and installed it in the same folder as
loadLSST.bash (just to make setup a little easier since its not loaded with
lsst_distrib ). I’ve been trying to edit
bps_config.yaml and change
wmsServiceClass to the ParslService class in the new module, but everytime I run
bps ping I get an error saying that the module doesn’t exist, e.g. in the config
Produces an error
> ModuleNotFoundError: Unable to import 'ctrl_bps_parsl.ParslService' (No module named 'ctrl_bps_parsl')
I’m sure the issue is with the installation, so I checked eups list -s as well, and it is in the list
eups list ctrl_bps_parsl
> LOCAL:/.../lsst_stack_v24/ctrl_bps_parsl setup
I think my installation is incorrect, but I can’t find any solid guidelines for adding this module into my lsst_stack, any advice?
ctrl_bps_parsl is a standard part of our distribution. What version of the software are you running?
Edit: it looks like v24. That’s really far too old. Please install the most recent weekly.
Everything is running smoothly now, thanks for the help!