I would like to completely delete all contents within the postISRCCD and icExp subdirectories of an output run (without resorting to rm -rf, unless that’s recommended…). I am using v23_0_1 of the LSST pipelines.
The following two commands appear to delete all files within the icExp and postISRCCD directories:
butler prune-datasets $REPO --purge DECam/runs/prune_test/20220926T234050Z --datasets icExp DECam/runs/prune_test/20220926T234050Z
butler prune-datasets $REPO --purge DECam/runs/prune_test/20220926T234050Z --datasets postISRCCD DECam/runs/prune_test/20220926T234050Z
But many empty directories are left behind:
$find repo/DECam/runs/prune_test/20220926T234050Z/postISRCCD -type f |wc -l
0
$find repo/DECam/runs/prune_test/20220926T234050Z/postISRCCD -type d |wc -l
34
$find repo/DECam/runs/prune_test/20220926T234050Z/icExp -type f |wc -l
0
$find repo/DECam/runs/prune_test/20220926T234050Z/icExp -type d |wc -l
50
In this run there are only 25 CCD’s worth of outputs, so the numbers of empty directories left behind aren’t huge. But I want to soon process millions of CCD’s, in which case it seems that I’d be left with O(10 million) inodes consumed by empty directories within icExp and postISRCCD. What is the recommended way to get rid of these directories, in addition to the files that they once contained?
Also, a related but different question which maybe should be its own separate forum topic: is there a way to embed butler prune-datasets directly into my YAML-defined pipeline? I think that’d be preferable to running my pipeline and then running separate butler prune-datasets commands after the fact.
I did a brief/superficial search for any instances of “prune” within all of the YAML files in our v23_0_1 installation but came up empty:
$ find lsst_stack_v23_0_1/stack -name “*.yaml” |wc -l
4523
$ find lsst_stack_v23_0_1/stack -name “*.yaml” -exec grep -i prune {} \; |wc -l
0
Thanks very much.