Output directory soon required for CmdLineTasks

There are three ways of specifying an output directory through the ArgumentParser used by our CmdLineTasks:

  • --output command-line argument: specify an output directory directly
  • --rerun command-line argument: specify an output directory as <input>/rerun/<rerun>
  • PIPE_OUTPUT_ROOT environment variable: specify an output directory directly

There is general agreement that it is not good if none of these is specified (because you get clutter in the top-level data repo, and data in shared repos may be clobbered unwittingly). On DM-4236, we will introduce a change that makes it mandatory to specify an output directory (using any of the above means). The only command that is excluded from this requirement currently is ingestImages.py (since that should ingest into the top-level data repo).

I am concerned that this will break some CI or validation packages that I am not aware of or am not set up to check. I have already run Jenkins against lsst_sims lsst_distrib ci_hsc, and it passes. Please let me know what else needs to be tested (and how), or test the package against tickets/DM-4236 (with changes in pipe_base, pipe_tasks and ci_hsc). I hope to merge DM-4236 on Monday (March 14) afternoon in advance of the sprint closure.

Note that it is a bug that you can specify output using only $PIPE_OUTPUT_ROOT. I would like to fix this, leaving --output and --rerun as the two options for specifying output.

One question: do we envision any tasks for which an output repository is optional? At present I only know know of tasks for which output is always written or never written, but none which write optionally. DM-4236 presently supports tasks that require output and tasks that have optional output, but the flag used could easily be modified to enable/disable output instead. The advantage of disabling output is that the --output and --rerun argument are not wanted and could be omitted (never added to the argument parser) giving a more accurate help string.

In any case, we do have a few tasks in pipe_tasks that never write output (they summarize information about data in a repo, but do not produce new data products). These will need a minor fix to be compatible with DM-4236.

lsst_ci will very soon be the thing to test. It’s (for now) just a metapackage to make sure obs_cfht, obs_subaru, obs_decam, obs_lsstSim, obs_sdss pass with their full tesdata_* -enabled tests.

lsst_distrib almost does most of this, but because of the ambiguity about setupOptional/setupRequired under different build systems, it’s not required to test these. It does not run ci_hsc – that will be left for a daily build of something likely to be called lsst_qa. The lsst_ci pacake may also see future evolution as we come up with additional short-turn-around tests. lsst_ci will be introduced next week – it just passed review yesterday (thanks @ktl).

tickets/DM-4236 passes lsst_ci on Jenkins.

https://ci.lsst.codes/job/stack-os-matrix/label=centos-6/9173/console

Thank you @mwv for your work on lsst_ci.

This work has just passed (along with DM-5453, a fix to ci_hsc) Jenkins, and a merge is imminent… tomorrow.

Merged. Let the mayhem ensue!

This makes --show data or --show config somewhat harder to use. Is it feasible exempt those from this requirement?

I believe ingestCalibs.py should also be treated the same way as ingestImages.py; is that right @price? I have a few other tiny fix-ups to that coming soon, so I can make the change.

I had deliberately put the check after the --show handling for this reason, but maybe @rowen moved it.

You’re right that ingestCalibs.py should not require outputs (I had been thinking it inherited, but apparently not).

I moved the check that one of --output or --rerun was supplied before creating a butler, because otherwise I saw problems from creating a butler with an invalid output. I guess finer tuning is wanted. The butler is wanted for some --show output, but not all of it.