Third generation Butler basic photometry run

raphaelshirley · June 10, 2021, 1:02pm

Hi,

I am trying to recreate early generation 2 test runs with the generation 3 Butler. This is for the obs_vista package we are developing to process VISTA data with the LSST Science pipelines. These are loosely the following:

ingestImages.py
processCcd.py
makeSkyMap.py
coaddDriver.py
multiBandDriver.py

We are using a combination of pipe tasks and pipe drivers for historical reasons but as I understand it processCcd.py is closely related to singleFrameDriver.py.

I think I have ingestion working and have registered the generation 3 instrument ‘VIRCAM’. I am now running the following command to use the equivalent of the singleFrameDriver.py task:

pipetask run -d "exposure=622538" -b data/butler.yaml --input VIRCAM/raw/all --register-dataset-types -p "${PIPE_TASKS_DIR}/pipelines/_SingleFrame.yaml" --instrument lsst.obs.vista.VIRCAM --output-run demo_collection

This is throwing the error pasted in full below. Do I need to register outputs as was done in the mapper in generation 2? I thought that the generation 3 Butler was registering data sets as it made them so did not need any mapper type object.

Many thanks,

Raphael.

Error:

Error: An error occurred during command execution:
Traceback (most recent call last):
  File "/Users/raphaelshirley/Documents/github/lsst_stack/stack/miniconda3-py37_4.8.2-cb4e2dc/DarwinX86/daf_butler/21.0.0+187b78b4b8/python/lsst/daf/butler/cli/utils.py", line 453, in cli_handle_exception
return func(*args, **kwargs)
  File "/Users/raphaelshirley/Documents/github/lsst_stack/stack/miniconda3-py37_4.8.2-cb4e2dc/DarwinX86/ctrl_mpexec/21.0.0+2f1cc9de74/python/lsst/ctrl/mpexec/cli/script/qgraph.py", line 133, in qgraph
qgraph = f.makeGraph(pipelineObj, args)
  File "/Users/raphaelshirley/Documents/github/lsst_stack/stack/miniconda3-py37_4.8.2-cb4e2dc/DarwinX86/ctrl_mpexec/21.0.0+2f1cc9de74/python/lsst/ctrl/mpexec/cmdLineFwk.py", line 571, in makeGraph
qgraph = graphBuilder.makeGraph(pipeline, collections, run, args.data_query)
  File "/Users/raphaelshirley/Documents/github/lsst_stack/stack/miniconda3-py37_4.8.2-cb4e2dc/DarwinX86/pipe_base/21.0.0+544a109665/python/lsst/pipe/base/graphBuilder.py", line 935, in makeGraph
skipExisting=self.skipExisting)
  File "/Users/raphaelshirley/Documents/github/lsst_stack/stack/miniconda3-py37_4.8.2-cb4e2dc/DarwinX86/pipe_base/21.0.0+544a109665/python/lsst/pipe/base/graphBuilder.py", line 645, in resolveDatasetRefs
findFirst=True
  File "/Users/raphaelshirley/Documents/github/lsst_stack/stack/miniconda3-py37_4.8.2-cb4e2dc/DarwinX86/daf_butler/21.0.0+187b78b4b8/python/lsst/daf/butler/registry/queries/_results.py", line 309, in findDatasets
f"Error finding datasets of type {datasetType.name} in collections {collections}; "
RuntimeError: Error finding datasets of type postISRCCD in collections [VIRCAM/raw/all]; it is impossible for any such datasets to be found in any of those collections, most likely because the dataset type is not registered.  This error may become a successful query that returns no results in the future, because queries with no results are not usually considered an error.

jbosch · June 10, 2021, 2:11pm

Looks like you may just need to add --register-dataset-types to your command line; you’ll need to do that the first time you run a pipeline that defines a particular output dataset type in each data repository. And while you can include it all the time (it won’t do anything if the dataset type already exists), not including it when you don’t need it may improve error messages and reduce the chance of doing something you don’t want in the presence of typos.

ktl · June 10, 2021, 2:51pm

Um, --register-dataset-types was in the command line .

The use of an underscore-prefixed pipeline YAML is a bit of a red flag. In the newest pipe_tasks, no such thing exists, and you would use ${PIPE_TASKS_DIR}/pipelines/DRP.yaml#singleFrame instead. But I still don’t think that’s the cause of the problem.

You shouldn’t need to do --register-dataset-types on a task-by-task basis, and our CI pipelines definitely don’t do anything like that, nor can they have all dataset types predefined, so I’m not sure why postISRCCD is giving you trouble. Is there a possible mismatch between the pipe_tasks pipeline and the underlying ip_isr Task?

raphaelshirley · June 10, 2021, 3:12pm

If I run the following:

pipetask run -d "exposure=622538" -b data/butler.yaml --input VIRCAM/raw/all --register-dataset-types -p "${PIPE_TASKS_DIR}/pipelines/DRP.yaml#singleFrame" --instrument lsst.obs.vista.VIRCAM --output-run demo_collection

Then I get a file not found error:

FileNotFoundError: [Errno 2] No such file or directory: '/Users/raphaelshirley/Documents/github/lsst_stack/stack/miniconda3-py37_4.8.2-cb4e2dc/DarwinX86/pipe_tasks/21.0.0+44ca056b81/pipelines/DRP.yaml#singleFrame'

I can see the DRP.yaml file.

Should the generation 3 pipetask naturally access the equivalent generation 2 config files? e.g. config/singleFrameDriver.py.

ktl · June 10, 2021, 3:15pm

You look like you’re using an older version of the stack that doesn’t recognize #. I think it may have used : instead.

The Gen3 pipetask pipelines should indeed refer to appropriate config overrides (either internally or externally).

raphaelshirley · June 10, 2021, 3:20pm

Ok replacing ‘#’ with ‘:’ yields:

ValueError: Not all supplied labels (specified or named subsets) are in the pipeline definition, extra labels: {'singleFrame'}

Perhaps I should update the stack. I was originally planning on only updating the major releases (Currently on 21.0.0) but perhaps gen 3 is moving so fast that I should use the weekly builds.

raphaelshirley · June 10, 2021, 3:27pm

How can I check for a mismatch between pipe_tasks and ip_isr ? in the pipelines directory I only see _SingleFrame.yaml and a few others.

jbosch · June 10, 2021, 3:56pm

Sorry I steered you wrong before. I do think we really need to get you upgraded to a newer (weekly) version of the stack, given what’s in your pipe_tasks/pipelines directory; I don’t think much besides the special case of ci_hsc was really working back when those _-prefixed pipelines existed.

The pipelines will automatically pick up configs from your obs_*/config directory if two conditions are met:

Your YAML pipeline file has an “instrument” entry or you pass --instrumenton the command-line. It looks like you’re doing the latter, and while that should work, we recommend the former - create a new pipeline file, and use import to pull in pipe_tasks/pipelines/DRP.yaml as the pipelines in obs_lsst and obs_subaru do.
The config file to load has the same base name as the _DefaultName of the task. Because Gen3 runs isr, characterizeImage, and calibrate as separate tasks (not singleFrameDriver.py) it will only look for isr.py, characterizeImage.py, and calibrate.py files (etc.) and it’s not clear to me whether you have those.

My first recommendation for debugging dataset type mismatches and other problems during quantum graph generation is to use pipetask build with the same -p and --instrument arguments, and use --pipeline-dot to write a GraphViz dot file that describes the expanded pipeline. See Frequently asked questions — LSST Science Pipelines for more information on that.

raphaelshirley · June 10, 2021, 4:12pm

Thanks for all the useful comments. I will update the stack to a recent version and see if I can find a solution. In particular, I will try to write my own pipeline yaml file with the instrument specified.

I have the isr.py and other config files required and load them into the singleFrameDriver.py config so hopefully they will work if I get past this error.

Cheers,

Raphael.

raphaelshirley · June 26, 2021, 10:45am

Hi,

This specific error was eventually solved by removing ‘config.doWrite = False’ that I had written in the config/isr.py file. After removing that the PostISRCCD file gets written to the butler following the ISR task.