I am trying to recreate early generation 2 test runs with the generation 3 Butler. This is for the obs_vista package we are developing to process VISTA data with the LSST Science pipelines. These are loosely the following:
We are using a combination of pipe tasks and pipe drivers for historical reasons but as I understand it processCcd.py is closely related to singleFrameDriver.py.
I think I have ingestion working and have registered the generation 3 instrument âVIRCAMâ. I am now running the following command to use the equivalent of the singleFrameDriver.py task:
This is throwing the error pasted in full below. Do I need to register outputs as was done in the mapper in generation 2? I thought that the generation 3 Butler was registering data sets as it made them so did not need any mapper type object.
Many thanks,
Raphael.
Error:
Error: An error occurred during command execution:
Traceback (most recent call last):
File "/Users/raphaelshirley/Documents/github/lsst_stack/stack/miniconda3-py37_4.8.2-cb4e2dc/DarwinX86/daf_butler/21.0.0+187b78b4b8/python/lsst/daf/butler/cli/utils.py", line 453, in cli_handle_exception
return func(*args, **kwargs)
File "/Users/raphaelshirley/Documents/github/lsst_stack/stack/miniconda3-py37_4.8.2-cb4e2dc/DarwinX86/ctrl_mpexec/21.0.0+2f1cc9de74/python/lsst/ctrl/mpexec/cli/script/qgraph.py", line 133, in qgraph
qgraph = f.makeGraph(pipelineObj, args)
File "/Users/raphaelshirley/Documents/github/lsst_stack/stack/miniconda3-py37_4.8.2-cb4e2dc/DarwinX86/ctrl_mpexec/21.0.0+2f1cc9de74/python/lsst/ctrl/mpexec/cmdLineFwk.py", line 571, in makeGraph
qgraph = graphBuilder.makeGraph(pipeline, collections, run, args.data_query)
File "/Users/raphaelshirley/Documents/github/lsst_stack/stack/miniconda3-py37_4.8.2-cb4e2dc/DarwinX86/pipe_base/21.0.0+544a109665/python/lsst/pipe/base/graphBuilder.py", line 935, in makeGraph
skipExisting=self.skipExisting)
File "/Users/raphaelshirley/Documents/github/lsst_stack/stack/miniconda3-py37_4.8.2-cb4e2dc/DarwinX86/pipe_base/21.0.0+544a109665/python/lsst/pipe/base/graphBuilder.py", line 645, in resolveDatasetRefs
findFirst=True
File "/Users/raphaelshirley/Documents/github/lsst_stack/stack/miniconda3-py37_4.8.2-cb4e2dc/DarwinX86/daf_butler/21.0.0+187b78b4b8/python/lsst/daf/butler/registry/queries/_results.py", line 309, in findDatasets
f"Error finding datasets of type {datasetType.name} in collections {collections}; "
RuntimeError: Error finding datasets of type postISRCCD in collections [VIRCAM/raw/all]; it is impossible for any such datasets to be found in any of those collections, most likely because the dataset type is not registered. This error may become a successful query that returns no results in the future, because queries with no results are not usually considered an error.
Looks like you may just need to add --register-dataset-types to your command line; youâll need to do that the first time you run a pipeline that defines a particular output dataset type in each data repository. And while you can include it all the time (it wonât do anything if the dataset type already exists), not including it when you donât need it may improve error messages and reduce the chance of doing something you donât want in the presence of typos.
Um, --register-dataset-typeswas in the command line .
The use of an underscore-prefixed pipeline YAML is a bit of a red flag. In the newest pipe_tasks, no such thing exists, and you would use ${PIPE_TASKS_DIR}/pipelines/DRP.yaml#singleFrame instead. But I still donât think thatâs the cause of the problem.
You shouldnât need to do --register-dataset-types on a task-by-task basis, and our CI pipelines definitely donât do anything like that, nor can they have all dataset types predefined, so Iâm not sure why postISRCCD is giving you trouble. Is there a possible mismatch between the pipe_tasks pipeline and the underlying ip_isr Task?
FileNotFoundError: [Errno 2] No such file or directory: '/Users/raphaelshirley/Documents/github/lsst_stack/stack/miniconda3-py37_4.8.2-cb4e2dc/DarwinX86/pipe_tasks/21.0.0+44ca056b81/pipelines/DRP.yaml#singleFrame'
I can see the DRP.yaml file.
Should the generation 3 pipetask naturally access the equivalent generation 2 config files? e.g. config/singleFrameDriver.py.
ValueError: Not all supplied labels (specified or named subsets) are in the pipeline definition, extra labels: {'singleFrame'}
Perhaps I should update the stack. I was originally planning on only updating the major releases (Currently on 21.0.0) but perhaps gen 3 is moving so fast that I should use the weekly builds.
Sorry I steered you wrong before. I do think we really need to get you upgraded to a newer (weekly) version of the stack, given whatâs in your pipe_tasks/pipelines directory; I donât think much besides the special case of ci_hsc was really working back when those _-prefixed pipelines existed.
The pipelines will automatically pick up configs from your obs_*/config directory if two conditions are met:
Your YAML pipeline file has an âinstrumentâ entry or you pass --instrumenton the command-line. It looks like youâre doing the latter, and while that should work, we recommend the former - create a new pipeline file, and use import to pull in pipe_tasks/pipelines/DRP.yaml as the pipelines in obs_lsst and obs_subaru do.
The config file to load has the same base name as the _DefaultName of the task. Because Gen3 runs isr, characterizeImage, and calibrate as separate tasks (not singleFrameDriver.py) it will only look for isr.py, characterizeImage.py, and calibrate.py files (etc.) and itâs not clear to me whether you have those.
My first recommendation for debugging dataset type mismatches and other problems during quantum graph generation is to use pipetask build with the same -p and --instrument arguments, and use --pipeline-dot to write a GraphViz dot file that describes the expanded pipeline. See Frequently asked questions â LSST Science Pipelines for more information on that.
Thanks for all the useful comments. I will update the stack to a recent version and see if I can find a solution. In particular, I will try to write my own pipeline yaml file with the instrument specified.
I have the isr.py and other config files required and load them into the singleFrameDriver.py config so hopefully they will work if I get past this error.
This specific error was eventually solved by removing âconfig.doWrite = Falseâ that I had written in the config/isr.py file. After removing that the PostISRCCD file gets written to the butler following the ISR task.