Missing Dataset on DRP re-run on a subset of data

Snyder005 · March 26, 2026, 11:07pm

I am trying to re-run a task from DRP-compat.yaml with a different configuration, summarized in the YAML portion:

instrument: lsst.obs.lsst.LsstCam
imports:
  - location: $DRP_PIPE_DIR/pipelines/LSSTCam/DRP-compat.yaml
tasks:
  detectAndMeasureDiaSources:
    class: lsst.ip.diffim.detectAndMeasure.DetectAndMeasureTask
    config:
      writeStreakInfo: True

However I am encountering the following error:

lsst.pipe.base.quantum_graph_builder.QuantumGraphBuilderError: No datasets for overall-input 'deepDiff_differenceTempExp' found (the dataset type is not even registered).  This is probably a bug in either the pipeline definition or the dataset constraints passed to the quantum graph builder.

When using the input collection LSSTCam/runs/DRP/20250515-20251214/v30_0_0_rc2/DM-53697. I selected this because I thought it would contain all the necessary data products to re-run that task. Is there additional collections to include as input, or a different issue?

Snyder005 · April 23, 2026, 9:13pm

I was able to trace the problem to being caused by the DetectAndMeasureTask being named detecteAndMeasureDiaSource (singular vs plural). What happened was a second instance of DetectAndMeasureTask was being created with the name detectAndMeasureDiaSources with the default configuration (input/output dataset names), except for writeStreakInfo: True. This second task was looking for an input dataset type named ‘deepDiff_differenceTempExp’, triggering the error.

Fixing the error amounted to making sure the name of the task in my YAML matched the name in the YAML where it was originally defined, which was subsequently imported through a long chain of YAML “imports”.

I’m not sure there is a bug needing fixing, but what might be useful is a way to more easily parse the tasks in a YAML pipeline that is from a long chain of YAML files, since it took me a while to find where detectAndMeasureDiaSource was actually first named. The Quantum Graph would also show the typo I believe, but takes a non-negligible amount of time to generate.

lskelvin · May 5, 2026, 12:44pm

what might be useful is a way to more easily parse the tasks in a YAML pipeline that is from a long chain of YAML files

You might be interested in the documentation for pipetask build, which is designed to help you with this. For example:

pipetask build \
-p $DRP_PIPE_DIR/pipelines/LSSTCam/DRP-compat.yaml \
--show task-graph

will show all the tasks in the pipeline and how they connect. To see the task labels and their associated classes, replace --show task-graph with --show tasks.

Tasks also fall into subsets, which can be queried with --show subsets. If you want to see the tasks and their dataset connections, you can pass --show pipeline-graph.

Importantly, if you want to unravel all of the pipeline YAML imports and produce a flat YAML, you can simply pass --show pipeline.

There are more useful options that can be passed to --show. See the official documentation for more information.