I’d happy to announce the availability of two bits of long-promised pipeline functionality (since the weekend, actually, but some docs are only landing in today’s d_2025_04_16
):
-
You can now pass
--data-id-table <filename>
when building a quantum graph, where<filename>
is any type supported byastropy.table
(ECSV is recommended). Columns are data ID keys and rows are the values associated with those dimensions; dimensions that are fully specified in the--data-query
don’t need to be included (e.g. if you say--data-query "instrument='LSSTCam'"
, you don’t need aninstrument
column in the table). This is the recommended path for building QGs that filter on quantities that are in theConsDB
but not the butler metadata. -
When building pipeline graphs or quantum graphs, you can now pass
--select-tasks "<expression>"
to filter the tasks based on their dependency graph. This mini expression language can do general set operations on tasks and subsets via|
,&
, and~
as well as ancestor and descendant traversal starting from task labels and dataset type names with<
,>
,<=
, and>=
(e.g.<X
means “all tasks that must run before X”).You can find more docs and some examples here. Note that task expressions act on the pipeline after it has been turned into a graph, which is after the
*.yaml#<labels>
subsetting has been applied, and hence it won’t work well if those labels specify a bunch of scattered tasks rather than a full pipeline or step. I think it’d be good practice for us to start using--select-tasks
to select individual tasks and use*.yaml#<labels>
only for step-like subsets, and eventually we may deprecate the more limited*.yaml#a..b
syntax.
One of the places I hope --select-tasks
will be useful is in recovering from task execution failures with new runs, and I’ve added a new entry to the middleware FAQ on that subject.