I’d happy to announce the availability of two bits of long-promised pipeline functionality (since the weekend, actually, but some docs are only landing in today’s d_2025_04_16):
-
You can now pass
--data-id-table <filename>when building a quantum graph, where<filename>is any type supported byastropy.table(ECSV is recommended). Columns are data ID keys and rows are the values associated with those dimensions; dimensions that are fully specified in the--data-querydon’t need to be included (e.g. if you say--data-query "instrument='LSSTCam'", you don’t need aninstrumentcolumn in the table). This is the recommended path for building QGs that filter on quantities that are in theConsDBbut not the butler metadata. -
When building pipeline graphs or quantum graphs, you can now pass
--select-tasks "<expression>"to filter the tasks based on their dependency graph. This mini expression language can do general set operations on tasks and subsets via|,&, and~as well as ancestor and descendant traversal starting from task labels and dataset type names with<,>,<=, and>=(e.g.<Xmeans “all tasks that must run before X”).You can find more docs and some examples here. Note that task expressions act on the pipeline after it has been turned into a graph, which is after the
*.yaml#<labels>subsetting has been applied, and hence it won’t work well if those labels specify a bunch of scattered tasks rather than a full pipeline or step. I think it’d be good practice for us to start using--select-tasksto select individual tasks and use*.yaml#<labels>only for step-like subsets, and eventually we may deprecate the more limited*.yaml#a..bsyntax.
One of the places I hope --select-tasks will be useful is in recovering from task execution failures with new runs, and I’ve added a new entry to the middleware FAQ on that subject.