Changes to Synthetic Source Injection Dataset Naming Conventions

lskelvin · May 12, 2026, 3:59pm

On Jira ticket DM-53324, a change has been implemented in the LSST Science Pipelines that modifies the recommended dataset naming conventions we use as part of our synthetic source injection pipelines.

Until now, all datasets downstream of the point of synthetic source injection have been prefixed with the “injected_” prefix. For example, an injection into a post_isr_image dataset results in an injected_post_isr_image, and datasets constructed using these injected data would be similarly prefixed, e.g., injected_preliminary_visit_image.

This naming scheme has not been mandatory, but it has been a recommended convention. To facilitate the construction of an injection pipeline which adheres to this convention, the utility command make_injection_pipeline was created. It allows users to construct a new pipeline based on a reference pipeline, insert the appropriate source injection task, and reconfigure all downstream task connections accordingly.

However, with the growing use of dynamic connections across the Science Pipelines, it has become increasingly difficult to construct fully qualified source injection pipelines. Dynamic connections are task connections which are modified at runtime based on a given configuration. It’s not possible to modify these dynamic connections programmatically in a manner consistent with our existing convention, preventing the generation of fully qualified end-to-end source injection pipelines.

Following a discussion between Data Management team members on RFC-1171, it was agreed that our injected dataset naming convention recommendations should change to resolve this issue. The changes are summarized as follows:

Tasks which consume the dataset being injected into should have their input connections modified to accept the injected dataset in place of the original non-injected dataset;
Those immediate consuming tasks will not have their output connection names modified;
No further downstream tasks will have their connection names modified.

In practice, following our example above, an injection into a post_isr_image will produce an injected_post_isr_image; these injected data will be consumed by our calibration task, and that calibration task will output a preliminary_visit_image (i.e., without an injection prefix).

This change was made to re-enable fully qualified end-to-end source injection pipelines, but we want to highlight the importance of accurate book-keeping for downstream data products going forward. Following this change, users may not know by virtue of the dataset name alone if the dataset was constructed using synthetic data. We strongly recommend therefore that users practice good collection naming conventions and utilize other book-keeping mechanisms as required to minimize the risk of mistaking an injected dataset for a non-injected dataset, and vice versa.

To assist with this transition, the make_injection_pipeline utility function has been modified to adhere to the updated naming scheme described above.

We understand that this may modify some workflows, but we hope the end result will provide a more robust and maintainable approach to large-scale synthetic source injection processing across the Science Pipelines. This updated convention is implemented beginning with LSST Science Pipelines weekly release w_2026_20 and daily release d_2026_05_13.