Synthetic Source Injection Workshop 2021-06-03

lskelvin · June 2, 2021, 4:55pm

On Thursday June 3 at 13:00 - 16:00 Eastern (10:00 Pacific, 18:00 UK, 22:30 India, 01:00+1 Perth, 02:00+1 Japan, 03:00+1 Sydney) the data management team will host the Synthetic Source Injection Workshop. Injecting synthetic sources into real data provides invaluable information on sources where the ‘truth’ is known. Analysis of these data helps us explore not only the accuracy of output source catalogs, but also the fidelity of associated data processing steps such as sky subtraction.

DM is currently considering making a number of changes to the way in which we process future SSI datasets, and we’re keen to engage with the broader community to ensure that these output data products are maximally useful to community members going forward. A number of speakers from different groups will present overviews of their own SSI projects, and the session will conclude with a broader discussion on future efforts.

The primary goals of this mini-workshop will be:

To connect the various groups currently engaged in SSI using the science pipelines or working with HSC/DECam/DC2 imaging
To learn more about processing details ongoing elsewhere
To present Rubin Obs / DM current and future SSI plans
To facilitate discussion on said plans, ensuring synthetic source datasets are maximally useful to community members and the science collaborations going forward

Potential topics for discussion include:

What kinds of regularly processed synthetic datasets would be most useful for DM to produce and supply to the community?
Would any irregularly processed synthetic datasets also be useful?
What metrics would be most useful for us to track on a regular basis?

We hope those with an interest in this field will be able to join us to participate in this discussion. For those who are not able to attend, we plan on making a recording of the session available afterwards.

The Zoom connection details are: Launch Meeting - Zoom.

Further discussion and Q&A will take place within the Zoom chat during the session, on the #dm-ssi-workshop channel on the LSSTC Slack, and here on Community.

Best Wishes,

Lee Kelvin, Sophie Reed & Josh Meyers

lskelvin · June 4, 2021, 3:03pm

Please find below all the slides presented by speakers during this session:

Lee Kelvin, Sophie Reed, Josh Meyers - Introduction (1.0 MB)
Spencer Everett - Balrog in DES Y3 (1.5 MB)
Chris Morrison - AP data quality testing needs for synthetic sources (1.0 MB)
Matt Becker - Fake-source Injection for the DESC (407.4 KB)
Boris Leistedt - SI plans in LSST DESC LSS (183.2 KB)
Aaron Watkins - Assessing the impact of the pipeline sky subtraction on LSB science using model galaxies (424.7 KB)
Song Huang - Lessons learned from Synpipe tests on HSC SSP data (1.4 MB)
Lee Kelvin - Future synthetic source processing plans (2.0 MB)

A full recording of the session can be found at this Zoom link.

lskelvin · June 4, 2021, 3:57pm

A robust discussion took place on a number of topics. I encourage those interested in this topic to take a look at the slides and a video recording of the session above for further information.

In summary, a number of issues were discussed, notably:

Injected Populations

Of all the topics discussed, this seemed to be the one that returned the most feedback. It was felt that, in addition to injecting simple parametric models such as Sérsic profiles, it would also be beneficial for DM to inject more realistic extended sources: either real galaxies adapted from (e.g.) HST imaging, or imaging taken from complex hydrodynamical simulations. The benefits of real imaging are that they ensure (e.g.) optical effects are not included twice, whereas the benefits of hydrodynamical simulations are that they can go deeper. Whilst constructing such images is beyond the scope of DM effort, the connections made with groups such as the LSST:UK LSB working group may potentially facilitate this effort.

In addition, a good level of discussion took place on additional ‘unique’ sources that might be injected. It was felt that injecting stellar ghosts from stars both inside and outside the field might be beneficial, as might the injection of satellite trails with which to better test upcoming satellite trail masking functionality. Also mentioned was the potential to inject dust, which should allow us to test extreme cases, but we’d need to be careful handling the inherent degeneracies here.

Source Placement

A good discussion took place on the optimal way to inject synthetic sources. Currently, DM injects sources at random positions within the field of interest. Spencer Everett (Balrog) noted that they also initially used random positions, which made the clustering algorithm easier, but they started getting Balrog-Balrog blends which artificially increase the blending rate. Following collaboration with Gary Bernstein, the Balrog group ultimately decided on injecting sources on a hex-lattice. This makes sense, as it minimizes the injected source collision factor (an uninteresting parameter). Regularly spaced hex-grid sources above an avoidance distance of interest also minimizes the shot noise in the counts of the placed objects too, which is beneficial here. The DESC working groups also follow a similar prescription, injecting sources in a tilted hex-grid format. The advantage of tilting the hex-lattice is that it also reduces any row/column effects that may creep in at the CCD single frame processing stage. Following this discussion, we strongly recommend that future DM SSI effort should likely aim to replicate this source placement behaviour.

Regular vs Irregular Processed Datasets

It was felt that DM can commit to regularly processing a limited number of SSI datasets on a bi-monthly basis, but irregular dataset processing may need to be on a more limited basis. Current RCfakes outputs should be publicly available via the Rubin Science Platform, and we envisage this to continue going forward. The metrics and analysis plots currently generated for internal analysis are not currently publicly available, however, the Characterization Metric Report, currently produced following each major release of the Science Pipelines, does contain a number of metrics of interest. It was highlighted by the author of DMTR-281 Jeff Carlin that extra metrics can be added to this report. We will further consult with the community on what metrics are of interest following recent changes to the SSI framework, and ensure these are highlighted in future CMRs.

It was additionally felt that a number of the proposed irregularly processed datasets may be able to exist as subsets of the proposed regularly processed datasets (for example, the LSB galaxies dataset should be able to exist as a subset of the Synthetic Galaxies data). If possible, this would be a win-win scenario for all, minimizing computational effort on the DM side, and maximizing the return for those end-users who would make use of these data products.

Additional Points

Regarding proposed changes to the DM SSI framework, it was felt that any changes should attempt to be as user-friendly as possible, ensuring maximum user uptake. Alongside this, sufficient documentation should exist to explain all the necessary features. It was noted that the current documentation on the Community Forum by Sophie Reed remains valid, and works with the currently available SSI code base.
A robust discussion took place on the issue of how best to handle deblending, both synthetic-synthetic and science-synthetic. A closely related issue is star-galaxy separation. Fred Moolekamp suggested that proxy information such as source shape correlations might help in this regard. Also highlighted was the difficulty in producing output synthetic source catalogues (i.e., how best to compare the outputs without SSI to the outptus with SSI).
The Alert Production team have produced the createApFakes.py script, which generates a suitable synthetic input catalogue spanning ranges in various dimensions, such as magnitude. Future DM SSI effort should make full use of this script, perhaps converting it to a generic createSynthetics.py (or similar).
SSI datasets are ideal testing groups for future sky subtraction methodology modifications. It was noted that detected footprint dilation has traditionally been an excellent means by which to minimize sky map estimation contamination, however, such an approach is not feasible as we approach future Rubin operations. One potential solution might be the proposed ‘modeled masking’ procedure, whereby bright sources are modeled and subtracted prior to sky estimation (see PUB-110 for further details).