Title: Dark Energy Science Collaboration (DESC) Static Probes Letter of Recommendation
Contributors: Sam Schmidt, Alex Malz
Co-signers: Daniel Gruen, Nacho Sevilla, John Franklin Crenshaw, Johann Cohen-Tanugi, Eric Gawiser, Sylvie Dagoret-Campagne, Shahab Joudaki, Biprateep Dey, Jeff Newman for the LSST-DESC
0: Summary Statement
This LoR summarizes DESC’s expected needs of Rubin Observatory-provided photo-z data products. Below we summarize the anticipated science cases that will employ DM Photo-z Table outputs, and their requirements on those outputs. This LoR covers only static science cases; a separate LoR describing DESC needs for transient probes will be posted separately.
1: Scientific Utility
Though the DESC Science Requirements Document (SRD) is a living document, continuously updated with the understanding and modeling of the Rubin system’s performance, its most stringent requirements are set by the cosmic shear analysis in tomographic redshift bins and large scale structure clustering measurements, which require significant pre-processing beyond the single-object redshifts anticipated of DM. This has motivated DESC’s development of dedicated pipelines specific to these aims. DESC will of course strive to share any gains realized in the process of developing DESC pipelines with DM to better support production of the best possible photo-z estimates.
However, DESC’s preliminary analyses will make use of the DM PZ table outputs for initial selections and verification testing of data processing pipeline stages. For example, point estimate redshifts may be used to select tomographic redshift samples for LSS analyses and initial guesses for covariance and simulation development. Also, prompt examination of survey property maps and their correlations with photo-z may reveal anomalies and could be a valuable pathway for feedback to DM if irregularities are observed, so we recommend that the PZ Data Table be released concurrently with each Data Release.
2: Outputs
The main desired output is a redshift PDF for each galaxy, along with point-estimate redshifts, for early science checks. One of the main requirements on outputs is the inclusion of specific flags for sample selection. For example, photo-z quality cuts may use star/galaxy separation and/or extendedness flags to identify and remove likely foreground stellar sources, along with a blendedness flag to identify sources whose flux measurement is likely to be contaminated by nearby objects. As the “photo” in photo-z implies, redshift estimates are extremely sensitive to anything that affects the underlying photometry, so any “bad photometry” flags, e.g. contamination by nearby bright stars, cosmic rays, satellite streaks, etc…, or any other flags indicating a problem with flux measurements in one or more bands, will assist in defining clean galaxy samples. A confidence score or other reliability measure could distinguish objects with large but appropriate redshift errors from those with tight but unreliable reported uncertainties.
3: Performance
One of the main performance-related areas that DESC cares about is having redshift estimates that approximately meet the statistical definition of a PDF given perfect training data. That is, the probability distribution provided should at least somewhat reflect the actual relative probability of finding a galaxy with the input fluxes as a function of redshift. While not strictly necessary for DESC’s planned usage of the DM table outputs, principled statistical PDFs would enable early comparisons to redshift estimates computed by DESC with complementary algorithms, and will generally maximize utility for a wide array of science cases beyond just dark energy. We note that not all estimators currently in broad usage necessarily produce reliable PDFs even in the presence of representative data (Schmidt, Malz, Soo et al. 2020). As such, estimators that produce statistical PDFs would be highly favored by DESC when choosing which to include in the DM table.
4: Technical Aspects
Reproducibility is a concern for DESC, and thus we would favor methods included in the DM Table for which all code and priors (both training sets and template libraries) used to generate the photo-z data products are available to the SCs if not the public, such that the code can be re-run by the users for testing and verification. Even seemingly innocuous parameters such as random seeds and tunable parameters set in specific modules can be important in reproducing catalogs, and should be considered. This would include training data, which may be an issue if proprietary sets of secure redshifts are included in training samples.
It has been noted that Project is nominally responsible for correcting object fluxes to how they appear at the top of the atmosphere (Rubin Observatory SRD) and is thus not responsible for removing the effects of foreground Milky Way dust on extragalactic objects. The effect of not accounting for dust on measured colors before estimating redshifts would induce spatially dependent redshift biases. For example, extragalactic analyses planning to use portions of the sky with E(B-V) ~ 0.2 could see color shifts of >~0.2 mags in color in the regions of highest extinction relative to areas of low extinction, and with a u-band value for A/E(B-v)~4.8, the u-band could be nearly a magnitude fainter than in low extinction areas. Without at least a rough correction for Milky Way dust, both the dimming and the reddening will introduce degeneracies between true redshift and observed photometry, degrading photo-z performance. So, we think it is uncontroversial to recommend that Rubin DM perform a basic foreground removal (e.g. using SFD maps) on the fluxes before photo-z estimators are run.