LOR Example 2: The CMNN PZ Estimator

Title: LOR Example 2: The CMNN PZ Estimator
Contributors: Melissa Graham
Co-signers: N/A

This is not a real LOR. This is an example of the minimum contents that the Rubin Data Management team is looking for in these letters. Most LOR that recommend estimators will probably turn out to be longer than this minimum.

0. Summary Statement
This example LOR uses the Color-Matched Nearest Neighbors PZ Estimator (Graham+2018,2020) as an example. The CMNN Estimator would use a training set with known redshifts to identify galaxies that are “well-matched” in color space to a given LSST Object in order to estimate the photometric redshift.

1. Scientific Utility
The CMNN Estimator is a generic PZ estimator which is not specific to any science cases, and would serve as a general PZ estimator for all LSST Objects. It would not be appropriate for advanced cosmological analyses, but could serve other science use-cases with less strict performance requirements. The CMNN Estimator can incorporate priors and non-optical photometry (e.g., NIR magnitudes), but has never been applied to real data.

2. Outputs
The CMNN Estimator is configured to return point estimates with uncertainties and full posterior probability density functions with a user-specified binning. The CMNN Estimator could be adapted to provide a probabilistic galaxy “type” if the training set had galaxy types, and could be adapted to provide user-friendly quality flags.

3. Performance
Studies have shown that with an appropriate, well-matched, and complete training set, the CMNN estimator could deliver a PZ quality that meets the specifications for LSST PZ as described in the Science Requirements Document (ls.st/srd; Graham+2018,2020). However, this has yet to be demonstrated with real data, and so we recommend the CMNN Estimator be included in the shortlist for testing with commissioning data during the Photo-z Validation Cooperative.

4. Technical Aspects

Scalability - Will probably meet. The CMNN Estimator’s run time depends on the size of the training set. Pre-run subsetting of the training set by magnitude and color should be effective at minimizing this run time, but more testing of the minimum training subset size and quality trade-offs will be needed during commissioning.

Inputs and Outputs - Will meet. The CMNN Estimator only requires galaxy apparent magnitudes and their uncertainties as inputs, which will already exist in the LSST Object catalog. The output point estimates, errors, and binned PDFs are all consistent with the PZ-related Object elements defined in the DPDD.

Storage Constraints - Will meet. The CMNN Estimator requires a training set of galaxies with apparent magnitudes and “true” redshifts, but this would be no more storage than any other PZ estimator.

External Data Sets - Will probably meet. The CMNN Estimator has only been run on simulated data with training sets that are perfectly matched to the test set in terms of magnitude and redshift distributions. In order to use the CMNN Estimator for LSST Objects, a more careful assessment of the impact of incomplete/impure training sets needs to be done. It is not yet clear that such an external data set exists to provide PZ results of adequate quality.

Estimator Training and Iterative Development - Will meet. As mentioned above, more work is needed to validate the use of a spectroscopic training set, but once the correct training set is identified there would be no further iterative development.

Computational Processing Constraints - Will meet. The CMNN Estimator does not require that a large amount of data is held in memory at any given time.

Implementation Language - Will meet. The CMNN Estimator is written in Python 3.