Title: LoR For The BPZ (Bayesian Photometric Redshifts) Estimator
Contributors: Sam Schmidt
Co-signers: John Franklin Crenshaw
0 Summary Statement
BPZ (Bayesian Photometric Redshifts, Benitez 2000, Coe et al. 2006 ) is a public template-based photo-z code that includes a powerful Bayesian apparent magnitude/type prior formulation. It was the best performing of the three template-based codes to participate in the data challenge described in Schmidt et al. 2020. v1.99.3 of the code is available at https://www.stsci.edu/~dcoe/BPZ/. (Note: The original code is written in Python 2, and is no longer actively supported by the authors. Several DESC members have ported the code to python 3, a copy of which is available at https://github.com/LSSTDESC/DESC_BPZ). BPZ is also one of the only template-based codes that properly marginalizes over the templates/types when producing 1-dimensional redshift posteriors. With optimized templates and training of the Bayesian prior, BPZ could be very competitive in producing robust redshift posteriors for Rubin data.
1 Scientific Utility
BPZ is a generic photo-z estimator not tuned to a particular science case. In the past the default template set has not included stars or AGN, so it is not ideal for star/galaxy flagging nor inclusion of AGN components (though an AGN template could be added to the SED template set). BPZ does not have functionality for inclusion of internal host galaxy reddening, so if a variety of dust models are to be incorporated then this must be done at the level of the templates input into the code. However, as mentioned above, BPZ does an excellent job at producing marginalized 1D redshift posteriors that should be of broad scientific utility. Providing that the templates extend blueward and redward enough, BPZ can handle additional filter bandwidths (e.g. in the NIR) to improve redshift estimates.
2 Outputs
For each input object BPZ produces a point estimate and associated Nsigma (with N set by the code runner) error uncertainties (with and without the Bayesian prior applied), as well as a marginalized 1D redshift posterior evaluated on an input grid of redshifts. BPZ also returns the “best fit type”: the SED template that contains the highest fraction of the overall posterior probability as evaluated at the point estimate redshift. This “best type” may be useful in estimating physical parameters in a post-facto analysis (BPZ does not estimate physical parameters natively). The code also returns the chi^2 value of the best SED evaluated at the best redshift, which can be useful for flagging potential outlier objects for which no template is a good fit.
3 Performance
BPZ was used in the LSST Science Book analyses to show that (provided with representative templates and training data) it is able to meet the goals listed in the SRD. The code was also very close to these same SRD goals in the analysis tabulated in Appendix B3 of Schmidt et al. 2020. BPZ has been run by KiDS (e.g. Wright et al. 2019), as part of DES Y3 analyses, and by DESC on the cosmoDC2 simulated and image-extracted catalogs. Each of these samples consists of hundreds of millions of galaxies, indicating that production-scale runs are feasible. The lack of active support of the code base may be a drawback; however, the performance of the code in quantitative metric comparisons with other codes shows that it warrants consideration in the PhotoZ Validation Cooperative exploration.
4 Technical Aspects
Scalability: Will likely meet: BPZ is trivially parallelizable, and has been run on hundreds of millions of galaxies by DESC as one of the placeholder photo-z codes during early testing. Performance depends on the template set and redshift grid specified by the code runner, but it is highly likely that the code could be run at scale for billions of objects for a reasonable redshift grid and template set.
Inputs and Outputs: will meet. BPZ requires only catalog-level ugrizy fluxes/magnitudes and uncertainties, and outputs are consistent with expected DPDD-type products.
Storage Constraints: Will meet. As mentioned, the user specifies the redshift grid on which the PDF is evaluated, so a 1D PDF can be assigned the number of points available for the Object Table. A set of SED templates and a parameterized prior file take up a trivial amount of storage space.
External Data Sets: Will likely meet. BPZ can be run out of the box with a default set of templates and a default prior. However, in order to maximize science utility, a more optimal template set that has been updated to better reflect the galaxies that we expect to see in 10-year Rubin WFD data should be used, and some additional effort to optimize the Bayesian apparent magnitude/type prior for this dataset would be necessary. To do this, we require relatively representative training data with secure redshifts. This is true of almost any photo-z algorithm, so we do not think that BPZ is in a better or worse position than other algorithms in this regard. Furthermore, Crenshaw and Connolly (2020) have shown that a representative template set for BPZ can be deconvolved directly from LSST photometry, given a modest number of galaxies with secure spectroscopic redshifts.
Estimator Training and Iterative Development: Will likely meet. As mentioned above, the Bayesian apparent magnitude/type prior must be tuned to the specific template set used by the code using training data. However, in practice, this is not overly difficult (modulo accounting for incompleteness in the training data, which again is a problem for all photo-z methods), it requires a straightforward fitting of the parameterized form described in Benitez 2000.
Computational Processing Constraints: will meet. BPZ is trivially parallelizable and does not require an excessive amount of data to be held in memory at once.
Implementation Language: It depends. As mentioned in the footnote in Section 0, the original code is written in python 2. However, a python 3 port of the code is available, if that port is acceptable (as-is, again, the code is no longer actively supported), then it meets DM requirements.