LOR: The DEmP PZ Estimator

Title: LoR for the DEmP PZ Estimator
Contributors: Bau-Ching Hsieh (ASIAA)
Co-signers: Yen-Ting Lin (ASIAA)

0. Summary Statement
The “Direct Empirical Photometric method” code (DEmP; Hsieh & Yee 2014) is a machine learning code to derive physical properties of galaxies (e.g., photometric redshift, stellar mass, and SFR, etc.) from multi-band photometry (and other information such as galaxy size or shape). It is one of the official photometric redshift codes adopted by the Subaru Hyper Suprime-Cam (HSC) survey (Tanaka et al. 2018; Nishizawa et al. 2020).

1. Scientific Utility
Although the DEmP code is designed for general science cases, it can also be applied to specific types of objects (e.g., AGNs, OII emitters, etc.) if a training set is properly prepared. The redshifts, stellar masses, and star formation rates estimated by DEmP for the HSC survey have been widely used to study properties of general galaxies, environments, AGNs, galaxy groups and clusters, gravitational lens, weak lensing, and cosmology (e.g., Shirasaki et al. 2020; Chiu et al. 2021; Li et al. 2021; Tadaki et al. 2020; Jaelani et al. 2020; Jian et al. 2020; Sakakibara et al. 2019; Pintos-Castro et al. 2019).

2. Outputs
The core algorithm of the DEmP code outputs probability distribution function (PDF) of a given physical property (e.g., redshift). Point estimates can be either derived from the PDFs with a simple python code (like what the HSC photo-z working group does) or generated from DEmP internally. The currently available point estimates are mean, mode, median, best, confidence levels for these four point estimates, risk flags (0 to 1) for these four point estimates, standard deviation, and lower and upper bounds of 68% and 95% confidence intervals.

3. Performance
Like all the machine learning codes, the performance of DEmP depends on the quality of training set, the amount of information provided (e.g., photometry, size, morphology, etc.), as well as the quality of the input data.

The core algorithm of DEmP is a polynomial fitting with N-nearest neighbors. If N is much smaller than the sample size of the training set, the polynomial fitting form becomes irrelevant since these N-nearest neighbors are supposed to have very similar values of the output quantities (e.g., redshift, stellar mass, star formation rate, etc.). In addition, all kinds of priors are natively embedded in any training set, e.g., redshift distribution, luminosity distribution, mass distribution, SFR distribution, etc., and these priors may introduce unwanted biases for general machine learning codes. However, the polynomial fitting with N-nearest neighbors algorithm can reduce this effect significantly because the fitting is done in a small subset of the training set with very similar input values where the effect of prior is negligible.

The overall performance of DEmP has been reviewed in the HSC photometric redshift data release papers (Tanaka et al. 2018; Nishizawa et al. 2020). The latest DEmP performance tests for the HSC S20A internal data release shows -0.003(1+z) in bias, 0.019(1+z) in scatter and 5.4% in outliers, for galaxies with i < 24.5 mag, at the HSC Wide depth (i.e., about 1-2 yr of LSST depth).

4. Technical Aspects

Scalability - Will meet
DEmP’s run time depends on the sample size of the training set and the parameter configuration. The DEmP code is now being optimized for GPU parallel computing, and should meet the requirement after the optimization is done.

Inputs - Will meet
DEmP requires photometry with corresponding uncertainties, and allows additional measurements like size and shape information.

Outputs - Will meet
DEmP provides probability distribution function (PDF), and posterior processing to produce mean, mode, median, best, confidence levels for these four point estimates, risk flags (0 to 1) for these four point estimates, standard deviations, and lower and upper bounds of 68% and 95% confidence intervals.

Storage Constraints - Will meet
The bin size of probability distribution function is adjustable. The outputs of point estimates are customizable.

External Data Sets - Will meet
DEmP requires a training set with spectroscopic redshifts and photometric redshifts derived using numerous photometric bands with relatively large wavelength coverage like COSMOS2015 (Laigle et al. 2016).

Estimator Training and Iterative Development - Will meet
The developer of DEmP will be in charge of the training and iterative development.

Computational Processing Constraints - Will meet
DEmP will be run using the supercomputer of ASIAA.

Implementation Language - Will meet
DEmP is written in Fortran. The developer will port it to C++.

Maintenance - Will meet
The developer of DEmP will maintain the code.