Oxford University Press, Monthly Notices of the Royal Astronomical Society, 4(498), p. 5498-5510, 2020
Full text: Download
ABSTRACT Wide-area imaging surveys are one of the key ways of advancing our understanding of cosmology, galaxy formation physics, and the large-scale structure of the Universe in the coming years. These surveys typically require calculating redshifts for huge numbers (hundreds of millions to billions) of galaxies – almost all of which must be derived from photometry rather than spectroscopy. In this paper, we investigate how using statistical models to understand the populations that make up the colour–magnitude distribution of galaxies can be combined with machine learning photometric redshift codes to improve redshift estimates. In particular, we combine the use of Gaussian mixture models with the high-performing machine-learning photo-z algorithm GPz and show that modelling and accounting for the different colour–magnitude distributions of training and test data separately can give improved redshift estimates, reduce the bias on estimates by up to a half, and speed up the run-time of the algorithm. These methods are illustrated using data from deep optical and near-infrared data in two separate deep fields, where training and test data of different colour–magnitude distributions are constructed from the galaxies with known spectroscopic redshifts, derived from several heterogeneous surveys.