Improving machine learning-derived photometric redshifts and physical property estimates using unlabelled observations

Humphrey, A.; Troncoso, Israel Matute; Cunha, P. A. C.; Paulino-Afonso, A.; Amarantidis, Stergios; Carvajal, R.; Gomes, Jean Michel; Matute, I.; Papaderos, P.

Published in

Oxford University Press, Monthly Notices of the Royal Astronomical Society, 1(520), p. 305-313, 2022

DOI: 10.1093/mnras/stac3596

Tools

Export citation

Search in Google Scholar

Improving machine learning-derived photometric redshifts and physical property estimates using unlabelled observations

Journal article published in 2022 by A. Humphrey, Israel Matute Troncoso, P. A. C. Cunha, A. Paulino-Afonso, Stergios Amarantidis

, R. Carvajal, Jean Michel Gomes

, I. Matute

, P. Papaderos

This paper was not found in any repository, but could be made available legally by the author.

Full text: Unavailable

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving allowed

Upload

Policy details

Data provided by

Abstract

ABSTRACT In the era of huge astronomical surveys, machine learning offers promising solutions for the efficient estimation of galaxy properties. The traditional, ‘supervised’ paradigm for the application of machine learning involves training a model on labelled data, and using this model to predict the labels of previously unlabelled data. The semi-supervised ‘pseudo-labelling’ technique offers an alternative paradigm, allowing the model training algorithm to learn from both labelled data and as-yet unlabelled data. We test the pseudo-labelling method on the problems of estimating redshift, stellar mass, and star formation rate, using COSMOS2015 broad band photometry and one of several publicly available machine learning algorithms, and we obtain significant improvements compared to purely supervised learning. We find that the gradient-boosting tree methods CatBoost, XGBoost, and LightGBM benefit the most, with reductions of up to ∼15 per cent in metrics of absolute error. We also find similar improvements in the photometric redshift catastrophic outlier fraction. We argue that the pseudo-labelling technique will be useful for the estimation of redshift and physical properties of galaxies in upcoming large imaging surveys such as Euclid and LSST, which will provide photometric data for billions of sources.

Published in

Links

Tools

Improving machine learning-derived photometric redshifts and physical property estimates using unlabelled observations

Abstract