Robust Classification

Bertsimas, Dimitris; Dunn, Jack; Pawlowski, Colin; Zhuo, Ying Daisy

Published in

Institute for Operations Research and Management Sciences, INFORMS Journal on Optimization, 1(1), p. 2-34, 2019

DOI: 10.1287/ijoo.2018.0001

Tools

Export citation

Search in Google Scholar

Robust Classification

Journal article published in 2019 by Dimitris Bertsimas

, Jack Dunn

, Colin Pawlowski

, Ying Daisy Zhuo

This paper was not found in any repository, but could be made available legally by the author.

Full text: Unavailable

Preprint: archiving allowed

Upload

Postprint: archiving allowed

Upload

Published version: archiving forbidden

Policy details

Data provided by

Abstract

Motivated by the fact that there may be inaccuracies in features and labels of training data, we apply robust optimization techniques to study in a principled way the uncertainty in data features and labels in classification problems and obtain robust formulations for the three most widely used classification methods: support vector machines, logistic regression, and decision trees. We show that adding robustness does not materially change the complexity of the problem and that all robust counterparts can be solved in practical computational times. We demonstrate the advantage of these robust formulations over regularized and nominal methods in synthetic data experiments, and we show that our robust classification methods offer improved out-of-sample accuracy. Furthermore, we run large-scale computational experiments across a sample of 75 data sets from the University of California Irvine Machine Learning Repository and show that adding robustness to any of the three nonregularized classification methods improves the accuracy in the majority of the data sets. We observe the most significant gains for robust classification methods on high-dimensional and difficult classification problems, with an average improvement in out-of-sample accuracy of robust versus nominal problems of 5.3% for support vector machines, 4.0% for logistic regression, and 1.3% for decision trees.

Published in

Links

Tools

Robust Classification

Abstract