Published in

American Chemical Society, Journal of Chemical Information and Modeling, 6(47), p. 2408-2415, 2007

DOI: 10.1021/ci7002076

Wiley-VCH Verlag, ChemInform, 8(39), 2008

DOI: 10.1002/chin.200808228

Links

Tools

Export citation

Search in Google Scholar

ADME Evaluation in Drug Discovery. 8. The Prediction of Human Intestinal Absorption by a Support Vector Machine

Journal article published in 2007 by Tingjun Hou, Junmei Wang ORCID, Youyong Li
This paper is available in a repository.
This paper is available in a repository.

Full text: Download

Green circle
Preprint: archiving allowed
  • Must obtain written permission from Editor
  • Must not violate ACS ethical Guidelines
Orange circle
Postprint: archiving restricted
  • Must obtain written permission from Editor
  • Must not violate ACS ethical Guidelines
Red circle
Published version: archiving forbidden
Data provided by SHERPA/RoMEO

Abstract

Human intestinal absorption (HIA) is an important roadblock in the formulation of new drug substances. In silico models for predicting the percentage of HIA based on calculated molecular descriptors are highly needed for the rapid estimation of this property. Here, we have studied the performance of a support vector machine (SVM) to classify compounds with high or low fractional absorption (%FA > 30% or %FA < or = 30%). The analyzed data set consists of 578 structural diverse druglike molecules, which have been divided into a 480-molecule training set and a 98-molecule test set. Ten SVM classification models have been generated to investigate the impact of different individual molecular properties on %FA. Among these studied important molecule descriptors, topological polar surface area (TPSA) and predicted apparent octanol-water distribution coefficient at pH 6.5 (logD6.5) show better classification performance than the others. To obtain the best SVM classifier, the influences of different kernel functions and different combinations of molecular descriptors were investigated using a rigorous training-validation procedure. The best SVM classifier can give satisfactory predictions for the training set (97.8% for the poor-absorption class and 94.5% for the good-absorption class). Moreover, 100% of the poor-absorption class and 97.8% of the good-absorption class in the external test set could be correctly classified. Finally, the influence of the size of the training set and the unbalanced nature of the data set have been studied. The analysis demonstrates that large data set is necessary for the stability of the classification models. Furthermore, the weights for the poor-absorption class and the good-absorption class should be properly balanced to generate unbiased classification models. Our work illustrates that SVMs used in combination with simple molecular descriptors can provide an extremely reliable assessment of intestinal absorption in an early in silico filtering process.