SPICE, A Dataset of Drug-like Molecules and Peptides for Training Machine Learning Potentials

Eastman, Peter; Behara, Pavan Kumar; Dotson, David L.; Galvelis, Raimondas; Herr, John E.; Horton, Josh T.; Mao, Yuezhi; Chodera, John D.; Pritchard, Benjamin P.; Wang, Yuanqing; De Fabritiis, Gianni; Markland, Thomas E.

Published in

Nature Research, Scientific Data, 1(10), 2023

DOI: 10.1038/s41597-022-01882-6

Tools

Export citation

Search in Google Scholar

SPICE, A Dataset of Drug-like Molecules and Peptides for Training Machine Learning Potentials

Journal article published in 2023 by Peter Eastman

, Pavan Kumar Behara, David L. Dotson

, Raimondas Galvelis

, John E. Herr, Josh T. Horton

, Yuezhi Mao

, John D. Chodera, Benjamin P. Pritchard, Yuanqing Wang, Gianni De Fabritiis, Thomas E. Markland

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving forbidden

Published version: archiving allowed

Upload

Policy details

Data provided by

Abstract

AbstractMachine learning potentials are an important tool for molecular simulation, but their development is held back by a shortage of high quality datasets to train them on. We describe the SPICE dataset, a new quantum chemistry dataset for training potentials relevant to simulating drug-like small molecules interacting with proteins. It contains over 1.1 million conformations for a diverse set of small molecules, dimers, dipeptides, and solvated amino acids. It includes 15 elements, charged and uncharged molecules, and a wide range of covalent and non-covalent interactions. It provides both forces and energies calculated at the ωB97M-D3(BJ)/def2-TZVPPD level of theory, along with other useful quantities such as multipole moments and bond orders. We train a set of machine learning potentials on it and demonstrate that they can achieve chemical accuracy across a broad region of chemical space. It can serve as a valuable resource for the creation of transferable, ready to use potential functions for use in molecular simulations.

Published in

Links

Tools

SPICE, A Dataset of Drug-like Molecules and Peptides for Training Machine Learning Potentials

Abstract