Accelerating Formulation Design via Machine Learning: Generating a High-throughput Shampoo Formulations Dataset

Chitre, Aniket; Querimit, Robert C. M.; Rihm, Simon D.; Karan, Dogancan; Zhu, Benchuan; Wang, Ke; Wang, Long; Hippalgaonkar, Kedar; Lapkin, Alexei A.

Published in

Nature Research, Scientific Data, 1(11), 2024

DOI: 10.1038/s41597-024-03573-w

Tools

Export citation

Search in Google Scholar

Accelerating Formulation Design via Machine Learning: Generating a High-throughput Shampoo Formulations Dataset

Journal article published in 2024 by Aniket Chitre

, Robert C. M. Querimit

, Simon D. Rihm

, Dogancan Karan

, Benchuan Zhu, Ke Wang, Long Wang, Kedar Hippalgaonkar

, Alexei A. Lapkin

This paper is made freely available by the publisher.

Full text: Download

Preprint: archiving allowed

Upload

Postprint: archiving forbidden

Published version: archiving allowed

Upload

Policy details

Data provided by

Abstract

AbstractLiquid formulations are ubiquitous yet have lengthy product development cycles owing to the complex physical interactions between ingredients making it difficult to tune formulations to customer-defined property targets. Interpolative ML models can accelerate liquid formulations design but are typically trained on limited sets of ingredients and without any structural information, which limits their out-of-training predictive capacity. To address this challenge, we selected eighteen formulation ingredients covering a diverse chemical space to prepare an open experimental dataset for training ML models for rinse-off formulations development. The resulting design space has an over 50-fold increase in dimensionality compared to our previous work. Here, we present a dataset of 812 formulations, including 294 stable samples, which cover the entire design space, with phase stability, turbidity, and high-fidelity rheology measurements generated on our semi-automated, ML-driven liquid formulations workflow. Our dataset has the unique attribute of sample-specific uncertainty measurements to train predictive surrogate models.

Published in

Links

Tools

Accelerating Formulation Design via Machine Learning: Generating a High-throughput Shampoo Formulations Dataset

Abstract