Frontiers Media, Frontiers in Drug Discovery, (2), 2022
DOI: 10.3389/fddsv.2022.954911
Full text: Download
HIV-1 integrase is an essential enzyme for the HIV-1 replication cycle, and currently, integrase inhibitors are in the first line of treatment in many guidelines. Despite the discovery of new inhibitors, including a new class of molecules with different mechanisms of action, resistance is still a relevant problem, and adding new options to the therapeutic arsenal to fight viral resistance is a Sisyphean task. Because of the difficulty and cost of in vitro screenings, machine learning-driven ligand-based virtual screenings are an alternative that can not only cut costs but also use valuable information about active compounds with yet unknown mechanisms of action. In this work, we describe a thorough model exploration and hyperparameter tuning procedure in a dataset with class imbalance and show several models capable of distinguishing between compounds that are active or inactive against the HIV-1 integrase. The best of the models was then used to screen the natural product atlas for active compounds, resulting in a myriad of molecules that share features with known integrase inhibitors. Here we also explore the strengths and shortcomings of our models and discuss the use of the applicability domain to guide in vitro screenings and differentiate between the “predictable” and “unknown” regions of the chemical space.