RFMO-04 - Rapid fire session from selected oral abstracts

P1-P2

Machine Learning Algorithms For Predicting Solubility Of Small-molecule Compounds In Organic Solvents

  • By: YE, Zhuyifan (Macao Polytechnic University, China)
  • Co-author(s): Dr Zhuyifan Ye (Macao Polytechnic University, Macau, China)
    Prof Defang Ouyang (University of Macau, Macau, China)
  • Abstract:

    In the field of chemistry, the rapid selection of solvents is highly important, but accurate prediction of solubility remains a critical challenge. Therefore, the aim of this study was to develop machine learning models capable of precisely predicting the solubility of compounds in organic solvents. In this study, a dataset comprising over 5000 experimental temperature and solubility data of compounds in organic solvents was collected. To characterize the structural features, molecular fingerprints were utilized. The performance of lightGBM was compared against traditional machine learning techniques (PLS, Ridge regression, kNN, DT, ET, RF, SVM) as well as deep learning methods, for developing accurate models to predict the solubility of compounds in organic solvents at various temperatures. LightGBM demonstrated significantly improved overall generalization (logS±0.20) when compared to other models. The model also provided prediction accuracy (logS±0.59) for unseen solutes, which was similar to the expected noise level of experimental solubility data. Additionally, LightGBM revealed the physicochemical relationship between solubility and structural features, thereby contributing to a better understanding of the underlying mechanisms. The approach offers a means of quickly screening solvents and has the potential to be extended to predict solubility in different solvent systems.