RFMO-04 - Rapid fire session from selected oral abstracts

P1-P2

An Artificial Intelligence Decision System For Solubilization Strategies Of Small Molecule Drug Candidates: Lessons From Approved Drugs By Partially Supervised Learning

  • By: OUYANG, Defang (University Of Macau)
  • Co-author(s): Prof Defang Ouyang (University of Macau, Macau, China)
    Mr Zheng Wu (University of Macau, Macau, China)
    Dr Defang Ouyang (University of Macau, Macau, China)
  • Abstract:

    Background information:
    As the low-hanging fruit gets picked competitively, the proportion of poorly water-soluble molecules in drug development pipelines continues to climb. Pharmaceutical scientists have developed various bio-enabling formulation strategies to improve the delivery efficiency of poorly water-soluble molecules. Such strategies based on different solubilizing principles suit compounds with different structures and properties. Choosing the appropriate formulation technique for a drug candidate in the early stages of formulation development is critical in improving drug development efficiency and reducing risk and costs. However, there currently needs to be more systematic studies to support solubilization strategy decisions for small molecules.
    Purpose:
    Driven by the philosophy that “Structure determines nature and nature influences decisions”, current research aims to establish correlations between a drug’s structure and property with its appropriate delivery strategies through machine learning algorithms and to develop a user-friendly artificial intelligence (AI) system for formulation strategy decisions.
    Method:
    First, the formulation techniques used in approved small molecule drugs were collated from the Orange Book, literature, and public reports. Oral drugs and injectable drugs were considered and handled separately. The highest single therapeutic dose information, the pKa data, and the RDKit descriptors were selected to represent drug molecules. Based on the information that approved drugs provide, the formulation strategy decision pathway can be briefly described as below: Decision 1 to determine whether a bio-enabling strategy is necessary; Decision 2a to decide if a drug can be developed in salt forms; Decision 2b to determine if each of the four commonly used bio-enabling strategies (solid dispersion, nanocrystals, lipid-based formulation, and cyclodextrin inclusions for oral drugs; organic solvents, surfactant micelles, liposomal formulations, and cyclodextrin inclusions for injectable drugs) is feasible for drugs need to be formulated as non-conventional formulations. It is worth noting that the above task lacks identified negative samples, which are typical of partially supervised learning tasks, more specifically, are positive-unlabeled (PU) learning tasks. For example, a poorly water-soluble drug that has yet to be approved as a solid dispersion formulation should not mean that it cannot be formulated as a solid dispersion. Given that, the PU bagging strategy was improved for scoring and relabeling such unlabeled data. After that, the interpretable random forest algorithm was selected from commonly used supervised learning algorithms for the total 12 classification tasks. Lastly, all well-trained models are systematically integrated into a user-friendly website for easy access.
    Results:
    Twelve machine learning models were built with classification accuracy greater than 0.85 and the Matthews correlation coefficient (MCC) greater than 0.70. The model decision processes were visualized for rule extraction. The feature importance and feature differences across categories are analyzed and discussed. Moreover, a user-friendly AI system was built recommending formulation strategies for a given structure.
    Conclusion:
    The current study developed the first artificial intelligence system for delivery strategy decisions of given structures, which enables efficient design-driven formulation development, opens up opportunities for improved new drugs, and demonstrates the potential and value of partially supervised learning in drug development.