Women’s College of Pharmacy, Peth Vadgaon, India.
Drug discovery is traditionally a prolonged, costly, and high-risk enterprise, often exceeding ten years and billions of dollars for a single new molecular entity. Artificial Intelligence (AI) has emerged as a transformative technology that can dramatically shorten timelines, reduce costs, and enhance predictive accuracy throughout pharmaceutical research and development. AI algorithms particularly machine learning (ML), deep learning (DL), natural language processing (NLP), and reinforcement learning (RL) enable the mining and interpretation of massive biochemical, genomic, and clinical datasets. These capabilities support target identification, hit discovery, lead optimization, and clinical trial design. This review summarizes the evolution of AI in pharmaceutical sciences, outlines the principal algorithms and architectures currently used, and evaluates applications across the modern drug-discovery pipeline. Special attention is given to AI-based target discovery from multi-omics data, structure-based virtual screening, predictive ADME/Tox modeling, and data-driven drug repurposing. Representative industrial case studies, including AlphaFold, Atomwise, BenevolentAI, and Insilico Medicine, demonstrate the translation of AI concepts into tangible therapeutic candidates. The article also critically discusses persistent challenges data bias, model transparency, reproducibility, and regulatory acceptance and highlights emerging solutions such as explainable AI, federated learning, and quantum-enhanced modeling. AI is not a substitute for experimental science but an indispensable complement that can guide and accelerate hypothesis generation, compound design, and clinical validation. The integration of AI with computational chemistry and biological experimentation heralds a new era of precision-driven, cost-effective, and patient-centered drug discovery.
1.1 Context and Rationale
Drug discovery is an inherently complex, iterative process involving target identification, hit discovery, lead optimization, preclinical assessment, and multiple clinical phases. On average, the journey from concept to market requires 10–15 years and an investment exceeding USD 2 billion, with an overall success rate below 10 % (Oliveira et al., 2023). Each failure at late stages adds enormous financial and ethical costs. The pharmaceutical industry therefore faces an urgent need for technologies that can improve prediction accuracy, minimize attrition, and compress development timelines.
Artificial Intelligence (AI) defined as computational systems capable of learning patterns, reasoning, and decision-making offers a paradigm shift from empirical trial-and-error to data-driven prediction and optimization. Advances in cloud computing, algorithmic design, and high-performance GPUs have allowed AI to analyze complex biological and chemical datasets at unprecedented scale (Kim et al., 2021). In silico modeling empowered by AI now complements, and sometimes precedes, traditional wet-lab experimentation.
1.2 Limitations of Conventional Approaches
Traditional high-throughput screening (HTS) can evaluate only a fraction of the available chemical space estimated at 10?? possible molecules leaving vast opportunities unexplored. Classical computer-aided drug design (CADD) methods such as molecular docking and QSAR depend on human-defined descriptors and simple statistical correlations, which may fail to capture nonlinear relationships among molecular features. Moreover, manual curation of biochemical data is laborious and prone to bias or incompleteness (Patel et al., 2020). Consequently, many candidate compounds progress through the pipeline with hidden liabilities that lead to failure in preclinical toxicology or clinical efficacy studies.
AI-based systems can integrate multidimensional data from genomics and transcriptomics to chemical and phenotypic screening extract hidden patterns, and learn predictive models that generalize beyond the training dataset (Kant et al., 2025). These systems not only accelerate early discovery but also inform later phases such as patient stratification and post-marketing safety surveillance.
Fig no.2: Comparison Between Traditional vs AI-Based Drug Discovery Pipelines
1.3 Scope and Objectives of the Review
This review provides a comprehensive assessment of AI’s role in contemporary drug discovery, aligning with the format of the Indian Journal of Pharmaceutical Sciences (IJPS). The objectives are to:
1.4 AI as an Enabler of Efficiency and Innovation
By automating routine analyses and prioritizing high-value hypotheses, AI can reduce experimental redundancy. For instance, ML-based QSAR models can predict biological activity, physicochemical properties, and toxicity profiles of untested compounds with high precision (Patel et al., 2020). Deep learning models have been reported to improve hit rates by 30–50 % in virtual screening compared to classical docking alone (Sumathi et al., 2023). In addition, AI-powered literature mining tools using NLP can rapidly extract drug–target associations from millions of publications, supporting knowledge synthesis that would otherwise take years of manual curation.
1.5 Synergy with Computational and Experimental Workflows
AI should be viewed as an integrative layer atop established computational and experimental methodologies. For example, predictive models can pre-filter compound libraries before molecular docking, reducing computation by orders of magnitude. Likewise, reinforcement learning can suggest chemical modifications that enhance potency or selectivity while preserving synthesizability (Tropsha et al., 2023). Ultimately, AI complements human expertise chemists interpret model outputs, design confirmatory experiments, and refine datasets iteratively.
1.6 Ethical, Regulatory, and Societal Dimensions
The increasing autonomy of AI systems in proposing chemical entities raises new regulatory and ethical questions: Who owns an AI-generated molecule? How can transparency and accountability be ensured when models act as “black boxes”? Regulatory agencies such as the EMA and FDA are developing frameworks to evaluate AI-based decision tools in drug development (European Medicines Agency, 2024). Establishing data governance, validation standards, and explainability metrics will be critical for safe deployment.
1.7 Outlook
With global investments and academic collaboration, AI-driven discovery is transitioning from proof-of-concept to industrial reality. In India, national initiatives in bioinformatics and computational chemistry, combined with rapidly expanding pharmaceutical R&D capacity, provide fertile ground for adoption. By 2030, AI-assisted platforms are projected to contribute substantially to lead identification and optimization pipelines across both multinational and domestic firms (Bhat et al., 2025).
Fig no.2: Workflow for AI-assisted data analysis
2. HISTORICAL BACKGROUND AND EVOLUTION
2.1 Early Computer-Aided Drug Design (CADD)
The conceptual foundations of AI in drug discovery can be traced to the rise of computer-aided drug design (CADD) in the 1960s–1980s, when chemists first employed computers to model molecular structures and predict biological activity.
Methods such as quantitative structure–activity relationship (QSAR) and molecular docking dominated early computational pharmacology (Martin, 1978). These techniques relied on human-defined molecular descriptors lipophilicity, polar surface area, hydrogen-bond counts and used linear regression or partial-least-squares models to correlate structure with potency or toxicity (Patel et al., 2020).
While revolutionary for their time, early CADD methods were limited by small datasets and simplified statistical assumptions. Most models failed to generalize to new chemical scaffolds, leading to poor external predictivity. Nevertheless, they laid crucial groundwork for data-driven reasoning in medicinal chemistry.
2.2 Expansion of Databases and Bioinformatics (1990s–2000s)
The late 1990s saw the explosion of bioinformatics and cheminformatics, driven by the Human Genome Project and advances in protein crystallography. Public repositories such as PubChem, ChEMBL, and Protein Data Bank (PDB) enabled large-scale data sharing. Algorithms such as BLAST and AutoDock allowed researchers to match sequences and simulate ligand–receptor binding (Morris et al., 2009).
However, these methods were largely rule-based, requiring extensive manual curation. Their predictive power depended on the scientist’s expertise in feature selection and statistical modeling. Computational cost and data scarcity further restricted their scope.
The emergence of support vector machines (SVMs) and random forests around 2000 marked the first true machine learning (ML) applications in drug design (Cortes & Vapnik, 1995; Chen et al., 2018). These algorithms could detect non-linear patterns in physicochemical descriptors, outperforming traditional regression methods.
2.3 Transition to Data-Driven AI (2010–2015)
From 2010 onward, exponential growth in computational power (notably GPU computing) and big-data availability transformed computational chemistry. Pharmaceutical companies began integrating ML to prioritize compounds before synthesis or screening (Ekins et al., 2016).
Large, annotated datasets bioassays, gene-expression profiles, clinical outcomes enabled supervised learning models to predict compound bioactivity and ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) profiles with increasing accuracy.
Simultaneously, unsupervised clustering was used to identify chemical subspaces associated with novel scaffolds (Hughes et al., 2011).
At this stage, the field started to shift from deterministic, rule-based modeling toward probabilistic and learning-based systems a precursor to modern AI.
2.4 Emergence of Deep Learning (2015–2020)
A critical milestone was the introduction of deep learning (DL) into molecular modeling. Inspired by advances in computer vision and natural-language processing, deep neural networks could learn hierarchical representations of chemical structures directly from raw data.
These methods produced quantum leaps in predictive accuracy: models trained on millions of molecules achieved correlation coefficients (R²) > 0.8 for physicochemical property prediction (Sumathi et al., 2023).
The publication of AlphaFold2 by DeepMind (Jumper et al., 2021) marked another watershed moment accurately predicting protein 3-D structures at near-experimental resolution and enabling structure-based drug design for previously “undruggable” targets.
2.5 Graph Neural Networks and Generative AI (2020–Present)
The next revolution arrived with graph neural networks (GNNs) and generative models.
Unlike traditional neural networks that process images or text, GNNs operate directly on molecular graphs atoms as nodes, bonds as edges capturing connectivity and chemical context (Zhang et al., 2025).
These models outperform descriptor-based QSAR in property prediction, toxicity forecasting, and synthesis planning.
In parallel, Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs) emerged for de novo molecular design (Bian & Xie, 2020).
By optimizing reward functions such as docking scores or ADMET metrics, reinforcement learning (RL) agents could propose novel compounds with targeted pharmacological profiles.
Commercial adoption accelerated:
These examples validated AI’s capacity to compress discovery timelines from years to months.
2.6 Integration with Cloud Computing and Big Data
Cloud platforms such as Google Colab, AWS SageMaker, and Azure ML democratized access to high-performance computation.
Simultaneously, open-source frameworks (TensorFlow, PyTorch, DeepChem) facilitated collaborative development and reproducibility.
The fusion of AI with big-data analytics enabled pharmaceutical consortia to analyze billions of compound–target pairs, accelerating lead identification and optimization (Ali et al., 2025).
The shift toward cloud-based, collaborative AI ecosystems now underpins virtually every major pharmaceutical R&D strategy, making data integration and interoperability key priorities.
3. AI TECHNIQUES USED IN DRUG DISCOVERY
Artificial intelligence is not a single algorithm but a family of computational strategies capable of perceiving patterns, learning from data, and making predictions. In drug discovery, AI techniques can be broadly grouped into machine learning (ML), deep learning (DL), natural language processing (NLP), and reinforcement learning (RL). Each approach contributes unique capabilities from predicting molecular properties to generating novel chemical entities.
3.1 Machine Learning (ML)
Machine learning encompasses algorithms that infer relationships between molecular descriptors and biological activity from existing data.
Traditional ML workflows involve data preprocessing, feature extraction, model training, and validation. Input data may include molecular fingerprints, 2-D descriptors (e.g., Log P, hydrogen bond count), or physicochemical properties derived from cheminformatics software such as PaDEL or RDKit.
Common ML algorithms
ML techniques remain widely used for activity prediction, ADMET modeling, toxicity classification, and compound clustering. For example, Patel et al. (2020) reported that an SVM-based QSAR model predicted hERG channel inhibition with an R² = 0.78, outperforming multiple linear regression (R² = 0.54).
Key advantages include interpretability, modest computational demand, and compatibility with relatively small datasets. However, ML models depend heavily on descriptor quality and may fail for chemotypes outside the training domain (Ekins et al., 2016).
3.2 Deep Learning (DL)
Deep learning extends ML by using multi-layered artificial neural networks capable of automatically extracting hierarchical features from raw molecular or biological data.
3.2.1 Convolutional Neural Networks (CNNs)
CNNs, widely used in image recognition, can treat molecules or protein–ligand complexes as 3-D images.
Ragoza et al. (2017) trained CNNs on voxelized docking grids and achieved higher enrichment factors than AutoDock’s empirical scoring functions.
CNNs can also process 2-D molecular graphs represented as adjacency matrices to learn structural motifs associated with activity.
3.2.2 Recurrent Neural Networks (RNNs) and LSTM
RNNs handle sequential data such as SMILES strings. By learning chemical syntax, they generate novel, syntactically valid molecules.
Segler et al. (2018) demonstrated that an LSTM trained on 1.5 million drug-like molecules produced new compounds with 70 % validity and comparable diversity to the training set.
RNNs also assist in retrosynthetic prediction suggesting reaction sequences leading to a target molecule.
3.2.3 Graph Neural Networks (GNNs)
GNNs represent molecules as graphs, naturally encoding atomic connectivity and chemical context. Each node (atom) aggregates information from neighboring nodes through message-passing operations (Zhang et al., 2025). GNNs achieve state-of-the-art accuracy in property prediction, outperforming classical descriptor-based QSAR. For example, the MPNN (Message-Passing Neural Network) framework reported a mean absolute error < 0.3 log units in solubility prediction (Gilmer et al., 2017). Beyond small molecules, Graph Attention Networks (GATs) and Graph Transformer models are used to model protein–ligand interactions, enabling structure-based virtual screening.
Fig no.3: The Deep Learning or GNN subsection.
3.2.4 Autoencoders and Variational Autoencoders (VAEs)
Autoencoders compress molecular information into low-dimensional latent spaces. Manipulating these latent vectors allows exploration of chemical space for optimized properties (Gómez-Bombarelli et al., 2018).
VAEs and Generative Adversarial Networks (GANs) are foundational for de novo drug design (Bian & Xie, 2020).
Advantages:
Limitations:
3.3 Natural Language Processing (NLP)
NLP applies AI to unstructured text such as biomedical literature, patents, and clinical reports.
With tens of millions of articles in PubMed, manual curation of drug–target data is impractical. NLP automates knowledge extraction and hypothesis generation.
3.3.1 Text Mining and Named Entity Recognition
Systems like PubTator, ChemSpot, and BioBERT identify and categorize entities drugs, targets, pathways from raw text (Lee et al., 2020).
These tools enable construction of knowledge graphs linking diseases, genes, and compounds (Liu et al., 2021).
3.3.2 Language Models for Drug Discovery
Recent transformer models (e.g., BERT, GPT-based architectures) fine-tuned on chemical corpora can interpret relationships between molecular mentions and biological outcomes.
Such models support automated literature review, adverse-event detection, and drug repurposing by correlating textual evidence across multiple sources (Bhat et al., 2025).
In 2023, NLP-driven analysis of COVID-19 literature successfully identified potential inhibitors of the SARS-CoV-2 main protease within days (Sadybekov et al., 2023).
3.4 Reinforcement Learning (RL) and Generative AI
RL introduces an agent environment framework, where the model learns by receiving rewards for actions that lead to desirable outcomes (Sutton & Barto, 2018).
In drug discovery, the environment represents chemical space, and actions correspond to molecular modifications.
3.4.1 Molecule Generation and Optimization
RL agents optimize molecular properties by sequentially modifying structures. For instance, the REINVENT algorithm uses an RNN generator guided by a reward function combining predicted activity and drug-likeness (Olivecrona et al., 2017).
In 2022, Zhavoronkov et al. reported that AI-generated compounds against fibrosis (via RL) progressed from concept to preclinical testing within 18 months.
3.4.2 Reward Functions and Constraints
Reward functions may include docking scores, synthetic accessibility, or toxicity penalties. Properly balancing these terms is critical; overly narrow rewards can produce unrealistic molecules.
Modern frameworks integrate multi-objective optimization simultaneously maximizing potency while minimizing toxicity (Tropsha et al., 2023).
3.4.3 Applications in Synthesis Planning
RL also underpins retrosynthetic analysis, where agents learn optimal reaction sequences to synthesize target molecules efficiently (Segler et al., 2018). Such tools have been commercialized in platforms like Synthia and AiZynthFinder.
3.5 Hybrid and Ensemble Approaches
Real-world drug discovery increasingly employs hybrid models that combine multiple AI paradigms or couple AI with physics-based simulations.
Examples include:
Such multimodal systems achieve higher predictive accuracy and greater generalization across chemical classes.
3.6 Comparative Summary:
|
Technique |
Typical Input |
Primary Application |
Strengths |
Limitations |
|
Machine Learning |
Molecular descriptors |
QSAR, ADMET prediction |
Interpretable, fast |
Limited to known chemotypes |
|
Deep Learning (CNN/RNN/GNN) |
Raw structural/ sequence data |
Activity prediction, molecule generation |
High accuracy, feature automation |
Data-hungry, low interpretability |
|
NLP |
Scientific text, patents |
Data extraction, drug repurposing |
Exploits unstructured data |
Requires domain-specific training |
|
Reinforcement Learning / Generative |
Molecular graphs or SMILES |
De novo design, optimization |
Creates novel scaffolds |
Reward design complexity |
|
Hybrid / Ensemble |
Multi-modal |
Integration & validation |
Synergy, robustness |
Complex implementation |
4. AI IN STAGES OF DRUG DISCOVERY
Artificial Intelligence has become embedded at nearly every stage of the modern drug discovery pipeline from early target identification to post-market surveillance. Its ability to mine multi-omics data, predict structure–activity relationships, and optimize candidate selection has transformed pharmaceutical R&D into a data-driven endeavor (Bhat et al., 2025). The key applications across each stage are summarized below.
4.1 Target Identification and Validation
Target identification involves recognizing biological macromolecules proteins, enzymes, receptors, or genes that play a critical role in disease pathophysiology. Traditional identification relied on laborious biochemical screening or genetic manipulation, often producing false positives or missing context-specific targets.
AI now allows the integration of multi-omics datasets genomics, transcriptomics, proteomics, metabolomics to uncover new druggable targets.
For instance, DeepMind’s AlphaFold2 predicted over 200 million protein structures, enabling structure-based target selection even in the absence of crystallographic data (Jumper et al., 2021).
Similarly, AI-driven transcriptome analysis by Insilico Medicine led to discovery of a novel fibrosis target (TGFB1-related) in less than 3 months (Dharmasivam et al., 2025).
Impact: AI enhances reliability in target selection, minimizes off-target effects, and allows early hypothesis generation before costly experimentation.
4.2 Hit Identification (Virtual Screening and QSAR Modeling)
Once a validated target is known, identifying chemical “hits” that interact with it effectively is the next challenge. Historically, high-throughput screening (HTS) tested millions of compounds experimentally expensive and inefficient.
AI-driven virtual screening (VS) can computationally screen billions of molecules within days (Zhou et al., 2024).
4.2.1 AI-Accelerated Virtual Screening
Deep learning and convolutional networks improve virtual screening by predicting binding affinities based on structural features.
Such platforms achieved enrichment factors up to 6× greater than traditional docking alone.
4.2.2 Quantitative Structure–Activity Relationship (QSAR) and ML
AI-enhanced QSAR models predict compound activity or toxicity based on descriptors. For example:
4.2.3 Ligand-Based Similarity and Active Learning
Active-learning frameworks iteratively train models as new assay data emerges, refining predictions dynamically (Ekins et al., 2016).
AI can also predict polypharmacology the likelihood of a compound interacting with multiple targets helpful in designing multi-target drugs for complex diseases (Zhang et al., 2025).
Impact: Hit identification now focuses on quality over quantity, narrowing billions of possibilities to a few hundred prioritized candidates for synthesis and testing.
4.3 Lead Optimization
After promising hits are discovered, medicinal chemists refine them for potency, selectivity, solubility, and pharmacokinetic properties. AI enhances this optimization by predicting key molecular properties without exhaustive synthesis.
4.3.1 Predictive ADMET and Toxicity Models
Deep learning models predict absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties with high precision:
4.3.2 Structure–Property Relationship Prediction
AI can model the impact of small structural modifications (bioisosteres, substitutions) on activity and drug-likeness. Reinforcement learning (RL) algorithms optimize molecules iteratively, maximizing binding affinity while minimizing toxicity (Olivecrona et al., 2017).
4.3.3 Multi-Objective Optimization
Modern RL frameworks employ multi-objective reward functions, simultaneously balancing potency, lipophilicity, and synthetic feasibility. The REINVENT system by AstraZeneca generated leads with improved pharmacological balance using such methods (Popova et al., 2018).
Impact: Lead optimization cycles that once required months can now be simulated virtually, saving experimental effort and reducing attrition before preclinical testing.
4.4 Preclinical and Clinical Development
AI plays a crucial role in bridging discovery and development phases.
4.4.1 Preclinical Modelling
AI models forecast pharmacokinetics (PK) and pharmacodynamics (PD) based on in vitro or animal data.
In silico toxicology using DL networks predicts off-target effects and drug–drug interactions (Vamathevan et al., 2019).
AI-aided molecular dynamics (MD) simulations allow efficient modeling of ligand binding and protein flexibility.
4.4.2 Clinical Trial Design and Patient Stratification
In clinical development, AI assists in trial design, patient recruitment, and risk assessment:
4.4.3 Biomarker and Safety Prediction
ML algorithms correlate multi-omics and imaging data to discover biomarkers predictive of therapeutic response or toxicity.
For example, AI-based image classifiers have identified subtle cardiac anomalies linked to drug-induced QT prolongation (Serafim et al., 2023).
Impact: AI reduces preclinical animal use, optimizes trial design, and enhances safety monitoring shortening timelines from discovery to approval.
4.5 Drug Repurposing (Repositioning)
Drug repurposing seeks new therapeutic uses for approved or shelved compounds an area where AI has excelled due to data abundance and reduced regulatory barriers.
4.5.1 Network-Based and Similarity Models
AI constructs drug–target–disease networks, using graph theory to identify repositioning opportunities. Machine learning models evaluate chemical, genomic, and phenotypic similarities among compounds (Tanoli et al., 2021).
For example, BenevolentAI used its knowledge-graph platform to predict Baricitinib as an inhibitor of COVID-19 inflammatory pathways a prediction later validated clinically (Stebbing et al., 2020).
4.5.2 NLP-Assisted Repurposing
NLP tools scan biomedical literature and clinical trial registries for overlooked associations.
During the pandemic, AI-based mining of open-access literature suggested over 100 potential antivirals within weeks (Sadybekov et al., 2023).
4.5.3 Case Studies
4.5.4 Benefits
Repurposing saves up to 60–70 % of development time and cost, as safety data already exist (Wan et al., 2025). AI-driven prediction further enhances precision, increasing the probability of clinical success.
4.6 Post-Marketing Surveillance and Pharmacovigilance
AI continues to add value even after drug approval. ML algorithms monitor real-world data electronic health records, social media, and spontaneous reporting systems for adverse events or efficacy trends (European Medicines Agency, 2024). Natural language models identify under-reported side effects, enabling proactive safety updates.
Impact: Continuous AI monitoring ensures patient safety, regulatory compliance, and improved lifecycle management.
5. INTEGRATION OF AI WITH COMPUTATIONAL TOOLS
Artificial intelligence (AI) does not function in isolation it amplifies the power of existing in silico methodologies such as molecular docking, pharmacophore modeling, and quantitative structure–activity relationship (QSAR) analyses. The integration of AI with classical computational chemistry has created hybrid workflows capable of accelerating discovery, improving accuracy, and reducing experimental dependency (Friesner et al., 2020; Dharmasivam et al., 2025).
5.1 AI-Enhanced Molecular Docking and Virtual Screening
Molecular docking predicts how a small molecule interacts with a biological target, evaluating both the orientation and binding energy of the complex. Traditional docking algorithms, such as AutoDock and Glide, rely on physics-based scoring functions that sometimes oversimplify real molecular interactions.
AI has enhanced docking by improving scoring accuracy, pose prediction, and screening efficiency.
5.1.1 Machine Learning–Based Scoring Functions
Machine learning (ML) models trained on empirical binding data can replace or augment conventional scoring functions.
5.1.2 Accelerated Virtual Screening
AI pre-filters compound libraries before docking, reducing computational burden by up to 90 %. Deep learning classifiers rapidly exclude molecules predicted to have poor binding or ADMET properties (Zhou et al., 2024).
For example, RosettaVS, integrating ML with Rosetta docking, successfully screened 1.2 billion molecules in under 60 hours (Zhou et al., 2024).
These approaches significantly enhance throughput, enabling ultra-large-scale virtual screening for neglected or rare-disease targets.
5.1.3 Generative Docking
Reinforcement learning (RL) combined with docking feedback enables de novo molecule generation guided by binding affinity scores. AI iteratively refines chemical structures toward improved poses and interactions (Olivecrona et al., 2017).
Impact: AI-docking integration merges physics-based rigor with predictive flexibility, achieving better ranking accuracy and drastically reducing computational time.
5.2 AI-Assisted QSAR and QSPR Modeling
The QSAR (Quantitative Structure–Activity Relationship) and QSPR (Quantitative Structure–Property Relationship) frameworks correlate molecular features with biological activity or physicochemical properties. While QSAR has been foundational since the 1960s, classical linear models often fail for non-linear relationships or diverse chemotypes.
AI overcomes these limitations by learning directly from raw data or molecular representations without pre-defined descriptors.
5.2.1 Descriptor-Free Learning
Graph neural networks (GNNs) and message-passing networks (MPNNs) can directly process molecular graphs, eliminating the need for manually calculated descriptors (Gilmer et al., 2017).
These models capture electronic and topological effects, enabling generalizable predictions across novel scaffolds.
5.2.2 Transfer and Multitask Learning
AI enables multitask QSAR, where models learn multiple properties (e.g., potency, solubility, toxicity) simultaneously, improving data efficiency (Mayr et al., 2018).
Transfer learning allows models trained on large datasets to adapt to new targets with limited data a key advantage in orphan disease research (Yang et al., 2021).
5.2.3 Hybrid QSAR Pipelines
AI-QSAR workflows now integrate both data-driven and mechanistic elements:
Such pipelines are implemented in platforms like ADMETlab 2.0 and ChemProp, routinely used by industry and academia (Dong et al., 2021).
Impact: AI-QSAR models demonstrate higher accuracy (average R² > 0.85) and external predictivity (Q² > 0.7), reducing the need for exhaustive experimental screening.
5.3 AI in Pharmacophore Modeling and 3D Alignment
Pharmacophore modeling identifies the essential chemical features required for biological activity. Traditional models relied on geometric heuristics, but AI has enhanced both feature extraction and 3D alignment accuracy.
AI-driven pharmacophore searches can now process millions of ligands within hours, identifying bioactive scaffolds even when receptor structures are unknown.
5.4 Integration with Molecular Dynamics (MD) Simulations
Molecular dynamics (MD) simulations provide atomic-level insights into biomolecular behavior but are computationally intensive. AI is revolutionizing MD analysis and prediction.
5.4.1 Surrogate Models for Free Energy Calculations
AI surrogate models predict free-energy profiles and conformational landscapes, reducing simulation time by 100–1000× (Noé et al., 2020).
Graph-based neural potentials (GNPs) approximate quantum mechanics-level accuracy for binding free energy estimation.
5.4.2 Enhanced Sampling and Trajectory Prediction
Deep learning can learn collective variables from simulation data, accelerating conformational sampling. The DeepMD and TorchMD frameworks exemplify this hybridization, coupling neural networks with classical MD (Tholke et al., 2022).
Impact: AI-guided MD yields faster, more accurate insight into protein flexibility, ligand binding, and allosteric regulation vital for rational drug design.
5.5 AI and Cloud-Based Computational Platforms
Cloud computing has democratized access to powerful computational resources, enabling scalable AI–chemistry integration.
5.5.1 Scalable Infrastructure
Platforms such as AWS SageMaker, Google Vertex AI, and Microsoft Azure ML host molecular datasets, pre-trained models, and high-performance GPUs. These enable collaborative model training and reproducibility (Ali et al., 2025).
5.5.2 Federated and Distributed Learning
Pharmaceutical companies often cannot share raw data due to confidentiality. Federated learning enables joint model training across distributed datasets without centralizing data, preserving privacy (Rieke et al., 2020). Such approaches are used in oncology and rare disease research to aggregate insights across institutions securely.
5.5.3 Integration with Big Data and Databases
AI platforms interface directly with major databases (ChEMBL, PubChem, DrugBank, PDB) for real-time updates and automated data cleaning.
Combining big data analytics and AI accelerates pattern recognition and hypothesis generation (Kim et al., 2021).
Impact: Cloud-based AI ecosystems reduce hardware barriers, promote collaboration, and ensure continuous model improvement across global research networks.
5.6 AI in Data Mining and Cheminformatics
Cheminformatics forms the backbone of AI-driven discovery by curating, annotating, and standardizing molecular data. AI improves every stage of data management:
These innovations ensure that AI models are built on robust, standardized, and reproducible datasets critical for regulatory acceptance and scientific validity.
5.7 Synergy Between AI and Classical Computational Approaches
AI augments but does not replace classical computational drug design. While physics-based simulations offer interpretability and mechanistic understanding, AI provides speed and scalability.
Example Workflow:
This hybrid cycle ensures both efficiency and credibility, forming the backbone of modern AI-driven discovery.
6. CHALLENGES AND LIMITATIONS
Despite tremendous progress, Artificial Intelligence (AI) in drug discovery still faces several limitations that restrict its large-scale, regulatory, and clinical implementation.
6.1 Data-Related Challenges
AI relies heavily on large, high-quality datasets. However, most pharmaceutical data are incomplete, inconsistent, or biased, leading to unreliable predictions (Vamathevan et al., 2019).
Differences in assay conditions, non-standard SMILES notations, and missing stereochemical information often introduce noise (Liu et al., 2021).
Limited data for rare targets cause data imbalance, making models favor well-studied proteins while underperforming on novel ones.
Solution: Standardization using FAIR principles (Findable, Accessible, Interoperable, Reusable) and data-cleaning AI tools.
6.2 Model and Algorithmic Limitations
Deep-learning models are often “black boxes”, offering little interpretability (Wires et al., 2023).
Medicinal chemists and regulators require transparent reasoning to trust AI decisions.
Additionally, overfitting where models perform well on training but fail on unseen data is a persistent problem (Ekins et al., 2016).
Model generalization, explainable AI (XAI), and rigorous validation can help overcome these issues.
Solution: Use of interpretable models (SHAP, LIME), external validation, and diverse datasets.
6.3 Validation and Reproducibility
Many AI predictions lack experimental validation or benchmarking. Different research groups use inconsistent metrics, hindering reproducibility (Huang et al., 2022). Models trained on one dataset may not transfer to new targets or chemical spaces.
Solution: Use standardized benchmarking platforms like Therapeutics Data Commons (TDC) and share open-source models for transparency.
6.4 Ethical and Regulatory Concerns
AI introduces privacy and ownership challenges who owns the data or AI-generated molecules remains unclear (Rieke et al., 2020). Moreover, biased training data may lead to unfair predictions. Regulatory authorities such as EMA and FDA are still developing AI-specific guidelines (European Medicines Agency, 2024).
Solution: Implement ethical AI principles transparency, accountability, and fairness.
7.5 Technical and Economic Barriers
AI model training requires powerful computational resources, which may be expensive for smaller labs (Ali et al., 2025). Integration across different bioinformatics and chemistry tools also remains complex due to lack of interoperability (Kim et al., 2021).
Solution: Use of cloud computing and public–private collaborations.
7. FUTURE PERSPECTIVES AND CONCLUSION
7.1 Future Perspectives
Artificial Intelligence (AI) is evolving rapidly, and its future integration into drug discovery will focus on next-generation computational paradigms, multi-omics integration, and precision medicine. These advancements will make AI systems more intelligent, explainable, and clinically relevant.
7.1.1 Next-Generation AI Technologies
Emerging technologies such as quantum computing, graph transformers, and federated learning are expected to redefine computational efficiency and security.
7.1.2 Multi-Omics and Systems Biology Integration
AI will increasingly merge genomics, proteomics, transcriptomics, and metabolomics to create unified disease models. This multi-omics integration helps reveal molecular pathways, predict biomarkers, and identify precise therapeutic targets (Han et al., 2023). AI-based integration platforms like PandaOmics and DeepTarget are already demonstrating such capabilities (Dharmasivam et al., 2025).
7.1.3 Predictive Toxicology and Safety Profiling
Deep-learning models will play a growing role in predictive toxicology, detecting off-target and long-term safety concerns before clinical testing.
Tools like DeepTox and ADMETlab 2.0 have shown 85–90% prediction accuracy (Dong et al., 2021). Integration of AI-driven imaging and omics-based biomarkers will make safety evaluation faster and more reliable.
7.1.4 Personalized and Precision Medicine
AI is expected to enable individualized drug design by integrating patient-specific genomics, proteomics, and clinical data. Predictive models will identify the most effective and safest drug for each patient, revolutionizing therapy optimization (Ali et al., 2025). This shift will reduce trial-and-error prescriptions and improve overall therapeutic outcomes.
7.1.5 Regulatory and Ethical Evolution
Regulatory bodies like the FDA and EMA are developing frameworks for AI transparency, validation, and model monitoring. By 2030, AI-driven submissions are expected to become routine in new drug applications (European Medicines Agency, 2024). Ethical frameworks promoting fairness, data security, and explainability will ensure responsible use.
7.2 Conclusion
AI has revolutionized the pharmaceutical research landscape, offering unprecedented capabilities in target identification, molecule design, lead optimization, clinical development, and drug repurposing.
Through techniques such as machine learning, deep learning, NLP, reinforcement learning, and graph neural networks, drug discovery has shifted from intuition-based to data-driven science.
However, challenges persist in data quality, interpretability, validation, and regulation, which must be addressed to ensure sustainable progress.
Integrating AI with classical computational chemistry, cloud infrastructure, and standardized data systems will foster reproducibility and trust.
Looking ahead, the synergy of AI, quantum computing, and multi-omics will enable faster, safer, and more cost-effective discovery pipelines.
In India and globally, expanding access to open-source datasets, interdisciplinary training, and ethical governance will determine how successfully AI transforms drug discovery into a truly intelligent and personalized science.
REFERENCES
Sanika Kamble, Sneha Wavdane, Vishal Mote, Dhanraj Jadge, The Role of Artificial Intelligence in Drug Discovery, Int. J. of Pharm. Sci., 2025, Vol 3, Issue 12, 1171-1190. https://doi.org/10.5281/zenodo.17840354
10.5281/zenodo.17840354