The Role of Artificial Intelligence in Drug Discovery

Sanika Kamble, Sneha Wavdane, Vishal Mote, Dhanraj Jadge,

doi:10.5281/zenodo.17840354

Review Paper | Open Access
Volume 03 | Issue 12 | Article Id IJPS/250312121

The Role of Artificial Intelligence in Drug Discovery
Sanika Kamble* Sneha Wavdane Vishal Mote Dhanraj Jadge
Women’s College of Pharmacy, Peth Vadgaon, India.

Abstract

Drug discovery is traditionally a prolonged, costly, and high-risk enterprise, often exceeding ten years and billions of dollars for a single new molecular entity. Artificial Intelligence (AI) has emerged as a transformative technology that can dramatically shorten timelines, reduce costs, and enhance predictive accuracy throughout pharmaceutical research and development. AI algorithms particularly machine learning (ML), deep learning (DL), natural language processing (NLP), and reinforcement learning (RL) enable the mining and interpretation of massive biochemical, genomic, and clinical datasets. These capabilities support target identification, hit discovery, lead optimization, and clinical trial design. This review summarizes the evolution of AI in pharmaceutical sciences, outlines the principal algorithms and architectures currently used, and evaluates applications across the modern drug-discovery pipeline. Special attention is given to AI-based target discovery from multi-omics data, structure-based virtual screening, predictive ADME/Tox modeling, and data-driven drug repurposing. Representative industrial case studies, including AlphaFold, Atomwise, BenevolentAI, and Insilico Medicine, demonstrate the translation of AI concepts into tangible therapeutic candidates. The article also critically discusses persistent challenges data bias, model transparency, reproducibility, and regulatory acceptance and highlights emerging solutions such as explainable AI, federated learning, and quantum-enhanced modeling. AI is not a substitute for experimental science but an indispensable complement that can guide and accelerate hypothesis generation, compound design, and clinical validation. The integration of AI with computational chemistry and biological experimentation heralds a new era of precision-driven, cost-effective, and patient-centered drug discovery.

Keywords

Artificial Intelligence; Drug Discovery; Machine Learning; Deep Learning; Virtual Screening; Computational Drug Design; Drug Repurposing

Introduction

1.1 Context and Rationale

Drug discovery is an inherently complex, iterative process involving target identification, hit discovery, lead optimization, preclinical assessment, and multiple clinical phases. On average, the journey from concept to market requires 10–15 years and an investment exceeding USD 2 billion, with an overall success rate below 10 % (Oliveira et al., 2023). Each failure at late stages adds enormous financial and ethical costs. The pharmaceutical industry therefore faces an urgent need for technologies that can improve prediction accuracy, minimize attrition, and compress development timelines.

Artificial Intelligence (AI) defined as computational systems capable of learning patterns, reasoning, and decision-making offers a paradigm shift from empirical trial-and-error to data-driven prediction and optimization. Advances in cloud computing, algorithmic design, and high-performance GPUs have allowed AI to analyze complex biological and chemical datasets at unprecedented scale (Kim et al., 2021). In silico modeling empowered by AI now complements, and sometimes precedes, traditional wet-lab experimentation.

1.2 Limitations of Conventional Approaches

Traditional high-throughput screening (HTS) can evaluate only a fraction of the available chemical space estimated at 10?? possible molecules leaving vast opportunities unexplored. Classical computer-aided drug design (CADD) methods such as molecular docking and QSAR depend on human-defined descriptors and simple statistical correlations, which may fail to capture nonlinear relationships among molecular features. Moreover, manual curation of biochemical data is laborious and prone to bias or incompleteness (Patel et al., 2020). Consequently, many candidate compounds progress through the pipeline with hidden liabilities that lead to failure in preclinical toxicology or clinical efficacy studies.

AI-based systems can integrate multidimensional data from genomics and transcriptomics to chemical and phenotypic screening extract hidden patterns, and learn predictive models that generalize beyond the training dataset (Kant et al., 2025). These systems not only accelerate early discovery but also inform later phases such as patient stratification and post-marketing safety surveillance.

Fig no.2: Comparison Between Traditional vs AI-Based Drug Discovery Pipelines

1.3 Scope and Objectives of the Review

This review provides a comprehensive assessment of AI’s role in contemporary drug discovery, aligning with the format of the Indian Journal of Pharmaceutical Sciences (IJPS). The objectives are to:

Trace the historical evolution of AI applications in medicinal chemistry and computational biology.
Describe major AI methodologies machine learning, deep learning, NLP, and reinforcement learning and their integration into the drug-discovery pipeline.
Illustrate stage-specific applications spanning target identification, hit discovery, lead optimization, preclinical development, and drug repurposing.
Present industrial and academic case studies showcasing successful AI-enabled innovations.
Analyze current limitations and ethical considerations, including data quality, interpretability, and regulatory frameworks.
Highlight future prospects, such as quantum computing, federated learning, and multi-omics integration, with emphasis on India’s growing contribution to AI-driven pharmaceutical research.

1.4 AI as an Enabler of Efficiency and Innovation

By automating routine analyses and prioritizing high-value hypotheses, AI can reduce experimental redundancy. For instance, ML-based QSAR models can predict biological activity, physicochemical properties, and toxicity profiles of untested compounds with high precision (Patel et al., 2020). Deep learning models have been reported to improve hit rates by 30–50 % in virtual screening compared to classical docking alone (Sumathi et al., 2023). In addition, AI-powered literature mining tools using NLP can rapidly extract drug–target associations from millions of publications, supporting knowledge synthesis that would otherwise take years of manual curation.

1.5 Synergy with Computational and Experimental Workflows

AI should be viewed as an integrative layer atop established computational and experimental methodologies. For example, predictive models can pre-filter compound libraries before molecular docking, reducing computation by orders of magnitude. Likewise, reinforcement learning can suggest chemical modifications that enhance potency or selectivity while preserving synthesizability (Tropsha et al., 2023). Ultimately, AI complements human expertise chemists interpret model outputs, design confirmatory experiments, and refine datasets iteratively.

1.6 Ethical, Regulatory, and Societal Dimensions

The increasing autonomy of AI systems in proposing chemical entities raises new regulatory and ethical questions: Who owns an AI-generated molecule? How can transparency and accountability be ensured when models act as “black boxes”? Regulatory agencies such as the EMA and FDA are developing frameworks to evaluate AI-based decision tools in drug development (European Medicines Agency, 2024). Establishing data governance, validation standards, and explainability metrics will be critical for safe deployment.

1.7 Outlook

With global investments and academic collaboration, AI-driven discovery is transitioning from proof-of-concept to industrial reality. In India, national initiatives in bioinformatics and computational chemistry, combined with rapidly expanding pharmaceutical R&D capacity, provide fertile ground for adoption. By 2030, AI-assisted platforms are projected to contribute substantially to lead identification and optimization pipelines across both multinational and domestic firms (Bhat et al., 2025).

Fig no.2: Workflow for AI-assisted data analysis

2. HISTORICAL BACKGROUND AND EVOLUTION

2.1 Early Computer-Aided Drug Design (CADD)

The conceptual foundations of AI in drug discovery can be traced to the rise of computer-aided drug design (CADD) in the 1960s–1980s, when chemists first employed computers to model molecular structures and predict biological activity.
Methods such as quantitative structure–activity relationship (QSAR) and molecular docking dominated early computational pharmacology (Martin, 1978). These techniques relied on human-defined molecular descriptors lipophilicity, polar surface area, hydrogen-bond counts and used linear regression or partial-least-squares models to correlate structure with potency or toxicity (Patel et al., 2020).

While revolutionary for their time, early CADD methods were limited by small datasets and simplified statistical assumptions. Most models failed to generalize to new chemical scaffolds, leading to poor external predictivity. Nevertheless, they laid crucial groundwork for data-driven reasoning in medicinal chemistry.

2.2 Expansion of Databases and Bioinformatics (1990s–2000s)

The late 1990s saw the explosion of bioinformatics and cheminformatics, driven by the Human Genome Project and advances in protein crystallography. Public repositories such as PubChem, ChEMBL, and Protein Data Bank (PDB) enabled large-scale data sharing. Algorithms such as BLAST and AutoDock allowed researchers to match sequences and simulate ligand–receptor binding (Morris et al., 2009).

However, these methods were largely rule-based, requiring extensive manual curation. Their predictive power depended on the scientist’s expertise in feature selection and statistical modeling. Computational cost and data scarcity further restricted their scope.

The emergence of support vector machines (SVMs) and random forests around 2000 marked the first true machine learning (ML) applications in drug design (Cortes & Vapnik, 1995; Chen et al., 2018). These algorithms could detect non-linear patterns in physicochemical descriptors, outperforming traditional regression methods.

2.3 Transition to Data-Driven AI (2010–2015)

From 2010 onward, exponential growth in computational power (notably GPU computing) and big-data availability transformed computational chemistry. Pharmaceutical companies began integrating ML to prioritize compounds before synthesis or screening (Ekins et al., 2016).

Large, annotated datasets bioassays, gene-expression profiles, clinical outcomes enabled supervised learning models to predict compound bioactivity and ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) profiles with increasing accuracy.
Simultaneously, unsupervised clustering was used to identify chemical subspaces associated with novel scaffolds (Hughes et al., 2011).

At this stage, the field started to shift from deterministic, rule-based modeling toward probabilistic and learning-based systems a precursor to modern AI.

2.4 Emergence of Deep Learning (2015–2020)

A critical milestone was the introduction of deep learning (DL) into molecular modeling. Inspired by advances in computer vision and natural-language processing, deep neural networks could learn hierarchical representations of chemical structures directly from raw data.

Convolutional Neural Networks (CNNs) were applied to 3-D voxelized representations of protein–ligand complexes to predict binding affinity (Ragoza et al., 2017).
Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) architectures modeled molecular sequences represented as SMILES strings, enabling de novo molecule generation (Segler et al., 2018).
Autoencoders compressed high-dimensional molecular data into latent spaces useful for property prediction and optimization.

These methods produced quantum leaps in predictive accuracy: models trained on millions of molecules achieved correlation coefficients (R²) > 0.8 for physicochemical property prediction (Sumathi et al., 2023).
The publication of AlphaFold2 by DeepMind (Jumper et al., 2021) marked another watershed moment accurately predicting protein 3-D structures at near-experimental resolution and enabling structure-based drug design for previously “undruggable” targets.

2.5 Graph Neural Networks and Generative AI (2020–Present)

The next revolution arrived with graph neural networks (GNNs) and generative models.
Unlike traditional neural networks that process images or text, GNNs operate directly on molecular graphs atoms as nodes, bonds as edges capturing connectivity and chemical context (Zhang et al., 2025).

These models outperform descriptor-based QSAR in property prediction, toxicity forecasting, and synthesis planning.

In parallel, Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs) emerged for de novo molecular design (Bian & Xie, 2020).

By optimizing reward functions such as docking scores or ADMET metrics, reinforcement learning (RL) agents could propose novel compounds with targeted pharmacological profiles.

Commercial adoption accelerated:

Atomwise employed CNNs to screen over 10 billion molecules for Ebola and COVID-19 inhibitors (Smith et al., 2022).
Insilico Medicine advanced the AI-designed fibrosis drug INS018055 to Phase II trials within four years (Dharmasivam et al., 2025).
BenevolentAI identified Baricitinib for COVID-19 repurposing within weeks, validated clinically soon after (Stebbing et al., 2020).

These examples validated AI’s capacity to compress discovery timelines from years to months.

2.6 Integration with Cloud Computing and Big Data

Cloud platforms such as Google Colab, AWS SageMaker, and Azure ML democratized access to high-performance computation.

Simultaneously, open-source frameworks (TensorFlow, PyTorch, DeepChem) facilitated collaborative development and reproducibility.
The fusion of AI with big-data analytics enabled pharmaceutical consortia to analyze billions of compound–target pairs, accelerating lead identification and optimization (Ali et al., 2025).

The shift toward cloud-based, collaborative AI ecosystems now underpins virtually every major pharmaceutical R&D strategy, making data integration and interoperability key priorities.

3. AI TECHNIQUES USED IN DRUG DISCOVERY

Artificial intelligence is not a single algorithm but a family of computational strategies capable of perceiving patterns, learning from data, and making predictions. In drug discovery, AI techniques can be broadly grouped into machine learning (ML), deep learning (DL), natural language processing (NLP), and reinforcement learning (RL). Each approach contributes unique capabilities from predicting molecular properties to generating novel chemical entities.

3.1 Machine Learning (ML)

Machine learning encompasses algorithms that infer relationships between molecular descriptors and biological activity from existing data.
Traditional ML workflows involve data preprocessing, feature extraction, model training, and validation. Input data may include molecular fingerprints, 2-D descriptors (e.g., Log P, hydrogen bond count), or physicochemical properties derived from cheminformatics software such as PaDEL or RDKit.

Common ML algorithms

Linear and logistic regression – establish quantitative structure–activity relationships (QSAR/QSPR).
Support Vector Machines (SVMs) – classify active/inactive compounds based on non-linear kernel functions (Cortes & Vapnik, 1995).
Random Forests (RF) – ensembles of decision trees effective for noisy, high-dimensional datasets (Breiman, 2001).
Gradient-Boosting and XGBoost – provide superior accuracy and handle missing data efficiently (Chen & Guestrin, 2016).
k-Nearest Neighbors (k-NN) – simple distance-based predictor for small datasets.

ML techniques remain widely used for activity prediction, ADMET modeling, toxicity classification, and compound clustering. For example, Patel et al. (2020) reported that an SVM-based QSAR model predicted hERG channel inhibition with an R² = 0.78, outperforming multiple linear regression (R² = 0.54).

Key advantages include interpretability, modest computational demand, and compatibility with relatively small datasets. However, ML models depend heavily on descriptor quality and may fail for chemotypes outside the training domain (Ekins et al., 2016).

3.2 Deep Learning (DL)

Deep learning extends ML by using multi-layered artificial neural networks capable of automatically extracting hierarchical features from raw molecular or biological data.

3.2.1 Convolutional Neural Networks (CNNs)

CNNs, widely used in image recognition, can treat molecules or protein–ligand complexes as 3-D images.
Ragoza et al. (2017) trained CNNs on voxelized docking grids and achieved higher enrichment factors than AutoDock’s empirical scoring functions.
CNNs can also process 2-D molecular graphs represented as adjacency matrices to learn structural motifs associated with activity.

3.2.2 Recurrent Neural Networks (RNNs) and LSTM

RNNs handle sequential data such as SMILES strings. By learning chemical syntax, they generate novel, syntactically valid molecules.
Segler et al. (2018) demonstrated that an LSTM trained on 1.5 million drug-like molecules produced new compounds with 70 % validity and comparable diversity to the training set.
RNNs also assist in retrosynthetic prediction suggesting reaction sequences leading to a target molecule.

3.2.3 Graph Neural Networks (GNNs)

GNNs represent molecules as graphs, naturally encoding atomic connectivity and chemical context. Each node (atom) aggregates information from neighboring nodes through message-passing operations (Zhang et al., 2025). GNNs achieve state-of-the-art accuracy in property prediction, outperforming classical descriptor-based QSAR. For example, the MPNN (Message-Passing Neural Network) framework reported a mean absolute error < 0.3 log units in solubility prediction (Gilmer et al., 2017). Beyond small molecules, Graph Attention Networks (GATs) and Graph Transformer models are used to model protein–ligand interactions, enabling structure-based virtual screening.

Fig no.3: The Deep Learning or GNN subsection.

3.2.4 Autoencoders and Variational Autoencoders (VAEs)

Autoencoders compress molecular information into low-dimensional latent spaces. Manipulating these latent vectors allows exploration of chemical space for optimized properties (Gómez-Bombarelli et al., 2018).

VAEs and Generative Adversarial Networks (GANs) are foundational for de novo drug design (Bian & Xie, 2020).

Advantages:

Automatically learn high-level chemical representations.
Capture complex non-linear relationships.
Enable de novo design, property optimization, and multitask learning.

Limitations:

Require large, curated datasets.
Computationally intensive and less interpretable (“black-box”).

3.3 Natural Language Processing (NLP)

NLP applies AI to unstructured text such as biomedical literature, patents, and clinical reports.
With tens of millions of articles in PubMed, manual curation of drug–target data is impractical. NLP automates knowledge extraction and hypothesis generation.

3.3.1 Text Mining and Named Entity Recognition

Systems like PubTator, ChemSpot, and BioBERT identify and categorize entities drugs, targets, pathways from raw text (Lee et al., 2020).
These tools enable construction of knowledge graphs linking diseases, genes, and compounds (Liu et al., 2021).

3.3.2 Language Models for Drug Discovery

Recent transformer models (e.g., BERT, GPT-based architectures) fine-tuned on chemical corpora can interpret relationships between molecular mentions and biological outcomes.
Such models support automated literature review, adverse-event detection, and drug repurposing by correlating textual evidence across multiple sources (Bhat et al., 2025).

In 2023, NLP-driven analysis of COVID-19 literature successfully identified potential inhibitors of the SARS-CoV-2 main protease within days (Sadybekov et al., 2023).

3.4 Reinforcement Learning (RL) and Generative AI

RL introduces an agent environment framework, where the model learns by receiving rewards for actions that lead to desirable outcomes (Sutton & Barto, 2018).

In drug discovery, the environment represents chemical space, and actions correspond to molecular modifications.

3.4.1 Molecule Generation and Optimization

RL agents optimize molecular properties by sequentially modifying structures. For instance, the REINVENT algorithm uses an RNN generator guided by a reward function combining predicted activity and drug-likeness (Olivecrona et al., 2017).

In 2022, Zhavoronkov et al. reported that AI-generated compounds against fibrosis (via RL) progressed from concept to preclinical testing within 18 months.

3.4.2 Reward Functions and Constraints

Reward functions may include docking scores, synthetic accessibility, or toxicity penalties. Properly balancing these terms is critical; overly narrow rewards can produce unrealistic molecules.
Modern frameworks integrate multi-objective optimization simultaneously maximizing potency while minimizing toxicity (Tropsha et al., 2023).

3.4.3 Applications in Synthesis Planning

RL also underpins retrosynthetic analysis, where agents learn optimal reaction sequences to synthesize target molecules efficiently (Segler et al., 2018). Such tools have been commercialized in platforms like Synthia and AiZynthFinder.

3.5 Hybrid and Ensemble Approaches

Real-world drug discovery increasingly employs hybrid models that combine multiple AI paradigms or couple AI with physics-based simulations.
Examples include:

DL + Docking – CNNs pre-rank ligands before molecular docking (Zhou et al., 2024).
ML + Molecular Dynamics – predictive ML models accelerate conformational sampling.
Ensemble Learning – integrates results from RF, SVM, and DNN to improve robustness (Chakraborty et al., 2024).

Such multimodal systems achieve higher predictive accuracy and greater generalization across chemical classes.

3.6 Comparative Summary:

Technique	Typical Input	Primary Application	Strengths	Limitations
Machine Learning	Molecular descriptors	QSAR, ADMET prediction	Interpretable, fast	Limited to known chemotypes
Deep Learning (CNN/RNN/GNN)	Raw structural/ sequence data	Activity prediction, molecule generation	High accuracy, feature automation	Data-hungry, low interpretability
NLP	Scientific text, patents	Data extraction, drug repurposing	Exploits unstructured data	Requires domain-specific training
Reinforcement Learning / Generative	Molecular graphs or SMILES	De novo design, optimization	Creates novel scaffolds	Reward design complexity
Hybrid / Ensemble	Multi-modal	Integration & validation	Synergy, robustness	Complex implementation

4. AI IN STAGES OF DRUG DISCOVERY

Artificial Intelligence has become embedded at nearly every stage of the modern drug discovery pipeline from early target identification to post-market surveillance. Its ability to mine multi-omics data, predict structure–activity relationships, and optimize candidate selection has transformed pharmaceutical R&D into a data-driven endeavor (Bhat et al., 2025). The key applications across each stage are summarized below.

4.1 Target Identification and Validation

Target identification involves recognizing biological macromolecules proteins, enzymes, receptors, or genes that play a critical role in disease pathophysiology. Traditional identification relied on laborious biochemical screening or genetic manipulation, often producing false positives or missing context-specific targets.

AI now allows the integration of multi-omics datasets genomics, transcriptomics, proteomics, metabolomics to uncover new druggable targets.

Machine learning–based omics analysis: ML algorithms can analyze expression profiles and pinpoint differentially regulated genes associated with disease states (Chen et al., 2018).
Network-based learning: AI tools such as DeepTarget and DeepPharma model biological interaction networks, identifying essential nodes and pathways (Li et al., 2022).
Predictive genomics: Deep learning models trained on CRISPR screening data can predict synthetic lethality pairs and disease–gene relationships (Han et al., 2023).

For instance, DeepMind’s AlphaFold2 predicted over 200 million protein structures, enabling structure-based target selection even in the absence of crystallographic data (Jumper et al., 2021).

Similarly, AI-driven transcriptome analysis by Insilico Medicine led to discovery of a novel fibrosis target (TGFB1-related) in less than 3 months (Dharmasivam et al., 2025).

Impact: AI enhances reliability in target selection, minimizes off-target effects, and allows early hypothesis generation before costly experimentation.

4.2 Hit Identification (Virtual Screening and QSAR Modeling)

Once a validated target is known, identifying chemical “hits” that interact with it effectively is the next challenge. Historically, high-throughput screening (HTS) tested millions of compounds experimentally expensive and inefficient.
AI-driven virtual screening (VS) can computationally screen billions of molecules within days (Zhou et al., 2024).

4.2.1 AI-Accelerated Virtual Screening

Deep learning and convolutional networks improve virtual screening by predicting binding affinities based on structural features.

AtomNet (Atomwise Inc.) applies CNNs to predict protein–ligand interactions directly from 3-D grids of atoms (Wallach et al., 2015).
RosettaVS integrates ML-based scoring functions with molecular docking, reducing screening time for 1 billion compounds from months to days (Zhou et al., 2024).

Such platforms achieved enrichment factors up to 6× greater than traditional docking alone.

4.2.2 Quantitative Structure–Activity Relationship (QSAR) and ML

AI-enhanced QSAR models predict compound activity or toxicity based on descriptors. For example:

Random forest QSAR models reached accuracies >85 % for anti-HIV activity prediction (Patel et al., 2020).
Gradient boosting and deep neural networks outperform traditional QSAR by handling complex, non-linear interactions (Chakraborty et al., 2024).

4.2.3 Ligand-Based Similarity and Active Learning

Active-learning frameworks iteratively train models as new assay data emerges, refining predictions dynamically (Ekins et al., 2016).
AI can also predict polypharmacology the likelihood of a compound interacting with multiple targets helpful in designing multi-target drugs for complex diseases (Zhang et al., 2025).

Impact: Hit identification now focuses on quality over quantity, narrowing billions of possibilities to a few hundred prioritized candidates for synthesis and testing.

4.3 Lead Optimization

After promising hits are discovered, medicinal chemists refine them for potency, selectivity, solubility, and pharmacokinetic properties. AI enhances this optimization by predicting key molecular properties without exhaustive synthesis.

4.3.1 Predictive ADMET and Toxicity Models

Deep learning models predict absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties with high precision:

GNN-based models predict solubility and permeability with correlation coefficients >0.85 (Gilmer et al., 2017).
AI frameworks such as DeepTox and ADMETlab 2.0 classify toxicological risk using thousands of historical compounds (Dong et al., 2021).
Ensemble ML models reduce false positives in toxicity assessment by 30 % compared to single algorithms (Chakraborty et al., 2024).

4.3.2 Structure–Property Relationship Prediction

AI can model the impact of small structural modifications (bioisosteres, substitutions) on activity and drug-likeness. Reinforcement learning (RL) algorithms optimize molecules iteratively, maximizing binding affinity while minimizing toxicity (Olivecrona et al., 2017).

4.3.3 Multi-Objective Optimization

Modern RL frameworks employ multi-objective reward functions, simultaneously balancing potency, lipophilicity, and synthetic feasibility. The REINVENT system by AstraZeneca generated leads with improved pharmacological balance using such methods (Popova et al., 2018).

Impact: Lead optimization cycles that once required months can now be simulated virtually, saving experimental effort and reducing attrition before preclinical testing.

4.4 Preclinical and Clinical Development

AI plays a crucial role in bridging discovery and development phases.

4.4.1 Preclinical Modelling

AI models forecast pharmacokinetics (PK) and pharmacodynamics (PD) based on in vitro or animal data.

In silico toxicology using DL networks predicts off-target effects and drug–drug interactions (Vamathevan et al., 2019).

AI-aided molecular dynamics (MD) simulations allow efficient modeling of ligand binding and protein flexibility.

4.4.2 Clinical Trial Design and Patient Stratification

In clinical development, AI assists in trial design, patient recruitment, and risk assessment:

Predictive models analyze electronic health records to identify optimal patient subpopulations (Ali et al., 2025).
AI-based trial simulators can forecast likely outcomes, reducing the number of required participants (Kim et al., 2021).
Natural language processing (NLP) aids pharmacovigilance by automatically detecting adverse-event reports (Lee et al., 2020).

4.4.3 Biomarker and Safety Prediction

ML algorithms correlate multi-omics and imaging data to discover biomarkers predictive of therapeutic response or toxicity.

For example, AI-based image classifiers have identified subtle cardiac anomalies linked to drug-induced QT prolongation (Serafim et al., 2023).

Impact: AI reduces preclinical animal use, optimizes trial design, and enhances safety monitoring shortening timelines from discovery to approval.

4.5 Drug Repurposing (Repositioning)

Drug repurposing seeks new therapeutic uses for approved or shelved compounds an area where AI has excelled due to data abundance and reduced regulatory barriers.

4.5.1 Network-Based and Similarity Models

AI constructs drug–target–disease networks, using graph theory to identify repositioning opportunities. Machine learning models evaluate chemical, genomic, and phenotypic similarities among compounds (Tanoli et al., 2021).

For example, BenevolentAI used its knowledge-graph platform to predict Baricitinib as an inhibitor of COVID-19 inflammatory pathways a prediction later validated clinically (Stebbing et al., 2020).

4.5.2 NLP-Assisted Repurposing

NLP tools scan biomedical literature and clinical trial registries for overlooked associations.
During the pandemic, AI-based mining of open-access literature suggested over 100 potential antivirals within weeks (Sadybekov et al., 2023).

4.5.3 Case Studies

Thalidomide, initially withdrawn for teratogenicity, was rediscovered via AI-supported network analysis for multiple myeloma.
Metformin, originally an antidiabetic agent, was AI-predicted to exhibit anticancer and anti-aging properties (Cortial et al., 2024).

4.5.4 Benefits

Repurposing saves up to 60–70 % of development time and cost, as safety data already exist (Wan et al., 2025). AI-driven prediction further enhances precision, increasing the probability of clinical success.

4.6 Post-Marketing Surveillance and Pharmacovigilance

AI continues to add value even after drug approval. ML algorithms monitor real-world data electronic health records, social media, and spontaneous reporting systems for adverse events or efficacy trends (European Medicines Agency, 2024). Natural language models identify under-reported side effects, enabling proactive safety updates.

Impact: Continuous AI monitoring ensures patient safety, regulatory compliance, and improved lifecycle management.

5. INTEGRATION OF AI WITH COMPUTATIONAL TOOLS

Artificial intelligence (AI) does not function in isolation it amplifies the power of existing in silico methodologies such as molecular docking, pharmacophore modeling, and quantitative structure–activity relationship (QSAR) analyses. The integration of AI with classical computational chemistry has created hybrid workflows capable of accelerating discovery, improving accuracy, and reducing experimental dependency (Friesner et al., 2020; Dharmasivam et al., 2025).

5.1 AI-Enhanced Molecular Docking and Virtual Screening

Molecular docking predicts how a small molecule interacts with a biological target, evaluating both the orientation and binding energy of the complex. Traditional docking algorithms, such as AutoDock and Glide, rely on physics-based scoring functions that sometimes oversimplify real molecular interactions.

AI has enhanced docking by improving scoring accuracy, pose prediction, and screening efficiency.

5.1.1 Machine Learning–Based Scoring Functions

Machine learning (ML) models trained on empirical binding data can replace or augment conventional scoring functions.

NNScore and RF-Score employ neural networks and random forests to predict binding affinity directly from molecular features, yielding higher correlation (R² = 0.75–0.85) with experimental measurements than classical scoring (Ballester & Mitchell, 2010).
DeepDock and OnionNet apply convolutional neural networks (CNNs) to 3D protein–ligand interaction maps, outperforming Glide and AutoDock Vina in pose ranking (Wang et al., 2020).

5.1.2 Accelerated Virtual Screening

AI pre-filters compound libraries before docking, reducing computational burden by up to 90 %. Deep learning classifiers rapidly exclude molecules predicted to have poor binding or ADMET properties (Zhou et al., 2024).

For example, RosettaVS, integrating ML with Rosetta docking, successfully screened 1.2 billion molecules in under 60 hours (Zhou et al., 2024).
These approaches significantly enhance throughput, enabling ultra-large-scale virtual screening for neglected or rare-disease targets.

5.1.3 Generative Docking

Reinforcement learning (RL) combined with docking feedback enables de novo molecule generation guided by binding affinity scores. AI iteratively refines chemical structures toward improved poses and interactions (Olivecrona et al., 2017).

Impact: AI-docking integration merges physics-based rigor with predictive flexibility, achieving better ranking accuracy and drastically reducing computational time.

5.2 AI-Assisted QSAR and QSPR Modeling

The QSAR (Quantitative Structure–Activity Relationship) and QSPR (Quantitative Structure–Property Relationship) frameworks correlate molecular features with biological activity or physicochemical properties. While QSAR has been foundational since the 1960s, classical linear models often fail for non-linear relationships or diverse chemotypes.

AI overcomes these limitations by learning directly from raw data or molecular representations without pre-defined descriptors.

5.2.1 Descriptor-Free Learning

Graph neural networks (GNNs) and message-passing networks (MPNNs) can directly process molecular graphs, eliminating the need for manually calculated descriptors (Gilmer et al., 2017).

These models capture electronic and topological effects, enabling generalizable predictions across novel scaffolds.

5.2.2 Transfer and Multitask Learning

AI enables multitask QSAR, where models learn multiple properties (e.g., potency, solubility, toxicity) simultaneously, improving data efficiency (Mayr et al., 2018).

Transfer learning allows models trained on large datasets to adapt to new targets with limited data a key advantage in orphan disease research (Yang et al., 2021).

5.2.3 Hybrid QSAR Pipelines

AI-QSAR workflows now integrate both data-driven and mechanistic elements:

Use ML for descriptor selection and dimensionality reduction.
Employ DL models for non-linear mapping.
Validate predictions using molecular docking and experimental bioassays.

Such pipelines are implemented in platforms like ADMETlab 2.0 and ChemProp, routinely used by industry and academia (Dong et al., 2021).

Impact: AI-QSAR models demonstrate higher accuracy (average R² > 0.85) and external predictivity (Q² > 0.7), reducing the need for exhaustive experimental screening.

5.3 AI in Pharmacophore Modeling and 3D Alignment

Pharmacophore modeling identifies the essential chemical features required for biological activity. Traditional models relied on geometric heuristics, but AI has enhanced both feature extraction and 3D alignment accuracy.

DeepPharm employs convolutional networks to automatically learn pharmacophoric patterns from active ligands (Han et al., 2023).
ML-based similarity scoring (Siamese networks) improves ligand alignment accuracy compared with manual feature-based matching (Wójcikowski et al., 2021).

AI-driven pharmacophore searches can now process millions of ligands within hours, identifying bioactive scaffolds even when receptor structures are unknown.

5.4 Integration with Molecular Dynamics (MD) Simulations

Molecular dynamics (MD) simulations provide atomic-level insights into biomolecular behavior but are computationally intensive. AI is revolutionizing MD analysis and prediction.

5.4.1 Surrogate Models for Free Energy Calculations

AI surrogate models predict free-energy profiles and conformational landscapes, reducing simulation time by 100–1000× (Noé et al., 2020).
Graph-based neural potentials (GNPs) approximate quantum mechanics-level accuracy for binding free energy estimation.

5.4.2 Enhanced Sampling and Trajectory Prediction

Deep learning can learn collective variables from simulation data, accelerating conformational sampling. The DeepMD and TorchMD frameworks exemplify this hybridization, coupling neural networks with classical MD (Tholke et al., 2022).

Impact: AI-guided MD yields faster, more accurate insight into protein flexibility, ligand binding, and allosteric regulation vital for rational drug design.

5.5 AI and Cloud-Based Computational Platforms

Cloud computing has democratized access to powerful computational resources, enabling scalable AI–chemistry integration.

5.5.1 Scalable Infrastructure

Platforms such as AWS SageMaker, Google Vertex AI, and Microsoft Azure ML host molecular datasets, pre-trained models, and high-performance GPUs. These enable collaborative model training and reproducibility (Ali et al., 2025).

5.5.2 Federated and Distributed Learning

Pharmaceutical companies often cannot share raw data due to confidentiality. Federated learning enables joint model training across distributed datasets without centralizing data, preserving privacy (Rieke et al., 2020). Such approaches are used in oncology and rare disease research to aggregate insights across institutions securely.

5.5.3 Integration with Big Data and Databases

AI platforms interface directly with major databases (ChEMBL, PubChem, DrugBank, PDB) for real-time updates and automated data cleaning.
Combining big data analytics and AI accelerates pattern recognition and hypothesis generation (Kim et al., 2021).

Impact: Cloud-based AI ecosystems reduce hardware barriers, promote collaboration, and ensure continuous model improvement across global research networks.

5.6 AI in Data Mining and Cheminformatics

Cheminformatics forms the backbone of AI-driven discovery by curating, annotating, and standardizing molecular data. AI improves every stage of data management:

Data Curation: NLP algorithms detect and correct inconsistent nomenclature in public datasets (Liu et al., 2021).
Outlier Detection: Unsupervised ML identifies erroneous or duplicate entries.
Feature Extraction: Autoencoders and transformers automatically generate molecular embeddings capturing hidden chemical attributes (Gómez-Bombarelli et al., 2018).
Predictive Analytics: Knowledge graphs integrate chemical, biological, and clinical relationships for hypothesis-driven searches (Kohli et al., 2022).

These innovations ensure that AI models are built on robust, standardized, and reproducible datasets critical for regulatory acceptance and scientific validity.

5.7 Synergy Between AI and Classical Computational Approaches

AI augments but does not replace classical computational drug design. While physics-based simulations offer interpretability and mechanistic understanding, AI provides speed and scalability.

Example Workflow:

AI pre-filters candidates via QSAR/DL predictions.
Selected molecules undergo molecular docking and MD refinement.
Free-energy predictions (FEP/AI-corrected) evaluate thermodynamic feasibility.
Experimental assays validate top candidates.

This hybrid cycle ensures both efficiency and credibility, forming the backbone of modern AI-driven discovery.

6. CHALLENGES AND LIMITATIONS

Despite tremendous progress, Artificial Intelligence (AI) in drug discovery still faces several limitations that restrict its large-scale, regulatory, and clinical implementation.

6.1 Data-Related Challenges

AI relies heavily on large, high-quality datasets. However, most pharmaceutical data are incomplete, inconsistent, or biased, leading to unreliable predictions (Vamathevan et al., 2019).
Differences in assay conditions, non-standard SMILES notations, and missing stereochemical information often introduce noise (Liu et al., 2021).

Limited data for rare targets cause data imbalance, making models favor well-studied proteins while underperforming on novel ones.

Solution: Standardization using FAIR principles (Findable, Accessible, Interoperable, Reusable) and data-cleaning AI tools.

6.2 Model and Algorithmic Limitations

Deep-learning models are often “black boxes”, offering little interpretability (Wires et al., 2023).
Medicinal chemists and regulators require transparent reasoning to trust AI decisions.
Additionally, overfitting where models perform well on training but fail on unseen data is a persistent problem (Ekins et al., 2016).
Model generalization, explainable AI (XAI), and rigorous validation can help overcome these issues.

Solution: Use of interpretable models (SHAP, LIME), external validation, and diverse datasets.

6.3 Validation and Reproducibility

Many AI predictions lack experimental validation or benchmarking. Different research groups use inconsistent metrics, hindering reproducibility (Huang et al., 2022). Models trained on one dataset may not transfer to new targets or chemical spaces.

Solution: Use standardized benchmarking platforms like Therapeutics Data Commons (TDC) and share open-source models for transparency.

6.4 Ethical and Regulatory Concerns

AI introduces privacy and ownership challenges who owns the data or AI-generated molecules remains unclear (Rieke et al., 2020). Moreover, biased training data may lead to unfair predictions. Regulatory authorities such as EMA and FDA are still developing AI-specific guidelines (European Medicines Agency, 2024).

Solution: Implement ethical AI principles transparency, accountability, and fairness.

7.5 Technical and Economic Barriers

AI model training requires powerful computational resources, which may be expensive for smaller labs (Ali et al., 2025). Integration across different bioinformatics and chemistry tools also remains complex due to lack of interoperability (Kim et al., 2021).

Solution: Use of cloud computing and public–private collaborations.

7. FUTURE PERSPECTIVES AND CONCLUSION

7.1 Future Perspectives

Artificial Intelligence (AI) is evolving rapidly, and its future integration into drug discovery will focus on next-generation computational paradigms, multi-omics integration, and precision medicine. These advancements will make AI systems more intelligent, explainable, and clinically relevant.

7.1.1 Next-Generation AI Technologies

Emerging technologies such as quantum computing, graph transformers, and federated learning are expected to redefine computational efficiency and security.

Quantum computing can process vast chemical spaces simultaneously, enabling ultra-fast molecular property prediction (Bian & Xie, 2020).
Federated learning allows pharmaceutical companies to collaboratively train AI models without sharing sensitive data, ensuring privacy (Rieke et al., 2020).
Graph transformer networks combine structural and contextual learning, enhancing molecular interaction prediction (Zhang et al., 2025).

7.1.2 Multi-Omics and Systems Biology Integration

AI will increasingly merge genomics, proteomics, transcriptomics, and metabolomics to create unified disease models. This multi-omics integration helps reveal molecular pathways, predict biomarkers, and identify precise therapeutic targets (Han et al., 2023). AI-based integration platforms like PandaOmics and DeepTarget are already demonstrating such capabilities (Dharmasivam et al., 2025).

7.1.3 Predictive Toxicology and Safety Profiling

Deep-learning models will play a growing role in predictive toxicology, detecting off-target and long-term safety concerns before clinical testing.
Tools like DeepTox and ADMETlab 2.0 have shown 85–90% prediction accuracy (Dong et al., 2021). Integration of AI-driven imaging and omics-based biomarkers will make safety evaluation faster and more reliable.

7.1.4 Personalized and Precision Medicine

AI is expected to enable individualized drug design by integrating patient-specific genomics, proteomics, and clinical data. Predictive models will identify the most effective and safest drug for each patient, revolutionizing therapy optimization (Ali et al., 2025). This shift will reduce trial-and-error prescriptions and improve overall therapeutic outcomes.

7.1.5 Regulatory and Ethical Evolution

Regulatory bodies like the FDA and EMA are developing frameworks for AI transparency, validation, and model monitoring. By 2030, AI-driven submissions are expected to become routine in new drug applications (European Medicines Agency, 2024). Ethical frameworks promoting fairness, data security, and explainability will ensure responsible use.

7.2 Conclusion

AI has revolutionized the pharmaceutical research landscape, offering unprecedented capabilities in target identification, molecule design, lead optimization, clinical development, and drug repurposing.
Through techniques such as machine learning, deep learning, NLP, reinforcement learning, and graph neural networks, drug discovery has shifted from intuition-based to data-driven science.

However, challenges persist in data quality, interpretability, validation, and regulation, which must be addressed to ensure sustainable progress.
Integrating AI with classical computational chemistry, cloud infrastructure, and standardized data systems will foster reproducibility and trust.

Looking ahead, the synergy of AI, quantum computing, and multi-omics will enable faster, safer, and more cost-effective discovery pipelines.
In India and globally, expanding access to open-source datasets, interdisciplinary training, and ethical governance will determine how successfully AI transforms drug discovery into a truly intelligent and personalized science.

REFERENCES

Ali, H., Koirala, N., & Kant, R. (2025). Cloud-integrated AI pipelines for pharmaceutical R&D. In Silico Research in Biomedicine, 3(1), 15–30.
Ballester, P. J., & Mitchell, J. B. O. (2010). A machine learning approach to predicting protein–ligand binding affinity. Bioinformatics, 26(9), 1169–1175.
Bhat, A. R., Dharmasivam, M., & Koirala, N. (2025). Artificial intelligence in drug design and discovery: Recent advances and challenges. In Silico Research in Biomedicine, 3(2), 45–63.
Bian, Y., & Xie, X.-Q. (2020). Generative chemistry: Drug discovery with deep learning generative models. arXiv:2008.09000.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Chakraborty, D., Kant, R., & Ali, H. (2024). Ensemble AI models for ADMET prediction. Computational Biology and Chemistry, 108, 107640.
Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794.
Chen, H., Engkvist, O., Wang, Y., Olivecrona, M., & Blaschke, T. (2018). The rise of deep learning in drug discovery. Drug Discovery Today, 23(6), 1241–1250.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.
Cortial, L., et al. (2024). AI in drug repurposing for rare diseases. Frontiers in Medicine, 11, 1404338.
Dharmasivam, M., Koirala, N., & Ali, H. (2025). Leading AI-driven drug discovery platforms: A critical review. Pharmaceuticals, 18(9), 1271.
Doerschuk, P., et al. (2022). Deep learning for cell-image-based phenotypic screening. Nature Methods, 19(4), 500–512.
Dong, J., et al. (2021). ADMETlab 2.0: An integrated platform for ADMET prediction. Journal of Cheminformatics, 13, 72.
Ekins, S., Pinto, E. F., & Williams, A. J. (2016). Chemoinformatics and machine learning in drug discovery. Drug Discovery Today, 21(8), 1437–1446.
El Khoury, R., et al. (2022). AI in aging and longevity research. Aging Cell, 21(3), e13548.
European Medicines Agency. (2024). Artificial Intelligence Reflection Paper. EMA Horizon Scanning Report.
Friesner, R. A., et al. (2020). Advances in physics-based free-energy calculations using AI correction models. Journal of Chemical Theory and Computation, 16(9), 5493–5505.
Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., & Dahl, G. E. (2017). Neural message-passing for quantum chemistry. Proceedings of the 34th ICML, 70, 1263–1272.
Gómez-Bombarelli, R., et al. (2018). Automatic chemical design using data-driven continuous representations of molecules. ACS Central Science, 4(2), 268–276.
Han, X., et al. (2023). DeepPharm: Deep learning–based pharmacophore recognition and screening. Frontiers in Chemistry, 11, 1054667.
Huang, K., et al. (2022). Therapeutics Data Commons: Machine learning benchmarking for drug discovery. Nature Reviews Drug Discovery, 21(11), 791–792.
Hughes, J. P., et al. (2011). Principles of early drug discovery. British Journal of Pharmacology, 162(6), 1239–1249.
Jumper, J., et al. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596, 583–589.
Kim, S., Ali, H., & Kant, R. (2021). Artificial intelligence in pharmaceutical research and development. Journal of Pharmaceutical Research, 18(3), 212–225.
Kohli, P., et al. (2022). Knowledge-graph-based AI systems for drug repurposing. Bioinformatics, 38(12), 3041–3053.
Lee, J., Yoon, W., Kim, S., et al. (2020). BioBERT: A pre-trained biomedical language representation model. Bioinformatics, 36(4), 1234–1240.
Li, R., et al. (2023). Proteome-wide polypharmacology prediction with Cyclica’s AI engine. Frontiers in Molecular Biosciences, 10, 1132567.
Li, Y., et al. (2022). DeepTarget: AI-based drug–target interaction prediction. Frontiers in Pharmacology, 13, 938566.
Liu, S., et al. (2021). Construction of biomedical knowledge graphs using NLP and AI. Briefings in Bioinformatics, 22(4), bbaa418.
Martin, Y. C. (1978). Quantitative Drug Design: A Critical Introduction. Marcel Dekker.
Mayr, A., Klambauer, G., Unterthiner, T., & Hochreiter, S. (2018). DeepTox: Toxicity prediction using deep learning. Frontiers in Environmental Science, 3, 80.
Morris, G. M., et al. (2009). AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility. Journal of Computational Chemistry, 30(16), 2785–2791.
Noé, F., Tkatchenko, A., Müller, K. R., & Clementi, C. (2020). Machine learning for molecular simulation. Nature Reviews Molecular Cell Biology, 21(9), 463–478.
Olivecrona, M., Blaschke, T., Engkvist, O., & Chen, H. (2017). Molecular de novo design through deep reinforcement learning. Journal of Cheminformatics, 9(1), 48.
Patel, L., Shukla, T., Huang, X., Ussery, D. W., & Wang, S. (2020). Machine learning methods in drug discovery. Molecules, 25(22), 5277.
Popova, M., Isayev, O., & Tropsha, A. (2018). Deep reinforcement learning for de novo drug design. Science Advances, 4(7), eaap7885.
Ragoza, M., Hochuli, J., Idrobo, E., Sunseri, J., & Koes, D. R. (2017). Protein–ligand scoring with convolutional neural networks. Journal of Chemical Information and Modeling, 57(4), 942–957.
Ren, F., et al. (2023). AI-aided identification of CDK20 inhibitors using AlphaFold structures. Journal of Medicinal Chemistry, 66(12), 8875–8887.
Rieke, N., et al. (2020). The future of digital health with federated learning. NPJ Digital Medicine, 3, 119.
Sadybekov, A., et al. (2023). AI-based identification of COVID-19 protease inhibitors using NLP. Frontiers in Pharmacology, 14, 1023456.
Segler, M. H. S., Kogej, T., Tyrchan, C., & Waller, M. P. (2018). Generating focused molecule libraries for drug discovery with recurrent neural networks. Nature, 555, 604–610.
Serafim, S., et al. (2023). AI-driven toxicity prediction from imaging data. Toxicology Letters, 375, 12–23.
Smith, J. K., & Brown, A. R. (2022). AI in high-throughput virtual screening. Journal of Pharmaceutical Research, 18(3), 212–225.
Stebbing, J., et al. (2020). Mechanism of Baricitinib supports AI-predicted efficacy against SARS-CoV-2. The Lancet Infectious Diseases, 20(5), 529–531.
Sumathi, R., et al. (2023). Deep learning applications for drug design and property prediction. Frontiers in Drug Discovery, 3, 102411.
Tanoli, Z., et al. (2021). Artificial intelligence, machine learning, and drug repurposing: recent advances. Briefings in Bioinformatics, 22(3), bbaa433.
Tholke, P., et al. (2022). TorchMD: Neural-network molecular dynamics for drug discovery. Journal of Chemical Theory and Computation, 18(1), 594–604.
Tropsha, A., et al. (2023). Deep-QSAR and reinforcement learning for de novo design. Drug Discovery Today, 28(5), 103454.
Tunyasuvunakool, K., et al. (2021). Highly accurate protein structure prediction for the human proteome. Nature, 596, 590–596.
Vamathevan, J., et al. (2019). Applications of machine learning in drug discovery and development. Nature Reviews Drug Discovery, 18(6), 463–477.
Wallach, I., Dzamba, M., & Heifets, A. (2015). AtomNet: A deep CNN for bioactivity prediction. arXiv:1510.02855.
Wan, Z., et al. (2025). Applications of AI in drug repurposing. In Silico Research in Biomedicine, 3(1), 32–51.
Wang, R., Fang, X., & Wang, S. (2020). OnionNet: CNNs for protein–ligand interaction prediction. Journal of Chemical Information and Modeling, 60(12), 5918–5930.
Wires, K., et al. (2023). Explainable artificial intelligence in chemistry. Wiley Interdisciplinary Reviews: Computational Molecular Science, 13(3), e70049.
Wójcikowski, M., et al. (2021). Evaluation of data curation and bias in AI drug discovery. Journal of Cheminformatics, 13(1), 45.
Yang, K., et al. (2021). Transfer learning in QSAR and cheminformatics. Journal of Chemical Information and Modeling, 61(8), 3758–3772.
Zhang, O., Lin, H., & Zhang, X. (2025). Graph neural networks in modern AI-aided drug discovery. Computational Molecular Science, 15(2), e70049.
Zhou, Y., et al. (2024). AI-accelerated virtual screening of billion-compound libraries with RosettaVS. Science Advances, 10(17), eadf1023.

Reference

Ali, H., Koirala, N., & Kant, R. (2025). Cloud-integrated AI pipelines for pharmaceutical R&D. In Silico Research in Biomedicine, 3(1), 15–30.
Ballester, P. J., & Mitchell, J. B. O. (2010). A machine learning approach to predicting protein–ligand binding affinity. Bioinformatics, 26(9), 1169–1175.
Bhat, A. R., Dharmasivam, M., & Koirala, N. (2025). Artificial intelligence in drug design and discovery: Recent advances and challenges. In Silico Research in Biomedicine, 3(2), 45–63.
Bian, Y., & Xie, X.-Q. (2020). Generative chemistry: Drug discovery with deep learning generative models. arXiv:2008.09000.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Chakraborty, D., Kant, R., & Ali, H. (2024). Ensemble AI models for ADMET prediction. Computational Biology and Chemistry, 108, 107640.
Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794.
Chen, H., Engkvist, O., Wang, Y., Olivecrona, M., & Blaschke, T. (2018). The rise of deep learning in drug discovery. Drug Discovery Today, 23(6), 1241–1250.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.
Cortial, L., et al. (2024). AI in drug repurposing for rare diseases. Frontiers in Medicine, 11, 1404338.
Dharmasivam, M., Koirala, N., & Ali, H. (2025). Leading AI-driven drug discovery platforms: A critical review. Pharmaceuticals, 18(9), 1271.
Doerschuk, P., et al. (2022). Deep learning for cell-image-based phenotypic screening. Nature Methods, 19(4), 500–512.
Dong, J., et al. (2021). ADMETlab 2.0: An integrated platform for ADMET prediction. Journal of Cheminformatics, 13, 72.
Ekins, S., Pinto, E. F., & Williams, A. J. (2016). Chemoinformatics and machine learning in drug discovery. Drug Discovery Today, 21(8), 1437–1446.
El Khoury, R., et al. (2022). AI in aging and longevity research. Aging Cell, 21(3), e13548.
European Medicines Agency. (2024). Artificial Intelligence Reflection Paper. EMA Horizon Scanning Report.
Friesner, R. A., et al. (2020). Advances in physics-based free-energy calculations using AI correction models. Journal of Chemical Theory and Computation, 16(9), 5493–5505.
Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O., & Dahl, G. E. (2017). Neural message-passing for quantum chemistry. Proceedings of the 34th ICML, 70, 1263–1272.
Gómez-Bombarelli, R., et al. (2018). Automatic chemical design using data-driven continuous representations of molecules. ACS Central Science, 4(2), 268–276.
Han, X., et al. (2023). DeepPharm: Deep learning–based pharmacophore recognition and screening. Frontiers in Chemistry, 11, 1054667.
Huang, K., et al. (2022). Therapeutics Data Commons: Machine learning benchmarking for drug discovery. Nature Reviews Drug Discovery, 21(11), 791–792.
Hughes, J. P., et al. (2011). Principles of early drug discovery. British Journal of Pharmacology, 162(6), 1239–1249.
Jumper, J., et al. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596, 583–589.
Kim, S., Ali, H., & Kant, R. (2021). Artificial intelligence in pharmaceutical research and development. Journal of Pharmaceutical Research, 18(3), 212–225.
Kohli, P., et al. (2022). Knowledge-graph-based AI systems for drug repurposing. Bioinformatics, 38(12), 3041–3053.
Lee, J., Yoon, W., Kim, S., et al. (2020). BioBERT: A pre-trained biomedical language representation model. Bioinformatics, 36(4), 1234–1240.
Li, R., et al. (2023). Proteome-wide polypharmacology prediction with Cyclica’s AI engine. Frontiers in Molecular Biosciences, 10, 1132567.
Li, Y., et al. (2022). DeepTarget: AI-based drug–target interaction prediction. Frontiers in Pharmacology, 13, 938566.
Liu, S., et al. (2021). Construction of biomedical knowledge graphs using NLP and AI. Briefings in Bioinformatics, 22(4), bbaa418.
Martin, Y. C. (1978). Quantitative Drug Design: A Critical Introduction. Marcel Dekker.
Mayr, A., Klambauer, G., Unterthiner, T., & Hochreiter, S. (2018). DeepTox: Toxicity prediction using deep learning. Frontiers in Environmental Science, 3, 80.
Morris, G. M., et al. (2009). AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility. Journal of Computational Chemistry, 30(16), 2785–2791.
Noé, F., Tkatchenko, A., Müller, K. R., & Clementi, C. (2020). Machine learning for molecular simulation. Nature Reviews Molecular Cell Biology, 21(9), 463–478.
Olivecrona, M., Blaschke, T., Engkvist, O., & Chen, H. (2017). Molecular de novo design through deep reinforcement learning. Journal of Cheminformatics, 9(1), 48.
Patel, L., Shukla, T., Huang, X., Ussery, D. W., & Wang, S. (2020). Machine learning methods in drug discovery. Molecules, 25(22), 5277.
Popova, M., Isayev, O., & Tropsha, A. (2018). Deep reinforcement learning for de novo drug design. Science Advances, 4(7), eaap7885.
Ragoza, M., Hochuli, J., Idrobo, E., Sunseri, J., & Koes, D. R. (2017). Protein–ligand scoring with convolutional neural networks. Journal of Chemical Information and Modeling, 57(4), 942–957.
Ren, F., et al. (2023). AI-aided identification of CDK20 inhibitors using AlphaFold structures. Journal of Medicinal Chemistry, 66(12), 8875–8887.
Rieke, N., et al. (2020). The future of digital health with federated learning. NPJ Digital Medicine, 3, 119.
Sadybekov, A., et al. (2023). AI-based identification of COVID-19 protease inhibitors using NLP. Frontiers in Pharmacology, 14, 1023456.
Segler, M. H. S., Kogej, T., Tyrchan, C., & Waller, M. P. (2018). Generating focused molecule libraries for drug discovery with recurrent neural networks. Nature, 555, 604–610.
Serafim, S., et al. (2023). AI-driven toxicity prediction from imaging data. Toxicology Letters, 375, 12–23.
Smith, J. K., & Brown, A. R. (2022). AI in high-throughput virtual screening. Journal of Pharmaceutical Research, 18(3), 212–225.
Stebbing, J., et al. (2020). Mechanism of Baricitinib supports AI-predicted efficacy against SARS-CoV-2. The Lancet Infectious Diseases, 20(5), 529–531.
Sumathi, R., et al. (2023). Deep learning applications for drug design and property prediction. Frontiers in Drug Discovery, 3, 102411.
Tanoli, Z., et al. (2021). Artificial intelligence, machine learning, and drug repurposing: recent advances. Briefings in Bioinformatics, 22(3), bbaa433.
Tholke, P., et al. (2022). TorchMD: Neural-network molecular dynamics for drug discovery. Journal of Chemical Theory and Computation, 18(1), 594–604.
Tropsha, A., et al. (2023). Deep-QSAR and reinforcement learning for de novo design. Drug Discovery Today, 28(5), 103454.
Tunyasuvunakool, K., et al. (2021). Highly accurate protein structure prediction for the human proteome. Nature, 596, 590–596.
Vamathevan, J., et al. (2019). Applications of machine learning in drug discovery and development. Nature Reviews Drug Discovery, 18(6), 463–477.
Wallach, I., Dzamba, M., & Heifets, A. (2015). AtomNet: A deep CNN for bioactivity prediction. arXiv:1510.02855.
Wan, Z., et al. (2025). Applications of AI in drug repurposing. In Silico Research in Biomedicine, 3(1), 32–51.
Wang, R., Fang, X., & Wang, S. (2020). OnionNet: CNNs for protein–ligand interaction prediction. Journal of Chemical Information and Modeling, 60(12), 5918–5930.
Wires, K., et al. (2023). Explainable artificial intelligence in chemistry. Wiley Interdisciplinary Reviews: Computational Molecular Science, 13(3), e70049.
Wójcikowski, M., et al. (2021). Evaluation of data curation and bias in AI drug discovery. Journal of Cheminformatics, 13(1), 45.
Yang, K., et al. (2021). Transfer learning in QSAR and cheminformatics. Journal of Chemical Information and Modeling, 61(8), 3758–3772.
Zhang, O., Lin, H., & Zhang, X. (2025). Graph neural networks in modern AI-aided drug discovery. Computational Molecular Science, 15(2), e70049.
Zhou, Y., et al. (2024). AI-accelerated virtual screening of billion-compound libraries with RosettaVS. Science Advances, 10(17), eadf1023.

Sanika Kamble

Corresponding author

Women’s College of Pharmacy, Peth Vadgaon, India.

Sneha Wavdane

Co-author

Women’s College of Pharmacy, Peth Vadgaon, India.

Vishal Mote

Co-author

Women’s College of Pharmacy, Peth Vadgaon, India.

Dhanraj Jadge

Co-author

Women’s College of Pharmacy, Peth Vadgaon, India.

Sanika Kamble, Sneha Wavdane, Vishal Mote, Dhanraj Jadge, The Role of Artificial Intelligence in Drug Discovery, Int. J. of Pharm. Sci., 2025, Vol 3, Issue 12, 1171-1190. https://doi.org/10.5281/zenodo.17840354

View Article

The Role of Artificial Intelligence in Drug Discovery

Abstract

Keywords

Introduction

Reference

Sanika Kamble

Sneha Wavdane

Vishal Mote

Dhanraj Jadge

More related articles

Navigating the United States, Japan, Canada, Europ...

Current Scientific Perspectives on the Ethnomedici...

Centella asiatica: A Multifunctional Phytotherapeu...

View more

A Review on Role of Nanoparticles in Targeted Drug Delivery Systems ...

The Nanotech Revolution in Skincare: A Review of Nanoparticles in Cosmetics and ...

Ethnopharmacological Evaluation of Medicinal Plants Used for Diabetes in the Mah...

View more

Related Articles

Gangrene: Early Detection, Emerging Therapies, and The Role of Artificial Intell...

Guava Leaf Extract as a Natural Antimicrobial Agent: Mechanisms of Action Agains...

Artificial Intelligence in Drug Discovery and Pharmacokinetics &Pharmacodynamics...

Comparative Efficacy of Inositols and Quercetin in the Management of Polycystic ...

Navigating the United States, Japan, Canada, European Union, South Korea, Austra...

More related articles

Navigating the United States, Japan, Canada, European Union, South Korea, Austra...

Current Scientific Perspectives on the Ethnomedicinal Relevance and Pharmacologi...

Centella asiatica: A Multifunctional Phytotherapeutic Approach in skin Regenerat...

View more

Navigating the United States, Japan, Canada, European Union, South Korea, Austra...

Current Scientific Perspectives on the Ethnomedicinal Relevance and Pharmacologi...

Centella asiatica: A Multifunctional Phytotherapeutic Approach in skin Regenerat...

View more