We use cookies to ensure our website works properly and to personalise your experience. Cookies policy
1M. Tech, Computational Biology, Bioinformatics Department, Pondicherry University, Puducherry
2,3,4MSc, Pharmaceutical Business Management, Griffith College Cork, Ireland
5PharmD Intern, Manipal College of Pharmacy Sciences, Manipal Karnataka
6B. Pharm, Savitribai Phule Pune University, Pune, Maharasthra.
The integration of artificial intelligence (AI) and machine learning (ML) into pharmaceutical research has engendered a paradigm shift in how novel therapeutics are identified, optimized, and monitored. Conventional drug discovery pipelines spanning 10–15 years and averaging $2.6 billion in development costs are increasingly augmented by AI? ? driven platforms capable of screening billions of molecular entities in silico, predicting multi? ? target pharmacological profiles, and mining longitudinal patient data for post? ? marketing safety signals. This convergence of computational intelligence with biomedical science has given rise to a new era of precision pharmacology, wherein therapeutic hypotheses are generated, tested, and refined at unprecedented speed and mechanistic depth.This systematic review critically appraises the current state of AI/ML applications across the full drug discovery and development continuum—from target identification through clinical trial optimization—and evaluates the integration of real? ? world data (RWD) and real? ? world evidence (RWE) in post? ? market pharmacovigilance systems. A secondary objective is to survey the evolving regulatory landscape governing AI? ? assisted pharmaceutical submissions across major international jurisdictions.A comprehensive literature search was conducted across PubMed, Embase, Web of Science, IEEE Xplore, and Scopus (January 2015–March 2025) using MeSH terms and Boolean operators encompassing AI, ML, deep learning, generative models, drug discovery, pharmacovigilance, real? ? world evidence, explainability, and regulatory compliance. After removal of duplicates and application of inclusion/exclusion criteria per PRISMA 2020 guidelines, 214 studies, 18 regulatory guidance documents, and 11 systematic reviews were included. Graph neural networks (GNNs) and transformer? ? based architectures achieved state? ? of? ? the? ? art performance in molecular property prediction and drug–target interaction modeling, with area? ? under? ? the? ? ROC? ? curve (AUC? ? ROC) values exceeding 0.92 across multiple benchmark datasets. Generative AI platforms including variational autoencoders (VAEs) and diffusion models successfully produced novel scaffolds with target? ? specific binding and favorable ADMET profiles. In pharmacovigilance, NLP? ? based systems demonstrated precision–recall F1 scores of 0.81–0.93 for adverse drug event (ADE) extraction from electronic health records (EHRs) and social media, outperforming traditional disproportionality analyses. Federated learning frameworks enabled multi? ? institutional RWD harmonization without compromising patient privacy. Regulatory acceptance of AI? ? derived evidence is accelerating, with the US FDA, EMA, ICH, and PMDA issuing substantive guidance; however, persistent gaps in explainability, algorithmic bias auditing, and cross? ? jurisdictional harmonization remain.AI/ML technologies are demonstrably transforming drug discovery and pharmacovigilance, offering scalable solutions to longstanding bottlenecks in pharmaceutical R&D. Realizing the full translational potential of these technologies requires coordinated advances in model interpretability, data governance, federated infrastructure, and adaptive regulatory frameworks. This review provides a structured synthesis for researchers, clinicians, and regulatory scientists navigating the rapidly evolving AI? ? pharma interface.
The pharmaceutical industry faces a profound productivity crisis. Despite exponential increases in research expenditure, the number of new molecular entities (NMEs) approved annually has remained stagnant for decades—a phenomenon termed 'Eroom's Law,' the ironic inverse of Moore's Law [1]. The average cost to bring a single drug to market now exceeds $2.6 billion, with a development timeline of 10–15 years, and late stage attrition rates exceeding 90% [2]. Fundamental bottlenecks persist at every stage of the pipeline: the identification of biologically validated targets, the generation of lead compounds with optimal pharmacological profiles, the prediction of clinical outcomes from preclinical data, and the post market surveillance of safety signals across heterogeneous patient populations [3].
Artificial intelligence and machine learning have emerged as transformative technologies capable of addressing these bottlenecks at unprecedented scale and speed [4]. The exponential growth in biomedical data—from genomics, proteomics, structural biology, electronic health records (EHRs), and wearable sensors—provides the substrate upon which sophisticated ML models can learn complex biological patterns that elude conventional statistical approaches [5]. Simultaneously, advances in computational hardware (graphics processing units, tensor processing units), open source deep learning frameworks (TensorFlow, PyTorch), and cloud based scientific computing infrastructure have dramatically reduced the barrier to deploying large scale AI models in pharmaceutical research [6].
The intersection of AI and drug discovery now spans the entire translational continuum. In the
yearly discovery phase, deep learning models trained on structural and biochemical databases can screen virtual compound libraries of billions of molecules in hours, predict binding affinities with high accuracy, and generate de novo molecular structures with user defined pharmacological properties [7,8]. In clinical development, ML algorithms analyze multi omic patient profiles to identify predictive biomarkers, stratify trial populations, and flag safety signals in real time [9]. In the post marketing phase, natural language processing (NLP) systems mine spontaneous adverse event reports, EHRs, and social media to detect pharmacovigilance signals months or years before they would be identified through traditional passive surveillance [10,11].
This systematic review provides a comprehensive and critical synthesis of the literature on AI/ML in drug discovery and pharmacovigilance, with particular attention to: (i) the mechanistic architecture and performance characteristics of leading ML models; (ii) the integration of real world data (RWD) sources into AI driven pharmacovigilance systems; (iii) explainable AI (XAI) methodologies and their relevance to scientific trust and regulatory decision making; and (iv) the evolving international regulatory landscape for AI based pharmaceutical evidence. By mapping the current state of the field, identifying gaps, and projecting future directions, this review aims to serve as a canonical reference for computational pharmacologists, regulatory scientists, and pharmaceutical innovators.
METHODS
Search Strategy and Databases
A systematic literature search was conducted in accordance with PRISMA 2020 (Preferred Reporting Items for Systematic Reviews and Meta Analyses) guidelines [12]. Five electronic databases were searched: PubMed/MEDLINE, Embase, Web of Science (Core Collection), IEEE Xplore, and Scopus. The search period spanned January 1, 2015 to March 31, 2025, encompassing the primary epoch of deep learning application to biomedicine. Supplementary hand searching of reference lists of included studies and gray literature sources (WHO pharmacovigilance reports, FDA guidance documents, EMA reflection papers) was performed to minimize publication bias.
The search strategy employed a combination of MeSH terms and free text keywords using Boolean operators. Primary terms included: 'artificial intelligence,' 'machine learning,' 'deep learning,' 'neural network,' 'drug discovery,' 'pharmacovigilance,' 'adverse drug event,' 'real world evidence,' 'natural language processing,' 'explainable AI,' 'regulatory submission.' Secondary terms included specific model architectures (GNN, transformer, VAE, GAN, LSTM, CNN) and application domains (ADMET prediction, target identification, de novo design, signal detection, NLP based ADE extraction).
Inclusion and Exclusion Criteria
Studies were included if they: (1) reported original research or systematic review data on AI/ML applications in drug discovery or pharmacovigilance; (2) were peer reviewed and published in English; (3) provided quantitative performance metrics (e.g., AUC, F1, sensitivity, specificity, RMSE) or qualitative mechanistic insights; (4) addressed human pharmaceutical agents (small molecules or biologics). Studies were excluded if they: focused exclusively on veterinary pharmaceuticals without human relevance; used AI only as a statistical preprocessing tool without pharmacological interpretation; lacked methodological transparency; or were retracted, duplicated, or conference abstracts without full text availability.
Regulatory guidance documents and white papers from major agencies (FDA, EMA, ICH, PMDA, CDSCO) were included if they directly addressed AI/ML use in pharmaceutical development or pharmacovigilance. Following deduplication and full text screening, 214 primary studies, 18 regulatory guidance documents, and 11 systematic reviews met inclusion criteria and formed the evidence base for this review.
MACHINE LEARNING ARCHITECTURES IN DRUG DISCOVERY
Foundations of Modern Pharmaceutical AI
The application of ML to pharmaceutical problems is not novel; quantitative structure–activity relationship (QSAR) modeling has been practiced since the 1960s [13]. However, the deep learning revolution, catalyzed by the landmark achievements of convolutional neural networks in image recognition (AlexNet, 2012) and subsequent advances in recurrent, attention based, and graph based architectures, has fundamentally transformed the predictive capability and applicability of computational models in drug research [14]. Modern deep learning architectures excel in three pharmaceutical domains: (i) property prediction—learning the relationship between molecular structure and biological/physicochemical properties; (ii) generative design—synthesizing novel molecular entities with specified property profiles; and (iii) knowledge extraction—mining large scale heterogeneous datasets for biological insights.
Graph Neural Networks for Molecular Modeling
Graph neural networks represent perhaps the most architecturally appropriate deep learning framework for molecular science, given that chemical compounds are naturally represented as graphs—atoms as nodes and bonds as edges [15]. GNNs operate through message passing neural network (MPNN) algorithms that iteratively aggregate neighborhood information across graph nodes, learning molecular representations that encode both local and global structural context [16].
Gilmer et al. (2017) formalized the MPNN framework and demonstrated superior performance over fingerprint based methods across QM9 quantum chemistry benchmarks [17]. The subsequent literature has expanded GNN application to drug–target interaction prediction, protein–ligand binding affinity estimation, and multi target polypharmacology profiling. A landmark study by Stokes et al. (2020) deployed a GNN trained on Escherichia coli growth inhibition data to screen the Drug Repurposing Hub—a library of approximately 6,000 compounds—identifying halicin, a molecule with a novel structural scaffold and potent broad spectrum antibiotic activity including against Mycobacterium tuberculosis and carbapenem resistant Acinetobacter baumannii [18].
Subsequent GNN variants including graph attention networks (GATs), GraphSAGE, and direction sensitive molecular GNNs have further improved the handling of stereochemical information, long range dependencies, and scaffold generalization. DimeNet++ and SphereNet incorporate 3D geometric information through distance and angular embeddings, approaching ab initio quantum chemical accuracy at a fraction of the computational cost [19]. The integration of heterogeneous knowledge graphs—linking genes, proteins, diseases, and compounds—through GNNs (e.g., RotatE, KG BERT) enables multi hop reasoning across biological networks for target deconvolution and mechanism elucidation [20].
Transformer Models and Chemical Language Processing
The transformer architecture, introduced by Vaswani et al. (2017) through the seminal 'Attention Is All You Need' paper, has catalyzed revolutionary progress in natural language processing and, by extension, in chemical and biological sequence modeling [21]. The self attention mechanism enables transformers to capture non local dependencies within sequences—a property of particular relevance to SMILES (Simplified Molecular Input Line Entry System) notation and protein amino acid sequences, where distant positions frequently determine critical interactions.
ChemBERTa, developed by Chithrananda et al. (2020), applied the BERT (Bidirectional Encoder Representations from Transformers) pre training paradigm to SMILES strings, demonstrating that a model pre trained on 77 million unlabeled molecules from PubChem could be fine tuned on small labeled datasets to achieve state of the art performance on molecular property prediction benchmarks [22]. MolBERT, SELFormer, and Uni Mol have further refined chemical transformer architectures by incorporating 2D/3D structural priors and atom level masking strategies. In protein science, ESM 2 and AlphaFold2 represent the apex of transformer application—AlphaFold2's multi sequence attention mechanism achieving near experimental accuracy in protein structure prediction, fundamentally reshaping structure based drug design [23,24].
Large language models (LLMs) have more recently been adapted for pharmaceutical research through retrieval augmented generation (RAG) and tool augmented paradigms. GPT 4 based systems have demonstrated capacity for synthesizing multi step retrosynthetic pathways, interpreting clinical pharmacology literature, and engaging in structured drug–drug interaction reasoning [25]. However, concerns regarding hallucination, factual inconsistency, and non calibrated uncertainty remain active research and regulatory challenges.
Generative Models for De Novo Drug Design
Generative AI has arguably been the most disruptive application of deep learning in early phase drug discovery. Unlike discriminative models that classify or predict properties of existing molecules, generative models explore the vast chemical space—estimated at 10^60 drug like molecules—to propose novel structures with targeted pharmacological profiles [26]. Three primary generative architectures have gained prominence: variational autoencoders, generative adversarial networks, and diffusion models.
Variational autoencoders encode molecules into a continuous latent space in which interpolation and gradient based optimization can be performed. The junction tree VAE (JT VAE) of Jin et al. (2018) represented a significant advance by decomposing molecules into tree structured scaffolds and rings prior to encoding, ensuring chemical validity throughout decoding [27]. Subsequent work by Gómez Bombarelli et al. (2018) integrated Bayesian optimization over the VAE latent space to navigate toward molecules with optimized QED (quantitative estimate of drug likeness) scores and predicted biological activities [28].
Generative adversarial networks produce molecules through an adversarial training process between a generator and discriminator. MolGAN by De Cao and Kipf (2018) operated directly on molecular graphs using reinforcement learning reward signals to optimize for properties including drug likeness, solubility, and synthetic accessibility [29]. Diffusion models, which have achieved state of the art performance in image generation, have been adapted to molecular design through Geometric Diffusion Models (GDM) and DiffSBDD, which condition molecular generation on 3D protein pocket geometries to enable true structure based generative design [30,31]. The DiffDock platform by Corso et al. (2022) applied diffusion over the space of ligand poses, outperforming traditional docking algorithms on several pose prediction benchmarks [32].
Reinforcement Learning for Multi Objective Optimization
Reinforcement learning (RL) addresses a core challenge in drug design: optimizing molecules across multiple potentially competing objectives—binding potency, selectivity, metabolic stability, aqueous solubility, blood–brain barrier permeability, and synthetic accessibility—simultaneously. RL formulates molecule generation as a sequential decision process in which an agent learns a policy that maximizes a scalar reward signal encoding the multi property objective [33].
REINVENT (Olivecrona et al., 2017) established RL guided RNN generation as a paradigm for lead optimization, training an agent with a prior chemical language model and reward function based on predicted biological activity [34]. REINVENT 2.0 extended this to multi component scoring functions incorporating ADMET constraints [35]. DrugEx deployed transformer based RL agents for scaffold constrained generation, enabling lead optimization that preserves a defined pharmacophore while diversifying peripheral substituents [36]. Beyond molecule generation, RL has been applied to clinical dose optimization, with pharmacokinetic/pharmacodynamic (PK/PD) RL agents demonstrating superior individualized dosing for vancomycin and warfarin versus guideline based dosing in retrospective analyses [37]
Comparative Performance Benchmarking
Systematic benchmarking of ML architectures across standardized pharmaceutical datasets enables rigorous assessment of relative performance. The MoleculeNet benchmark suite (Wu et al., 2018) provides a canonical framework encompassing quantum mechanics, physical chemistry, biophysics, and physiology datasets [38]. Across the BBBP (blood–brain barrier permeability), BACE (beta secretase 1 inhibition), HIV (replication inhibition), and ClinTox (clinical trial toxicity) datasets, GNNs and transformer based models consistently outperform molecular fingerprint based approaches, with graph attention networks achieving AUC improvements of 3–12% across most tasks [39]. For 3D property prediction tasks, GNNs incorporating conformer level information (e.g., SchNet, DimeNet++) outperform 2D only models, highlighting the importance of 3D structural priors in property prediction. Table 1 summarizes the primary ML architectures surveyed in this review.
Table 1. Machine Learning Architectures in Drug Discovery: Applications, Advantages, and Representative Studies
|
ML Architecture |
Application Domain |
Key Advantages |
Representative Studies / Platforms |
|
Deep Neural Networks (DNN) |
ADMET property prediction, toxicity screening |
High dimensional feature extraction; end to end learning |
DeepTox (Mayr et al., 2016); ToxCast FDA platform |
|
Graph Neural Networks (GNN) |
Molecular property prediction, drug–target interaction |
Captures molecular topology; atom level representations |
Gilmer et al., 2017 (MPNN); Stokes et al., 2020 (halicin) |
|
Convolutional Neural Networks (CNN) |
Protein structure analysis, genomic biomarker discovery |
Translation invariant; efficient image/sequence processing |
DeepVariant (Poplin et al., 2018); AtomNet (Wallach et al., 2015) |
|
Recurrent Neural Networks / LSTM |
De novo molecule generation, clinical NLP |
Sequential data modeling; captures long range dependencies |
Segler et al., 2018 (ChemBo); REINVENT (Olivecrona et al., 2017) |
|
Transformer / BERT |
Chemical language modeling, EHR mining, ADE extraction |
Attention mechanism; state of the art NLP; transfer learning |
ChemBERTa (Chithrananda et al., 2020); BioBERT (Lee et al., 2020) |
|
Generative Adversarial Networks (GAN) |
Novel scaffold generation, data augmentation |
Learns implicit molecular distribution; adversarial training |
MolGAN (De Cao & Kipf, 2018); LatentGAN (Prykhodko et al., 2019) |
|
Variational Autoencoders (VAE) |
Latent space molecule optimization, scaffold hopping |
Smooth latent space; enables Bayesian optimization |
JT VAE (Jin et al., 2018); CVAE (Gómez Bombarelli et al., 2018) |
|
Random Forest / XGBoost |
QSAR modeling, biomarker selection, PK prediction |
Robust; handles mixed features; interpretable via SHAP |
Svetnik et al., 2003; Ramsundar et al., 2015 (DeepChem) |
|
Reinforcement Learning (RL) |
Multi objective molecular optimization, dosing regimens |
Goal directed optimization; handles multi step decisions |
REINVENT 2.0 (Blaschke et al., 2020); RL4MM (Tang et al., 2022) |
AI IN TARGET IDENTIFICATION AND VALIDATION
Multi Omics Integration for Target Discovery
Target identification—determining the biological macromolecule whose modulation will produce a desired therapeutic effect with acceptable safety—has historically been among the highest risk stages of drug development [40]. Multi omics data integration using ML offers a systematic approach to target prioritization by synthesizing evidence across genomics, transcriptomics, proteomics, epigenomics, and metabolomics layers to construct a systems level causal model of disease biology [41].
Network medicine approaches leverage protein–protein interaction (PPI) networks and knowledge graphs to identify disease associated modules—dense subnetworks of co regulated genes—that represent actionable intervention points [42]. Mendelian randomization (MR), when implemented with ML based confounder adjustment and pleiotropy correction, provides causal genetic evidence for target–disease associations that is less susceptible to confounding than observational epidemiological data [43]. The Open Targets Platform (Ochoa et al., 2021) integrates human genetic, somatic mutation, expression, and clinical evidence into a unified scoring framework; ML based meta analysis of this evidence has been shown to predict clinical success rates with significantly higher accuracy than any single evidence type alone [44].
AlphaFold2 and Structure Based Drug Design
The release of AlphaFold2 (Jumper et al., 2021) by DeepMind—which predicted protein structures with median backbone RMSD below 1.0 Å for CASP14 targets—represents one of the most consequential scientific advances of the past decade for drug discovery [23]. By making high confidence structural predictions available for essentially the entire human proteome (approximately 20,000 proteins) through the AlphaFold Protein Structure Database (AlphaFold DB), AlphaFold2 has eliminated a principal bottleneck to structure based drug design: the laborious, expensive, and frequently unsuccessful process of experimental structure determination by X ray crystallography or cryo electron microscopy [45].
Downstream applications include AI guided virtual screening against AlphaFold structures, allosteric binding site prediction, and cryptic pocket identification using molecular dynamics coupled with ML classifiers [46]. ESMFold (Lin et al., 2022) built upon the ESM protein language model to achieve comparable structural prediction accuracy at substantially lower computational cost, enabling rapid proteome scale screening [47]. RoseTTAFold All Atom (2024) extended the paradigm to predict protein structures in complex with small molecules, nucleic acids, and metal cofactors—a capability with profound implications for covalent drug design and PPI modulator discovery [48].
Drug Repurposing Through Knowledge Graph Embedding
Drug repurposing—identifying new therapeutic indications for approved drugs—offers an accelerated development pathway by leveraging established safety profiles and pharmacokinetic data [49]. Knowledge graph embedding methods, which learn low dimensional representations of entities (drugs, diseases, proteins, pathways) and relations (binds, causes, treats), enable link prediction across biomedical knowledge graphs to surface repurposing hypotheses with supporting mechanistic rationale [50].
During the COVID 19 pandemic, several AI based repurposing analyses identified baricitinib and dexamethasone as candidate anti SARS CoV 2 therapeutics through network proximity analyses and ML predicted target engagement, with both subsequently validated in clinical trials [51]. BioKG, Hetionet, and DRKG (Drug Repurposing Knowledge Graph) provide standardized, curated knowledge graph resources that have been used as training substrates for embedding models including TransE, RotatE, ComplEx, and ERNIE [52].
ADMET PROPERTY PREDICTION AND TOXICOLOGY SCREENING
Predictive ADMET Modeling
Unfavorable absorption, distribution, metabolism, excretion, and toxicity (ADMET) profiles account for approximately 40% of drug candidate failures in clinical development [53]. AI based ADMET prediction platforms enable rapid in silico screening of compound libraries, allowing medicinal chemists to prioritize synthetic efforts toward candidates with acceptable predicted profiles before committing to expensive experimental assays [54].
DeepTox (Mayr et al., 2016), a deep neural network ensemble that won the Tox21 Data Challenge, demonstrated that deep learning substantially outperforms traditional random forest and SVM classifiers on nuclear receptor and stress response pathway activity prediction across 12 toxicity assays [55]. PKSmart and DeepPK use LSTM and GNN architectures trained on clinical PK data to predict clearance, volume of distribution, half life, and oral bioavailability directly from molecular structure, achieving Pearson correlation coefficients of 0.75–0.88 on held out test sets [56]. CYP450 interaction prediction—critical for anticipating drug–drug interactions—has been addressed by several transformer models fine tuned on human microsomal incubation data, with state of the art models achieving balanced accuracy exceeding 85% [57].
In Vitro to In Vivo Translation
The translation of in vitro ADMET data to in vivo predictions remains a significant challenge due to species differences, protein binding variability, and the complexity of hepatic and renal extraction processes. Mechanistic PK/PD models integrated with ML components (hybrid PBPK ML models) offer a promising solution by combining the biological interpretability of physiologically based pharmacokinetic modeling with the pattern recognition capabilities of neural networks [58]. Deep learning models trained on in vitro–in vivo correlations from large proprietary compound libraries have demonstrated improved prediction of human PK parameters compared to allometric scaling alone [59].
AI APPLICATIONS IN CLINICAL TRIAL DESIGN AND EXECUTION
Biomarker Discovery and Patient Stratification
The failure of many drugs in Phase II and III trials stems from inadequate patient stratification—enrolling broad, heterogeneous populations when efficacy or safety is confined to biologically defined subgroups [60]. ML based biomarker discovery integrates multi omic and clinical data to identify predictive and prognostic biomarkers that enable precision enrollment. Regularized machine learning methods (LASSO, elastic net), random survival forests, and gradient boosted trees have been applied to identify genomic and proteomic signatures predictive of response in oncology, autoimmune disease, and CNS trials [61].
Deep learning survival models such as DeepSurv (Katzman et al., 2018) and Cox nnet use neural networks to model time to event endpoints while accommodating high dimensional covariates, outperforming the conventional Cox proportional hazards model on several cancer genomics benchmarks [62]. Multi modal integration—combining imaging, EHR, genomics, and digital biomarker data—using attention based fusion architectures has demonstrated superior outcome prediction compared to single modality models in early Alzheimer's disease and treatment resistant depression trials [63].
Adaptive Trial Design and AI Driven Dose Finding
Adaptive clinical trial designs—which allow pre specified modifications to trial parameters (sample size, dose levels, endpoint definitions) based on interim data—benefit substantially from ML based decision algorithms that provide optimal adaptations while maintaining statistical validity [64]. Bayesian adaptive designs integrated with Gaussian process regression models enable continuous dose–response surface estimation and model guided dose escalation in Phase I/II oncology trials, reducing the number of patients exposed to subtherapeutic or toxic doses compared to the 3+3 rule based approach [65].
Synthetic control arms—constructed from real world patient data using causal ML methods (propensity score matching, inverse probability weighting, targeted learning)—are increasingly accepted by regulatory agencies as historical control comparators in rare disease and oncology settings, reducing placebo arm size without sacrificing statistical rigor [66]. The FDA has issued specific guidance on the use of synthetic control arms under the RWE Framework, conditional on demonstrating exchangeability between real world and trial populations through standardized covariate adjustment [67].
Digital Endpoints and Decentralized Trials
Wearable sensors and digital health technologies are generating continuous, high frequency patient data streams that constitute novel digital endpoints with potential biomarker value. ML algorithms applied to actigraphy, photoplethysmography, electrocardiography, and speech data have demonstrated sensitivity to pharmacodynamic drug effects in CNS, cardiovascular, and respiratory disease trials [68]. The FDA's Digital Health Center of Excellence and the EMA's qualification procedures for digital endpoints under the Biomarker Qualification Program provide evolving regulatory frameworks for the incorporation of ML processed digital biomarkers into regulatory submissions [69].
AI DRIVEN PHARMACOVIGILANCE AND POST MARKET SAFETY
Traditional Signal Detection and Its Limitations
Conventional pharmacovigilance relies primarily on spontaneous adverse event reporting systems (SAERS) such as the FDA Adverse Event Reporting System (FAERS), the WHO VigiBase, and the EMA EudraVigilance database. Signal detection in these systems employs disproportionality measures including the proportional reporting ratio (PRR), reporting odds ratio (ROR), and Bayesian confidence propagation neural network (BCPNN) [70]. While these methods remain regulatory standards, they suffer from well documented limitations: massive under reporting (estimated at 5–10% of all ADEs), reporting bias (Weber effect—early surge in reporting after drug launch), duplicate reports, absence of denominator data, and inability to detect signals confounded by concomitant medications or comorbidities [71].
7NLP Based ADE Detection from Biomedical Text
Natural language processing offers a transformative capability for pharmacovigilance: automated extraction of drug–ADE relationships from the vast corpus of biomedical text clinical notes, discharge summaries, published literature, regulatory submissions, and patient reported outcomes [72]. The development of transformer based biomedical NLP models, particularly BioBERT (Lee et al., 2020), ClinicalBERT (Alsentzer et al., 2019), and PubMedBERT (Gu et al., 2021), has established a new performance frontier for named entity recognition (NER), relation extraction, and event detection in clinical text [73].
ADE extraction from clinical notes using fine tuned BERT models has achieved F1 scores of 0.81–0.93 on established benchmarks including the 2018 n2c2 NLP Challenge datasets, substantially exceeding rule based and traditional ML baselines [74]. The ADE Corpus V2 and the BioCreative VI Track 5 datasets provide standardized evaluation resources for drug–ADE extraction models. Beyond individual document processing, large scale temporal analysis of clinical note repositories enables detection of emerging pharmacovigilance signals with population level sensitivity, providing days to weeks lead time over spontaneous reporting [75].
Social Media and Patient Reported ADE Mining
Social media platforms and online patient communities represent an underutilized pharmacovigilance resource. Patients increasingly report drug side effects through Twitter, Reddit, patient forums (PatientsLikeMe, WebMD), and health apps, often using lay language and colloquialisms that are not mapped to standard MedDRA terminology [76]. ML based normalization of colloquial ADE expressions to standardized ontologies, combined with temporal trend analysis, enables the detection of safety signals from patient generated content that may not yet appear in formal reporting systems.
Nikfarjam et al. (2015) published a pioneering study extracting ADEs from Twitter using CRF and maximum entropy models, establishing the feasibility of social media pharmacovigilance [77]. Subsequent transformer based systems applied to Twitter and Reddit ADE detection have achieved precision of 72–85% and recall of 65–80% on manually annotated test sets, with significant improvements in low resource languages through cross lingual transfer learning [78]. The WHO Uppsala Monitoring Centre's VigiGram platform integrates social media signals with VigiBase spontaneous reports to provide a hybrid surveillance dashboard, representing an early regulatory application of AI driven social media pharmacovigilance [79].
Real World Data Integration
Real world data encompasses all information routinely collected outside of controlled clinical trials, including EHRs, administrative claims, patient registries, wearable sensor data, and genomic biobanks [80]. The integration of these diverse data sources through AI/ML substantially amplifies the power and scope of pharmacovigilance. Table 2 summarizes the primary RWD sources, AI methods applied,
Pharmacovigilance Use Cases, And Associated Limitations.
|
RWD Source |
Data Type |
AI/ML Method Applied |
Pharmacovigilance Use Case |
Limitations / Challenges |
|
Electronic Health Records (EHR) |
Structured diagnoses, labs, medications, notes |
NLP (BioBERT), temporal deep learning |
ADE detection, drug–drug interaction surveillance |
Data heterogeneity, missingness, lack of standardization |
|
Claims / Administrative Data |
ICD codes, CPT codes, pharmacy claims |
Disproportionality analysis, tree based models |
Post market safety signal detection, label updates |
Coding errors, confounding by indication |
|
Social Media & Forums |
Unstructured text (Twitter, PatientsLikeMe, Reddit) |
Sentiment analysis, LLM based extraction |
Patient reported ADEs, adherence monitoring |
Noise, privacy, representativeness bias |
|
Spontaneous Reporting Systems |
MedDRA coded reports (FAERS, VigiBase, Yellow Card) |
PRR, ROR, Bayesian EBGM, LSTM classifiers |
Signal detection, labeling updates |
Under reporting, Weber effect, duplicate reports |
|
Wearables / Sensors / mHealth |
Continuous vitals, activity, CGM, ECG |
LSTM, CNN, federated learning |
Real time ADE monitoring, adherence tracking |
Data quality, regulatory acceptance, interoperability |
|
Genomics & Biobanks |
SNPs, WGS/WES, polygenic risk scores |
GNN, transformer, GWAS ML hybrid |
Pharmacogenomics driven safety profiling |
Linkage to clinical outcomes, consent issues |
Federated Learning for Multi Institutional Safety Surveillance
A fundamental tension in AI driven pharmacovigilance is the requirement for large, diverse patient populations to train robust signal detection models versus the privacy, regulatory, and institutional constraints that prevent centralization of patient level data. Federated learning (FL)—a distributed ML paradigm in which models are trained across multiple institutions without sharing raw data—offers a principled resolution to this tension [81]. In FL, each participating institution trains a local model on its own data; only model parameters (gradients or weights) are shared with a central aggregation server, which combines them to produce a global model.
TriNetX, OMOP based federated networks (PCORnet, OHDSI), and the European Health Data Space represent emerging federated data infrastructure for pharmacovigilance [82]. Studies applying FL to ADE detection have demonstrated that federated models approach the performance of centralized models trained on pooled data, with only 2–5% AUC reduction, while maintaining privacy guarantees through differential privacy augmentation [83]. The MELLODDY consortium demonstrated FL based molecular property prediction across 10 pharmaceutical companies without sharing proprietary compound data, providing proof of concept for cross industry federated pharmaceutical AI [84].
EXPLAINABLE AI AND SCIENTIFIC INTERPRETABILITY
The Black Box Problem in Pharmaceutical AI
Deep learning models, despite their superior predictive performance, are fundamentally black boxes: they produce predictions without transparent justification of the features and reasoning processes that led to them [85]. This opacity poses profound challenges in pharmaceutical applications, where mechanistic understanding is inseparable from scientific validity, and where regulatory submissions require traceable, reproducible, and interpretable evidence pathways. A model that predicts a compound will cause hepatotoxicity but cannot specify which structural motifs or pharmacophoric features drive the prediction provides limited actionable insight to medicinal chemists.
XAI Methodologies
The field of explainable AI (XAI) has produced a diverse toolkit for post hoc interpretation of black box models [86]. SHapley Additive exPlanations (SHAP), derived from cooperative game theory, quantifies the marginal contribution of each feature to a model's prediction across all possible feature subsets, providing both global (population level) and local (sample level) feature importance [87]. LIME (Locally Interpretable Model Agnostic Explanations) approximates complex models locally with interpretable surrogates. Gradient based attribution methods—Integrated Gradients, GradCAM, SmoothGrad—compute attribution scores by backpropagating prediction gradients to input features.
For molecular models, attention visualization in graph neural networks provides chemically meaningful atom and bond level attribution, identifying pharmacophoric contributions to predicted properties [88]. Concept based explanations (TCAV, ConceptSHAP) provide higher level mechanistic interpretations aligned with chemical knowledge, enabling validation of model reasoning against established structure–activity relationships (SAR). The integration of XAI with medicinal chemistry visualization tools (3D pharmacophore mapping, SAR heat maps) bridges the gap between computational predictions and experimental design decisions.
Regulatory Expectations for Model Explainability
Regulatory agencies have increasingly articulated expectations for AI model interpretability in pharmaceutical submissions. The FDA's Artificial Intelligence/Machine Learning Based Software as a Medical Device (AI/ML based SaMD) Action Plan (2021) identifies transparency and explainability as core components of trustworthy AI, emphasizing the need for clear documentation of model architecture, training data characteristics, performance evaluation, and known limitations [89]. The EMA's Reflection Paper on AI (2023) requires that AI models used in clinical development be supported by appropriate validation evidence and that their outputs be interpretable in the context of clinical decision making [90].
The EU AI Act (2024) establishes a risk tiered regulatory framework in which AI systems classified as high risk—including those used in clinical diagnosis, treatment recommendation, and pharmaceutical decision support—are subject to mandatory conformity assessments encompassing transparency, explainability, robustness, and bias mitigation requirements [91]. Compliance with these requirements necessitates systematic XAI documentation as part of the model lifecycle management process, creating a direct regulatory demand for explainability in pharmaceutical AI.
REGULATORY COMPLIANCE AND INTERNATIONAL FRAMEWORKS
US FDA Framework for AI in Drug Development
The FDA has established a progressively sophisticated regulatory framework for AI in pharmaceutical development, spanning device software, drug development tools, and real world evidence. The Real World Evidence Framework (2018) established the conditions under which RWE from observational studies, pragmatic trials, and administrative data could support regulatory decisions, including new indications, labeling updates, and post market surveillance [92]. The PDUFA VII commitment letter (2022) included specific provisions for developing an AI/ML action plan encompassing algorithmic bias assessment, iterative learning oversight, and pre determined change control protocols (PCCPs) for adaptive AI models [93].
The FDA's Biomarker Qualification Program and Drug Development Tool (DDT) qualification pathway provide regulatory pathways for AI derived biomarkers and digital endpoints, requiring demonstration of analytical validity, clinical validity, and context of use appropriateness. The FDA's emerging framework for AI in drug manufacturing (PAT—Process Analytical Technology) enables real time adaptive control of manufacturing processes using AI, with implications for quality by design and continuous manufacturing [94].
EMA and EU Regulatory Framework
The European Medicines Agency has published a series of regulatory science initiatives addressing AI in medicines development. The EMA Regulatory Science Strategy to 2025 (2020) explicitly identifies AI as a priority area, emphasizing the need for qualification procedures, scientific advice frameworks, and capacity building within the Agency for AI evaluation competency [95]. The EMA's Guidance on the Use of Real World Data in Benefit Risk Assessment provides a structured framework for the regulatory use of observational data, emphasizing design based confounding control and causal inference methodology.
The EU AI Act, adopted in 2024, creates the world's first comprehensive AI regulatory framework with direct implications for pharmaceutical AI. Medical device AI systems are classified as high risk per Annex III, requiring conformity assessment, CE marking, and registration in the EU AI database. The Act mandates human oversight provisions, algorithmic transparency, and data governance requirements that substantially overlap with good clinical practice (GCP) and good pharmacovigilance practice (GVP) obligations—creating an opportunity for integrated compliance frameworks [96].
ICH Guidelines and International Harmonization
The International Council for Harmonisation (ICH) provides the primary framework for international regulatory harmonization in pharmaceutical development. ICH E6(R3), the Good Clinical Practice guideline finalized in 2023, explicitly addresses the use of digital technologies and real world data in clinical trials, requiring systematic data governance, audit trails, and algorithmic validation for AI assisted trial processes [97]. ICH E17 provides guidance on multi regional clinical trial designs that incorporate AI based subgroup analyses across regulatory jurisdictions. ICH Q12 on product lifecycle management establishes a framework for post approval change management that accommodates AI model updates without requiring full regulatory submissions for pre specified, validated changes [98].
Comparative International Regulatory Landscape
Beyond the FDA, EMA, and ICH, several national regulatory agencies have developed AI specific frameworks. The PMDA (Japan) has established a regulatory sandbox for AI aided drug development under the Sakigake designation scheme and published AI principles emphasizing uncertainty quantification and model documentation [99]. Health Canada has issued guidance on AI as a medical device and is piloting AI assisted review processes within its regulatory operations.
The Central Drugs Standard Control Organisation (CDSCO) of India is developing digital health guidelines that encompass AI pharmacovigilance applications within the National Pharmacovigilance Programme. Table 3 provides a comparative summary of major international regulatory frameworks.
Table 3. Comparative International Regulatory Frameworks for AI/ML in Drug Development and Pharmacovigilance
|
Regulatory Body |
Key Guidance / Framework |
AI/ML Relevance |
XAI / Transparency Requirement |
Enforcement Status |
|
US FDA |
AI/ML Based SaMD Action Plan (2021); PDUFA VII (2023); Real World Evidence Framework (2018) |
Predetermination change protocol; iterative learning oversight |
Explainability strongly encouraged; bias audits required |
Active; multiple AI devices cleared via 510(k)/De Novo |
|
EMA (EU) |
Reflection Paper on AI (2023); EU AI Act (2024, risk tiered); GVP Modules VI & IX |
High risk AI classification for diagnostics/therapeutics |
Mandatory XAI for high risk; CE marking implications |
EU AI Act enforcement begins 2026; ongoing EMA reflection |
|
ICH |
ICH E6(R3) GCP (2023); ICH E17; ICH Q12 lifecycle management |
RWE in regulatory submissions; decentralized trial guidance |
Audit trails; algorithmic traceability in clinical evidence |
ICH E6(R3) finalized 2023; ICH Q12 implementation ongoing |
|
PMDA (Japan) |
AI Principles (2022); RWD guidelines; Sakigake designation for AI aided drugs |
Regulatory sandbox for adaptive trial designs using AI |
Model documentation; uncertainty quantification required |
Pilot AI review track active since 2022 |
|
CDSCO (India) |
Draft Digital Health Guidelines (2023); New Drugs & Clinical Trials Rules 2019 |
RWE from EMR networks accepted; AI aided pharmacovigilance piloted |
Emerging; SUGAM portal digitization; transparency frameworks under development |
Evolving; National Pharmacovigilance Programme integration |
ETHICAL CONSIDERATIONS AND ALGORITHMIC BIAS
Data Equity and Representativeness
AI models trained on biased data will perpetuate and potentially amplify existing healthcare disparities. The historical underrepresentation of women, racial minorities, elderly patients, and pediatric populations in both clinical trials and the databases underlying AI training is a well documented problem [100]. When ML models trained on predominantly white, male, middle aged clinical cohorts are applied to diverse populations, their predictive validity may be substantially reduced—a form of algorithmic bias with direct patient safety implications. Pfizer's analysis of FAERS data found that ADE reporting rates varied systematically by race and gender, suggesting that AI models trained on this data may have differential pharmacovigilance sensitivity across demographic subgroups [101].
Transparency and Accountability in Autonomous Systems
The deployment of AI systems in pharmaceutical decision contexts raises fundamental questions about accountability, particularly when AI recommendations influence clinical or regulatory decisions that affect patient outcomes. The concept of 'meaningful human control' over AI assisted decisions is increasingly cited in regulatory documents as a prerequisite for responsible deployment—ensuring that trained experts retain interpretive authority over AI outputs and are not inappropriately deferential to algorithmic recommendations [102]. Responsible AI frameworks including NIST's AI Risk Management Framework (AI RMF 1.0, 2023) and ISO/IEC 42001 provide structured methodologies for AI governance, risk assessment, and lifecycle management applicable to pharmaceutical contexts [103].
Data Privacy and Consent Frameworks
The use of patient data—including EHRs, genomics, and wearable sensor streams—for AI training in pharmacovigilance raises complex privacy and consent challenges. GDPR (EU), HIPAA (US), and emerging national data protection regimes impose varying obligations on data controllers and processors using patient data for AI development [104]. Synthetic data generation—using GANs, VAEs, or diffusion models to produce statistically faithful but privacy preserving patient datasets—has emerged as a promising technical approach to training pharmacovigilance AI models without direct exposure of personal health information [105]. However, re identification risks in synthetic data, particularly for rare disease cohorts or genomically characterized patients, necessitate rigorous privacy evaluation frameworks such as the k anonymity and differential privacy guarantees.
EMERGING DIRECTIONS AND FUTURE PERSPECTIVES
Multimodal Foundation Models for Drug Development
The development of large, pre trained foundation models capable of processing diverse modalities—chemical structures, protein sequences, 3D geometries, clinical text, imaging data—simultaneously represents the frontier of pharmaceutical AI [106]. Models like BioMedGPT, GeneGPT, and Med PaLM 2 demonstrate the potential for unified representations that bridge molecular, cellular, and clinical scales of biological description. Multi modal transformers pre trained jointly on chemical, genomic, and clinical data have shown zero shot and few shot generalization to unseen pharmaceutical tasks, suggesting the emergence of general pharmaceutical intelligence [107].
Protein language models (PLMs) trained on hundreds of millions of evolutionary sequences encode rich functional information that can be leveraged for binding site prediction, mutation effect estimation, and protein engineering. ESM 2 (Lin et al., 2022), with 15 billion parameters trained on 250 million UniRef90 sequences, provides sequence based structural and functional embeddings that approach AlphaFold2 accuracy for per residue contact prediction, with orders of magnitude lower computational cost [47]. Integration of PLMs with small molecule transformers through cross modal attention architectures offers a path toward end to end drug–target interaction modeling that does not require explicit structure determination.
AI Driven Precision Pharmacovigilance
The vision of precision pharmacovigilance—personalized safety monitoring tailored to individual patient risk profiles based on genetic, proteomic, and clinical characteristics—is becoming technologically achievable [108]. Pharmacogenomics AI models trained on GWAS data and clinical outcome databases can predict susceptibility to specific ADEs (e.g., drug induced liver injury, QTc prolongation, Stevens Johnson syndrome) at the individual level, enabling proactive risk stratification at the point of prescribing [109]. Integration with clinical decision support systems (CDSS) and EHR platforms allows these predictions to be delivered in the clinical workflow as real time alerts, closing the loop between pharmacogenomics discovery and practice.
Quantum Computing and Next Generation ML
Quantum computing offers theoretical advantages for pharmaceutical AI through quantum machine learning (QML) algorithms and quantum chemistry simulation capabilities that may enable the exact solution of molecular Hamiltonians for drug sized molecules—currently intractable on classical computers [110]. Variational quantum eigensolvers (VQE) and quantum approximate optimization algorithms (QAOA) have been applied to small molecular systems; scaling to drug sized compounds remains an engineering challenge but represents a potential paradigm shift for binding energy prediction. Quantum classical hybrid ML architectures are being explored for molecular property prediction and generative design, though practical quantum advantage for pharmaceutical applications has not yet been demonstrated at scale [111].
LLM Agents and Autonomous Drug Discovery
The integration of large language model agents with laboratory robotics and automated synthesis platforms is enabling the emergence of autonomous or semi autonomous drug discovery workflows [112]. Platforms such as Chemist X, Coscientist (Boiko et al., 2023), and ChemCrow demonstrate LLM agents capable of formulating hypotheses, designing experiments, querying databases, and interpreting results in an iterative loop. Coscientist demonstrated autonomous design and execution of palladium catalyzed cross coupling reactions without human intervention [113]. While current capabilities are limited to relatively simple chemical operations, the trajectory suggests future systems capable of conducting multi step drug optimization campaigns with minimal human oversight, profoundly reshaping the pharmaceutical R&D workforce and process model.
PERSISTENT CHALLENGES AND CRITICAL GAPS
Data Quality, Standardization, and Interoperability
Despite the volume of biomedical data available for AI training, data quality, standardization, and interoperability remain critical constraints. EHR data is characterized by systematic missingness, temporal irregular sampling, free text heterogeneity, and institution specific coding practices that challenge ML model development and generalization [114]. The adoption of standardized data models (OMOP CDM, HL7 FHIR) and controlled ontologies (SNOMED CT, MedDRA, RxNorm) is essential for AI pharmacovigilance systems to generalize across institutions and national healthcare systems, but implementation is uneven and resource intensive. The Global Burden of Disease AI Network and WHO's HealthAI initiative are pursuing standardization frameworks, but universal adoption remains a long term challenge [115].
Model Generalizability and Distribution Shift
A fundamental challenge for pharmaceutical ML is generalizability—the ability of a model trained on one dataset to perform reliably on structurally dissimilar compounds, different patient populations, or different time periods [116]. Distribution shift—the difference between training and deployment data distributions—is a pervasive source of performance degradation. In molecular property prediction, models frequently fail to generalize to novel scaffolds outside the training chemical space (scaffold based distribution shift). In pharmacovigilance NLP, models trained on academic medical center notes may underperform on community hospital records with different documentation cultures [117]. Prospective temporal validation and external geographic validation are critical but frequently omitted from published pharmaceutical AI studies.
Regulatory Science Readiness
Despite significant progress, regulatory agencies worldwide lack full capacity for AI evaluation, with workforce gaps in computational expertise limiting the depth of algorithmic review possible within current regulatory timelines and resources [118]. The development of regulatory AI evaluation methodologies—analogous to statistical analysis plan review and GCP audit procedures—is an active area of regulatory science investment. Collaborative initiatives including the FDA EMA parallel scientific advice program, the IMI Trials@Home consortium, and the Critical Path Institute's AI working groups are developing shared regulatory standards for AI validation that may accelerate cross jurisdictional harmonization [119].
CONCLUSION
This systematic review has documented the remarkable transformation wrought by artificial intelligence and machine learning across the full continuum of drug discovery and pharmacovigilance. Graph neural networks and transformer based architectures have achieved state of the art performance in molecular property prediction, drug–target interaction modeling, and structure based design. Generative AI frameworks have demonstrated the capacity to explore vast chemical space and propose novel molecular entities with specified pharmacological profiles. NLP based systems have transformed adverse drug event surveillance, enabling proactive signal detection from EHRs, spontaneous reports, and patient generated social media content at scales and speeds inaccessible to manual review.
The integration of real world evidence into AI driven pharmacovigilance represents a paradigm shift in post market safety science, enabling longitudinal safety monitoring across millions of patients with temporal precision and mechanistic granularity. Federated learning frameworks have made multi institutional data harmonization feasible without privacy compromise. The emerging precision pharmacovigilance paradigm—personalized safety profiles based on pharmacogenomic and phenotypic individual characteristics—offers the prospect of proactive, predictive safety management rather than reactive adverse event reporting.
However, the realization of AI's full translational potential in pharmaceutical development requires coordinated advances across multiple dimensions. Model interpretability through XAI methodologies must evolve from academic novelty to regulatory standard. Algorithmic bias auditing must become integral to AI validation frameworks, ensuring that benefits of AI driven drug development are equitably distributed. Regulatory frameworks in the US, EU, and Asia Pacific require ongoing harmonization to prevent divergent national requirements from impeding the global development and deployment of beneficial pharmaceutical AI systems.
Ultimately, artificial intelligence is not a replacement for the biological intuition, clinical judgment, and ethical responsibility of pharmaceutical scientists and healthcare practitioners, but a powerful augmentation of these human capacities. The most transformative pharmaceutical AI deployments will be those that preserve and enhance human agency, transparency, and accountability while expanding the frontiers of what is computationally possible in the service of human health.
REFERENCES
Darshan K R, Shubham Shivangekar, Yash Vispute, Gayatri Dhamane, Parth Thorat, Prathamesh Chavan, Revolutionizing Drug Discovery and Pharmacovigilance Through Artificial Intelligence:A Comprehensive Systematic Review of Machine Learning Architectures, Real? ? World Evidence Integration, and Regulatory Compliance, Int. J. of Pharm. Sci., 2026, Vol 4, Issue 7, 296-319, https://doi.org/10.5281/zenodo.21131900
10.5281/zenodo.21131900