1Department of Mechanical and Aerospace Engineering, Illinois Institute of Technology, 3201 South State Street, Chicago, 60616, Illinois, United States of America
2Department of Biochemistry, Postgraduate Institute of Medical Education and Research, Sector-12, Chandigarh, 160012, India
This paper investigates the developments in machine learning (ML) applications within the pharmaceutical industry along with biomedical and healthcare sectors. ML demonstrates its crucial impact through biomarker identification and improvements in drug discovery and diagnostic accuracy. The biomedical sector has implemented multiple ML algorithms, including Support Vector Machines (SVMs) and Random Forests (RFs), to detect microRNA (miRNA) biomarkers for cancer with excellent classification performance. Automated learning tools, including BioAutoML, have optimized feature extraction and model selection to surpass traditional methods regarding predictive performance. The pharmaceutical sector has benefited from ML-integrated high-content screening platforms that have resulted in the discovery of new antibacterial agents. Metrics such as the lowest effective dose (LOED) have broadened the scope of antibiotic discovery. The paper demonstrates how ML brings transformative enhancements to efficiency and accuracy along with innovative advancements in critical industry sectors.
Recent years have witnessed significant progress in the adoption of ML technologies across pharmaceutical, biomedical, and healthcare industries. As a specialized branch of artificial intelligence (AI), ML has transformed pharmaceutical and healthcare sectors through its impact on operational efficiency as well as accuracy and innovation within multiple processes that include drug discovery and patient care. ML algorithms allow industries to analyze massive datasets with efficiency, which previously required extensive labor and time, thus accelerating decision-making and enhancing performance outcomes [1]. The pharmaceutical sector has experienced a major transformation in drug discovery and development because of ML advancements. Traditional drug development methods demand extensive time and resources for trials, yet ML streamlines these processes by predicting biological activity and optimizing drug formulations. The progress in ML has enabled high-throughput screening, which helps researchers discover potential drug candidates with greater efficiency. The application of ML proves crucial for solving the problems caused by high attrition rates in clinical trials since many candidates do not advance in later stages because of unexpected biological interactions [2, 3].
ML applications reach other fields beyond the pharmaceutical sector. ML methods deployed in the biomedical domain analyze complex biological datasets and facilitate genomic research while also enhancing diagnostic precision. These algorithms enable real-time biological process modeling, which provides insights previously inaccessible with conventional methods. ML has become vital in creating personalized medical treatments by combining genetic data with demographic information, which enables healthcare providers to develop treatment plans suited to each patient's unique profile [4].
The healthcare sector has adopted ML technologies more frequently to improve both patient care quality and operational effectiveness. ML demonstrates cost reduction and better patient outcomes through its applications in patient monitoring, predictive analytics, and administrative task automation. AI-driven applications equip clinicians with real-time decision support mechanisms that enable faster patient responses and treatment pathway optimization [5]. The review highlights how advancements in ML applications have created a substantial transformative effect across pharmaceutical, biomedical, and healthcare sectors. The review demonstrates how ML applications boost productivity and effectiveness in pharmaceutical and healthcare sectors and underscores the importance of ongoing research to solve current challenges. The integration of ML with established methods will transform pharmaceutical and healthcare advancements to deliver enhanced solutions that will serve humanity [6, 7].
The pharmaceutical, biomedical, and healthcare industries have transformed through ML, which delivers numerous benefits and applications that improve operational efficiency alongside drug discovery and patient care.
BENEFITS
ML technology enables these industries to speed up their drug discovery processes, which is considered one of its most significant advantages. Pharmaceutical companies can use ML algorithms to process extensive datasets for discovering potential drug candidates in both efficient and effective ways [11]. The accelerated identification process cuts down both development time and research and development expenditure [2]. ML reduces dependency on conventional trial-and-error techniques through the implementation of more focused drug testing strategies. The advanced predictive analytics power of ML proves to be extremely valuable in clinical trials. ML algorithms forecast clinical trial results through analysis of past trial data which boosts success likelihood while decreasing drug development financial risks. ML algorithms enable the identification of patient groups more likely to benefit from particular treatments thereby enhancing personalized medical practices [2]. ML addresses the big data analysis difficulties faced in biomedical research. The exponential growth of healthcare data allows ML algorithms to analyze this information for hidden insights which can enhance patient care and improve healthcare systems [12]. In genomics research genetic data analysis through ML enables identification of disease susceptibility factors which supports timely medical interventions.
ML technology finds many different uses within healthcare and pharmaceutical sectors while these applications consistently develop. The pharmaceutical research and development sector utilize ML for both discovering new drugs and optimizing them. The latest algorithms evaluate chemical substances and biological reactions by modeling potential drug-body interactions [13]. Leading pharmaceutical corporations Novartis and AstraZeneca use ML to optimize drug development by deploying algorithms that forecast both drug efficacy and safety results [11]. ML applications in clinical environments serve diagnostic purposes and support treatment planning procedures. ML models are now widely used to evaluate imaging data including X-rays and Magnetic Resonance Imaging (MRI) scans which helps healthcare professionals diagnose cancer faster and with greater precision. Through wearables and Internet of Things (IoT) devices ML enables patient remote monitoring which improves patient engagement and facilitates proactive healthcare management using real-time data analysis [11]. ML plays a vital role in pharmacovigilance as an essential tool for monitoring drug safety after they have been released to the market. ML algorithms have the capability to examine healthcare database reports of adverse drug events to detect possible safety signals while assisting with regulatory compliance requirements [14]. Through its application, patient safety becomes guaranteed while drug development receives optimized feedback regarding potential marketing phase issues. ML technologies are currently utilized to refine and optimize clinical trial designs. ML algorithms use historical trial data to create more resource-efficient trials by developing superior recruitment strategies and endpoints [15]. ML integration within healthcare processes creates valuable insights while enhancing patient care and stimulating innovation throughout the pharmaceutical, biomedical, and healthcare industries.
The number of articles covered in this review on the Advancements in ML Applications for the Pharmaceutical, Biomedical, and Healthcare Industries are shown in Figure 1 from 2019 through 2024.
Fig. 1. Number of articles on advancements in ML applications for the Pharmaceutical, Biomedical, and Healthcare Industries vs. Year
Table 1 below shows a quantitative distribution by publisher of the number of articles related to the advancements in ML applications for the Pharmaceutical Industries.
Table 1. Number of articles from different publishers reviewed on the advancements in ML applications for the Pharmaceutical Industries
Publisher |
Number of Articles Reviewed |
Springer |
9 |
MDPI |
5 |
Elsevier |
4 |
ACS Publications |
2 |
Frontiers |
2 |
IEEE |
2 |
Oxford University Press |
2 |
PLOS |
2 |
ETFLIN |
1 |
Journal of Advanced Zoology |
1 |
LPPM ISB Atma Luhur |
1 |
Royal Society of Chemistry |
1 |
The American Association for Cancer Research (AACR) |
1 |
Total |
33 |
Zoffmann and his team (2019) designed a semi-automated system that merges high-content screening with ML to process phenotypic data from 1.5 million compounds, which resulted in the discovery of new antibacterial agents with unique mechanisms of action. The development of the LOED metric showed that antibacterial effects could be detected at concentrations below the minimum inhibitory concentration (MIC), which broadened the chemical space available for antibiotic discovery [20]. The study by Galata et al. (2019) utilized ANNs to estimate the dissolution profiles of extended-release tablets through spectroscopy data analysis. The ANN model achieved better accuracy than PLS models by utilizing near-infrared and Raman spectroscopy data, which led to enhanced precision in pharmaceutical formulation analysis [21]. Ruano-Ordás et al. (2019) developed the Drug Discovery Multiple Classifier System (D2-MCS) tool, which addressed high-dimensional datasets in drug discovery by segmenting molecular data into feature-based groups and selecting the best classifier for each group. The multi-classifier system achieved superior predictive results compared to single-model approaches, which improved the discovery of biologically active compounds [22]. The research of Abbas et al. (2020) introduced a blockchain-based drug supply chain management and recommendation system that used N-gram and LightGBM models trained on drug reviews to build consumer trust while reducing counterfeit drug circulation. The research showed how ML can enhance both transparency and recommendation precision in pharmaceutical supply chains [23]. Sturm et al. (2020) investigated deep learning methods to predict drug targets using the ExCAPE-DB dataset and demonstrated the superiority of these methods compared to conventional ML techniques. The ML models used by researchers achieved high predictive accuracy through knowledge transfer from public datasets to internal pharmaceutical datasets, specifically in protein-ligand interaction identification [24]. Park et al. (2020) used ML to enhance lead optimization of anticancer drugs that failed during phase III trials. The researchers created a deep learning framework that combines long short-term memory (LSTM) networks with convolutional neural network (CNN) architectures together with Molecule Deep Q-Networks to enhance molecular attributes, including binding affinity and toxicity. Researchers optimized failed drug candidates by enhancing both drug-likeness and synthetic accessibility scores, which showed potential for clinical success [25]. The research team led by Ong in 2020 created Vaxign-ML, which is a supervised ML system that uses five different ML techniques to enhance the prediction of bacterial protective antigens (BPAgs) for vaccine development while they validated their results through cross-validation methods. The research showed that eXtreme Gradient Boosting (XGBoost) outperformed existing predictive tools for BPAgs, and the model was released to the public via a web server and GitHub [26]. Mohsen et al. (2021) used deep learning techniques to create models that predicted adverse drug reactions (ADRs) by integrating Open TG-GATEs gene expression data and FAERS-reported ADRs. The deep neural network (DNN)-trained model reached a mean accuracy of 89.94% across 14 predictive models and successfully identified drug-induced duodenal ulcers and hepatitis fulminant with area under the curve (AUC) values between 0.76 and 0.99 [27]. Masumshah and his team (2021) developed the Neural Network-based Polypharmacy Side Effects Prediction (NNPS) model, which utilizes mono side effects and drug-protein interaction data to evaluate drug-drug interactions (DDIs). The NNPS model surpassed five existing approaches by increasing the Area Under the Receiver-Operating Characteristic (AUROC) by 9.2% while cutting computation time down from 15 days to 8 hours [28]. A Bayesian optimization algorithm was used by Narayanan et al. (2021) to enhance biopharmaceutical formulation processes, which minimized experimental demands while facilitating concurrent optimization of various properties. Their approach cut down the needed experiments to a third of standard methods, which led to improved developability of biologic drugs [29]. Pandi and colleagues (2021) created a ML system using RF as the main algorithm to categorize pharmacogenomic variants according to their functional effects, which achieved 85% accuracy and an AUC score of 0.92. The researchers demonstrated their model's effectiveness for both whole genome sequencing and targeted pharmacogenomic data, which showed promise for personalized medicine by prioritizing genetic variants [30]. The research by Wang et al. (2021) presented DeepDRK as a deep learning platform that merges multi-omics data to find new use cases for existing drugs in cancer therapy. DeepDRK demonstrated high predictive accuracy through the analysis of more than 20,000 drug-cell line pairs by achieving an AUC of 0.84, which helped identify effective drug repurposing candidates [31]. Wang et al. (2022) implemented ML algorithms RF and XGBoost to forecast metabolic drug interactions concerning cytochrome P450 isozymes, which improved DDI assessment and assisted clinical pharmacy. The study's methodology combined multiple chemical descriptors with consensus-based predictions to achieve 80% internal validation accuracy and 79.5% external validation accuracy, which identified 54,013 potential drug interaction pairs [32]. Zhu and Dupuy (2022) utilized a machine-learning system that combined biological knowledge to examine drug response mechanisms in cancer while highlighting the pathways that determine drug sensitivity and resistance. The researchers found critical biological elements that influence treatment outcomes for GPX4, BRAF, and microtubule inhibitors and discovered new resistance pathways like NOTCH3/PAX8 signaling during paclitaxel therapy [33]. Han et al. (2022) developed a model using XGBoost to identify new target-disease connections in the Open Targets platform by combining various biological features such as tissue specificity and protein-protein interactions, which achieved an area under the precision-recall curve of 0.73 during validation [34]. Qureshi et al. (2022) implemented an XGBoost classifier to build a personalized drug response prediction model that combines molecular dynamics simulations and clinical data for predicting lung cancer patients' reactions to targeted therapy. The model obtained a 97.5% accuracy rate, which demonstrated the importance of geometric features for drug-target interactions [35]. Using ML techniques, Goldwaser et al. (2022) successfully identified inhibitors that target the cytochrome P450 2C9 (CYP2C9) enzyme to prevent harmful DDIs. The predictive models developed from public databases demonstrated about 80% accuracy when validated in vitro assays confirmed the inhibitory properties of the identified compounds [36]. The study conducted by Rahman et al. (2022) improved antibacterial drug discovery through directed-message passing neural networks, which boosted hit rates for FDA-approved compounds and natural products by over 14 times compared to conventional screening methods [37]. Badwan et al. (2023) examined how ML algorithms can be used in oncology to improve drug efficacy and toxicity predictions through enhanced disease state and therapeutic agent representation. The research findings demonstrated the expanding influence of ML tools in the fields of drug discovery and repurposing while emphasizing the need for a comprehensive understanding of ML techniques to enhance cancer therapy approaches [38]. The research conducted by Bannigan et al. (2023) used ML to speed up polymeric long-acting injectables (LAIs) development and found the Light Gradient Boosting Machine (LGBM) model to be the most precise in forecasting drug release with a mean absolute error of 0.125. They demonstrated that ML-driven predictive models can make pharmaceutical manufacturing more efficient by cutting down both development time and costs compared to traditional formulation development methods [39]. The research team led by Hou (2023) developed a ML-based data analysis approach for identifying ligands with DNA-encoded libraries (DELs) in cell-based selection processes. The research used a Maximum A Posteriori (MAP) loss function to lessen noisy data effects, which improved hit identification and structure–activity relationships (SAR) accuracy for therapeutic compound research [40]. A ML model built by Vojjala et al. (2023) fills gaps in pharmacy cost information within claims data to boost data completeness and refine healthcare cost evaluations. The ML model constructed from a fully informative dataset demonstrated superior performance compared to conventional imputation methods through enhanced prediction accuracy and reliability for pharmacy costs [41]. Patel and colleagues (2023) developed DE-INTERACT, which uses ML to predict interactions between drugs and excipients during pharmaceutical development. Through experimental studies, researchers validated the tool using paracetamol and vanillin as case studies, which confirmed its capability to predict significant drug-excitement interactions important for drug formulation [42]. Pirzada et al. (2023) used ML techniques to discover small-molecule glycogen synthase kinase 3 (GSK3) inhibitors, which could serve as COVID-19 treatment options. Their analysis of ChEMBL database datasets enabled predictive models to select selinexor and ruboxistaurin as GSK3 inhibitors, while molecular dynamics simulations validated their stability and potential efficacy [43]. Shin and colleagues (2023) applied ML techniques to create structure-activity relationship models, which helped identify new phytochemicals that block the glucocorticoid receptor and showed potential to fight obesity. The two-step workflow produced 65 potential compounds, with nine receiving validation and demethylzeylasteral emerging as a promising therapeutic agent [44]. Asha et al. (2024) merged ML techniques with blockchain systems to advance drug discovery and development processes by using generative adversarial networks (GANs) for creating molecules and deep learning methods to predict drug targets alongside reinforcement learning strategies for designing clinical trials. The research demonstrated advancements in precision and efficiency and increased security in drug research that may transform pharmaceutical innovation processes [45]. The study by Arunkumar and Baskaran (2024) implemented ANNs for predicting DDIs by analyzing pharmacokinetic and pharmacodynamic data obtained from Lexi-Comp and Vidal databases. The research showed how ML could improve pharmacovigilance with multi-layer perceptron models reaching an F1 score of 82% for minor interactions and 54% for major interactions [46]. The research conducted by Singh and Kaewprapha (2024) utilized the You Only Look Once (YOLOv7) model to improve real-time detection of defective and QC-approved tablets during pharmaceutical production, which resulted in 97.5% accuracy, thus showcasing ML capabilities for QC [47]. Bello et al. (2024) developed a ML approach using speckle pattern imaging to categorize parenteral artificial nutrition pharmaceutical suspensions with RF and Multi-Layer Perceptron algorithms to enhance traditional optical analysis methods. The research demonstrated that statistical imaging techniques combined with ML algorithms provide accurate drug classification methods [48]. Cysewski et al. (2024) investigated active pharmaceutical ingredient solubility in choline- and betaine-based deep eutectic solvents using 8014 data points and Nu Support Vector Regression (nuSVR)-based predictive models to improve solubility predictions [49]. Mustapa and Tjahyanto (2024) conducted research on ML methods to improve Total Organic Carbon (TOC) level predictions in pharmaceutical water treatment systems because accurate predictions are crucial for product integrity protection. The researchers compared linear regression, RF, and multilayer perceptron models and found that RF achieved the best predictive accuracy of 95%, which enhanced monitoring and maintenance processes [50]. Nhlapho et al. used ML to classify drug compounds based on Lipinski’s Rule of Five in 2024. The authors achieved near-perfect classification results by using RF, Extreme Gradient Boost, and Decision Tree classifiers, with RF achieving 99.94%. The research led to the creation of DrugCheckMaster, which enables efficient screening of compounds [51]. Kalaichelvan and team (2024) enhanced pharmaceutical inventory management by applying fuzzy theory and ML techniques, which combined pentagonal fuzzy numbers with the naive Bayes classifier. The proposed method demonstrated 95.9% classification accuracy and successfully tackled storage limitations while reducing inventory expenses [52].
Table 2 below shows a quantitative distribution by publisher of the number of articles related to the advancements in ML applications for the Biomedical Industries.
Table 2. Number of articles on the advancements in ML applications for the Biomedical Industries by Publisher
Publisher |
Number of Articles Reviewed |
Springer |
13 |
Frontiers |
3 |
IEEE |
3 |
MDPI |
3 |
Elsevier |
2 |
Oxford University Press |
2 |
Wiley |
2 |
arXiv (Cornell University) |
1 |
European Alliance for Innovation (EAI) |
1 |
European Association of Percutaneous Cardiovascular Interventions (EAPCI) |
1 |
Massachusetts Medical Society |
1 |
PLOS |
1 |
Total |
33 |
Kim et al. (2019) improved biomedical named entity recognition (BioNER) performance by employing a bootstrapping approach that combined Conditional Random Fields (CRFs) with LSTM networks, resulting in a 23.69% F1-score enhancement over traditional methods. A repeated machine-generated corpus labeling approach resulted in substantial improvements in entity recognition performance across numerous biomedical sub-domains [53]. The BioWordVec word embedding model developed by Zhang et al. (2019) addresses the shortcomings in biomedical word representations by combining subword information with Medical Subject Headings (MeSH). In biomedical natural language processing (BioNLP) tasks such as relation extraction and semantic similarity computations, their model achieved better results compared to existing word embedding methods [54]. The researchers Hathaway and colleagues (2019) used ML algorithms to categorize type 2 diabetes mellitus (T2DM) patients based on cardiac biomarkers and integrative genomics and achieved an 84% accuracy rate through Classification and Regression Trees (CART) and SHapley Additive exPlanations (SHAP). The researchers discovered meaningful associations between nuclear methylation levels and mitochondrial functionality, which relate to diabetic status, and identified potential new biomarkers for diabetes diagnostics [55]. Richens and colleagues (2020) enhanced diagnostic precision by redefining medical diagnosis as a counterfactual inference task, which allowed causal ML techniques to surpass both associative algorithms and expert clinicians in rare disease detection [56]. Martino et al. (2020) created a ML method that predicts Ki67/MIB1 labeling indices using hematoxylin and eosin-stained sections and determined nuclear hematoxylin mean optical density as the principal distinguishing element. The team developed a method that allowed fast quantitative analysis of tumor growth and enhanced pathology processing capabilities [57]. Hazra and Byun (2020) presented SynSigGAN, which is a GAN created to produce synthetic biomedical signals for building extensive and varied datasets while maintaining patient privacy. The new model demonstrated superior performance when compared to existing methods in producing high-quality electrocardiograms, electroencephalograms, electromyography, and photoplethysmography signals, which shows its usefulness for medical education and diagnostic purposes [58]. Marcinkiewicz-Siemion et al. (2020) introduced a new diagnostic panel for heart failure that combines ML with untargeted metabolomics to show that metabolite-based models reached 0.85 accuracy, matching conventional B-type natriuretic peptide (BNP) biomarkers. The study showed how ML can discover new biomarkers while underlining their clinical importance and the need for additional confirmation [59]. Gandouz et al. (2021) created a solution for biomedical decision-making challenges where they introduced asymmetric abstention intervals that led to better classification accuracy with lower rejection rates in imbalanced datasets, especially in cancer diagnostics. The research showed that ML models can be improved for better uncertainty management, which leads to increased reliability in critical medical decisions [60]. The research by Gu et al. (2021) produced Galaxy-ML, which delivers scalability and user-friendly integration of multiple ML tools to enhance both accessibility and reproducibility in biomedical ML applications. A benchmark analysis of 4,028 models across 276 datasets highlighted boosted tree models as top performers and demonstrated platform versatility through drug response prediction and deep learning validation applications [61]. The 2021 study by Du et al. introduced a deep learning method specifically designed for analyzing coronary angiography, which combines clinical and imaging data from multiple sources to improve diagnostic accuracy. The model demonstrated excellent performance with 98.4% accuracy in coronary segment recognition and strong F1 scores between 0.802 and 0.854 for lesion morphology detection, which enhanced diagnostic efficiency [62]. Akazawa et al. (2021) applied ML techniques to develop several predictive models for postpartum hemorrhage (PPH) in vaginal births based on clinical data. The top model's performance achieved a moderate AUC score of 0.708 yet faced limitations due to high false positive and false negative rates, which suggests that larger datasets and more predictive variables are necessary to improve results [63]. Feng et al. (2021) used ML techniques to identify substantial fibrosis in non-alcoholic fatty liver disease (NAFLD) patients, which resulted in better diagnostic performance than traditional fibrosis tests. The ML algorithm performed significantly better than logistic regression (LR) and traditional biomarkers with an AUROC score of 0.902 in the training cohort and 0.893 in the validation cohort [64]. The research by Kim (2022) implemented ML techniques using radiomic features to distinguish COVID-19 from pneumonia through chest X-ray imaging, which showcased the capabilities of automated diagnostic systems in biomedical imaging. Researchers found four primary radiomic features, which multiple classifiers analyzed, and LGBM reached the top AUC score of 0.900, showing its strong ability to differentiate between the conditions [65]. Bonidia et al. (2022) introduced BioAutoML to enhance automated learning within bioinformatics through an automated pipeline that handles feature engineering and model selection for bacterial noncoding ribonucleic acids (RNAs) prediction. This tool enhanced feature extraction and algorithm recommendation processes and perfected hyperparameter tuning to surpass established Automated ML (AutoML) frameworks RECIPE and TPOT in both efficiency and predictive accuracy [66]. Fu et al. (2022) successfully used ML strategies such as Least Absolute Shrinkage and Selection Operator (LASSO), SVM Recursive Feature Elimination (SVM-RFE), and RFs to detect diagnostic biomarkers for diabetic kidney disease (DKD). The examination of differentially expressed genes from microarray datasets allowed researchers to find potential diagnostic markers DUSP1 and PRKAR2B, which were confirmed using ROC analysis and then connected to immune cell infiltration patterns seen in DKD patients [67]. The research conducted by Zeng et al. (2022) resulted in KV-PLM, which combines molecular structures and biomedical text for enhanced molecular property prediction and drug discovery through deep learning. The KV-PLM system attained a molecular comprehension accuracy of 0.83 by analyzing Simplified Molecular Input Line Entry System (SMILES) strings and biomedical text together, which exceeded human professional performance [68]. Zhang et al. (2022) used ML approaches like weighted gene co-expression network analysis (WGCNA) and LASSO to find metabolism-related biomarkers for diabetic nephropathy (DN). Research demonstrated that the genes ADI1 and POLR2B are critical to DN development and show links to immune cell infiltration, which indicates promising avenues for diagnosis and treatment strategies [69]. Akatsuka et al. (2022) combined ultrasound imaging with clinical data and used ML techniques to improve prostate cancer detection accuracy. The integration of ultrasound imaging with clinical data in their model increased the AUC from 0.691 to 0.835, which showed enhanced detection of high-grade prostate cancer [70]. Jan et al. (2023) introduced an AI model that merged radiomics and deep learning features from computerized tomography (CT) images to classify ovarian tumors as benign or malignant and achieved 82% accuracy, which surpassed the performance of junior radiologists. Their study revealed AI's capability to improve diagnostic accuracy in medical imaging [71]. Through the application of ML to transcriptomic data from bovine embryos, Rabaglino et al. (2023) successfully detected crucial gene patterns that can predict embryonic competence with more than 85% accuracy across different datasets. The research offered important findings about reproductive efficiency via ML, which showed how large biological datasets could be combined for prediction tasks [72]. Rana and Bhushan (2023) performed a systematic evaluation of ML and deep learning (DL) methods for medical image analysis and found that DL techniques like CNNs demonstrated exceptional classification accuracy of 97.6% for MRI-based disease detection [73]. Jungo and Hewer (2023) demonstrated how ML techniques could be applied to histopathology using Microsoft Custom Vision and Google AutoML as code-free platforms to achieve precision and recall rates up to 98.4% while classifying central nervous system tumor images. The research results demonstrated ML tools' usability for non-experts and highlighted external validation as a method to prevent accuracy overestimation [74]. In their research from 2023, Shuryak et al. used a RF algorithm to process radiation-responsive biomarkers along with blood cell counts, which improved biodosimetry techniques to distinguish radiation exposure between partial-body and complete-body exposures. The model achieved high accuracy as shown by an AUROC of 0.944, which supports its application in radiological emergency response [75]. Sun et al. (2023) combined bioinformatics with ML to discover shared biomarkers between chronic obstructive pulmonary disease and atrial fibrillation and identified cyclin-dependent kinase 8 (CDK8) as a critical biomarker and a possible therapeutic target. The research team used gene co-expression analysis alongside immune cell infiltration assessment and drug prediction to discover 20 drugs that may target CDK8 [76]. Azari et al. (2023) used SVM, RF, and k-Nearest Neighbors (KNNs) to analyze The Cancer Genome Atlas (TCGA) data for gastric cancer miRNA biomarker identification, which achieved its highest classification accuracy through SVM at 93%. Through their study, researchers showed how miRNA biomarkers contribute to early detection and prognosis by associating their dysregulation with cancer pathways, including Wnt signaling [77]. Su et al. (2024) investigated few-shot BioNER by transforming it into a machine reading comprehension task and created "grape" demonstrations to boost learning performance. By employing a demonstration-based learning approach, they achieved up to a 1.1% improvement in F1 scores over traditional sequence labeling, which proved their method's viability in situations with limited resources [78]. Wu et al. (2024) applied Bayesian optimization within ML to produce functionally graded ceramic scaffolds for bone regeneration while combining lithography-based ceramic manufacturing and micro-CT analysis to ensure structural integrity. Their method succeeded in producing scaffolds with enhanced biomechanical properties, which led to better bone growth in segmental defects [79]. The research by Islam et al. (2024) introduced an unsupervised ML technique to clean biomedical signals during cardiopulmonary resuscitation (CPR), which supports real-time medical decision-making in emergency situations. The research team's multi-modal framework improved signal fidelity while reducing noise without using labeled data and demonstrated better performance in signal-to-noise and peak signal-to-noise ratios compared to existing methods [80]. Slonopas et al. (2024) demonstrated the application of ML algorithms to biomedical imaging through a ML-based histogram equalization (ML-HE) technique that integrated reservoir computing to improve both image clarity and contrast. Their results showed substantial enhancements to image visibility, which supported better diagnostic accuracy and immediate medical decisions [81]. By exploiting both CNNs and KNN models Huan et al. (2024) created a biomedical knowledge graph for symptom phenotype analysis in coronary artery plaque. The researchers attained an AUC score of 92.5%, which helped them pinpoint essential symptom connections along with central genes and molecular pathways that relate to both inflammatory responses and lipid regulation [82]. He et al. (2024) used a U-Net CNN architecture to perform human tissue classification within biomedical inverse scattering studies by applying subspace optimization techniques for acquiring dielectric permittivity distributions. The team validated their findings using synthetic data, which demonstrated precise tissue classification and highlighted deep learning potential for medical imaging applications [83]. Lehmann et al. (2024) established a ML method to identify hypoglycemic events in diabetes patients during driving through analysis of driving behavior and gaze/head motion tracking. The model obtained an AUROC score of 0.80 ± 0.11, which confirmed that noninvasive detection techniques can improve driving safety and diabetes self-management [84]. Mercaldo et al. (2024) focused on Extreme Learning Machine (ELM) for biomedical image classification, showing its advantages in cost efficiency compared to deep learning networks. The research confirmed ELM managed to deliver similar predictive outcomes as other methods while offering substantial training cost savings, thus establishing it as a practical choice for biomedical image analysis [85].
Table 3 below shows a quantitative distribution by publisher of the number of articles related to the advancements in ML applications for the Healthcare Industries.
Table 3. Number of articles on the advancements in ML applications for the Healthcare Industries by Publisher
Publisher |
Number of Articles Reviewed |
Springer |
10 |
Elsevier |
8 |
PLOS |
6 |
IEEE |
3 |
JMIR Publications |
2 |
European Modern Studies Journal (EMSJ) |
1 |
EWA Publishing |
1 |
IOS Press |
1 |
Oxford University Press |
1 |
Total |
33 |
Through the use of Hadoop-Spark for processing big data, the Naïve Bayes technique-based Big Data Predictive Analytics Model developed by Venkatesh et al. (2019) reached 97.12% accuracy in heart disease prediction. The model enabled early detection and improved health outcomes through the analysis of extensive datasets [86]. Ramkumar et al. (2019) conducted a validation study of a remote patient monitoring system for total knee arthroplasty (TKA) through the use of wearable devices together with mobile health applications. The research team attained complete continuous data collection throughout their study while patients exhibited enhanced mobility by 30% after three months of surgery and remained highly engaged with the technology, which demonstrated the importance of ML in post-surgical recovery and rehabilitation [87]. In 2019 Myers and colleagues used ML to analyze EHRs to create the FIND FH model, which identifies familial hypercholesterolemia (FH). The FIND FH model identified 1.3 million patients at high risk for FH with 85% precision, which facilitated targeted clinical evaluation and intervention processes [88]. Maarseveen et al. (2020) developed a ML workflow for identifying rheumatoid arthritis patients through EHRs. The researchers' model successfully combined two datasets to produce high-precision results with F1 scores of 0.83 and 0.82 and effectively worked across various languages and healthcare environments, which demonstrated ML's capability for efficient cohort studies [89]. A research team led by Du in 2020 created a coronary heart disease prediction model by applying the XGBoost algorithm to EHRs from a cohort of 42,000 hypertensive patients. The ML model surpassed traditional risk scales by achieving an AUC of 0.943, which highlighted big data's potential to enhance cardiovascular disease prediction accuracy [90]. El-Ganainy and colleagues (2020) presented a real-time clinical decision support system that uses Hierarchical Temporal Memory (HTM) and LSTM models for mean arterial pressure prediction in critically ill patients. Traditional models showed inferior performance compared to this system, which delivered a predictive accuracy improvement of 20% while significantly reducing decision-to-event time [91]. The 2020 study by Artzi and colleagues used ML methods to analyze EHRs from more than 588,000 pregnancies to predict gestational diabetes mellitus (GDM) with an AUROC of 0.85. This model demonstrated superior performance against traditional risk assessment tools while identifying new risk factors, including previous pregnancy glucose challenges, which improved early detection of GDM and enabled early-stage interventions [92]. Philpott-Morgan et al. (2021) directed their research toward forecasting missed outpatient appointments within the National Health Service (NHS) by applying gradient boosting machines (GBMs) to examine hospital episode statistics from 2016 through 2018. The model they developed pinpointed age and previous missed appointments as essential predictors while delivering 28.7% sensitivity and stressed the importance of targeted interventions such as personalized reminders to reduce missed appointments [93]. The study by Liu et al. (2021) showcased how ML improved bipolar disorder screening through the EarlyDetect tool, which achieved an 80.6% balanced accuracy rate and outperformed the traditional Mood Disorder Questionnaire in terms of sensitivity and specificity. The research findings demonstrated that ML methods could substantially improve mental health screening accuracy [94]. Guo et al. (2021) investigated how ML models can predict liver cirrhosis patient mortality through EHR data. Researchers used multiple ML techniques such as DNNs, RFs, and LR to evaluate mortality risks across different time periods. The DNN model surpassed traditional MELD-Na scores to reach a 0.88 AUC score for predicting 90-day mortality [95]. In 2021, Estiri and colleagues created the MLHO framework, which predicts COVID-19 patient adverse outcomes through analysis of their medical records. The analysis of 600+ features across 13,000 patient records resulted in high predictive accuracy for the model, which achieved an AUC score of 0.91 for mortality prediction. The research demonstrated the critical necessity of integrating demographic and clinical data while identifying age as a fundamental factor for determining severe outcomes [96]. Zeng and colleagues (2021) created an ensemble ML model designed to estimate hospital mortality rates among intensive care unit (ICU) patients suffering from sepsis. The method used nine distinct ML models, which they trained on the electronic ICU Collaborative Research Database (eICU-CRD) and tested on the Mart for Intensive Care III (MIMIC-III) database. The model displayed enhanced predictive performance beyond traditional severity scores like Simplified Acute Physiology Score II (SAPS II) and Sequential Organ Failure Assessment (SOFA), reaching an AUROC of 0.806 [97]. Shahbandegan and colleagues (2022) created a ML algorithm to forecast CT imaging needs in emergency departments based on triage data from 81,118 patient encounters. The ML model attained an AUROC score of 0.86, enabling better resource allocation and patient flow management while patient complaints and triage acuity determined CT scan decisions [98]. In a 2022 study by Xi et al., the researchers evaluated multiple ML algorithms performance versus LR for cardiovascular disease risk prediction in 143,043 hypertensive patients. RF paired with XGBoost and deep learning made up an ensemble model that achieved a 0.760 AUROC score, which surpassed LR and demonstrated the enhanced predictive potential of ML for cardiovascular disease risk assessment while offering preventive care opportunities [99]. Chen and Chen (2022) investigated the application of synthetic patient data within a Learning Health System (LHS) supported by ML for predicting risks of lung cancer and stroke. The study showed that recall in lung cancer risk prediction rose from 0.849 to 0.936 as the size of the dataset grew. The research highlighted that healthcare delivery can be enhanced through a progressive model improvement process that involves adding new patient data [100]. Lazzarini et al. (2022) developed a ML model to predict Acute Respiratory Distress Syndrome (ARDS) progression in COVID-19 patients by training it on data from 289,351 individuals. The LightGBM model demonstrated superior performance to clinical predictions with an AUC score of 0.695 while also identifying age, diabetes, and hypertension as significant risk factors [101]. Liao et al. (2022) used ML techniques to determine which chronic stroke patients would experience enhanced health-related quality of life (HRQOL) following rehabilitation interventions. Through their evaluation of various algorithms, they found RF reached an accuracy of 85% and KNNs reached 82.5%, using baseline HRQOL scores and muscle function as predictors [102]. Uddin and colleagues (2022) utilized ML techniques, including XGBoost, to forecast the simultaneous occurrence of major chronic diseases in their examination of chronic disease comorbidity and multimorbidity. Their model obtained an accuracy rate of 95.05% by identifying key indicators such as patient trajectory episode counts and patient network transitivity [103]. The Feature Engineering Automation Tool (FEAT) presented by La Cava et al. (2023) enables the creation of clinical prediction models that interpret EHR data while ensuring accuracy. The study showed that FEAT models analyzed data from 1,200 patients with different types of hypertension and proved to be more compact and precise than traditional approaches, which enhances their clinical application potential due to better interpretability and scalability [104]. The approach taken by Langenberger et al. (2023) involved using ML to identify patients with high healthcare costs through the analysis of healthcare claims data. When researchers evaluated multiple algorithms, among which RF and Gradient Boosting were included, they found that tree-based models had significantly better results compared to ANNs and LR by achieving high accuracy as reflected in their AUC values. The research demonstrated how ML techniques can enable healthcare organizations to predict costs and allocate resources more effectively [105]. The study by Pasieczna et al. (2023) examined frailty syndrome in heart failure patients through ML analysis of psychosocial and physical information collected from the Tilburg Frailty Indicator. The study results demonstrated that psychological aspects such as mood and irritability had greater significance than physical aspects when diagnosing frailty. The study indicated that non-physical elements play a significant role in patient treatment and recommended that medical practitioners take psychological health into account when diagnosing frailty [106]. Caratsch and colleagues (2023) introduced a complete ML solution for automated radiographic hand osteoarthritis detection that enables clinicians without programming skills to generate predictive models using no-coding platforms. The system achieved high diagnostic accuracy in rheumatological diseases by combining genetic data with AI algorithms and reached a 92% success rate for knee osteoarthritis severity prediction [107]. The study by Limketkai et al. (2023) implemented ML to classify inflammatory bowel disease (IBD) patients into three categories based on their healthcare usage patterns with an accuracy range of 81-85%. These models demonstrated superior performance compared to traditional approaches and provided important information for resource distribution and patient care management [108]. The research conducted by Kwak et al. (2023) examined sex-specific cardiovascular risk factors through a RF model that predicted a 10-year development of atherosclerotic cardiovascular disease (ASCVD). The research showed different risk factors for men and women, including a stronger link between total cholesterol and ASCVD risk in men as well as the importance of waist circumference in women. The model obtained an AUC score of 0.733 for male subjects and 0.769 for female subjects, which demonstrated its capability for individualized risk assessment [109]. Liu et al. (2023) used XGBoost integrated with Cox models to discover new post-menopausal breast cancer risk predictors through UK Biobank data analysis. ML methods, including XGBoost, demonstrated high efficiency in feature selection from extensive predictor sets, which improved risk prediction, yet novel feature augmentation showed no substantial improvement in model results [110]. Wang et al. (2024) employed a combined approach that utilized RF and Support Vector Regression alongside traditional statistical forecasting methods to study American healthcare expenditure trends. The study demonstrated the continuous escalation of healthcare costs while recommending policy measures to address the economic burden as a foundation for future research [111]. A ML model was used by Biswas et al. (2024) to examine how patient ethnicity, alongside socio-economic deprivation and existing health conditions, influences hospital stay durations after cervical decompression surgery. The results indicated that socio-economic elements play a critical role in healthcare outcomes in public health facilities and that ML applications could boost both resource management and patient care [112]. The 2024 study by Zhou investigated how ML algorithms such as KNN, recurrent neural network (RNN), CNN, and GAN serve various healthcare fields, including medical imaging and heart disease prediction as well as eye health management. The research indicated that these algorithms improve diagnostic precision and service quality but face significant challenges from data quality and patient privacy concerns [113]. While other studies explored specific ML applications in healthcare domains, Ali et al. (2024) examined multiple ML algorithms to determine their effectiveness in predicting health outcomes and improving healthcare services. The research demonstrated that RF and KNN algorithms stand out as highly effective tools that can optimize healthcare operations while enabling treatment personalization and administrative efficiency [114]. Moradpour et al. (2024) developed a multi-objective optimization framework called MOOF, which aims to improve clinical diagnostic ML model performance through the balance of accuracy, sensitivity, and specificity. The integration of Non-dominated Sorting Genetic Algorithm II (NSGA-II) with Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) optimization techniques in their framework delivered optimal solutions that enhanced ML model precision for healthcare decision-making beyond traditional methods [115]. Bhute et al. (2024) carried out research on disease detection in smart cities using multiple ML algorithms such as Gradient Boost, CatBoost, and KNN to identify heart disease and diabetes at early stages. The study results indicated that CatBoost and Gradient Boost delivered superior outcomes for heart disease detection with 89.1% and 88.6% accuracy ratings, respectively, and KNN performed less effectively for diabetes detection at 77.9% accuracy [116]. Ismukhamedova et al. (2024) combined ML technology with electronic health passports to enhance diabetes detection and optimize resources. The research established GBM as the top-performing model and found that deep learning through RNNs improved diagnostic precision [117]. Lastly, Sevukamoorthy and colleagues (2024) applied ML and GANs for cancer detection to automate risk assessment while identifying biomarkers through healthcare data analysis and imaging scans. The method they employed improved cancer detection accuracy and speed, which supported personalized treatment plans and timely medical interventions [118].
CONCLUSIONS
The paper examines how ML has brought transformative changes to numerous industries with a special emphasis on healthcare and pharmaceuticals. The application of ML methods to biomarker identification has produced high classification rates for conditions like gastric cancer and diabetic kidney disease, with SVM models reaching up to 93% accuracy. ML integration into drug discovery processes enables optimized identification of potential drugs and enhances adverse reaction predictions. The integration of Vaxign-ML and deep learning frameworks has enhanced molecular properties, which shows their promising role in clinical achievements. BioAutoML and similar automated learning systems improve feature selection and model selection processes while delivering better efficiency and predictive accuracy than traditional methods. The study stresses the importance of continuous research to ensure ML technologies fulfill regulatory requirements and uphold ethical medical standards. Upcoming advancements suggest the combination of wearable technology with predictive analytics will enable ongoing patient monitoring alongside customized medical treatments. The ongoing advancements in ML highlight the crucial role of AI ethics and responsible AI principles, which demand standardized guidelines for fair application in healthcare environments. ML advancements are transforming healthcare by developing more effective diagnostic tools and treatment methods while improving patient care strategies in the pharmaceutical and biomedical sectors. Researchers outline the need for additional research and validation of these technologies to both maximize benefits and resolve ethical issues.
REFERENCES
Parankush Koul*, Dr. Indu B. Koul, Advancements in Machine Learning Applications for The Pharmaceutical, Biomedical, And Healthcare Industries, Int. J. of Pharm. Sci., 2025, Vol 3, Issue 4, 1548-1580. https://doi.org/10.5281/zenodo.15204262