We use cookies to make sure that our website works properly, as well as some ‘optional’ cookies to personalise content and advertising, provide social media features and analyse how people use our site. Further information can be found in our Cookies policy
Social media provides information about patients' health issues, including medication side effects and unsuccessful treatments. Social media patient reports of adverse drug events (ADEs) have the potential to significantly enhance pharmacovigilance procedures as they are today. In health informatics, obtaining these reports is still difficult, though. In this study, we develop a research framework with advanced natural language processing techniques for integrated and high-performance ADE extraction. The framework consists of medical entity extraction, ADE detection using shortest dependency path kernel-based statistical learning, semantic filtering using medical knowledge bases, and report source classification to reduce noise. Experiments were conducted using posts from major U.S.-based diabetes and heart disease forums. The results show each component significantly improves overall effectiveness. Our framework significantly outperforms previous methods.
Keywords
Medical Knowledge base, Semantic Filtering, Medical entity Extraction
Introduction
In recent years, more patients have shared their healthcare experiences online—creating a "cloud of patient experience." Social platforms such as blogs and forums allow patients to share diagnoses, treatments, medications, and side effects, especially for chronic conditions like hypertension, diabetes, and heart disease. Self-reports from patients frequently highlight medical issues and drug side effects overlooked by clinicians. These issues, if unrecorded, can lead to non-compliance and preventable adverse events. Mining such data from social media is a novel approach to capture evidence about drug effectiveness, compliance, and safety—providing insights that are often missed in clinical settings. An adverse drug reaction (ADR) is defined as “a harmful or unpleasant reaction resulting from the use of a medicinal product,” warranting intervention such as treatment modification or drug withdrawal.
Work Related
Pharmacovigilance in Health Social Media
Social media has become a crucial source for pharmacovigilance, where users often report adverse drug reactions that may not be officially documented.
Biomedical Relation Extraction
Extracting biomedical relations (e.g., gene-disease or protein interactions) has been extensively studied. Methods include co-occurrence analysis, rule-based systems, statistical learning, and hybrid approaches.
Research Gaps and Questions
Key issues identified include:
Limited use of advanced statistical learning in social media ADE research.
Over-reliance on co-occurrence analysis, which misses syntactic/semantic context.
Difficulty in distinguishing true patient experiences from noise or third-party narratives.
Research Questions:
How can we create a scalable framework for mining patient-reported ADEs?
Can semantic filtering and statistical learning improve ADE extraction?
How can we isolate true patient-reported ADEs in noisy data?
Fig. 1
Research Method
Data Collection
An automated crawler and extractor were built to collect forum data including post IDs, URLs, authors, dates, and content.
Data Preprocessing
Text cleaning using regular expressions removed URLs, personal data, and excess punctuation. Sentence segmentation was performed using Open NLP.
Fig.2
Adverse Drug Event Extraction
Forum discussions use informal language, requiring a hybrid of machine learning and rule-based methods. We use statistical learning for relation detection and semantic rules to filter drug indications and negated ADEs.
Semantic Filtering
Our algorithm filters out false positives using drug safety databases and negation detection tools, improving precision.
H2: Report source classification improves identification of true ADE reports.
Experiments and Results
Research Test Bed
We gathered data from leading U.S. forums such as:
American Diabetes Association
Diabetes Forums
Med Help’s Heart Disease boards
These platforms support patients managing chronic conditions through community interaction.
• Evaluation Metrics
Standard metrics (Precision, Recall, F1-Score) were used to evaluate the system's performance.
Experiments
Three main tasks:
Medical Entity Extraction
ADE Extraction
Report Source Classification
5-fold cross-validation was used. Each fold trained on 80% of labeled data, tested on 20%.
• ADE Extraction
400 annotated sentences per forum were used to evaluate drug-medical event relations within single sentences.
RESULTS AND DISCUSSION
Fig 3
ADE Extraction
We compared:
Co-occurrence method (CO)
Statistical learning (SL)
SL + Semantic Filtering (SL+SF)
SL+SF performed best across metrics
Fig.4
Hypothesis Testing
We conducted one-tailed t-tests (n=50 samples) using bootstrapping. Results validated all hypotheses, with statistically significant improvements from SL and SL+SF.
Fig. 5
CONCLUSIONS AND CONTRIBUTIONS
Social media offers unfiltered, real-time insights into patient healthcare experiences. Our framework:
Outperforms traditional pharmacovigilance methods
Leverages machine learning and semantic filtering
Identifies true patient reports in noisy data
This contributes to safer, patient-informed healthcare by enhancing adverse drug event detection in informal digital environments.
CONFLICT OF INTEREST
The author declares that there are no conflicts of interest.
REFERENCES
A. R. Miller and C. Tucker, "Active social media management: the case of health care," Information Systems Research, vol. 24, no. 1, pp. 52–70, 2013.
J. J. Mao, A. Chung, A. Benton, S. Hill, L. Ungar, C. E. Leonard, S. Hennessy, and J. H. Holmes, "Online discussion of drug side effects and discontinuation among breast cancer survivors," Pharmacoepidemiology and Drug Safety, vol. 22, no. 3, pp. 256–262, 2013.
E. Basch, "The missing voice of patients in drug-safety reporting," New England Journal of Medicine, vol. 362, no. 10, pp. 865–869, 2010.
M. Hauben and A. Bate, "Decision support methods for the detection of adverse events in post-marketing data," Drug Discovery Today, vol. 14, no. 7, pp. 343–357, 2009.
A. Bate and S. Evans, "Quantitative signal detection using spontaneous ADR reporting," Pharmacoepidemiology and Drug Safety, vol. 18, no. 6, pp. 427–436, 2009.
I. R. Edwards and M. Lindquist, "Social media and networks in pharmacovigilance," Drug Safety, vol. 34, no. 4, pp. 267–271, 2011.
R. Chunara, J. R. Andrews, and J. S. Brownstein, "Social and news media enable estimation of epidemiological patterns early in the 2010 Haitian cholera outbreak," American Journal of Tropical Medicine and Hygiene, vol. 86, no. 1, pp. 39–45, 2012.
R. Harpaz, W. DuMouchel, N. H. Shah, D. Madigan, P. Ryan, and C. Friedman, "Novel data-mining methodologies for adverse drug event discovery and analysis," Clinical Pharmacology & Therapeutics, vol. 91, no. 6, pp. 1010–1021, 2012.
R. Leaman, L. Wojtulewicz, R. Sullivan, A. Skariah, J. Yang, and G. Gonzalez, "Towards internet-age pharmacovigilance: extracting adverse drug reactions from user posts to health-related social networks," in Proc. 2010 Workshop on Biomedical Natural Language Processing, Association for Computational Linguistics, pp. 117–125, 2010.
A. Benton, L. Ungar, S. Hill, S. Hennessy, J. Mao, A. Chung, C. E. Leonard, and J. H. Holmes, "Identifying potential adverse effects using the web: a new approach to medical hypothesis generation," Journal of Biomedical Informatics, vol. 44, no. 6, pp. 989–996, 2011.
A. Nikfarjam and G. H. Gonzalez, "Pattern mining for extraction of mentions of adverse drug reactions from user comments," AMIA Annual Symposium Proceedings, vol. 2011, pp. 1019, 2011.
A. Yates and N. Goharian, "Adrtrace: detecting expected and unexpected adverse drug reactions from user reviews on social media sites," in Advances in Information Retrieval, Springer, pp. 816–819, 2013.
X. Liu and H. Chen, "Azdrugminer: an information extraction system for mining patient-reported adverse drug events in online patient forums," in Smart Health, Springer, pp. 134–150, 2013.
J. Bian, U. Topaloglu, and F. Yu, "Towards large-scale twitter mining for drug-related adverse events," in Proc. 2012 Int. Workshop on Smart Health and Wellbeing, ACM, pp. 25–32, 2012.
A. Sarker and G. Gonzalez, "Portable automatic text classification for adverse drug reaction detection via multi-corpus training," Journal of Biomedical Informatics, vol. 53, pp. 196–207, 2015.
I. Segura-Bedmar, P. Martínez, R. Revert, and J. Moreno-Schneider, "Exploring Spanish health social media for detecting drug effects," BMC Medical Informatics and Decision Making, vol. 15, Suppl. 2, S6, 2015.
B. W. Chee, R. Berlin, and B. Schatz, "Predicting adverse drug events from personal health messages," AMIA Annual Symposium Proceedings, pp. 217, 2011.
H. Wu, H. Fang, S. Stanhope, et al., "Exploiting online discussions to discover unrecognized drug side effects," Methods of Information in Medicine, vol. 52, no. 2, pp. 152–159, 2013.
D. A. Lindberg, B. L. Humphreys, and A. T. McCray, "The unified medical language system," Methods of Information in Medicine, vol. 32, no. 4, pp. 281–291, 1993.
J. Hadzi-Puric and J. Grmusa, "Automatic drug adverse reaction discovery from parenting websites using disproportionality methods," in Proc. 2012 Int. Conf. on Advances in Social Networks Analysis and Mining (ASONAM 2012), IEEE Computer Society, pp. 792–797, 2012.
M. Kuhn, M. Campillos, I. Letunic, L. J. Jensen, and P. Bork, "A side effect resource to capture phenotypic effects of drugs," Molecular Systems Biology, vol. 6, no. 1, p. 343, 2010.
C. C. Yang, H. Yang, L. Jiang, and M. Zhang, "Social media mining for drug safety signal detection," in Proc. 2012 Int. Workshop on Smart Health and Wellbeing, pp. 33–40, 2012.
H. Gurulingappa, L. Toldo, A. M. Rajput, J. A. Kors, A. Taweel, and Y. Tayrouz, "Automatic detection of adverse events to predict drug label changes using text and data mining techniques," Pharmacoepidemiology and Drug Safety, vol. 22, no. 11, pp. 1189–1194, 2013.
Q.-C. Bui, S. Katrenko, and P. M. Sloot, "A hybrid approach to extract protein–protein interactions," Bioinformatics, vol. 27, no. 2, pp. 259–265, 2011.
K. Fundel, R. Küffner, and R. Zimmer, "Relex–relation extraction using dependency parse trees," Bioinformatics, [complete details needed
Reference
A. R. Miller and C. Tucker, "Active social media management: the case of health care," Information Systems Research, vol. 24, no. 1, pp. 52–70, 2013.
J. J. Mao, A. Chung, A. Benton, S. Hill, L. Ungar, C. E. Leonard, S. Hennessy, and J. H. Holmes, "Online discussion of drug side effects and discontinuation among breast cancer survivors," Pharmacoepidemiology and Drug Safety, vol. 22, no. 3, pp. 256–262, 2013.
E. Basch, "The missing voice of patients in drug-safety reporting," New England Journal of Medicine, vol. 362, no. 10, pp. 865–869, 2010.
M. Hauben and A. Bate, "Decision support methods for the detection of adverse events in post-marketing data," Drug Discovery Today, vol. 14, no. 7, pp. 343–357, 2009.
A. Bate and S. Evans, "Quantitative signal detection using spontaneous ADR reporting," Pharmacoepidemiology and Drug Safety, vol. 18, no. 6, pp. 427–436, 2009.
I. R. Edwards and M. Lindquist, "Social media and networks in pharmacovigilance," Drug Safety, vol. 34, no. 4, pp. 267–271, 2011.
R. Chunara, J. R. Andrews, and J. S. Brownstein, "Social and news media enable estimation of epidemiological patterns early in the 2010 Haitian cholera outbreak," American Journal of Tropical Medicine and Hygiene, vol. 86, no. 1, pp. 39–45, 2012.
R. Harpaz, W. DuMouchel, N. H. Shah, D. Madigan, P. Ryan, and C. Friedman, "Novel data-mining methodologies for adverse drug event discovery and analysis," Clinical Pharmacology & Therapeutics, vol. 91, no. 6, pp. 1010–1021, 2012.
R. Leaman, L. Wojtulewicz, R. Sullivan, A. Skariah, J. Yang, and G. Gonzalez, "Towards internet-age pharmacovigilance: extracting adverse drug reactions from user posts to health-related social networks," in Proc. 2010 Workshop on Biomedical Natural Language Processing, Association for Computational Linguistics, pp. 117–125, 2010.
A. Benton, L. Ungar, S. Hill, S. Hennessy, J. Mao, A. Chung, C. E. Leonard, and J. H. Holmes, "Identifying potential adverse effects using the web: a new approach to medical hypothesis generation," Journal of Biomedical Informatics, vol. 44, no. 6, pp. 989–996, 2011.
A. Nikfarjam and G. H. Gonzalez, "Pattern mining for extraction of mentions of adverse drug reactions from user comments," AMIA Annual Symposium Proceedings, vol. 2011, pp. 1019, 2011.
A. Yates and N. Goharian, "Adrtrace: detecting expected and unexpected adverse drug reactions from user reviews on social media sites," in Advances in Information Retrieval, Springer, pp. 816–819, 2013.
X. Liu and H. Chen, "Azdrugminer: an information extraction system for mining patient-reported adverse drug events in online patient forums," in Smart Health, Springer, pp. 134–150, 2013.
J. Bian, U. Topaloglu, and F. Yu, "Towards large-scale twitter mining for drug-related adverse events," in Proc. 2012 Int. Workshop on Smart Health and Wellbeing, ACM, pp. 25–32, 2012.
A. Sarker and G. Gonzalez, "Portable automatic text classification for adverse drug reaction detection via multi-corpus training," Journal of Biomedical Informatics, vol. 53, pp. 196–207, 2015.
I. Segura-Bedmar, P. Martínez, R. Revert, and J. Moreno-Schneider, "Exploring Spanish health social media for detecting drug effects," BMC Medical Informatics and Decision Making, vol. 15, Suppl. 2, S6, 2015.
B. W. Chee, R. Berlin, and B. Schatz, "Predicting adverse drug events from personal health messages," AMIA Annual Symposium Proceedings, pp. 217, 2011.
H. Wu, H. Fang, S. Stanhope, et al., "Exploiting online discussions to discover unrecognized drug side effects," Methods of Information in Medicine, vol. 52, no. 2, pp. 152–159, 2013.
D. A. Lindberg, B. L. Humphreys, and A. T. McCray, "The unified medical language system," Methods of Information in Medicine, vol. 32, no. 4, pp. 281–291, 1993.
J. Hadzi-Puric and J. Grmusa, "Automatic drug adverse reaction discovery from parenting websites using disproportionality methods," in Proc. 2012 Int. Conf. on Advances in Social Networks Analysis and Mining (ASONAM 2012), IEEE Computer Society, pp. 792–797, 2012.
M. Kuhn, M. Campillos, I. Letunic, L. J. Jensen, and P. Bork, "A side effect resource to capture phenotypic effects of drugs," Molecular Systems Biology, vol. 6, no. 1, p. 343, 2010.
C. C. Yang, H. Yang, L. Jiang, and M. Zhang, "Social media mining for drug safety signal detection," in Proc. 2012 Int. Workshop on Smart Health and Wellbeing, pp. 33–40, 2012.
H. Gurulingappa, L. Toldo, A. M. Rajput, J. A. Kors, A. Taweel, and Y. Tayrouz, "Automatic detection of adverse events to predict drug label changes using text and data mining techniques," Pharmacoepidemiology and Drug Safety, vol. 22, no. 11, pp. 1189–1194, 2013.
Q.-C. Bui, S. Katrenko, and P. M. Sloot, "A hybrid approach to extract protein–protein interactions," Bioinformatics, vol. 27, no. 2, pp. 259–265, 2011.
K. Fundel, R. Küffner, and R. Zimmer, "Relex–relation extraction using dependency parse trees," Bioinformatics, [complete details needed
Sagar Saini, Harsh Kumar, Mohit Rana, A Review on Identification and Evaluation of Patient Adverse Drug Event Report, Int. J. of Pharm. Sci., 2025, Vol 3, Issue 6, 5432-5438. https://doi.org/10.5281/zenodo.15758051