Scholarly open access journals, Peer-reviewed, and Refereed Journals, Impact factor 8.14 (Calculate by google scholar and Semantic Scholar | AI-Powered Research Tool) , Multidisciplinary, Monthly, Indexing in all major database & Metadata, Citation Generator, Digital Object Identifier(DOI)
Stroke remains one of the leading causes of mortality and long-term neurological disability worldwide. Timely and accurate prediction of stroke risk is essential for implementing preventive strategies and optimizing patient care. In this study, a logistic regression model was developed to predict stroke occurrence using a comprehensive dataset of 5,110 adults aged 18 and above, sourced from a publicly available healthcare database. The dataset includes demographic, physiological, and lifestyle features such as age, gender, hypertension, heart disease, marital status, work type, residence, smoking status, body mass index (BMI), and average blood glucose level. The data underwent preprocessing, including imputation of missing values, normalization of continuous variables, and encoding of categorical features. Logistic regression was selected for its simplicity, interpretability, and effectiveness in binary clinical classification tasks. The model’s performance was evaluated using accuracy, precision, recall, F1-score, and the area under the receiver operating characteristic curve (ROC-AUC).The model achieved an overall accuracy of 74.64% and a high recall of 85% for stroke prediction, indicating strong sensitivity in identifying true positive cases. Although the precision was relatively low due to class imbalance, the model effectively minimized false negatives an essential consideration in medical diagnostics. These results support logistic regression as a reliable baseline for early stroke risk detection and suggest that performance can be further enhanced through techniques such as resampling or ensemble learning. This study contributes to the development of accessible, data-driven tools for proactive stroke prevention and clinical decision support.
Keywords:
Stroke prediction, logistic regression, machine learning, health data analytics, imbalanced dataset, risk classification, medical diagnosis, recall, confusion matrix, preventive healthcare.
Cite Article:
"EARLY STROKE RISK PREDICTION USING LOGISTICS REGRESSION :A DATA-MACHINE LEARNING APPROACH", International Journal for Research Trends and Innovation (www.ijrti.org), ISSN:2455-2631, Vol.10, Issue 7, page no.a729-a735, July-2025, Available :http://www.ijrti.org/papers/IJRTI2507081.pdf
Downloads:
000381
ISSN:
2456-3315 | IMPACT FACTOR: 8.14 Calculated By Google Scholar| ESTD YEAR: 2016
An International Scholarly Open Access Journal, Peer-Reviewed, Refereed Journal Impact Factor 8.14 Calculate by Google Scholar and Semantic Scholar | AI-Powered Research Tool, Multidisciplinary, Monthly, Multilanguage Journal Indexing in All Major Database & Metadata, Citation Generator