Brain Stroke Prediction Model Based SMOTE and Machine Learning Algorithms

Alhussain Waad Mohammed, Hwraa Kareem Hmoud, Ahmed Mohmed Abd Alzahra, Ahmad Muneathir Sukar, Alaa Khalaf Hamoud


A brain stroke is a critical medical emergency condition that causes disability and death. The pre-diagnosis of this case can reduce the complications and problems that affect the brain as a result of being affected by the complications that occur during the injury. This study lists an analysis process on a brain stroke dataset using the KNIME tool, which provides a set of different machine learning components such as random forest, Decision Tree Learner, Gradient Boosted Trees Learner, and Logistic Regression algorithms. The problem of imbalanced data will be handled as part of data preprocessing. The factors that affect the brain stroke will be explored based on feature selection approaches such as forward feature selection, backward feature elimination, genetic algorithms, and random. The aim is to build a model that helps doctors diagnose the disease accurately based on the results we obtained from the study and analysis. The results showed that logistic regression outperformed the other algorithms after applying the algorithm with forward feature selection and backward feature elimination.


Brain Stroke; Feature Selection; SMOTE, Decision Tree; Logistic Regression; Random Forest; Gradient Boosted Trees; KNIME.

Full Text:



X. W. Gao, R. Hui, and Z. Tian, “Classification of CT brain images based on deep learning networks,” Comput Methods Programs Biomed, vol. 138, 2017, doi: 10.1016/j.cmpb.2016.10.007.

K. Overgaard, “The effects of citicoline on acute ischemic stroke: A review,” Journal of Stroke and Cerebrovascular Diseases, vol. 23, no. 7. 2014. doi: 10.1016/j.jstrokecerebrovasdis.2014.01.020.

J. G. Merino, “Clinical stroke challenges: A practical approach,” Neurology: Clinical Practice, vol. 4, no. 5. 2014. doi: 10.1212/CPJ.0000000000000082.

O. Ozaltin, O. Coskun, O. Yeniay, and A. Subasi, “A Deep Learning Approach for Detecting Stroke from Brain CT Images Using OzNet,” Bioengineering, vol. 9, no. 12, 2022, doi: 10.3390/bioengineering9120783.

S. Yalçın and H. Vural, “Brain stroke classification and segmentation using encoder-decoder based deep convolutional neural networks,” Comput Biol Med, vol. 149, 2022, doi: 10.1016/j.compbiomed.2022.105941.

M. S. Sirsat, E. Fermé, and J. Câmara, “Machine Learning for Brain Stroke: A Review,” Journal of Stroke and Cerebrovascular Diseases, vol. 29, no. 10. 2020. doi: 10.1016/j.jstrokecerebrovasdis.2020.105162.

A. Väänänen, K. Haataja, K. Vehviläinen-Julkunen, and P. Toivanen, “AI in healthcare: A narrative review,” F1000Res, vol. 10, 2021, doi: 10.12688/f1000research.26997.2.

P. Apell and H. Eriksson, “Artificial intelligence (AI) healthcare technology innovations: the current state and challenges from a life science industry perspective,” Technol Anal Strateg Manag, vol. 35, no. 2, 2023, doi: 10.1080/09537325.2021.1971188.

J. Yu, S. Park, C. M. B. Ho, S. H. Kwon, K. H. cho, and Y. S. Lee, “AI-based stroke prediction system using body motion biosignals during walking,” Journal of Supercomputing, vol. 78, no. 6, 2022, doi: 10.1007/s11227-021-04209-1.

J. Yu, S. Park, S. H. Kwon, K. H. Cho, and H. Lee, “AI-Based Stroke Disease Prediction System Using ECG and PPG Bio-Signals,” IEEE Access, vol. 10, 2022, doi: 10.1109/ACCESS.2022.3169284.

J. Yu, S. Park, S. H. Kwon, C. M. B. Ho, C. S. Pyo, and H. Lee, “AI-based stroke disease prediction system using real-time electromyography signals,” Applied Sciences (Switzerland), vol. 10, no. 19, 2020, doi: 10.3390/app10196791.

F. Farrokhi et al., “Investigating Risk Factors and Predicting Complications in Deep Brain Stimulation Surgery with Machine Learning Algorithms,” World Neurosurg, vol. 134, 2020, doi: 10.1016/j.wneu.2019.10.063.

S. Mushtaq and K. S. Saini, “A Review on Predicting Brain Stroke using Machine Learning,” in Proceedings of the 17th INDIACom; 2023 10th International Conference on Computing for Sustainable Global Development, INDIACom 2023, 2023.

A. Cerasa et al., “Predicting Outcome in Patients with Brain Injury: Differences between Machine Learning versus Conventional Statistics,” Biomedicines, vol. 10, no. 9. 2022. doi: 10.3390/biomedicines10092267.

Q. Hang, J. Yang, and L. Xing, “Diagnosis of rolling bearing based on classification for high dimensional unbalanced data,” IEEE Access, vol. 7, 2019, doi: 10.1109/ACCESS.2019.2919406.

X. Li and L. Zhang, “Unbalanced data processing using deep sparse learning technique,” Future Generation Computer Systems, vol. 125, 2021, doi: 10.1016/j.future.2021.05.034.

A. S. Hussein, T. Li, C. W. Yohannese, and K. Bashir, “A-SMOTE: A new preprocessing approach for highly imbalanced datasets by improving SMOTE,” International Journal of Computational Intelligence Systems, vol. 12, no. 2, 2019, doi: 10.2991/ijcis.d.191114.002.

V. Padimi, V. S. Telu, and D. D. Ningombam, “Performance analysis and comparison of various machine learning algorithms for early stroke prediction,” ETRI Journal, 2022, doi: 10.4218/etrij.2022-0271.

C. Kokkotis et al., “An Explainable Machine Learning Pipeline for Stroke Prediction on Imbalanced Data,” Diagnostics, vol. 12, no. 10, 2022, doi: 10.3390/diagnostics12102392.

R. Pitchai et al., “An Artificial Intelligence-Based Bio-Medical Stroke Prediction and Analytical System Using a Machine Learning Approach,” Comput Intell Neurosci, vol. 2022, 2022, doi: 10.1155/2022/5489084.

T. Tazin, M. N. Alam, N. N. Dola, M. S. Bari, S. Bourouis, and M. Monirujjaman Khan, “Stroke Disease Detection and Prediction Using Robust Learning Approaches,” J Healthc Eng, vol. 2021, 2021, doi: 10.1155/2021/7633381.

G. Sailasya and G. L. A. Kumari, “Analyzing the Performance of Stroke Prediction using ML Classification Algorithms,” International Journal of Advanced Computer Science and Applications, vol. 12, no. 6, 2021, doi: 10.14569/IJACSA.2021.0120662.

A. Fillbrunn, C. Dietz, J. Pfeuffer, R. Rahn, G. A. Landrum, and M. R. Berthold, “KNIME for reproducible cross-domain analysis of life science data,” Journal of Biotechnology, vol. 261. 2017. doi: 10.1016/j.jbiotec.2017.07.028.

KNIME AG, “KNIME Analytics Platform | KNIME,” Knime. 2019.

S. Alija, E. Beqiri, A. S. Gaafar, and A. K. Hamoud, “Predicting Students Performance Using Supervised Machine Learning Based on Imbalanced Dataset and Wrapper Feature Selection,” Informatica, vol. 47, no. 1, 2023.

M. B. M. Kamel et al., “A Comparative Study of Supervised/Unsupervised Machine Learning Algorithms with Feature Selection Approaches to Predict Student Performance,” International Journal of Data Mining, Modelling and Management, vol. 15, no. 4, 2023, doi: 10.1504/ijdmmm.2023.10055032.

N. V Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” Journal of artificial intelligence research, vol. 16, pp. 321–357, 2002.

R. K. Tripathi, L. Raja, A. Kumar, P. Dadheech, A. Kumar, and M. N. Nachappa, “A Cluster Based Classification for Imbalanced Data Using SMOTE,” in IOP Conference Series: Materials Science and Engineering, 2021, p. 12080.

L. Breiman, “Random Forests,” Mach Learn, vol. 45, no. 1, pp. 5–32, 2001, doi: 10.1023/A:1010933404324.

J. Bai, Y. Li, J. Li, X. Yang, Y. Jiang, and S. T. Xia, “Multinomial random forest,” Pattern Recognit, vol. 122, 2022, doi: 10.1016/j.patcog.2021.108331.

“Pattern Recognition and Machine Learning,” J Electron Imaging, vol. 16, no. 4, 2007, doi: 10.1117/1.2819119.

D. W. Hosmer Jr, S. Lemeshow, and R. X. Sturdivant, Applied logistic regression, vol. 398. John Wiley & Sons, 2013.

N. Landwehr, M. Hall, and E. Frank, “Logistic Model Trees,” Mach Learn, vol. 59, no. 1, pp. 161–205, 2005, doi: 10.1007/s10994-005-0466-3.

S. Sperandei, “Understanding logistic regression analysis,” Biochem Med (Zagreb), vol. 24, no. 1, 2014, doi: 10.11613/BM.2014.003.

Jillani Soft Tech, “Brain Stroke Dataset,”

A. Desiani, S. Yahdin, A. Kartikasari, and I. Irmeilyana, “Handling the imbalanced data with missing value elimination SMOTE in the classification of the relevance education background with graduates employment,” IAES International Journal of Artificial Intelligence, vol. 10, no. 2, p. 346, 2021.

A. M. Sowjanya and O. Mrudula, “Effective treatment of imbalanced datasets in health care using modified SMOTE coupled with stacked deep learning algorithms,” Applied Nanoscience (Switzerland), vol. 13, no. 3, 2023, doi: 10.1007/s13204-021-02063-4.



  • There are currently no refbacks.

Copyright (c) 2024 Alhussain Waad Mohammed, Hwraa Kareem Hmoud, Ahmed Mohmed Abd Alzahra, Ahmad Muneathir Sukar, Alaa Khalaf Hamoud

ISSN 2233 -1859

Digital Object Identifier DOI: 10.21533/scjournal

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License