Anglia Ruskin Research Online (ARRO)
Browse

Resampling Imbalanced Healthcare Data for Predictive Modelling

Download (6.42 MB)
journal contribution
posted on 2025-03-18, 15:49 authored by Manoj Yadav Mamilla, Ronak Al-Haddad, Stiphen Chowdhury
<p dir="ltr">Imbalanced datasets pose significant challenges in healthcare for developing accurate predictive models in medical diagnostics. In this work, we explore the effectiveness of combining resampling methods with machine learning algorithms to enhance prediction accuracy for imbalanced heart and lung disease datasets. Specifically, we integrate undersampling techniques such as Edited Nearest Neighbours (ENN) and Instance Hardness Threshold (IHT) with oversampling methods like Random Oversampling (RO), Synthetic Minority Oversampling Technique (SMOTE), and Adaptive Synthetic Sampling (ADASYN). These resampling strategies are paired with classifiers including Decision Trees (DT), Random Forests (RF), K-Nearest Neighbours (KNN), and Support Vector Machines (SVM). Model performance is evaluated using accuracy, precision, recall, F1 score, and the Area Under the Curve (AUC). Our results show that tailored resampling significantly boosts machine learning model performance in healthcare settings. Notably, SVM with ENN undersampling markedly improves accuracy for lung cancer predictions, while SVM and RF with IHT achieve higher validation accuracies for both diseases. Random oversampling shows variable effectiveness across datasets, whereas SMOTE and ADASYN consistently enhance accuracy. This study underscores the value of integrating strategic resampling with machine learning to improve predictive reliability for imbalanced healthcare data.</p>

History

Item sub-type

Article

Refereed

  • Yes

Volume

16

Issue number

2

Publication title

International Journal of Advanced Computer Science and Applications

Location

United Kingdom

File version

  • Published version

Affiliated with

  • School of Computing and Information Science Outputs

Usage metrics

    ARU Outputs

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC