Back to Search View Original Cite This Article

Abstract

<jats:p>Diabetes is a non-communicable disease affecting people of all ages worldwide, therefore, early detection using machine learning techniques is crucial. This study aims to predict diabetes using multiple machine learning algorithms, performance metrics, and holdout validation on an Egyptian dataset. The dataset was divided into four age groups, including paediatric, early adulthood, middle age, and geriatric. Ten algorithms were applied and validated using 80:20, 70:30, and 60:40 split ratios with accuracy, precision, and recall as evaluation metrics. Results showed that Random Forest, Extra Trees, and Support Vector Machine performed best in the paediatric group, while Gradient Boosting, Random Forest, and Support Vector Machine achieved superior performance in early adulthood, middle age, and geriatric groups. In contrast, Decision Tree, K-Nearest Neighbors, and AdaBoost consistently demonstrated lower performance. Further analysis reveals that classification performance varies significantly across age groups, with the middle age and geriatric groups achieving the highest accuracy above 0.99, followed by the paediatric group 0.98–0.99, while early adulthood exhibits comparatively lower performance due to increased class overlap. Confusion matrix results indicate strong diagonal dominance in higher-performing groups, reflecting better class separability, whereas performance heatmaps confirm that top models maintain a balanced trade-off between accuracy, precision, and recall with minimal variation across different data splits. Feature importance analysis shows that higher performing models rely on a small number of dominant predictors, particularly in the middle age and geriatric groups, while more distributed feature contributions in early adulthood reduce predictive effectiveness. Therefore, the findings demonstrate that ensemble methods provide robust and consistent performance, and that age-based dataset segmentation enhances classification accuracy and model stability.</jats:p>

Show More

Keywords

performance groups early machine adulthood

Related Articles

PORE

About

Connect