نبذة مختصرة : Background: Diabetes mellitus is one of the most significant health challenges currently faced by people especially in the United States of America because of hyperglycemia. Despite recent research on predicting the incidence of the disease, there is still a need for a more efficient and robust approach to accurately predict diabetes, to provide immediate treatment at the early stage. Methods: This study investigates the early detection and management of diabetes by applying machine learning techniques to electronic health records. The research explores the effectiveness of three supervised machine learning algorithms: logistic regression, Random Forest, and k-nearest neighbors (KNN), in developing predictive models for diabetes. The goal is to identify the most significant features contributing to the disease and to determine which model offers the best performance. Findings: The KNN model emerged as the top performer among the tested algorithms. It achieved an accuracy of 96.09 %, a sensitivity of 98.54 %, and a specificity of 93.63 %. These results indicate that the KNN model with a mean test error of 0.0391 is the most reliable for predicting diabetes within the studied dataset. Interpretations: The high sensitivity and specificity suggest that the KNN model is well-suited for distinguishing between diabetic and non-diabetic patients, essential for early diagnosis and effective management of the disease in clinical practice. Despite the dataset’s limited demographic scope, the three machine learning algorithms explored, provide useful results.
No Comments.