Volume 18, Special Issue on Information Retrieval and Web Search, 2021

An Hybrid Ensemble Machine Learning Approach to Predict Type 2 Diabetes Mellitus


G. Geetha and K. Mohana Prasad

Abstract

Diabetic Mellitus is one of the chronic diseases that affect many people around the globe. The severity of the disease and risk can be greatly reduced if it is predicted in the early stage. The main objective of the proposed model (T2DDP) is to predict type 2 diabetes mellitus and alert the patients well in advance to reduce the risk factor and severity associated with diabetes diseases. We have used supervised classification algorithms such as Naïve Bayes and ensemble algorithms like bagging with random forest and Adaboost for decision tree. The ensemble algorithm is mainly used to improve the performance by combining two or more models; this will aggregate the results of the entire model which can greatly enhance the accuracy and precision of predictions. Through this post, we are trying to build hybrid models that doctors can effectively use to treat diabetic patients. This method helps physicians to quickly group, identify and classify the disease type and manage it accordingly. Finally, the findings of the forecast will be submitted to the patient's cell phone at an early stage to make the immediate decisions about the health risk. We also separated the data set into 1) training set and 2) evaluation set. The Pima Indian dataset was used to evaluate and interpret results, which involves n number of variables of medical predictors and one variable. Initially, our suggested approach is used to detect outer data, if applicable, using the Gaussian distribution method. After outlier detection, the missing values are filled out by taking the mean of the data rather than eliminating. We then split the dataset into different ratios of the training set and testing test to perform analysis on them: 85/15, 80/20, 70/30, 60/40. Naïve Bayes, bagging with random forest and Adaboost for decision tree is tested with a k10-fold cross-validation model for accuracy, precision, recall, and f1-score measures. Finally, we combined the predictions results of all the classifier models using stacking ensemble machine learning algorithms to increase the accuracy of the prediction.


Pages: 311-331

DOI: 10.14704/WEB/V18SI02/WEB18074

Keywords: Diabetics, Machine Learning, Ensemble, Data Mining, Classification.

Full Text