Open Access Open Access  Restricted Access Subscription or Fee Access

A Comparative Study of Machine Learning Algorithms Applied to Predictive Diabetes Data

K. Sathiyakumari, V. Pream Sudha


Healthcare industry encompasses abundant data, which is increasing everyday. Conversely, tools for analyzing these records are incredibly less. Machine learning provides a lot of techniques for solving diagnostic problems in a variety of medical domains. Intelligent systems are able to learn from machine learning methods, when they are provided with a set of clinical cases as training set. This paper aims at a comparative study of widely used supervised classification algorithms – Naïve Bayes, Multi Layer Perceptrons, Logistic Model Trees, and Nearest Neighbor with Generalized Exemplars applied to predictive diabetes dataset. The machine learning algorithms used in this study are chosen for their representability and diversity. They are evaluated on the basis of their accuracy, learning time and error rates.


Machine Learning, Diabetes Mellitus, Classification, Naive Bayes, Multi Layer Perceptrons, Logistic Model Trees, Nearest Neighbour with Generalized Exemplars, WEKA

Full Text:



lan H. Witten, Eibe Frank, “Data Mining – Practical Machine Learning Tools and Techniques,” 2nd Edition, Elsevier, 2005.

.Han, J. and M. Kamber, “Data Mining: Concepts and Techniques”, San Francisco, Morgan, Kauffmann Publishers. , 2001

Frawley and Piatetsky-Shapiro,.” Knowledge Discovery in Databases: An Overview”, The AAAI/MIT Press, Menlo Park, C.A.1996

Kopelman, P.G. and A.J. Sanderson., “Application of database systems in diabetes care.”, London, 1996

Kelling, D.G. and J.A. Wentworth et al.,. “Diabetes mellitus. Using a database to implement a systematic management program”, NC. Med. J., 1997

Nada Lavrac (1999), “Selected techniques for data mining in medicine”, Artificial Intelligence in Medicine.

W. Moore, D. Zuev, “Internet Traffic Classification Using Bayesian Analysis Techniques”, in Proceedings of ACM SIGMETRICS, Banff, Canada, June 2005.

T. Lim, W. Loh, Y. Shih, “A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-three Old and New Classification Algorithms”, Machine Learning, volume 40, pp. 203-229, Kluwer Academic Publishers, Boston, 2000

R. Bouckaert, “Bayesian Network Classifiers in Weka”,Technical Report, Department of Computer Science,Waikato University, Hamilton, NZ 2005.

Waikato Environment for Knowledge Analysis (WEKA) 3.4.4,

Kohavi R. A Study of Cross Validation and Bootstrap for Accuracy Estimation and Model Selection. International Joint Conference on Artificial Intelligence; Montreal IJCAI. 1995.

Baldi P, Brunak S, Chauvin Y, Andersen CA, Nielsen H. Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics. 2000; 16: 412–424. doi: 10.1093/bioinformatics/16.5.412

Kemal Polat, Salih Gunes, An expert system based on principal component analysis and adaptive neuro fuzzy inference system to diagnosis of diabetes disease, Science Direct, 20th October , 2006


  • There are currently no refbacks.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.