Outlier Removal Approach as a Continuous Process in Basic K-Means  Clustering Algorithm

Dauda Usman; Ismail Bin Mohamad

doi:10.19026/rjaset.7.315

Research Journal of Applied Sciences, Engineering and Technology

Research Article | OPEN ACCESS

Outlier Removal Approach as a Continuous Process in Basic K-Means Clustering Algorithm

¹Dauda Usman and ²Ismail Bin Mohamad

^{1, 2}Department of Mathematical Sciences, Faculty of Science, Universiti Teknologi Malaysia, 81310, UTM Johor Bahru, Johor Darul Taâ€™azim, Malaysia

Research Journal of Applied Sciences, Engineering and Technology 2014 4:771-777

http://dx.doi.org/10.19026/rjaset.7.315 | © The Author(s) 2014

Received: April 04, 2013 | Accepted: April 22, 2013 | Published: January 27, 2014

Back to issue | PDF | HTML

Abstract

Clustering technique is used to put similar data items in a same group. K-mean clustering is a commonly used approach in clustering technique which is based on initial centroids selected randomly. However, the existing method does not consider the data preprocessing which is an important task before executing the clustering among the different database. This study proposes a new approach of k-mean clustering algorithm. Experimental analysis shows that the proposed method performs well on infectious disease data set when compare with the conventional k-means clustering method.

Keywords:

Infectious diseases , k-means clustering, principal component analysis, principal components, standardization,

References

Alshalabi, L., Z. Shaaban and B. Kasasbeh. 2006. Data mining: A preprocessing engine. J. Comput. Sci., 2(9): 735-739.
CrossRef
Chris, D. and H. Xiaofeng, 2006. K-means clustering via principal component analysis. Proceeding of the 21st International Conference on Machine Learning. Banff, Canada.
PMCid:PMC1455438
Ding, C. and X.X. He, 2004. K-means clustering via principal component analysis. Proceeding of the 21st International Conference on Machine Learning. ACM Press, New York.
CrossRef
Eckart, C. and G. Young, 1936. The approximation of one matrix by another of lower rank. Psychometrika, 1: 211-218.
CrossRef
Hartigan, J. and M. Wang, 1979. A K-means clustering algorithm. Appl. Stat., 28:100-108.
CrossRef
Hastie, T., R. Tibshirani and J. Friedman, 2001. Elements of Statistical Learning. Springer Verlag, New York.
CrossRef
Jain, A. and R. Dubes, 1988. Algorithms for Clustering Data. Prentice Hall, New York.
Jolliffe, I., 2002. Principal Component Analysis. 2nd Edn., Springer Series in Statistics. Springer-Verlag, New York.
Karthikeyani, V.N. and K. Thangavel, 2009. Impact of normalization in distributed k-means clustering. Int. J. Soft Comput., 4(4): 168-172.
Milligan, G. and M. Cooper, 1988. A study of standardization of variables in cluster analysis. J. Classif., 5: 181-204.
CrossRef
Ng, A., M. Jordan and Y. Weiss, 2001. On spectral clustering: Analysis and an algorithm. Proceeding of the Neural Information Processing Systems (NIPS 2001).
Valarmathie, P., M. Srinath and K. Dinakaran, 2009. An increased performance of clustering high dimensional data through dimensionality reduction technique. J. Theor. Appl. Inform. Technol., 13: 271-273.
Yan, J., B. Zhang, N. Liu, S. Yan, Q. Cheng, W. Fan, Q. Yang, W. Xi and Z. Chen, 2006. Effective and efficient dimensionality reduction for large scale and streaming data preprocessing. IEEE T. Knowl. Data Eng., 18(3): 320-333.
CrossRef
Zha, H., C. Ding, M. Gu, X. He and H. Simon, 2002. Spectral relaxation for K-means clustering. Neu. Inf. Pro. Syst., 14: 1057-1064.

Competing interests

The authors have no competing interests.

Open Access Policy

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Copyright

The authors have no competing interests.

ISSN (Online): 2040-7467
ISSN (Print): 2040-7459

Information

Sales & Services



Journal Home \| Aim & Scope \| Author(s) Information \| Editorial Board \| MSP Download Statistics