Research Article | OPEN ACCESS
Standardization and Its Effects on K-Means Clustering Algorithm
Ismail Bin Mohamad and Dauda Usman
Department of Mathematical Sciences, Faculty of Science, Universiti Teknologi Malaysia, 81310, UTM Johor Bahru, Johor Darul Ta
Research Journal of Applied Sciences, Engineering and Technology 2013 17:3299-3303
Received: January 23, 2013 | Accepted: February 25, 2013 | Published: September 20, 2013
Abstract
Data clustering is an important data exploration technique with many applications in data mining. K-means is one of the most well known methods of data mining that partitions a dataset into groups of patterns, many methods have been proposed to improve the performance of the K-means algorithm. Standardization is the central preprocessing step in data mining, to standardize values of features or attributes from different dynamic range into a specific range. In this paper, we have analyzed the performances of the three standardization methods on conventional K-means algorithm. By comparing the results on infectious diseases datasets, it was found that the result obtained by the z-score standardization method is more effective and efficient than min-max and decimal scaling standardization methods.
Keywords:
Clustering, decimal scaling, k-means, min-max, standardization, z-score,
Competing interests
The authors have no competing interests.
Open Access Policy
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Copyright
The authors have no competing interests.
|
|
|
ISSN (Online): 2040-7467
ISSN (Print): 2040-7459 |
|
Information |
|
|
|
Sales & Services |
|
|
|