Research Article | OPEN ACCESS
Rough K-means Outlier Factor Based on Entropy Computation
Djoko Budiyanto Setyohadi, Azuraliza Abu Bakar and Zulaiha Ali Othman
Data Mining and Optimization Research Group, Center for Artificial Intelligence Technologi,
Faculty of Information Science and Technologi, Universiti Kebangsaan Malaysia, Bangi,
Selangor DarulEhsan, 43000, Malaysia
Research Journal of Applied Sciences, Engineering and Technology 2014 3:398-409
Received: March 29, 2014 | Accepted: April 28, 2014 | Published: July 15, 2014
Abstract
Many studies of outlier detection have been developed based on the cluster-based outlier detection approach, since it does not need any prior knowledge of the dataset. However, the previous studies only regard the outlier factor computation with respect to a single point or a small cluster, which reflects its deviates from a common cluster. Furthermore, all objects within outlier cluster are assumed to be similar. The outlier objects intuitively can be grouped into the outlier clusters and the outlier factors of each object within the outlier cluster should be different gradually. It is not natural if the outlierness of each object within outlier cluster is similar. This study proposes the new outlier detection method based on the hybrid of the Rough K-Means clustering algorithm and the entropy computation. We introduce the outlier degree measure namely the entropy outlier factor for the cluster based outlier detection. The proposed algorithm sequentially finds the outlier cluster and calculates the outlier factor degree of the objects within outlier cluster. Each object within outlier cluster is evaluated using entropy cluster-based to a whole cluster. The performance of the algorithm has been tested on four UCI benchmark data sets and show outperform especially in detection rate.
Keywords:
Entropy outlier , outlier detection , rough k-means,
References
-
Breunig, M.M., H.P. Kriegel, R.T. Ng and J. Sander, 2000. LOF: Identifying density based local outliers. Proceedings of the ACM SIGMOD International Conference on Management of Data, pp: 93-104.
CrossRef -
Chandola, V., A. Banerjee and V. Kumar, 2009. Anomaly detection: A survey. ACM Comput. Surv., 41(3), Article 15.
CrossRef -
Chen, Y., D. Miao and R. Wang, 2008. Outlier Detection Based on Granular Computing. Springer, Heidelberg.
CrossRef -
Duan, L., L. Xu, Y. Liu and J. Lee, 2009. Cluster-based outlier detection. Ann. Oper. Res., 168: 151-168.
CrossRef -
Harkins, S., H. He, G.J. Willams and R.A. Baster, 2002. Outlier detection using replicator neural networks. Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery. Aix-en-Provence, France, pp: 170-180.
CrossRef -
Hawkins, D.M., 1980. Identifications of Outliers, Monograph on Applied Probability and Statistic. Chapman and Hall, London.
CrossRef -
He, Z., X. Xu and S. Deng, 2003. Discovering cluster based local outliers. Pattern Recogn. Lett., 24(9-10): 1641-1650.
CrossRef -
He, Z., S. Deng and X. Xu, 2005. An optimization model for outlier detection in categorical data. Proceeding of the International Conference on Intelligent Computing, pp: 400-495.
CrossRef -
He, Z., J.Z. Huang, X. Xu and D. Shengchun, 2004. A Frequent Pattern Discovery Method for Outlier Detection. In: Springer Link (Ed.), Lecture Notes Computer Science. Springer, Berlin/Heidelberg, pp: 726-732.
CrossRef -
He, Z., S. Deng, X. Xu and J.Z. Huang, 2006. A fast greedy algorithm for outlier mining. Proceeding of 10th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining (PAKDD, 2006), pp: 567-576.
CrossRef -
Hodge, V.J. and J. Austin, 2004. A survey of outlier detection methodologies. Artif. Intell. Rev., 22: 85-126.
CrossRef -
Jiang, M.F., S.S. Tseng and C.M. Su, 2001. Two-phase clustering process for outliers detection. Pattern Recogn. Lett., 22(6-7): 691-70.
CrossRef -
Jiang, F., Y.F. Sui and C.G. Cao, 2005. Outlier Detection Using Rough Set Theory. In: Slezak, D., J. Yao, J.F. Peters, W. Ziarko and X. Hu (Eds.), RSFDGrC 2005. LNCS (LNAI), Springer, Heidelberg, 3642: 79-87.
CrossRef -
Jiang, F., Y. Sui and C. Cao, 2006. Outlier Detection Based on Rough Membership Function. In: Greco S. et al. (Eds.), RSCTC 2006. LNAI 4259, Springer-Verlag, Berlin, Heidelberg, pp: 388-397.
CrossRef -
Lingras, P. and C. West, 2004. Interval set clustering of Web users with rough k-means. J. Intell. Inform. Syst., 23: 5-16.
CrossRef -
Mahoney, M.V. and P.K. Chan, 2003. Learning rules for anomaly detection of hostile network traffic. Proceeding of 3rd IEEE International Conference on Data Mining (ICDM, 2003), pp: 601-604.
CrossRef -
Nguyen, T.T., 2007. Outlier Detection: An Approximate Reasoning Approach. In: Kryszkiwicz, M. et al. (Eds.), RSEISP 2007. LNAI 4585, Springer-Verlag, Berlin, Heidelberg, pp: 495-504.
CrossRef -
Pawlak, Z., J. Grzymala-Busse, R. Slowinski and W. Ziarko, 1995. Rough sets. Commun. ACM, 38(11): 89-95.
CrossRef -
Pires, A. and C.M. Santos-Pereira, 2005. Using clustering and robust estimators to detect outliers in multivariate data. Proceedings of the International Conference on Robust Statistics.
-
Ramaswamy, S., R. Rastogi and K. Shim, 2000. Efficient algorithms for mining outliers from large data sets. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD'00), pp: 427-438.
CrossRef PMid:10870986 -
Shaari, F., A.A. Bakar and A.R. Hamdan, 2009. Outlier detection based on rough sets theory. Intell. Data Anal., 13(2): 191-206.
-
Yao, J. and M.M. Dash, 2000. Entropy-based fuzzy clustering and modeling. Fuzzy Set. Syst., 3: 282-188.
CrossRef -
Zhang, K., M. Hutter and H. Jin, 2009. A new local distance-based outlier detection approach for scattered real-world data. Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining (PAKDD, 2009), pp: 813-822.
CrossRef
Competing interests
The authors have no competing interests.
Open Access Policy
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Copyright
The authors have no competing interests.
|
|
|
ISSN (Online): 2040-7467
ISSN (Print): 2040-7459 |
|
Information |
|
|
|
Sales & Services |
|
|
|