Rough K-means Outlier Factor Based on Entropy Computation

Djoko Budiyanto Setyohadi; Azuraliza Abu Bakar; Zulaiha Ali Othman

doi:10.19026/rjaset.8.986

Research Journal of Applied Sciences, Engineering and Technology

Research Article | OPEN ACCESS

Rough K-means Outlier Factor Based on Entropy Computation

Djoko Budiyanto Setyohadi, Azuraliza Abu Bakar and Zulaiha Ali Othman

Data Mining and Optimization Research Group, Center for Artificial Intelligence Technologi, Faculty of Information Science and Technologi, Universiti Kebangsaan Malaysia, Bangi, Selangor DarulEhsan, 43000, Malaysia

Research Journal of Applied Sciences, Engineering and Technology 2014 3:398-409

http://dx.doi.org/10.19026/rjaset.8.986 | © The Author(s) 2014

Received: March ‎29, ‎2014 | Accepted: April ‎28, ‎2014 | Published: July 15, 2014

Back to issue | PDF | HTML

Abstract

Many studies of outlier detection have been developed based on the cluster-based outlier detection approach, since it does not need any prior knowledge of the dataset. However, the previous studies only regard the outlier factor computation with respect to a single point or a small cluster, which reflects its deviates from a common cluster. Furthermore, all objects within outlier cluster are assumed to be similar. The outlier objects intuitively can be grouped into the outlier clusters and the outlier factors of each object within the outlier cluster should be different gradually. It is not natural if the outlierness of each object within outlier cluster is similar. This study proposes the new outlier detection method based on the hybrid of the Rough K-Means clustering algorithm and the entropy computation. We introduce the outlier degree measure namely the entropy outlier factor for the cluster based outlier detection. The proposed algorithm sequentially finds the outlier cluster and calculates the outlier factor degree of the objects within outlier cluster. Each object within outlier cluster is evaluated using entropy cluster-based to a whole cluster. The performance of the algorithm has been tested on four UCI benchmark data sets and show outperform especially in detection rate.

Keywords:

Entropy outlier , outlier detection , rough k-means,

References

Breunig, M.M., H.P. Kriegel, R.T. Ng and J. Sander, 2000. LOF: Identifying density based local outliers. Proceedings of the ACM SIGMOD International Conference on Management of Data, pp: 93-104.
CrossRef
Chandola, V., A. Banerjee and V. Kumar, 2009. Anomaly detection: A survey. ACM Comput. Surv., 41(3), Article 15.
CrossRef
Chen, Y., D. Miao and R. Wang, 2008. Outlier Detection Based on Granular Computing. Springer, Heidelberg.
CrossRef
Duan, L., L. Xu, Y. Liu and J. Lee, 2009. Cluster-based outlier detection. Ann. Oper. Res., 168: 151-168.
CrossRef
Harkins, S., H. He, G.J. Willams and R.A. Baster, 2002. Outlier detection using replicator neural networks. Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery. Aix-en-Provence, France, pp: 170-180.
CrossRef
Hawkins, D.M., 1980. Identifications of Outliers, Monograph on Applied Probability and Statistic. Chapman and Hall, London.
CrossRef
He, Z., X. Xu and S. Deng, 2003. Discovering cluster based local outliers. Pattern Recogn. Lett., 24(9-10): 1641-1650.
CrossRef
He, Z., S. Deng and X. Xu, 2005. An optimization model for outlier detection in categorical data. Proceeding of the International Conference on Intelligent Computing, pp: 400-495.
CrossRef
He, Z., J.Z. Huang, X. Xu and D. Shengchun, 2004. A Frequent Pattern Discovery Method for Outlier Detection. In: Springer Link (Ed.), Lecture Notes Computer Science. Springer, Berlin/Heidelberg, pp: 726-732.
CrossRef
He, Z., S. Deng, X. Xu and J.Z. Huang, 2006. A fast greedy algorithm for outlier mining. Proceeding of 10th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining (PAKDD, 2006), pp: 567-576.
CrossRef
Hodge, V.J. and J. Austin, 2004. A survey of outlier detection methodologies. Artif. Intell. Rev., 22: 85-126.
CrossRef
Jiang, M.F., S.S. Tseng and C.M. Su, 2001. Two-phase clustering process for outliers detection. Pattern Recogn. Lett., 22(6-7): 691-70.
CrossRef
Jiang, F., Y.F. Sui and C.G. Cao, 2005. Outlier Detection Using Rough Set Theory. In: Slezak, D., J. Yao, J.F. Peters, W. Ziarko and X. Hu (Eds.), RSFDGrC 2005. LNCS (LNAI), Springer, Heidelberg, 3642: 79-87.
CrossRef
Jiang, F., Y. Sui and C. Cao, 2006. Outlier Detection Based on Rough Membership Function. In: Greco S. et al. (Eds.), RSCTC 2006. LNAI 4259, Springer-Verlag, Berlin, Heidelberg, pp: 388-397.
CrossRef
Lingras, P. and C. West, 2004. Interval set clustering of Web users with rough k-means. J. Intell. Inform. Syst., 23: 5-16.
CrossRef
Mahoney, M.V. and P.K. Chan, 2003. Learning rules for anomaly detection of hostile network traffic. Proceeding of 3rd IEEE International Conference on Data Mining (ICDM, 2003), pp: 601-604.
CrossRef
Nguyen, T.T., 2007. Outlier Detection: An Approximate Reasoning Approach. In: Kryszkiwicz, M. et al. (Eds.), RSEISP 2007. LNAI 4585, Springer-Verlag, Berlin, Heidelberg, pp: 495-504.
CrossRef
Pawlak, Z., J. Grzymala-Busse, R. Slowinski and W. Ziarko, 1995. Rough sets. Commun. ACM, 38(11): 89-95.
CrossRef
Pires, A. and C.M. Santos-Pereira, 2005. Using clustering and robust estimators to detect outliers in multivariate data. Proceedings of the International Conference on Robust Statistics.
Ramaswamy, S., R. Rastogi and K. Shim, 2000. Efficient algorithms for mining outliers from large data sets. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD'00), pp: 427-438.
CrossRef PMid:10870986
Shaari, F., A.A. Bakar and A.R. Hamdan, 2009. Outlier detection based on rough sets theory. Intell. Data Anal., 13(2): 191-206.
Yao, J. and M.M. Dash, 2000. Entropy-based fuzzy clustering and modeling. Fuzzy Set. Syst., 3: 282-188.
CrossRef
Zhang, K., M. Hutter and H. Jin, 2009. A new local distance-based outlier detection approach for scattered real-world data. Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining (PAKDD, 2009), pp: 813-822.
CrossRef

Competing interests

The authors have no competing interests.

Open Access Policy

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Copyright

The authors have no competing interests.

ISSN (Online): 2040-7467
ISSN (Print): 2040-7459

Information

Sales & Services



Journal Home \| Aim & Scope \| Author(s) Information \| Editorial Board \| MSP Download Statistics