Research Article | OPEN ACCESS
A Review of Outlier Prediction Techniques in Data Mining
1S. Kannan and 2K. Somasundaram
1Department of Computer Science and Engineering, Karpagam University, Coimbatore,
Tamilnadu 641021, India
2Department of CSE, Vel Tech High Tech Dr RR and Dr SR Engineering College, Avadi,
Chennai-60062, India
Research Journal of Applied Sciences, Engineering and Technology 2015 9:1021-1028
Received: February 27, 2015 | Accepted: March 25, 2015 | Published: July 25, 2015
Abstract
The main objective of this review is that to predict the outliers in data mining. In general, the data mining is a process of applying various techniques to extract useful patterns or models from the available data. It plays a vital role to choose, explore and model high dimensional data. Outlier detection refers a substantial research problem in the domain of data mining those objectives to uncover objects which exhibit significantly different, exceptional and inconsistent from rest of the data. The outlier potential sources can be noise and errors, events and malicious attack in the network. The main challenges involved in the outlier detection with high complexity, size and different types of datasets, are how to catch similar outliers as a group by using clustering-based approach. The outlier or noise available in the clustered data is accurately removed and retrieves an efficient high dimensional data. Nowadays, the classification and clustering techniques for outlier prediction are applied in various fields like bioinformatics, natural language processing, military application, geographical domains etc. This study surveys various data classification and data clustering techniques in order to identify the optimal techniques, which provides better outlier predicted data detection. Moreover, the comparison between the various classification and clustering techniques for outlier prediction are illustrated.
Keywords:
Data classification, data clustering, data mining, high dimensional data, outlier detection,
References
-
Abdo, A., B. Chen, C. Mueller, N. Salim and P. Willett, 2010. Ligand-based virtual screening using bayesian networks. J. Chem. Inf. Model., 50(6): 1012-1020.
CrossRef PMid:20504032
-
Abdo, A., V. Leclère, P. Jacques, N. Salim and M. Pupin, 2014. Prediction of new bioactive molecules using a Bayesian belief network. J. Chem. Inf. Model., 54(1): 30-36.
CrossRef PMid:24392938
-
Bal, M., M.F. Amasyali, H. Sever, G. Kose and A. Demirhan, 2014. Performance evaluation of the machine learning algorithms used in inference mechanism of a medical decision support system. Sci. World J., 2014(2014): 15, Article ID 137896.
-
Bhosale, S.V., 2014. Holy grail of outlier detection technique: A macro level take on the state of the art. Int. J. Comput. Sci. Inform. Technol., 5(4): 5872-5874.
-
Chandola, V., A. Banerjee and V. Kumar, 2007. Outlier detection: A survey. ACM Comput. Surv., pp: 1-83.
-
Chandore, P. and P. Chatur, 2013. Hybrid approach for outlier detection over wireless sensor network real time data. Int. J. Comput. Sci. Appl., 6(2): 76-81.
-
Dabrowski, J.J. and J.P. De Villiers, 2015. Maritime piracy situation modelling with dynamic bayesian networks. Inform. Fusion, 23: 116-130.
CrossRef
-
Fan, H., O.R. Zaïane, A. Foss and J. Wu, 2006. A nonparametric outlier detection for effectively discovering top-n outliers from engineering data. In: Ng, W.K., M. Kitsuregawa and J. Li (Eds.), PAKDD, 2006. LNAI 3918, Springer-Verlag, Berlin, Heidelberg, pp: 557-566.
CrossRef
-
Gupta, M., J. Gao, C. Aggarwal and J. Han, 2014. Outlier detection for temporal data. Synthesis Lect. Data Mining Knowl. Discov., 5(1): 1-129.
CrossRef
-
Hodge, V.J. and J. Austin, 2004. A survey of outlier detection methodologies. Artificial Intell. Rev., 22(2): 85-126.
CrossRef
-
Huang, J.Z., M.K. Ng, R. Hongqiang and L. Zichen, 2005. Automated variable weighting in k-means type clustering. IEEE T. Pattern Anal., 27(5): 657-668.
CrossRef PMid:15875789
-
Jose, A., S. Ravi and M. Sambath, 2014. Brain tumor segmentation using k-means clustering and fuzzy c-means algorithms and its area calculation. Brain, 2(3): 3496-3501.
-
Koteeswaran, S. and P.V. Janet, 2012. A review on clustering and outlier analysis techniques in data mining. Am. J. Appl. Sci., 9(2): 254-258.
CrossRef
-
Koupaie, H.M., S. Ibrahim and J. Hosseinkhani, 2013. Outlier detection in stream data by clustering method. Int. J. Adv. Comput. Sci. Inform. Technol., 2(3): 25-34.
-
Kumar, M., 2014. Evaluating the existing solution of outlier detection in WSN system. Int. J. Adv. Res. IT Eng., 3(6): 16-25.
-
Lu, S. and S.L. Braunstein, 2014. Quantum decision tree classifier. Quantum Inf. Process., 13(3): 757-770.
CrossRef
-
Luo, L. and L. Li, 2014. Defining and evaluating classification algorithm for high-dimensional data based on latent topics. PloS One, 9(1): 1-9.
CrossRef PMid:24416136 PMCid:PMC3886981
-
Mosavi, A., 2010. Multiple criteria decision-making preprocessing using data mining tools. Int. J. Comput. Sci. Issues (IJCSI), 7(2): 26-34.
-
Peter, T., Z. Michael and U. Stan, 2013. Value-at-risk support vector machine: Stability to outliers. J. Comb. Optim., 28: 218-232.
-
Rahmani, M.K.I., N. Pal and K. Arora, 2014. Clustering of image data using k-means and fuzzy k-means. Int. J. Adv. Comput. Sci. Appl. (IJACSA), 5(7): 160-163.
-
Romero, C. and S. Ventura 2010. Educational data mining: A review of the state of the art. IEEE T. Syst. Man Cy. C, 40(6): 601-618.
CrossRef
-
Saini, A., K.K. Sharma and S. Dalal, 2014. A survey on outlier detection in WSN. Int. J. Res. Aspects Eng. Manage., 1(2): 69-72.
-
Shukla, D.S., A.C. Pandey and A. Kulhari, 2014. Outlier detection: A survey on techniques of WSNs involving event and error based outliers. Proceeding of Innovative Applications of Computational Intelligence on Power, Energy and Controls with their impact on Humanity (CIPECH), pp: 113-116.
-
Singh, G. and V. Kumar, 2013. An efficient clustering and distance based approach for outlier detection. Int. J. Comput. Trends. Technol. (IJCTT), 4(7): 2067-2072.
-
Singh, K. and S. Upadhyaya, 2012. Outlier detection: Applications and techniques. Int. J. Comput. Sci. Issues (IJCSI), 9(1): 307-323.
-
Su, X., Y. Lan, R. Wan and Y. Qin 2009. A fast incremental clustering algorithm. Proceeding of the International Symposium on Information Processing (ISIP’09), pp: 175-178.
PMCid:PMC2848967
-
Suphakit, N., S. Jatsada, N. Ekkachai and W. Supachanun, 2013. Using of jaccard coefficient for keywords similarity. Proceeding of the International Multi Conference of Engineers and Computer Scientists, Vol. 1.
-
Tien Bui, D., B. Pradhan, O. Lofman and I. Revhaug, 2012. Landslide susceptibility assessment in vietnam using support vector machines, decision tree and naive bayes models. Math. Probl. Eng., 2012(2012): 26, Article ID 974638.
-
Torres, G.J., R.B. Basnet, A.H. Sung, S. Mukkamala and B.M. Ribeiro, 2009. A similarity measure for clustering and its applications. Int. J. Electr. Comput. Eng. Syst. (IJECES), 3(3): 164-170.
-
Williams, G., R. Baxter, H. He, S. Hawkins and L. Gu 2002. A comparative study of RNN for outlier detection in data mining. Proceeding of IEEE 13th International Conference on Data Mining, pp: 709-709.
CrossRef
-
Yin, S., X. Gao, H.R. Karimi and X. Zhu, 2014. Study on support vector machine-based fault detection in Tennessee eastman process. Abstr. Appl. Anal., 2014(2014): 8, Article ID 836895.
-
Zhang, Y., N.A. Hamm, N. Meratnia, A. Stein, M. van de Voort and P.J. Havinga, 2012. Statistics-based outlier detection for wireless sensor networks. Int. J. Geogr. Inf. Sci., 26(8): 1373-1392.
CrossRef
Competing interests
The authors have no competing interests.
Open Access Policy
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Copyright
The authors have no competing interests.
|
|
|
ISSN (Online): 2040-7467
ISSN (Print): 2040-7459 |
|
Information |
|
|
|
Sales & Services |
|
|
|