Research Article | OPEN ACCESS
Privacy Preserving Probabilistic Possibilistic Fuzzy C Means Clustering
1V.S. Thiyagarajan and 2Venkatachalapathy
1Annamalai University, Chidhambaram, India
2Department of Computer Science and Engineering, Faculty of Engineering and Technology, Annamalai University, Chidhambaram, India
Research Journal of Applied Sciences, Engineering and Technology 2015 1:27-39
Received: December ?14, ?2014 | Accepted: April ?13, ?2015 | Published: September 05, 2015
Abstract
Due to this uncontrollable growth of data, clustering played major role to partition into a small sets to do relevant processes within the small sets. Recently, the privacy and security are extra vital essentials when data is large and the data is distributed to other sources for various purposes. According to that, the privacy preservation should be done before distributing the data. In this study, our proposed algorithm meets the both requirements of achieving the clustering accuracy and privacy preserving of the data. Initially, the whole dataset is divided to small segments. The next step is to find the best sets of attributes combinations, which are attained through, attribute weighing process, which leads to attain the privacy preservation through vertical partitioning. The next is to apply the proposed Probabilistic Possibilistic Clustering Algorithm (PPFCM) for each segment, which produces the number of clusters for each segment. The next step is applying the PPFCM on the centroids of the clusters. The corresponding data tuples of the grouped centroids join to attain the final clustered result. The implementation is done using JAVA and the performance of the proposed PPFCM algorithm is compared with possibilistic FCM and probability-clustering algorithm for the benchmark datasets.
Keywords:
Clustering, possibilistic fuzzy C means clustering, privacy preserving, probabilistic clustering,
References
- Adult dataset, 1994. Retrieved from: http://archive.ics.uci.edu/ml/datasets/Adult.
Direct Link - Chen, W.Y., Y. Song, H. Bai, C.J. Lin and E.Y. Chang, 2011. Parallel spectral clustering in distributed systems. IEEE T. Pattern Anal., 33(3): 568-586.
CrossRef PMid:20421667
- Das, S., A. Abraham and A. Konar, 2008. Automatic clustering using an improved differential evolution algorithm. IEEE T. Syst. Man Cy. A, 38(1): 218-237.
CrossRef
- Ester, M., H.P. Kriegel, J. Sander and X. Xu, 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. Proceeding of 2nd International Conference on Knowledge Discovery and Data Mining (KDD'96 ), pp: 226-231.
- Islam, M.Z. and L. Brankovic, 2004. A framework for privacy preserving classification in data mining. Proceeding of the 2nd Workshop on Australasian Information Security, Data Mining and Web Intelligence and Software Internationalisation, 32: 163-168.
-
Izakian, H., A. Abraham and V. Snasel, 2009. Fuzzy clustering using hybrid fuzzy C-means and fuzzy particle swarm optimization. Proceeding of World Congress on Nature and Biologically Inspired Computing. IEEE Press, India, pp: 1690-1694.
CrossRef
-
Jain, Y.K., V.K. Yadav and G. Panday, 2011. An efficient association rule hiding algorithm for privacy preserving data mining. Int. J. Comput. Sci. Eng., 3(7): 2792-2798.
-
Januzaj, E., H.P. Kriegel and M. Pfeifle, 2004. Scalable density-based distributed clustering. Proceeding of 8th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), pp: 231-244.
CrossRef
- Ji, J., W. Pang, C. Zhou, X. Han and Z. Wang, 2012. A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data. Knowl-Based Syst., 30: 129-135.
CrossRef
- Jin, R., A. Goswami and G. Agrawal, 2006. Fast and exact out-of-core and distributed K-means clustering. Knowl. Inf. Syst., 10(1): 17-40.
CrossRef
- Kusiak, A. and M. Smith, 2007. Data mining in design of products and production systems. IFAC Annu. Rev. Control, 31(1): 147-156.
CrossRef
- Li, T., N. Li, J. Zhang and I. Molloy, 2012. Slicing: A new approach for privacy preserving data publishing. IEEE T. Knowl. Data En., 24(3): 561-574.
CrossRef
- Lyigun, C., 2008. Probabilistic Distance Clustering, Proquest, ISBN: 0549980075, 9780549980070.
- Mehmed, K., 2003. Data Mining: Concepts, Models, Methods and Algorithms. John Wiley and Sons, Hoboken, N.J.
- Mushroom dataset, 1981. Retrieved from: http://archive.ics.uci.edu/ml/datasets/Mushroom.
Direct Link
- Ng, R.T. and J. Han, 1994. Efficient and effective clustering methods for spatial data mining. Proceeding of the 20th International Conference on Very Large Data Bases, pp: 144-155.
-
Osmar, R.Z., 1999. Introduction to Data Mining. In: Principles of Knowledge Discovery in Databases. CMPUT690, University of Alberta, Canada.
- Pal, N.R., K. Pal, J.M. Keller and J.C. Bezdek, 2005. A possibilistic fuzzy c-means clustering algorithm. IEEE T. Fuzzy Syst., 13(4): 517-530.
CrossRef
- Patel, S., V. Patel and D. Jinwala, 2013. Privacy preserving distributed K-means clustering in malicious model using zero knowledge proof. In: Hota, C. and P.K. Srimani (Eds.), ICDCIT, 2013. LNCS 7753, Springer-Verlag, Berlin, Heidelberg, pp: 420-431.
- Roy, B., 2014. Performance analysis of clustering in privacy preserving data mining. Int. J. Comput. Appl. Inform. Technol., 5(2): 35-45.
- Sheikholeslami, G., S. Chatterjee and A. Zhang, 1998. WaveCluster: A multi-resolution clustering approach for very large spatial databases. Proceeding of the 24th VLDB Conferences. New York, USA, pp: 428-439.
- Wang, W., J. Yang and R. Muntz, 1997. STING: A statistical information grid approach to spatial data mining. Proceeding of the 23rd International Conference on Very Large Data Bases (VLDB), pp: 186-195.
- Wehrens, R. and L.M. Buydens, 2004. Model-based clustering for image segmentation and large datasets via sampling. J. Classif., 21: 231-253.
CrossRef
- Wu, S. and S. Wang, 2013. Information-theoretic outlier detection for large-scale categorical data. IEEE T. Knowl. Data En., 25(3): 589-602.
CrossRef
- Zhang, T., R. Ramakrishnan and M. Livny, 1996a. BIRCH: An efficient data clustering method for very large databases. Proceeding of the ACM SIGMOD International Conference on Management of Data, pp: 103-114.
CrossRef
- Zhang, T., R. Ramakrishnan and M.L. Birch, 1996b. An efficient data clustering method for very large databases. Proceeding of the ACM SIGMOD International Conference on Management of Data, pp: 103-114.
Competing interests
The authors have no competing interests.
Open Access Policy
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Copyright
The authors have no competing interests.
|
|
|
ISSN (Online): 2040-7467
ISSN (Print): 2040-7459 |
|
Information |
|
|
|
Sales & Services |
|
|
|