An Efficient Technique to Implement Similarity Measures in Text Document Clustering using Artificial Neural Networks Algorithm

1K. Selvi and 2R.M. Suresh
1Sathyabama University
2Sri Muthukumaran Institute of Technology, Chennai, India
Research Journal of Applied Sciences, Engineering and Technology  2014  23:2320-2328
http://dx.doi.org/10.19026/rjaset.8.1235  |  © The Author(s) 2014
Received: ‎September ‎18, ‎2014  |  Accepted: October 17, ‎2014  |  Published: December 20, 2014


Pattern recognition, envisaging supervised and unsupervised method, optimization, associative memory and control process are some of the diversified troubles that can be resolved by artificial neural networks. Problem identified: Of late, discovering the required information in massive quantity of data is the challenging tasks. The model of similarity evaluation is the central element in accomplishing a perceptive of variables and perception that encourage behavior and mediate concern. This study proposes Artificial Neural Networks algorithms to resolve similarity measures. In order to apply singular value decomposition the frequency of word pair is established in the given document. (1) Tokenization: The splitting up of a stream of text into words, phrases, signs, or other significant parts is called tokenization. (2) Stop words: Preceding or succeeding to processing natural language data, the words that are segregated is called stop words. (3) Porter stemming: The main utilization of this algorithm is as part of a phrase normalization development that is characteristically completed while setting up in rank recovery technique. (4) WordNet: The compilation of lexical data base for the English language is called as WordNet Based on Artificial Neural Networks, the core part of this study work extends n-gram proposed algorithm. All the phonemes, syllables, letters, words or base pair corresponds in accordance to the application. Future work extends the application of this same similarity measures in various other neural network algorithms to accomplish improved results.


Artificial neural networks, natural language processing, porter stemming, similarity measure, wordnet,


Competing interests

The authors have no competing interests.

Open Access Policy

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.


