Research Article | OPEN ACCESS
Web Page Classification Using SVM and FURIA
1P. Madhubala and 2K. Murugesan
1Department of Computer Science and Engineering, Tagore Institute of Engineering and Technology, Tamilnadu, India
2Department of Electronics and Communication Engineering, Sree Sastha Institute of Engineering and Technology, Tamilnadu, India
Research Journal of Applied Sciences, Engineering and Technology 2015 7:512-518
Received: September 24, 2014 | Accepted: October 24, 2014 | Published: March 05, 2015
Text Classification classifies a document, under a predefined category. Mostly, an automatic text classification is an important application taken as a research topic, since the inception of digital documents. In this study, Hypernyms, superordinate words are identified in web and clubbed with entailment rule acquisition. Available tree of hyponym words in the document has been created and used with dependency tree. Features extraction is performed with weighted Term Frequency-Inverse Document Frequency (TF-IDF) where the weight of the word can be computed based on the number of hyponyms present in the radix tree. Performance evaluation is done using Support Vector Machine (SVM) classifier and Fuzzy Unordered Rule Induction Algorithm (FURIA) classifier.
Hypernym , hyponym, radix tree, Support Vector Machine (SVM) and Fuzzy Unordered Rule Induction Algorithm (FURIA), Term Frequency-Inverse Document Frequency (TF-IDF),
Bai, P. and J. Li, 2009. The improved naive Bayesian WEB text classification algorithm. Proceeding of International Symposium on Computer Network and Multimedia Technology (CNMT, 2009), pp: 1-4.
CrossRef -
Bo, S., S. Qiurui, C. Zhong and F. Zengmei, 2009. A study on automatic web pages categorization. Proceeding of IEEE International Advance Computing Conference (IACC, 2009), pp: 1423-1427.
CrossRef -
Dagan, I., B. Dolan, B. Magnini, D. Roth, I. Dagan, B. Dolan and P. Pantel, 2010. Recognizing textual entailment: Rational, evaluation and approaches-erratum. Nat. Lang. Eng., 16(1): 105.
CrossRef -
Gasparovica, M. and L. Aleksejeva, 2011. Using fuzzy unordered rule induction algorithm for cancer data classification. Proceeding of 17th International Conference on Soft Computing, MENDEL 2011. Czech Republic, Brno, pp: 141-147.
Hazman, M., S.R. El-Beltagy and A. Rafea, 2011. Survey of ontology learning approaches. Int. J. Comput. Appl., 22(9).
CrossRef -
Inyaem, U., P. Meesad and C. Haruechaiyasak, 2009. Named-entity techniques for terrorism event extraction and classification. Proceeding of 8th International Symposium on Natural Language Processing (SNLP'09), pp: 175-179.
CrossRef -
Jakkula, V., 2006. Tutorial on support vector machine (svm). School of EECS, Washington State University.
Jotheeswaran, J. and Y. Kumaraswamy, 2013. Opinion mining using decision tree based feature selection through Manhattan hierarchical cluster measure. J. Theor. Appl. Inform. Technol., 58(1).
Kan, M.Y. and H.O.N. Thi, 2005. Fast webpage classification using URL features. Proceeding of the 14th ACM International Conference on Information and Knowledge Management, pp: 325-326.
CrossRef -
Khan, A., B. Baharudin and K. Khan, 2010. Semantic based features selection and weighting method for text classification. Proceeding of the 2010 International Symposium in Information Technology (ITSim), 2: 850-855.
CrossRef -
Koirala, C. and K. Rasheed, 2008. Comparison of the effects of morphological and ontological information on text categorization. Proceeding of the 7th International Conference on Machine Learning and Applications (ICMLA'08), pp: 783-786.
CrossRef -
Leis, V., A. Kemper and T. Neumann, 2013. The adaptive radix tree: ARTful indexing for main-memory databases. Proceeding of the IEEE 29th International Conference on Data Engineering (ICDE, 2013), pp: 38-49.
Liu, J., G. Wang and Z. Jiang, 2009. Research on chinese ontology instance extension based on SVM. Proceeding of the International Symposium on Intelligent Ubiquitous Computing and Education, pp: 564-568.
CrossRef -
Luong, H.P., S. Gauch and Q. Wang, 2009. Ontology learning through focused crawling and information extraction. Proceeding of the International Conference on Knowledge and Systems Engineering (KSE'09), pp: 106-112.
CrossRef -
Luts, J., F. Ojeda, R. Van de Plas, B. De Moor, S. Van Huffel and J.A. Suykens, 2010. A tutorial on support vector machine-based methods for classification problems in chemometrics. Anal. Chim. Acta, 665(2): 129-145.
CrossRef PMid:20417323 -
Maedche, A. and S. Staab, 2004. Ontology Learning. In: Stab, S., et al., (Eds.), Handbook on Ontologies. Springer-Verlag, Berlin, Heidelberg, pp: 173-190.
CrossRef -
Nasuti, F.W., 2000. Knowledge acquisition using multiple domain experts in the design and development of an expert system for disaster recovery planning. Ph.D. Thesis, Nova Southeastern University.
Rahman, M.M. and D.N. Davis, 2012. Fuzzy unordered rules induction algorithm used as missing value imputation methods for K-mean clustering on real cardiovascular data. Proceeding of the World Congress on Engineering, Vol. 1.
Ranwez, V., S. Ranwez and S. Janaqi, 2012. Subontology extraction using hyponym and hypernym closure on is-a directed acyclic graph. IEEE T. Knowl. Data En., 24(12): 2288-2300.
CrossRef -
Rios-Alvarado, A.B., I. Lopez-Arevalo and V. Sosa-Sosa, 2011. Discovering hypernyms using linguistic patterns on web search. Proceeding of the 7th International Conference on Next Generation Web Services Practices (NWeSP), pp: 302-307.
CrossRef -
Siragusa, E., D. Weese and K. Reinert, 2013. Scalable string similarity search/join with approximate seeds and multiple backtracking. Proceeding of the Joint EDBT/ICDT 2013 Workshops, pp: 370-374.
CrossRef -
Soucy, P. and G.W. Mineau, 2005. Beyond TFIDF weighting for text categorization in the vector space model. Proceeding of the 19th International Joint Conference on Artificial Intelligence (IJCAI, 2005), pp: 1130-1135.
PMCid:PMC4286873 -
Soumya, S. and H. Swathi, 2013. Automatic repeated rule acquisition from similar web sites using rule ontology. Int. J. Comput. Appl., 66(6): 17-22.
Valêncio, C.R., F.T. Oyama, P.S. Neto, A.C. Colombini, A.M. Cansian, R.C.G. de Souza and P.L.P. Corrêa, 2012. MR-Radix: A multi-relational data mining algorithm. Human-Centric Comput. Inform. Sci., 2(1): 1-17.
Vigneshwari, S. and M. Aramudhan, 2012. A novel approach for personalizing the web using user profiling ontologies. Proceeding of 4th International Conference on Advanced Computing (ICoAC, 2012), pp: 1-4.
CrossRef -
Wang, X. and Q. Lu, 2011. Ontology auto-extension based on improved SVM algorithm. Proceeding of the International Conference on E-Business and E-Government (ICEE, 2011), pp: 1-4.
CrossRef -
Xu, Z., F. Yan, J. Qin and H. Zhu, 2011. A web page classification algorithm based on link information. Proceeding of the 10th International Symposium on Distributed Computing and Applications to Business, Engineering and Science (DCABES, 2011), pp: 82-86.
CrossRef -
Yildiz, T. and S. Yildirim, 2012. Association rule based acquisition of hyponym and hypernym relation from a Turkish corpus. Proceeding of the International Symposium on Innovations in Intelligent Systems and Applications (INISTA, 2012), pp: 1-5.
CrossRef -
Yoo, K., 2011. SVM-based knowledge topic identification toward the autonomous knowledge acquisition. Proceeding of the IEEE 9th International Symposium on Applied Machine Intelligence and Informatics (SAMI), pp: 149-154.
CrossRef -
Zhang, W., T. Yoshida and X. Tang, 2008. TFIDF, LSI and multi-word in information retrieval and text categorization. Proceeding of the IEEE International Conference on Systems, Man and Cybernetics (SMC, 2008), pp: 108-113.
Competing interests
The authors have no competing interests.
Open Access Policy
This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
The authors have no competing interests.
ISSN (Online): 2040-7467
ISSN (Print): 2040-7459 |
Information |
Sales & Services |