Home            Contact us            FAQs
    
      Journal Home      |      Aim & Scope     |     Author(s) Information      |      Editorial Board      |      MSP Download Statistics

     Research Journal of Applied Sciences, Engineering and Technology


Utilizing WordNet and Regular Expressions for Instance-based Schema Matching

Ahmed Mounaf Mahdi and Sabrina Tiun
Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Selangor, Malaysia
Research Journal of Applied Sciences, Engineering and Technology  2014  4:460-470
http://dx.doi.org/10.19026/rjaset.8.994  |  © The Author(s) 2014
Received: January 20, 2014  |  Accepted: February 06, 2014  |  Published: July 25, 2014

Abstract

Instance-based matching is the process of finding the correspondence of schema elements by comparing the data from different data sources. It is used as an alternative option when the match between schema elements fails. Instance-based matching is applied in many application areas such as website creation and management, schema evolution and migration, data warehousing, database design and data integration. Sometimes the schema information such as (element name, description, data type, etc.) is unavailable or is unable to get the correct match especially when the element name is abbreviation, therefore, if the schema matching failed, the next step is to focus on values stored in the schemas. For these reasons, many recent approaches focus on instance-based matching. In this study, we propose an approach that combines the strength of pattern recognition utilizing regular expressions for numerical domain as well with WordNet for string domain by getting the similarity coefficient in the range of [0,1]. In previous approach, the regular expression is achieved with a good accuracy for numerical instances only and is not implemented on string instances because we need to know the meaning of string to decide if there is a match or not. The using of WordNet-based measures for string instances should guarantee to improve the effectiveness in terms of Precision (P), Recall (R) and F-measure (F). This approach is evaluated with real dataset and the results are found better than using just equality measure for string especially if the schemas are disjoint. The approach achieved 95.3% F-measure (F).

Keywords:

Instance-based matching , regular expression , schema matching , WordNet,


References

  1. Belazzougui, D. and M. Raffinot, 2012. Approximate regular expression matching with multi-strings. J. Discret. Algorithm., 18: 14-21.
    CrossRef    
  2. Berlin, J. and A. Motro, 2001. Autoplex: Automated discovery of content for virtual databases. Lect. Notes Comput. Sc., 2172: 108-122.
    CrossRef    
  3. Bilenko, M., R. Mooney, W. Cohen, P. Ravikumar and S. Fienberg, 2003. Adaptive name matching in information integration. IEEE Intell. Syst., 18(5): 16-23.
    CrossRef    
  4. Blanchard, E., P. Kuntz, M. Harzallah and H. Briand, 2006. A tree-based similarity for evaluating concept proximities in an ontology. St. Class. Dat. Anal., pp: 3-11.
    CrossRef    
  5. Budanitsky, A. and G. Hirst, 2006. Evaluating wordnet-based measures of lexical semantic relatedness. Comput. Linguist., 32(1): 13-47.
    CrossRef    
  6. Bulskov, H., R. Knappe and T. Andreasen, 2002. On measuring similarity for conceptual querying. Lect. Notes Comput. Sc., 2522: 100-111.
    CrossRef    
  7. Doan, A. and A.Y. Halevy, 2005. Semantic integration research in the database community: A brief survey. AI Mag., 26(1): 83.
  8. Doan, A., P. Domingos and A.Y. Halevy, 2001. Reconciling schemas of disparate data sources: A machine-learning approach. ACM Sigmod Record, 30(2): 509-520.
    CrossRef    
  9. Duchateau, F., Z. Bellahsene and M. Roche, 2006. A Context-based Measure for Discovering Approximate Semantic Matching between Schema Elements.
    Direct Link
  10. Elmagarmid, A.K., P.G. Ipeirotis and V.S. Verykios, 2007. Duplicate record detection: A survey. IEEE T. Knowl. Data En., 19(1): 1-16.
    CrossRef    
  11. Fellbaum, C., 1998. A semantic network of english: The mother of all WordNets. Comput. Humanities, 32(2-3): 209-220.
    CrossRef    
  12. Friedl, J., 2006. Mastering Regular Expressions. O'Reilly Media, Incorporated.
  13. Gillani, S., M. Naeem, R. Habibullah and A. Qayyum, 2013. Semantic schema matching using DBpedia. Int. J. Intell. Syst. Appl., 5(4): 72.
    CrossRef    
  14. Gomes de Carvalho, M., A.H. Laender, M. André Gonçalves and A.S. Da Silva, 2012. An evolutionary approach to complex schema matching. Inform. Syst., 38(3): 302-316.
    CrossRef    
  15. Jaccard, P., 1912. The distribution of the flora in the alpine zone. 1. New Phytol., 11(2): 37-50.
    CrossRef    
  16. Jaro, M.A., 1989. Advances in record-linkage methodology as applied to matching the 1985 census of Tampa, Florida. J. Am. Stat. Assoc., 84(406): 414-420.
    CrossRef    
  17. Kozima, H., 1994. Computing lexical cohesion as a tool for text analysis. Ph.D. Thesis, University of Electro-Communications.
  18. Kumar, S., B. Chandrasekaran, J. Turner and G. Varghese, 2007. Curing regular expressions matching algorithms from insomnia, amnesia and acalculia. Proceedings of the 3rd ACM/IEEE Symposium on Architecture for Networking and Communications Systems, pp: 155-164.
    CrossRef    
  19. Levenshtein, V.I., 1966. Binary codes capable of correcting deletions, insertions and reversals. Sov. Phys. Doklady, 10(8): 707.
  20. Li, W.S. and C. Clifton, 1994. Semantic integration in heterogeneous databases using neural networks. Proceedings of the 20th VLDB Conference. Santiago, Chile, pp: 12-15.
  21. Li, W.S. and C. Clifton, 2000. SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks. Data Knowl. Eng., 33(1): 49-84.
    CrossRef    
  22. Liang, Y., 2008. An instance-based approach for domain-independent schema matching. Proceedings of the 46th Annual Southeast Regional Conference. Auburn, Alabama, pp: 268-271.
    CrossRef    
  23. Lin, D., 1998. An information-theoretic definition of similarity. Proceedings of the 15th International Conference on Machine Learning, pp: 296-304.
  24. Lin, F. and K. Sandkuhl, 2008. A survey of exploiting wordnet in ontology matching. Int. Fed. Info. Proc., 276: 341-350.
    CrossRef    
  25. Madhavan, J., P.A. Bernstein and E. Rahm, 2001. Generic schema matching with cupid. Proceedings of the International Conference on Very Large Data Bases, pp: 49-58.
  26. Mehdi, O.A., H. Ibrahim and L.S. Affendey, 2012. Instance based matching using regular expression. Proc. Comput. Sci., 10: 688-695.
    CrossRef    
  27. Melnik, S., H. Garcia-Molina and E. Rahm, 2002. Similarity flooding: A versatile graph matching algorithm and its application to schema matching. Proceedings of the 18th International Conference on Data Engineering, pp: 117-128.
    CrossRef    
  28. Meng, L., R. Huang and J. Gu, 2013. A Review of Semantic Similarity Measures in WordNet. Int. J. Hybrid Inform. Technol., 6(1).
  29. Miller, G. and C. Fellbaum, 1998. Wordnet: An Electronic Lexical Database. MIT Press, Cambridge.
  30. Milo, T. and S. Zohar, 1998. Using schema matching to simplify heterogeneous data translation. Proceeding of the 24th VLDB Conference. New York, USA, pp: 24-27.
  31. Monge, A.E. and C. Elkan, 1996. The field matching problem: Algorithms and applications. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, pp: 267-270.
  32. Moreau, E., F. Yvon and O. Cappé, 2008. Robust similarity measures for named entities matching. Proceedings of the 22nd International Conference on Computational Linguistics, 1: 593-600.
    CrossRef    
  33. Patwardhan, S., S. Banerjee and T. Pedersen, 2003. Using measures of semantic relatedness for word sense disambiguation. Proceedings of the 4th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing'03), pp: 241-257.
    CrossRef    
  34. Petrakis, E.G., G. Varelas, A. Hliaoutakis and P. Raftopoulou, 2006. Design and evaluation of semantic similarity measures for concepts stemming from the same or different ontologies. Proceedings of the 4th Workshop on Multimedia Semantics (WMS'06), pp: 44-52.
    PMid:16449092    
  35. Rada, R., H. Mili, E. Bicknell and M. Blettner, 1989. Development and application of a metric on semantic nets. IEEE T. Syst. Man Cyb., 19(1): 17-30.
    CrossRef    
  36. Rahm, E. and P.A. Bernstein, 2001. A survey of approaches to automatic schema matching. VLDB J., 10(4): 334-350.
    CrossRef    
  37. Rong, S., X. Niu, E.W. Xiang, H. Wang, Q. Yang and Y. Yu, 2012. A machine learning approach for instance matching based on similarity metrics. Proceedings of the 11th International Conference on the Semantic Web-Volume Part I (ISWC'12).
    CrossRef    
  38. Shvaiko, P. and J. Euzenat, 2005. A survey of schema-based matching approaches. Lect. Notes Comput. Sc., 3730: 146-171.
    CrossRef    
  39. Spishak, E., W. Dietl and M.D. Ernst, 2012. A type system for regular expressions. Proceedings of the 14th Workshop on Formal Techniques for Java-Like Programs, pp: 20-26.
    CrossRef    
  40. Tejada, S., C.A. Knoblock and S. Minton, 2001. Learning object identification rules for information integration. Inform. Syst., 26(8): 607-633.
    CrossRef    
  41. Tejada, S., C.A. Knoblock and S. Minton, 2002. Learning domain-independent string transformation weights for high accuracy object identification. Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp: 350-359.
    CrossRef    
  42. Varelas, G., E. Voutsakis, P. Raftopoulou, E.G. Petrakis and E.E. Milios, 2005. Semantic similarity methods in wordNet and their application to information retrieval on the web. Proceedings of the 7th Annual ACM International Workshop on Web Information and Data Management, pp: 10-16.
    CrossRef    
  43. Wu, Z. and M. Palmer, 1994. Verbs semantics and lexical selection. Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics, pp: 133-138.
    CrossRef    
  44. Xie, Y., F. Yu, K. Achan, R. Panigrahy, G. Hulten and I. Osipkov, 2008. Spamming botnets: Signatures and characteristics. Comput. Commun. Rev., 38(4): 171-182.
    CrossRef    
  45. Yang, Y., M. Chen and B. Gao, 2008. An effective content-based schema matching algorithm. Proceedings of the International Seminar on Future Information Technology and Management Engineering (FITME '08), pp: 7-11.
    CrossRef    
  46. Yatskevich, M. and F. Giunchiglia, 2004. Element level semantic matching using WordNet. Proceeding of the Meaning Coordination and Negotiation Workshop. ISWC.
  47. Zaiß, K., T. Schlüter and S. Conrad, 2008. Instance-based ontology matching using regular expressions. Proceeding of the OTM 2008 Workshops on the Move to Meaningful Internet Systems, pp: 40-41.
    CrossRef    
  48. Zapilko, B., M. Zloch and J. Schaible, 2012. Utilizing regular expressions for instance-based schema matching. Procedia Comput. Sci., 10: 688-695.
    CrossRef    

Competing interests

The authors have no competing interests.

Open Access Policy

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Copyright

The authors have no competing interests.

ISSN (Online):  2040-7467
ISSN (Print):   2040-7459
Submit Manuscript
   Information
   Sales & Services
Home   |  Contact us   |  About us   |  Privacy Policy
Copyright © 2024. MAXWELL Scientific Publication Corp., All rights reserved