Home            Contact us            FAQs
    
      Journal Home      |      Aim & Scope     |     Author(s) Information      |      Editorial Board      |      MSP Download Statistics

     Research Journal of Applied Sciences, Engineering and Technology


Dictionary and Gene Ontology Based Similarity for Named Entity Relationship Protein-protein Interaction Prediction from Biotext Corpus

1Smt K. Prabavathy and 2P. Sumathi
1Department of Computer Science, Manonmanium Sundaranar University, Tirunelveli
2Department of Computer Science, Government Arts College, Coimbatore, Tamil Nadu 627012, India
Research Journal of Applied Sciences, Engineering and Technology  2014  22:2282-2289
http://dx.doi.org/10.19026/rjaset.8.1230  |  © The Author(s) 2014
Received: September ‎13, ‎2014  |  Accepted: September ‎20, ‎2014  |  Published: December 15, 2014

Abstract

Protein-protein interactions functions as a significant key role in several biological systems. These involves in complex formation and many pathways which are used to perform biological processes. By accurate identification of the set of interacting proteins can get rid of new light on the functional role of various proteins in the complex surroundings of the cell. The ability to construct biologically consequential gene networks and identification of the exact relationship in the gene network is critical for present-day systems biology. In earlier research, the power of presented gene modules to shed light on the functioning of complex biological systems is studied. Most of modules in these networks have shown small link with meaningful biological function, because these methods doesn’t exactly calculate the semantic relationship between the entities. In order to overcome these problems and improve the PPI results in the biotext corpus a new method is proposed in this research. The proposed method which directly incorporates Gene Ontology (GO) annotation in construction of gene modules and Dictionary-based text is proposed to extract biotext information. Dictionary-Based Text and Gene Ontology (DBTGO) approach that integrates with various gene-gene pairwise similarity values, protein-protein interaction relationship obtained from gene expression, in order to gain better biotext information retrieval result. A result analysis has been carried out on Biotext Project at UC Berkley. Testing the DBTGO algorithm indicates that it is able to improve PPI relationship identification result with all previously suggested methods in terms of the precision, recall, F measure and Normalized Discounted Cumulative Gain (NDCG). The proposed DBTGO algorithm can facilitate comprehensive and in-depth analysis of high throughput experimental data at the gene network level.

Keywords:

Biotext corpus, gene network, gene ontology, Information Extraction (IE), Named Entity Relationship (NER), preprocessing, Protein-Protein Interaction (PPI), word-sense disambiguator,


References

  1. Abacha, A.B. and P. Zweigenbaum, 2011. Automatic extraction of semantic relations between medical entities: A rule based approach. J. Biomed. Semant., 2(Suppl. 5): S4.
    CrossRef    PMid:22166723 PMCid:PMC3239304    
  2. Aebersold, R. and M. Mann, 2003. Mass spectrometry-based proteomics. Nature, 422(6928): 198-207.
    CrossRef    PMid:12634793    
  3. Ananiadou, S., S. Pyysalo, J. Tsujii and D.B. Kell, 2010. Event extraction for systems biology by text mining the literature. Trends Biotechnol., 28: 381-390.
    CrossRef    PMid:20570001    
  4. Aronson, A.R. and F.M. Lang, 2010. An overview of MetaMap: Historical perspective and recent advances. J. Am. Med. Inform. Assn., 17: 229-236.
    CrossRef    PMid:20442139 PMCid:PMC2995713    
  5. Ashburner, M., C.A. Ball, J.A, Blake D. Botstein, H. Butler, J.M. Cherry, A.P. Davis, K. Dolinski, S.S. Dwight and J.T. Eppig, 2000. Gene ontology: Tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet., 25: 25-29.
    CrossRef    PMid:10802651 PMCid:PMC3037419    
  6. Aubin, S., A. Nazarenko and C. Nédellec, 2005. Adapting a general parser to a sublanguage. In: Angelova, G., K. Bontcheva, R. Mitkov, N. Nicolov and N. Nikolov (Eds.), Proceeding of the International Conference on Recent Advances in Natural Language Processing (RANLP, 05). Borovets, Incoma, Bulgaria, pp: 89-93.
  7. Barabasi, A.L. and E. Bonabeau, 2003. Scale-free networks. Sci. Am., 288(5): 60-69.
    CrossRef    PMid:12701331    
  8. Bhattacharya, I., S. Godbole, A. Gupta and A. Verma, 2010. Building re-usable dictionary repositories for real-world text mining. Proceeding of the 9th ACM international conference on Information and knowledge management (CIKM'10). Toronto, Ontario, Canada, October 26-30.
  9. Breiman, L., 2001. Random forests. Mach. Learn., 45: 5-32.
    CrossRef    Direct Link
  10. Cho, Y.R., L. Shi, M. Ramanathan and A. Zhang, 2008. A probabilistic framework to predict protein function from interaction data integrated with semantic knowledge. BMC Bioinformatics, 9: 382.
    CrossRef    PMid:18801191 PMCid:PMC2570367    
  11. Chun, H.W., Y. Tsuruoka, J.D. Kim, R. Shiba, N. Nagata, T. Hishiki and J. Tsujii, 2006. Extraction of gene-disease relations from Medline using domain dictionaries and machine learning. Proceeding of the Pacific Symposium on Biocomputing, pp: 4-15.
    PMid:17094223    
  12. Gu, J., Y. Chen, S. Li and Y. Li, 2010. Identification of responsive gene modules by network-based gene clustering and extending: Application to inflammation and angiogenesis. BMC Syst. Biol., 4: 47.
    CrossRef    PMid:20406493 PMCid:PMC2873318    
  13. Huang, M., X. Zhu, D.G. Payan, K. Qu and M. Li, 2004. Discovering patterns to extract protein-protein interactions from full biomedical texts. Bioinformatics, 20: 3604-3612.
    CrossRef    PMid:15284092    
  14. Ito, T., T. Chiba, R. Ozawa, M. Yoshida, M. Hattori and Y. Sakaki, 2001. A comprehensive two-hybrid analysis to explore the yeast protein interactome. P. Natl. Acad. Sci. USA, 98(8): 4569-4574.
    CrossRef    PMid:11283351 PMCid:PMC31875    
  15. Kuchaiev, O., T. Milenkovic, V. Memisevic, W. Hayes and N. Przulj, 2010. Topological network alignment uncovers biological function and phylogeny. J. Roy. Soc. Interface, 7(50): 1341-1354.
    CrossRef    PMid:20236959 PMCid:PMC2894889    
  16. Manning, C.D., P. Raghavan and H. Schütze, 2008. Introduction to Information Retrieval. Cambridge University Press, Cambridge, MA.
    CrossRef    
  17. Ohta, T., Y. Tateisi, H. Mima and J. Tsujii, 2002. GENIA corpus: An annotated research abstract corpus in molecular biology domain. Proceeding of the Human Language Technology Conference (HLT, 2002). San Diego, California, pp: 73-77.
  18. Palakal, M., M. Stephens, S. Mukhopadhyay, R. Raje and S. Rhodes, 2003. Identification of biological relationships from text documents using efficient computational methods. J. Bioinform. Comput. Biol., 1(2): 307-342.
    CrossRef    PMid:15290775    
  19. Pyysalo, S., F. Ginter, T. Pahikkala, J. Boberg, J. Järvinen, T. Salakoski and J. Koivula, 2004. Analysis of link grammar on biomedical dependency corpus targeted at protein-protein interactions. Proceeding of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (JNLPBA). Geneva, Switzerland, pp: 15-21.
    CrossRef    
  20. Qi, Y., J. Klein-Seetharaman and Z. Bar-Joseph, 2005. Random forest similarity for protein: Protein interaction prediction from multiple sources. Proceeding of the Pacific Symposium on Biocomputing, 10: 531-542.
  21. Rosario, B. and M. Hearst, 2004. Classifying semantic relations in bioscience texts. Proceeding of the 42nd Annual Meeting of Association of Computing Linguistics.
    CrossRef    
  22. Schulze, A. and J. Downward, 2001. Navigating gene expression using microarrays: A technology review. Nat. Cell Biol., 3(8): E190-E195.
    CrossRef    PMid:11483980    
  23. Sebastiani, F., 2002. Machine learning in automated text categorization. ACM Comput. Surv., 34: 1-47.
    CrossRef    
  24. Seco, N., T. Veale and J. Hayes, 2004. An intrinsic information content metric for semantic similarity in WordNet. Proceeding of the European Conference on Artificial Intelligence (ECAI'04), pp: 1089-1090.
  25. Sharan, R., A. Maron-Katz and R. Shamir, 2003. Click and expander: A system for clustering and visualizing gene expression data. Bioinformatics, 19: 1787-1799.
    CrossRef    PMid:14512350    
  26. Uetz, P., L. Giot and G. Cagney, 2000. A comprehensive analysis of protein' protein interactions in Saccharomyces cerevisiae. Nature, 403: 623-627.
    CrossRef    PMid:10688190    
  27. Wang, J.Z., Z. Du, R. Payattakool, P.S. Yu and C.F. Chen, 2007. A new method to measure the semantic similarity of GO terms. Bioinformatics, 23: 1274-1281.
    CrossRef    PMid:17344234    
  28. Wang, Z. and J. Zhang, 2007. In search of the biological significance of modular structures in protein networks. PLoS Comput. Biol., 3: e107.
    CrossRef    PMid:17542644    
  29. Winnenburg, R., T. Wachter, C. Plake, A. Doms and M. Schroeder, 2008. Facts from text: Casn text mining help to scale-up high-quality manual curation of gene products with ontologies? Brief. Bioinform., 9: 466-478.
    CrossRef    PMid:19060303    
  30. Zweigenbaum, P., D. Demner-Fushman, H. Yu and K.B. Cohen, 2007. Frontiers of biomedical text mining: Current progress. Brief. Bioinform., 8: 358-375.
    CrossRef    PMid:17977867 PMCid:PMC2516302    

Competing interests

The authors have no competing interests.

Open Access Policy

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Copyright

The authors have no competing interests.

ISSN (Online):  2040-7467
ISSN (Print):   2040-7459
Submit Manuscript
   Information
   Sales & Services
Home   |  Contact us   |  About us   |  Privacy Policy
Copyright © 2024. MAXWELL Scientific Publication Corp., All rights reserved