Thematic Text Graph: A Text Representation Technique for Keyword Weighting in Extractive Summarization System

Автор: Murali Krishna V.V. Ravinuthala, Satyananda Reddy Ch.

Журнал: International Journal of Information Engineering and Electronic Business(IJIEEB) @ijieeb

Статья в выпуске: 4 vol.8, 2016 года.

Бесплатный доступ

Keyword extraction approaches based on directed graph representation of text mostly use word positions in the sentences. A preceding word points to a succeeding word or vice versa in a window of N consecutive words in the text. The accuracy of this approach is dependent on the number of active voice and passive voice sentences in the given text. Edge direction can only be applied by considering the entire text as a single unit leaving no importance for the sentences in the document. Otherwise words at the initial or ending positions in each sentence will get less connections/recommendations. In this paper we propose a directed graph representation technique (Thematic text graph) in which weighted edges are drawn between the words based on the theme of the document. Keyword weights are identified from the Thematic text graph using an existing centrality measure and the resulting weights are used for computing the importance of sentences in the document. Experiments conducted on the benchmark data sets SemEval-2010 and DUC 2002 data sets shown that the proposed keyword weighting model is effective and facilitates an improvement in the quality of system generated extractive summaries.

Еще

Extractive summarization, keyword weighting, directed graph, Thematic text graph, ThemeRank

Короткий адрес: https://sciup.org/15013428

IDR: 15013428

Список литературы Thematic Text Graph: A Text Representation Technique for Keyword Weighting in Extractive Summarization System

  • S. Gholamrezazadeh, M. A. Salehi and B. G.: "A Comprehensive Survey on Text Summarization Systems", 2nd International Conference on Computer Science and its Applications, 2009, pp.1,6, 10-12.
  • G.Salton, A.Wong and C.S. Yang, "A vector space model for automatic indexing", Vol. 18, 1975, pp.613–620.
  • R. Mihalcea,D. Radev,Graph-based Natural Language Processing and Information Retrieval, Cambridge University Press , 2011
  • S.Beliga, A.Mestrovic, S. Martincic-ipsic, ,"An overview of graph-based keyword extraction methods and approaches", Journal of information and organizational sciences,Volume 39,2015, pp 1-20.
  • S. Brin and L. Page., "The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems", Vol. 30, 1998, pp.1–7.
  • J.M. Kleinberg. "Authoritative sources in a hyperlinked environment". Journal of the ACM, 46(5), 1999, 604–632.
  • Y.Ohsawa, N. E. Benson and M.Yachida ,"KeyGraph: Automatic Indexing by Co-occurrence Graph based on Building Construction Metaphor ", In Proceedings of the Advanced Digital Library Conference,1998, pp 12-18
  • Matsuo, Yutaka, Y. Ohsawa, and M. Ishizuka. "A document as a small world", New Frontiers in Artificial Intelligence. Springer Berlin Heidelberg, 2001. pp 444-448.
  • Matsuo, Yutaka, Y. Ohsawa, and M. Ishizuka. "Keyworld: Extracting keywords from document s small world." Discovery Science. Springer Berlin Heidelberg, 2001, pp271-281.
  • R. Mihalcea and P. Tarau , "Textrank: Bringing order into texts", In Lin, D., & Wu,D. (Eds.), Proceedings of EMNLP,2004, pp. 404.
  • Z. Xie, "Centrality Measures in Text Mining: Prediction of Noun Phrases that Appear in Abstracts" in Proc. of 43rd Annual Meeting of the Association for Computational Linguistics, ACL, University of Michigan, USA, 2005,pp 103-108
  • C. Huang, Y. Tian, Z. Zhou, C.X. Ling, T. Huang "Keyphrase extraction using semantic networks structure analysis" in IEEE Int. Conf. on Data Mining, 2006, pp.275-284.
  • Lahiri, Shibamouli, Sagnik Ray Choudhury, and Cornelia Caragea. "Keyword and keyphrase extraction using centrality measures on collocation networks." arXiv preprint, 2014,arXiv:1401.6571
  • S.Beliga, A.Mestrovic, S. Martincic-ipsic,. "Toward Selectivity Based Keyword Extraction for Croatian News." arXiv preprint,2014, arXiv:1407.4723
  • P. Goyal , L. Behera and T.M McGinnity: "A Context-Based Word Indexing Model for Document Summarization","IEEE Transactions on Knowledge and Data Engineering" , Vol.25, 2013, pp.1693-1705.
  • Krishna, RVV Murali, and Ch Satyananda Reddy. "Extractive Text Summarization Using Lexical Association and Graph Based Text Analysis." Computational Intelligence in Data Mining—Volume 1. Springer India, 2016. 261-272.
  • K. Toutanova, D. Klein, C. Manning, and Y. Singer: "Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network. In Proceedings of HLTNAACL(2003)", pp. 252-259.
  • K. Toutanova, C. D.Manning: "Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-2000)", (2000), pp. 63-70.
  • M. Litvak, M. Last, "Graph-based keyword extraction for single-document summarization" in ACM Workshop on Multi-source Multilingual Information Extraction and Summarization, 2008, pp.17-24.
  • M. Litvak, M. Last, H. Aizenman, I. Gobits, A. Kandel, "DegExt — A Language-Independent Graph-Based Keyphrase Extractor" in Advances in Intelligent and Soft Computing V. 86, 2011, pp 121-130.
  • F. Boudin, "A comparison of centrality measures for graph-based keyphrase extraction". in International Joint Conference on Natural Language Processing (IJCNLP), 2013, pp. 834-838.
  • Kim SN, Medelyan O, Kan MY, Baldwin T. "Semeval-2010 task 5: Automatic keyphrase extraction from scientific articles". In Proceedings of the 5th International Workshop on Semantic Evaluation, 2010, pp. 21-26
  • Edmundson, Harold P. "New methods in automatic extracting." Journal of the ACM (JACM) 16.2, 1969, 264-285.
  • P. Over, W. Liggett: Introduction to DUC: "An Intrinsic evaluation of Generic News Text Summarization Systems", Proc. DUC workshop Text Summarization., 2002.
  • C.-Y. Lin and E. H. Hovy, "Automatic evaluation of summaries using N-gram co-occurrencestatistics, in Proc". 2003 Language Technology Conference (HLT-NAACL), 2003, pp. 71–78.
Еще
Статья научная