A rule based extractive text summarization technique for Bangla news documents
Автор: Partha Protim Ghosh, Rezvi Shahariar, Muhammad Asif Hossain Khan
Журнал: International Journal of Modern Education and Computer Science @ijmecs
Статья в выпуске: 12 vol.10, 2018 года.
Бесплатный доступ
News summarization is a process of distilling the most important information from a news document in a precise way. For the advancement of Internet nowadays almost all of the Bangla newspapers have their online versions, and people of this era like to read newspaper from website using Internet. But large amount of electronic news content is a burden for human to come out with valuable information. For mitigating this pain point, this paper proposes an automatic method to summarize Bangla news document. In this proposed approach, graph based sentence scoring feature is introduced for the first time for Bangla news document summarization. After analyzing vast amount of Bangla news document 12 sentence scoring features have been introduced for calculating score of a sentence. An improved summary generation method has also been proposed which remove the redundant information from summary. The result is evaluated using a standard summary evaluation tool called ROUGE, and found proposed method outperforms all existing methods used in Bangla news summarization.
Bangla news summarization, Extractive based approach, NLP, ROUGE, Sentence scoring features
Короткий адрес: https://sciup.org/15016817
IDR: 15016817 | DOI: 10.5815/ijmecs.2018.12.06
Список литературы A rule based extractive text summarization technique for Bangla news documents
- First newspaper. retrieved from https://www.revolvy.com/page/Johann-Carolus [Online; accessed 5-may -2018].
- Ferreira, R., de Souza Cabral, L., Freitas, F., Lins, R. D., de França Silva, G., Simske, S. J., & Favaro, L. (2014). A multi-document summarization system based on statistics and linguistic treatment. Expert Systems with Applications, 41(13), 5780-5787.
- Luhn, H. P. (1958). The automatic creation of literature abstracts. IBM Journal of research and development, 2(2), 159-165.
- Haque, M. M., Pervin, S., & Begum, Z. (2013). Literature Review of Automatic Single Document Text Summarization Using NLP. International Journal of Innovation and Applied Studies, 3(3), 857-865.
- Haque, M., Pervin, S., & Begum, Z. (2013). Literature review of automatic multiple documents text summarization. International Journal of Innovation and Applied Studies, 3(1), 121-129.
- Akter, S., Asa, A. S., Uddin, M. P., Hossain, M. D., Roy, S. K., & Afjal, M. I. (2017, February). An extractive text summarization technique for Bengali document (s) using K-means clustering algorithm. In Imaging, Vision & Pattern Recognition (icIVPR), 2017 IEEE International Conference on(pp. 1-6). IEEE.
- Chowdhury, M., Khalil, I., & Mofazzal, H. C. (2000). Bangla Vasar Byakaran. Dhaka: Ideal publication.
- Islam, M. T., & Al Masum, S. M. (2004, December). Bhasa: A corpus-based information retrieval and summariser for bengali text. In Proceedings of the 7th International Conference on Computer and Information Technology.
- Uddin, M. N., & Khan, S. A. (2007, December). A study on text summarization techniques and implement few of them for Bangla language. In Computer and information technology, 2007. iccit 2007. 10th international conference on (pp. 1-4). IEEE.
- Sarkar, K. (2012). Bengali text summarization by sentence extraction. arXiv preprint arXiv:1201.2240.
- Sarkar, K. (2012, August). An approach to summarizing Bengali news documents. In proceedings of the International Conference on Advances in Computing, Communications and Informatics (pp. 857-862). ACM.
- Sarkar, K. (2014). A keyphrase-based approach to text summarization for English and bengali documents. International Journal of Technology Diffusion (IJTD), 5(2), 28-38.
- Efat, M. I. A., Ibrahim, M., & Kayesh, H. (2013, May). Automated Bangla text summarization by sentence scoring and ranking. In Informatics, Electronics & Vision (ICIEV), 2013 International Conference on (pp. 1-5). IEEE.
- B. language. (2017) History of bengali language. retrieved from https://www.cs.mcgill.ca/rwest/link-suggestion/wpcd2008-09 augmented/wp/b/Bengalilanguage. html. [Online; accessed 05-May-2017].
- T. T. of Inida. (2017) Nearly 60% of indians speak a language other than hindi. retrieved from http://timesofindia.indiatimes.com/india/Nearly-60-of-Indians-speak-a-language-other-than-Hindi/articleshow/ 36922157.cms. [Online; accessed 05-March-2018].
- Inshorts. (2017) Bengali is an official language in africa’s sierra leone. retrieved from https://www.inshorts.com/news/bengali-is-an-official-language-in-africas-sierra-leone-1487699311123.[Online; accessed 06-February-2018]
- Abujar, S., Hasan, M., Shahin, M. S. I., & Hossain, S. A. (2017, July). A heuristic approach of text summarization for Bengali documentation. In Computing, Communication and Networking Technologies (ICCCNT), 2017 8th International Conference on (pp. 1-8). IEEE.
- R. B. System. (2017) Rule based system. Retrieved from http://www.j-paine.org/students/ lectures/lect3/node5.html. [Online; accessed 01-April-2017].
- Oliveira, H., Ferreira, R., Lima, R., Lins, R. D., Freitas, F., Riss, M., & Simske, S. J. (2016). Assessing shallow sentence scoring techniques and combinations for single and multi-document summarization. Expert Systems with Applications, 65, 68-86.
- Haque, M., Pervin, S., & Begum, Z. (2017). An Innovative Approach of Bangla Text Summarization by Introducing Pronoun Replacement and Improved Sentence Ranking. Journal of Information Processing Systems, 13(4).
- Wong, S. M., Ziarko, W., & Wong, P. C. (1985, June). Generalized vector spaces model in information retrieval. In Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 18-25). ACM.
- Lin, C. Y., & Hovy, E. (2003, May). Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1 (pp. 71-78). Association for Computational Linguistics.
- R. 2.0. (2016) Java package for evaluation of summarization tasks with updated rouge measures. Retrieved from http://kavita-ganesan.com/content/rouge-2.0. [Online; accessed 25-May-2016].
- B. N. L. P. Community (2016) Dataset for evaluating Bangla text summarization system. Retrieved from http://bnlpc.org/research.php. [Online; accessed 8-August-2017].
- Edmundson, H. P. (1969). New methods in automatic extracting. Journal of the ACM (JACM), 16(2), 264-285.
- Hovy, E., & Lin, C. Y. (1999). Automated Text Summarization in SUMMARIST. Advances in Automatic Text Summarization, 81-94.
- Hariharan, S., Ramkumar, T., & Srinivasan, R. (2013). Enhanced graph based approach for multi document summarization. Int. Arab J. Inf. Technol., 10(4), 334-341.
- Baxendale, P. B. (1958). Machine-made index for technical literature—an experiment. IBM Journal of Research and Development, 2(4), 354-361.
- Bangla Stop word list: Retrieved from https://github.com/stopwords-iso/stopwords-bn [Online; accessed 10-August-2017].
- Value Normalization: Retrieved from https://en.wikipedia.org/wiki/Normalization_(statistics) [Online; accessed 12-November -2017].
- Bangla News Paper list: Retrieved from http://www.24livenewspaper.com/bangla-newspaper [Online; accessed 5-March -2018].
- Haque, M. M., Pervin, S., & Begum, Z. (2015, December). Automatic Bengali news documents summarization by introducing sentence frequency and clustering. In Computer and Information Technology (ICCIT), 2015 18th International Conference on (pp. 156-160). IEEE.
- Oliveira, Hilário, et al. "Assessing shallow sentence scoring techniques and combinations for single and multi-document summarization." Expert Systems with Applications 65 (2016): 68-86.