Exploring automated summarization: from extraction to abstraction

Бесплатный доступ

This paper provides a review of AI-powered automated summarization models, with a focus on two principal approaches: extractive and abstractive. The study aims to evaluate the capabilities of these models in generating concise yet meaningful summaries and analyze their lexical proficiency and linguistic fluidity. The compression rates are assessed using quantitative metrics such as page, word, and character counts, while language fluency is described in terms of ability to manipulate grammar and lexical patterns without compromising meaning and content. The study draws on a selection of scientific publications across various disciplines, testing the functionality and output quality of automated summarization tools such as Summate.it, WordTune, SciSummary, Scholarcy, and OpenAI ChatGPT-4. The findings reveal that the selected models employ a hybrid strategy, integrating both extractive and abstractive techniques. Summaries produced by these tools exhibited varying degrees of completeness and accuracy, with page compression rates ranging from 50 to 95%, and character count reductions reaching up to 98%. Qualitative evaluation indicated that while the models generally captured the main ideas of the source texts, some summaries suffered from oversimplification or misplaced emphasis. Despite these limitations, automated summarization models exhibit significant potential as effective tools for both text compression and content generation, highlighting the need for continued research, particularly from the perspective of linguistic analysis. Summaries generated by AI models offer new opportunities for analyzing machine-generated language and provide valuable data for studying how algorithms process, condense, and restructure human language.

Еще

Automated summarization, extractive summarization, abstractive summarization, artificial intelligence, neural networks, interdisciplinary research

Короткий адрес: https://sciup.org/149147496

IDR: 149147496   |   DOI: 10.15688/jvolsu2.2024.5.4

Список литературы Exploring automated summarization: from extraction to abstraction

  • Arana-Catania M., Procter R., He Y., Liakata M., 2021. Evaluation of Abstractive Summarisation Models with Machine Translation in Deliberative Processes. ArXiv (Cornell University). DOI: https://doi.org/10.48550/arxiv.2110.05847
  • Bawden D., Robinson L. 2020. Information Overload: An Overview. Oxford Encyclopedia of Political Decision Making. Oxford, Oxford University Press. DOI: 10.1093/acrefore/9780190228637.013.1360
  • Belwal R.C., Rai S., Gupta A., 2021. A New Graph-Based Extractive Text Summarization Using Keywords or Topic Modeling. Journal of Ambient Intelligence and Humanized Computing, vol. 12, no. 10, pp. 8975-8990. DOI: https://doi.org/10.1007/s12652-020-02591-x
  • Bhargava R., Sharma Y., 2020. Deep Extractive Text Sum mar iz ation. Pro cedi a Co mput er Science, no. 167, pp. 138-146. DOI: https://doi.org/10.1016/j.procs.2020.03.191
  • Collins E., Augenstein I., Riedel S., 2017. A Supervised Approach to Extractive Summarisation of Scientific Papers. Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017). Association for Computational Linguistics, pp. 195-205. DOI: https://doi.org/10.18653/v1/K17-1021
  • Gehrmann S., Deng Y., Rush A., 2018. Bottom-Up Abstractive Summarization. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. DOI: https://doi.org/10.18653/v1/d18-1443
  • Gooch P., Warren-Jones E., 2020. A Study’s Got to Know Its Limitations. DOI: 10.1101/2020.04.29.067843.
  • Gupta S., Gupta S.K., 2019. Abstractive Summarization: An Overview of the State of the Art. Expert Systems with Applications, no. 121, pp. 49-65. DOI: https://doi.org/10.1016/j.eswa.2018.12.011
  • Khan A., Salim N., Jaya Kumar Y., 2015. A Framework for Multi-Document Abstractive Summarization Based on Semantic Role Labelling. Applied Soft Computing, no. 30, pp. 737-747. DOI: https://doi.org/10.1016/j.asoc.2015.01.070
  • Lamsiyah S., El Mahdaouy A., El Alaoui S.O., Espinasse B., 2020. A Supervised Method for Extractive Single Document Summarization Based on Sentence Embeddings and Neural Networks. Advances in Intelligent Systems and Computing, vol. 1105, pp. 75-88. DOI: https://doi.org/10.1007/978-3-030-36674-2_8
  • Mishra A.R., Naruka M.S., Tiwari S., 2023. Extraction Techniques and Evaluation Measures for Extractive Text Summarisation. Sustainable Computing: Transforming Industry 4.0 to Society. Springer EBooks, pp. 279-290. DOI: https://doi.org/10.1007/978-3-031-13577-4_17
  • Mohan M.J., Sunitha C., Ganesh A., Jaya A., 2016. A Study on Ontology Based Abstractive Summarization. Procedia Computer Science, no. 87, pp. 32-37. DOI: https://doi.org/10.1016/j.procs.2016.05.122
  • Orasan C., Pekar V., Hasler, L., 2004. A Comparison of Summarisation Methods Based on Term Specificity Estimation. Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04). Lisbon, European Language Resources Association (ELRA), pp. 1037-1040.
  • Polyakova I.N., Zaitsev I.O., 2022. Modification of the Graph Method for Automatic Summarization Tasks Taking into Account Synonymy. International Journal of Open Information Technologies, vol. 10, no. 4, pp. 45-54.
  • Puduppully R.S., Jain P., Chen N., Steedman M., 2023. Multi-Document Summarization with Centroid-Based Pretraining. Edinburgh Research Explorer (University of Edinburgh). DOI: https://doi.org/10.18653/v1/2023.acl-short.13
  • Sorokina S.G., 2016. Ispolzovaniye rekurentnosti kak sredstva argumentatsii pri postroyenii tekstov nauchnogo soderzhaniya: dis.... kand. filol. nauk [Use of Recurrence as a Means of Argumentation in the Construction of Texts of Scientific Content. Cand. philol. sci. diss.]. Moscow. 196 p.
  • Sorokina S.G., 2023. Iskusstvennyy intellekt v kontekste mezhdistsiplinarnykh issledovaniy yazyka [Artificial Intelligence in Interdisciplinary Linguistics]. Vestnik Kemerovskogo gosudarstvennogo universiteta. Seriya: Gumanitarnye i obshchestvennye nauki [Bulletin of Kemerovo State University. Series: Humanities and Social Science], vol. 7, no. 3, pp. 267-280. DOI: https://doi. org/10.21603/2542-1840-2023-7-3-267-280
  • Sorokina S.G., 2024. Osobennosti primeneniya tekhnologii avtomaticheskoy summarizatsii k nauchnym publikatsiyam [Applying Automatic Summarization Technology to Academic Publications]. Tri «l» v paradigme sovremennogo gumanitarnogo znaniya: lingvistika, literaturovedenie, lingvodidaktika: sb. nauch. st. [Three L’s in the Paradigm of Modern Humanitarian Knowledge: Linguistics, Literary Criticism, Linguodidactics. Collection of Scientific Articles]. Moscow, Yaz. narodov mira Publ., pp. 132-138.
  • Sorokina S.G., Ulanova K.L., 2020. Implementatsiya kategorii tozhdestva v nazvaniyakh publitsisticheskikh i nauchnykh tekstov [Role of Article Title in Implementing the Category of Identity]. Sovremennoe pedagogicheskoe obrazovanie [Modern Pedagogical Education], no. 2, pp. 202-207.
  • Thaiprayoon S., Unger H., Kubek M., 2021. Graph and Centroid-Based Word Clustering. Proceedings of the 4th International Conference on Natural Language Processing and Information Retrieval, pp. 163-168. DOI: https://doi.org/10.1145/3443279.3443290
  • Vertinova A.A., Pashuk N.R., Makogonova P.V., Kosheleva A.I., 2022. Otsenka vliyaniya informatsionnogo shuma na prinyatiye resheniy [Assessing the Infoglut Impact on Decision-Making]. Liderstvo i menedzhment [Leadership and Management], vol. 9, no. 3, pp. 877-890. DOI: https://doi.org/10.18334/lim.9.3.116218
  • Yadav A.K., Ranvijay N., Yadav R.S., Maurya A.K., 2023. Graph-Based Extractive Text Summarization Based on Single Document. Multimedia Tools and Applications, vol. 83, no. 7, pp. 18987-19013. DOI: https://doi.org/10.1007/s11042-023-16199-8
Еще
Статья научная