Topic modelling in computer security discourse: a case study of whitepaper publications and news feeds

Бесплатный доступ

Up-to-date information plays a crucial role in modern linguistic research. For this reason, computational linguistic methods, including those aided with analytical and machine-learning tools, are attracting growing attention. Some of their applications in cognitive-discursive linguistics are keyword extraction, topic modelling, and content analysis. Text-mining tools facilitate time-consuming linguistic work and add to the results’ reliability and greater statistical precision by processing a significantly larger data volume. Most studies, however, have overlooked interference of socially significant but context-irrelevant (e.g. political) information into a specialized discourse by focusing mainly on one data format. The current study, aimed at topic modelling, has been carried out on the computer security discourse. We have implemented the project on the KNIME analytical platform. The model enables comparison between topics extracted from published articles and date-specific RSS news feeds. The study provides important insights into infodemiology and political incidental news exposure occurring in computer-security-oriented RSS feeds on the Kaspersky website but untraceable in the papers published on the same website in a PDF format. The results reported here provide further evidence for the need to consider the hypercontext of professional communication and employ real-time data in solving similar problems within cognitive-discursive linguistics. Our contribution to the development of cognitive-discursive linguistics is the method for comparing topics within one discourse, taking into account near-real-time data. For computational linguistics, the significance of our work lies in describing a new application of the topic extraction workflow freely available on the KNIME hub.

Еще

Topic modelling, computer security discourse, infodemiology, political incidental news exposure, content analysis, rss feeds, cognitive-discursive linguistics

Короткий адрес: https://sciup.org/147238219

IDR: 147238219   |   DOI: 10.17072/2073-6681-2022-2-18-26

Список литературы Topic modelling in computer security discourse: a case study of whitepaper publications and news feeds

  • Budaev E. Metaphors of disease in the Russian press, XLinguae. 2021, vol. 10, issue 2, pp. 30-37. doi 10.18355/XL.2017.10.02.03. (In Russ.)
  • Chudinov A. P., Sergienko N. A., Glushak V. M. Good, Evil, Truth, Lie in Russian, Ukrainian, British, and American linguo-cultures: Results of a psy-cholinguistic experiment. Sibirskiy Filologicheskiy Zhurnal [The Siberian Journal of Philology], 2021, issue 2, pp. 297-311. doi 10.17223/18137083/75/21 (In Russ.)
  • Dancy-Scott N., Dutcher G. A., Keselman A., Hochstein C., Copty C., Ben-Senia D., Rajan S., Asencio M. G., Choi J. J. Trends in HIV terminology: Text mining and data visualization assessment of international AIDS conference abstracts over 25 years. JMIR Public Health and Surveillance, 2018, vol. 4, issue 5. doi 10.2196/PUBLICHEALTH.8552. (In Eng.)
  • Dewi A., Thiel K. Topic extraction: Optimizing the number of topics with the elbow method. KNIME, June 19, 2017. Available at: https://www.knime.com/blog/topic-extraction-opti-mizing-the-number-of-topics-with-the-elbow-met-hod (accessed 30 Apr 2022). (In Eng.)
  • Document Vector Node. KNIMETV, December 9, 2020. Available at: https://www.youtube. com/watch?v=kLlmCWnknhE (accessed 30 Apr 2022). (In Eng.)
  • Flores-Ruiz D., Elizondo-Salto A., Barroso-González M. d. l. O. Using social media in tourist sentiment analysis: A case study of Andalusia during the Covid-19 pandemic. Sustainability, 2021, vol. 13, issue 7 (3836), pp. 1-19. doi 10.3390/SU13073836. (In Eng.)
  • Ertek G., Kailas L. Analyzing a decade of wind turbine accident news with topic modeling. Sustainability, 2021, vol. 13, issue 12757, pp. 1-34. doi 10.3390/su132212757 (In Eng.)
  • Isaeva E., Baiburova O., Manzhula O. Anthropomorphism in computer security terminology through the prizm of smart cognitive framing. Science and Global Challenges of the 21st Century -Science and Technology. Perm Forum 2021. Lecture Notes in Networks and Systems. 2022, vol. 342, pp. 460-474. doi 10.1007/978-3-030-89477-1_46. (In Eng.)
  • Isaeva E. V. Metaphor in terminology: Finding tools for efficient professional communication. Fachsprache, 2019, vol. 41, special issue 1. doi 10.24989/fs.v41is1.1766. (In Eng.)
  • Isaeva E. V., Crawford R. Semantic framing of computer viruses: The study of semantic roles' distribution. Vestnik Permskogo universiteta. Ros-siyskaya i zarubezhnaya filologiya [Perm University
  • Herald. Russian and Foreign Philology], 2019, vol. 11, issue 1, pp. 5-13. doi 10.17072/2073-66812019-1-5-13. (In Eng.)
  • Gustafson N., Pera, M. S., Ng, YK. Generating fuzzy equivalence classes on RSS news articles for retrieving correlated information. In: Gervasi O., Murgante B., Lagana A., Taniar D., Mun Y., Gav-rilova M. L. (eds) Computational Science and Its Applications - ICCSA 2008. ICCSA 2008. Lecture Notes in Computer Science. 2008. Springer, Berlin, Heidelberg, vol. 5073, pp. 232-247. doi 10.1007/978-3-540-69848-7_20. (In Eng.)
  • Lee C., Lim C. From technological development to social advance: A review of Industry 4.0 through machine learning. Technological Forecasting and Social Change, 2021, vol. 167 (120653). doi 10.1016/J.TECHF0RE.2021. 120653. (In Eng.)
  • Liew T. M., Lee C. S. Examining the utility of social media in Covid-19 vaccination: Unsupervised learning of 672,133 twitter posts. JMIR Public Health and Surveillance, 2021, vol. 7, issue 11, pp. 1-19. doi 10.2196/29789. (In Eng.)
  • Liu Y., Zavarsky P., Malik Y. Non-linguistic features for cyberbullying detection on a social media platform using machine learning. In: Vaidya, J., Zhang, X., Li, J. (eds) Cyberspace Safety and Security. CSS 2019. Lecture Notes in Computer Science, vol. 11982. Springer, Cham, pp. 391-406. doi 10.1007/978-3-030-37337-5_31. (In Eng.)
  • Matthes J., Nanz A., Stubenvoll M., Heiss R. Processing news on social media. The political incidental news exposure model (PINE). Journalism, 2020, vol. 21, issue 8, pp. 1031-1048. doi: 10.1177/1464884920915371. (In Eng.)
  • Mukhametzyanova L. R., Mardieva L. A., Chud-inov A. P. The titles of newspapers and magazines as artifacts of the epoch. Journal of Research in Applied Linguistics, 2020, vol. 11, pp. 400-405. doi 10.22055/RALS.2020.16338. (In Eng.)
  • Photiou A., Nicolaides C., Dhillon P. S. Social status and novelty drove the spread of online information during the early stages of COVID-19. Scientific Reports, vol. 11, issue 1 (20098). doi 10.103 8/S41598-021-99060-Y. (In Eng.)
  • Sebestyen V., Domokos E., Abonyi J. Multilayer network based comparative document analysis (MUNCoDA). MethodsX, 2020, vol. 7, 100902. doi 10.1016/J.MEX.2020.100902. (In Eng.)
  • Wu Y. C. Multilingual news extraction via stop-word language model scoring. Journal of Intelligent Information Systems, 2017, vol. 48, issue 1, pp. 191-213. doi 10.1007/S10844-016-0395-6. (In Eng.)
Еще
Статья научная