Smart Tool for Identifying Misinformation Spread Sources and Routes in Social Networks Based on NLP and Machine Learning

Автор: Victoria Vysotska, Sofiia Popp, Viktoriia Bulatova, Zhengbing Hu, Yuriy Ushenko, Dmytro Uhryn

Журнал: International Journal of Computer Network and Information Security @ijcnis

Статья в выпуске: 5 vol.17, 2025 года.

Бесплатный доступ

This article presents a method for detecting disinformation in news texts based on a combination of classic machine learning algorithms and deep learning models. The proposed approach was tested on the corpus of Ukrainian- and English-language news with the "fake/truth" classes marked. Before modelling, detailed data pre-processing was performed: deletion of duplicates, cleaning of HTML tags, links and special characters, normalisation of texts, unification of labels, class balancing, and tokenisation. A hybrid approach was used for vectorisation: frequency features (TF-IDF) were combined with contextual vector representations based on the IBM Granite multilingual model. Logistic regression is chosen as a classifier, which allows a balance to be achieved between quality and interpretation of results. Standard metrics are used to assess performance, such as Accuracy, Precision, Recall, F1-score, and ROC-AUC. According to the results of experiments, the model showed an Accuracy in the range of 0.91–0.93, a Precision of 0.89, a Recall of 0.92, an F1-score of 0.90, as well as an ROC-AUC over 0.94. The obtained values demonstrate the balanced ability of the system not only to accurately classify news, but also to minimise false positives, which is especially important in the conditions of information warfare. Priority is given to Recall's high scores, as the omission of fake messages can have critical consequences for information security. Thus, the proposed approach makes a scientific contribution to the field of automated disinformation detection by combining transparent and reproducible data processing with a hybrid text representation. The uniqueness of the study lies in the adaptation of NLP and machine learning methods to the Ukrainian-language information space and the context of modern hybrid warfare, which allows you to effectively identify the sources and routes of spreading fake news.

Еще

Fake News, Machine Learning, Ukrainian-language Texts, Telegram, TF-IDF, Contextual Embeddings, IBM Granite, Logistic Regression, NLP, Disinformation, Text Classification, Information Security

Короткий адрес: https://sciup.org/15020000

IDR: 15020000   |   DOI: 10.5815/ijcnis.2025.05.08