Construction of a Model for the Task of Reasoning Text Classification

Автор: Kanygin A.V.

Журнал: Математическая физика и компьютерное моделирование @mpcm-jvolsu

Рубрика: Моделирование, информатика и управление

Статья в выпуске: 1 т.28, 2025 года.

Бесплатный доступ

The article addresses the task of classifying texts for the presence of reasoning (logical links, argumentation, cause-and-effect relationships). The aim of the study is to develop a method that allows for highly accurate determination of the “reasoning” nature of a text fragment using modern machine learning algorithms. Particular attention is paid to an ensemble approach based on stacking: strong models (XGBoost, CatBoost, Random Forest, etc.) are considered as base classifiers, while logistic regression serves as the meta-model. To justify the choice of stacking, we present the results of a comparative analysis of more than ten popular algorithms (Logistic Regression, SVC, Random Forest, CatBoost, XGBoost, etc.) by Accuracy, Precision, Recall, F1-score, ROC AUC, and PR AUC. The main stages of the study include the generation and annotation of the training dataset, preliminary text processing (tokenization, lemmatization, stop-word removal), feature vectorization (TF-IDF), and experimental comparison of the models on a control sample. The proposed stacking model showed the best overall performance across all metrics, enabling us to increase the accuracy of reasoning text classification to F1 equal to 0.905 at ROC AUC equal to 0.887.

Еще

Machine learning, ensemble methods, stacking, TF-IDF, argumentation, text processing

Короткий адрес: https://sciup.org/149148926

IDR: 149148926 | УДК: 004.8 | DOI: 10.15688/mpcm.jvolsu.2025.1.3