Regression-based sentiment analysis model for predicting customer satisfaction

Бесплатный доступ

The paper focuses on the development of a model for the automated extraction of customer satisfaction information from textual inputs. As sentiment analysis has emerged as a pivotal tool for decision-making in the fields such as marketing, sociology, political science and others, it becomes particularly important in the context of the rapid expansion of textual information. Consequently, this promotes the growing interest in developing precise and scalable sentiment analysis methods, positioning it as a critical area in up-to-date natural language processing. The objective of this study, therefore, is to develop a sentiment analysis model to tackle the challenge of predicting customer satisfaction with medical institutions based on review texts. Specifically, this is achieved through a hybrid approach that integrates lexicon-based techniques and a machine learning methodology. The research material of the study is a corpus of reviews on private medical centers in Chelyabinsk, sourced from the 2GIS portal, and encompassing 100,000 word usages. Evaluative lexical units within this corpus have been labeled by sentiment tags - strongly negative, moderately negative, moderately positive, and strongly positive - using a domain-specific sentiment lexicon. In this paper, we proposea multiple linear regression model for predicting customer satisfaction, leveraging parameters defined as the proportions of units labeled by each sentiment within the text. The model has been developed and trained as a ridge regression with L2-regularization, employing cross-validation techniques. The model demonstrated high accuracy in forecasting user ratings of medical centers, achieving a mean squared error of 0.0226 and the coefficient of determination of 0.8182.

Еще

Sentiment analysis, customer satisfaction, hybrid approach, domain-specific sentiment lexicon, multiple linear regression, ridge regression, medical center, review

Короткий адрес: https://sciup.org/147246150

IDR: 147246150   |   DOI: 10.14529/ling240409

Статья научная