Video Game Sales Prediction Based on Social Media Data Using Machine Learning: A Survey and Future Directions
Автор: Oleg Chertov, Valerii Buslaiev
Журнал: International Journal of Information Technology and Computer Science @ijitcs
Статья в выпуске: 4 Vol. 17, 2025 года.
Бесплатный доступ
The rapid growth of the video game industry and its reliance on digital distribution have created new opportunities for data-driven sales forecasting. Social media platforms serve as influential environments where consumer sentiment, trends, and discussions impact purchasing behaviors. This study examines the potential of using sentiment analysis of social media data to predict video game sales. While traditional sales forecasting models mainly depend on historical sales data and statistical techniques, sentiment analysis offers real-time insights into consumer interest and market demand. This paper reviews existing research on video game sales prediction, the application of sentiment analysis in the gaming industry, and sentiment-based forecasting models in other domains. The findings highlight a significant research gap in applying sentiment analysis to video game sales forecasting, despite its demonstrated efficacy in related fields. The study emphasizes the advantages and challenges of integrating sentiment analysis with traditional forecasting methods and proposes future research directions to enhance predictive accuracy.
Social Media Analytics, Sentiment Analysis, Video Games, Sales Prediction, NLP, Consumer Behavior, Predictive Analytics, Machine Learning
Короткий адрес: https://sciup.org/15019932
IDR: 15019932 | DOI: 10.5815/ijitcs.2025.04.05
Текст научной статьи Video Game Sales Prediction Based on Social Media Data Using Machine Learning: A Survey and Future Directions
The rapid growth of the video game industry has transformed it into a dominant sector of global entertainment, generating billions in annual revenue. Social media has emerged as a critical factor influencing consumer behavior in this domain, serving as a platform for marketing, fan engagement, and trend prediction. The synergy between these industries underscores the potential of leveraging social media data to forecast video game sales, offering valuable insights for developers, publishers, and marketers.
Despite advances in predictive analytics and machine learning, accurately forecasting video game sales remains challenging due to the volatile nature of consumer interests and the diversity of game genres. Existing studies have mainly focused on historical sales data, traditional marketing metrics, or specific content features of video games. However, the impact of real-time social media trends and sentiment analysis on sales performance remains underexplored. This gap highlights the need for robust, data-driven models that combine social media analytics with conventional forecasting techniques to improve prediction accuracy.
Over the past decade, sentiment analysis has emerged as a broad research field, enabling the understanding of customer opinions and enhancing decision-making across various domains. On social media, it helps researchers track customer sentiment in real time, mitigating potential reputational and financial damage caused by negative posts. It also plays a crucial role in crisis prevention by alerting PR specialists to negative content before it escalates. In market research, sentiment analysis provides insights into customer preferences, industry analysis, and emerging trends, offering a competitive edge. For customer service, it enhances response efficiency by prioritizing urgent issues and improving overall satisfaction. Additionally, it aids product experience evaluation by identifying customer concerns and areas for improvement, helping businesses refine their offerings.
Given the scale of the video game industry and its specifics, sentiment analysis appears a promising tool for sales forecasting. However, it remains unclear whether the research community has explored this application and, if so, to what extent. Alternatively, prior attempts may have found sentiment analysis ineffective for this purpose, a finding equally valuable to this inquiry. Thus, we propose two hypotheses: (1) the scientific community lacks high-quality developments in forecasting video game sales using sentiment analysis, or such developments are nonexistent; (2) sentiment analysis is a viable and potentially effective method for this purpose. This article aims to review recent approaches to video game sales forecasting and evaluate the role of sentiment analysis in this context. Despite numerous publications on both topics, few studies combine them. This survey summarizes key primary studies, identifies the challenges of applying sentiment analysis techniques to predict sales levels, discusses open issues, and proposes promising future research in this field.
The remainder of this paper is structured as follows. Section 2 discusses the research method employed in this survey and offers a description of key primary studies chosen for review. Section 3 examines the necessity and potential usefulness of utilizing the hybrid system of traditional prediction techniques and sentiment analysis of social media data. Section 4 presents the final considerations of this paper.
2. Research Methodology and Related Works 2.1. Research Methodology
Primary studies in this survey were recovered mostly from Scopus [1]. Firstly, due to its wide database of scientific papers with an opportunity to view the selected study at the publisher’s resource. Secondly, Scopus gives an opportunity to make a search not only through the article names list but also through their keywords using queries of search terms. This might prove very useful because it is a common situation when the article name cannot give a clue, that the particular study is connected to our research, but looking through keywords or abstract text proves so. Lastly, articles in journals indexed in Scopus are believed to be well-reviewed and contain valuable research for the scientific world.
For this article, we have chosen such basic keywords as ‘Social media analytics’, ‘Sentiment analysis’, ‘Video games’, ‘Sales prediction’, ‘NLP’, which stands for Natural Language Processing, and ‘Consumer behavior’, and those should have been the search criteria. However, in our survey, we will focus on three of them: ‘Video games’, ‘Sales prediction’, and ‘Sentiment analysis’. We narrowed our list because of such reasons: (i) ‘Social media analytics’ refers to the source of the data, not to the methods discussed so that it won’t be relevant as a search point through the survey process; (ii) ‘NLP’ is a narrower term than ‘Sentiment analysis’ because the last one uses natural language processing techniques as part of the process, so searching for the papers connected to sentiment analysis is enough; (iii) as well, ‘Consumer behavior’ is a very wide term, which includes sentiment analysis inside of self, so adding it to the search points won’t be helpful, as far as it won’t narrow the list of potential sources.
Using these three keywords, we searched Scopus in three phases: (1) individually, to gauge topic coverage; (2) in pairs, to explore intersections; and (3) collectively, to identify studies directly addressing our research focus. Results were analyzed based on article titles, keywords, and abstracts.
-
2.2. Single-term Search
Using the term ‘Video games’, we will explore whether this field is sufficiently covered. After searching through Scopus, we found 45,972 documents as of March 9, 2025, dating back to 1970. In the early years of video games, it was not a popular research subject, with only one paper published each year. However, starting in 2004, the number of documents on this topic began to grow rapidly, averaging three and a half thousand papers and articles per year over the last four years.
These papers span various fields, from medicine to education, so we will narrow our search to focus solely on computer science. This refinement yields 20,421 documents, with an average of sixteen hundred publications annually, which is roughly half of the total.
Next, we will examine the keyword 'Sentiment analysis'. This term provides 54,836 documents dating back to 1910. The number of annual publications has obviously increased in the past decade, correlated with developments in the field of artificial intelligence. Since 2012, the number of documents has risen from five hundred annually to an average of seven thousand four hundred over the last four years. Most of these publications are in computer science, totaling 37,453, with an annual average of five thousand.
Finally, we will search for the Scopus database using the keyword 'Sales prediction'. It is important to note a point about terminology: in scientific literature, two different terms with synonymous meanings—sales prediction and sales forecasting—often appear. While these terms are quite similar, the keyword list of each article typically includes only one, and very rarely both. Therefore, by searching one term and then the other, and combining the results, we can obtain a representative summary for this topic. Following this logic, in gathering data under our initial term, ‘Sales prediction’, we will also review the results for ‘Sales forecasting'. This approach will be applied to both singular and combined term searches.
Searching with the term 'Sales prediction' yields 8,876 documents dating back to 1936. On average, in the last four years, one thousand papers were published each year. Of these eight thousand eight hundred publications, 5,194 are in the field of computer science, averaging six hundred papers annually over the last years.
Conversely, using the keyword 'Sales forecasting' gives us 9,516 matching documents. In the last four years, eight hundred papers have been published each year on average. Among these, 4,890 publications are in the computer science field, with five hundred published annually over the last four years.
Overall, this results in 18,392 documents in total and 10,084 publications in the computer science field. Each year, an average of eighteen hundred papers are published across all fields, and twelve hundred specifically in computer science.
Summarized results of single-term searches may be seen in Table 1.
Table 1. Single-term search results from scopus
Keyword |
Total number of publications |
Annual average in the last four years |
Number of publications in the computer science field |
The annual average in the last four years in this field |
Video games |
45,972 |
3,661 |
20,421 |
1,626 |
Sentiment analysis |
54,836 |
7,424 |
37,453 |
5,018 |
Sales prediction |
18,392 |
1,836 |
10,084 |
1,264 |
Here, we shall make a note about the process of calculating the average number of publications annually. For this task, when we say ‘in the last four years’, we use a number of publications in the years 2021—2024. Documents published in 2025 are taken into account for counting the total number of publications but not for calculating the average. We do so because, as it was said earlier, Scopus search was utilized in March 2025. Thus, only two full months have passed since the start of the year, and the number of publications this year is much lower than in previous years and shouldn’t be taken into account.
Additionally, Scopus [1], as a search resource, gives functionality that analyzes the results of the search. Using this functionality, we gathered the statistics of the number of publications for each discussed keyword from 2010, as from the beginning of the previous decade, to 2024. Data for the year 2025 was not gathered for the same reasons as for calculating the average: our goal is to see the trend and general number of publications in each year, and the significant drop in 2025 may be misunderstood by the viewer, so this year was not taken into account. The gathered data for all three keywords was summarized in one graph, on which we may see how the number of documents published annually changed through the mentioned period. This graph may be seen in Fig. 1.

Fig.1. Number of documents in the computer science field with each keyword by year
From those results, all three initial themes are widely covered by scientific research, and we may proceed with our survey deeper into their intersections.
-
2.3. Paired-keywords Search
We have three keywords, which give us three pairs to examine. We will proceed in the following specific order:
-
• ‘Video games’ and ‘Sales prediction’ keyword pair: we will research the methods used for our initial task of predicting sales levels in the video game market.
-
• ‘Video games’ and ‘Sentiment analysis’ keyword pair: if we find that sentiment analysis is not applied to the sales prediction task in the video game market, or if we discover that this topic is not adequately covered, we will review publications where sentiment analysis is mentioned as a method related to video games, what it is used for, and how successful those efforts are.
-
• ‘Sentiment analysis’ and ‘Sales prediction’ keyword pair: finally, we will explore existing publications that utilize sentiment analysis techniques for the sales prediction task in alternative fields and assess whether this methodology can be applied to the video game market.
-
A. Sales Prediction in Video Games
In this subsection we are going to explore existing methods to predict video game sales.
Yang (2024) [2] chose the ARIMA model to forecast video game sales. The model is developed by determining the best parameters through statistical diagnostics, such as the Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF). The main advantages of ARIMA are its effectiveness for time series data and the interpretability of its parameters. However, there are also disadvantages; ARIMA is limited to linear patterns, struggles with complex relationships that are common in video game sales, and crucially, it does not consider external variables, requiring large amounts of stationary data.
Goel et al. (2010) [3] and Ruohonen with Hyrynsalmi (2017) [4] used the number of search queries to predict box office performance, the early sales of video games, and song rankings. This method is beneficial due to its high predictive power and enhanced model performance. However, the original papers do not provide complete information about the models used; they only briefly explain the linear predictive formulas, which makes them less valuable as a source, as they show only the results, not the process. Additionally, this method raises concerns about data quality, representativeness, and user privacy, since it relies on user search data.
Wu and Yang (2024) [5] aimed to predict the popularity of video games by employing a graphical neural network (GNN) model. Unlike the ARIMA model in [2], the primary advantage of the GNN is its effectiveness in capturing complex relationships among games. The GNN also integrates multiple data types (e.g., game features, user interactions), providing a holistic view of the factors that influence game popularity. However, unlike ARIMA, the GNN is poorly interpretable; the model functions as a “black box,” making it challenging to identify the specific factors that lead to a game's predicted popularity.
The studies by Marcoux and Selouani (2009) [6] and by Chen et al. (2021) [7] aim to improve sales forecasting accuracy in the video game industry by developing a hybrid data mining approach that integrates subspace decomposition methods with neural networks. The authors claim that this hybrid architecture enhances accuracy and demonstrates high adaptability. Although it requires substantial computational resources, comprehensive high-quality sales data, and it is not easily interpretable.
The study by Li et al. (2021) [8] aims to boost the accuracy of video game sales predictions by developing a hybrid feature selection method that combines the Pearson Correlation Coefficient (PCC) with Random Forest Feature Selection (RFFS). This approach seeks to identify the most relevant features influencing sales and applies various machine learning models to effectively forecast video game sales. The method selects a subset of data using Pearson’s correlation coefficient and performs feature selection through a random forest. These two subsets are then merged into a single training set for model learning. To evaluate the effectiveness of this hybrid approach, nine machine learning models—Adaboost, Catboost, Decision Tree, Extreme Learning Machine, Gradient Boosting Decision Tree, K-Nearest Neighbors, LightGBM, Random Forest, and XGBoost—were trained and tested. The main advantages of the method include enhanced feature selection, improved model performance, and versatility. However, it also entails high computational complexity, a potential overfitting problem, and while PCC and RFFS account for individual feature importance, they may not fully capture the interactions between features that could influence sales predictions.
Shelstad et al. (2020) [9] investigates the effectiveness of three user experience (UX) scales in predicting players' intentions to continue playing and to purchase video games. The scales examined are:
-
• Game User Experience Satisfaction Scale (GUESS-24);
-
• ENJOY (A scale assessing the enjoyment aspect of the gaming experience);
-
• User Experience Questionnaire - Short Version (UEQ-S).
The authors aim to determine which of these scales best predicts gameplay continuance and purchasing intentions across six popular online games by letting participants complete all three questionnaires after a playing session. The advantages of this methodology are its validated instruments and comprehensive evaluation. However, reliance on selfreported measures may introduce biases, such as social desirability or inaccurate recall, the participant pool may not fully represent the broader gaming population, potentially limiting the generalizability of the findings and the study's design captures data at a single point in time, which may not account for changes in player behavior over time. Heo and Park (2021) [10] do the same task but only use reviews published by gamers themselves.
Studies [11, 12, 13] are articles about the general state of current topic development in the scientific field, though, unlike our survey, they only focus on the question of predicting video game sales and on particular parts of that question. Abbasi et al. (2015) [11] propose a conceptual model to understand how engagement in video games influences customers' behavioral learning, with a particular focus on the mediating role of observational learning. Gray et al. (2024) [12] demonstrate the benefits of utilizing predictive analytics to forecast future sales performance in the video game industry. Kimura (2015) [13] aims to examine the respective effects of advertising, word of mouth, and serialization on sales of console game series in Japan.
As the conclusion to this subsection, we may say that the issue of forecasting video game sales has received appropriate attention from researchers. Yet, while they employ various methods to address it, the use of sentiment analysis in these efforts is quite limited.
-
B. Sentiment Analysis in Video Games
In this subsection we are going to explore existing ways of using sentiment analysis in the context of the video games market.
Miyake and Saga (2023) [14] analyze the factors that lead to high ratings of games by performing causal analysis on text data such as reviews and live streaming chat. By incorporating both user reviews and livestreaming chat comments, the study captures a wide range of user opinions and sentiments, and the findings can inform game developers about which features are most valued by users, guiding future game design and development. However, focusing on the Japanese market may limit the generalizability of the findings to other regions with different gaming cultures.
Abdul-Rahman et al. (2024) [15] are working on a similar problem as our own; they analyze Steam reviews using the Support Vector Machine (SVM) method for sentiment analysis, but instead of future sales, authors forecast the churn rate of the game. And though the developed Support Vector Machine model achieved an accuracy of 89%, with precision, recall, and F1 scores all around 84-85%, indicating robust predictive capabilities, findings based only on Steam reviews may not be directly applicable to other gaming platforms with different user demographics and behaviors.
In a study [16] by Sivakumar and Uyyala (2021), sentiment analysis is the initial task. The results of the working of the model are not used for a specific purpose, but the complexity of this model is the focus of the research. The proposed system was experimented on the Amazon cell phone review, the Amazon video games review, and the consumer reviews of the Amazon products benchmark datasets. And while the integration of LSTM networks with fuzzy logic enables the system to capture complex patterns in textual data, its applicability to other domains or types of reviews except those from the Amazon stores remains uncertain without further validation.
Studies [17, 18, 19] aim to develop models that generate ratings for video games by performing sentiment analysis on public opinion data extracted from microblogging platforms. These articles can be used for completing the same task in our initial problem and can give valuable insights about potential problems with gathering or preprocessing the data.
Li et al. (2024) [20] and Xia et al. (2023) [21] aim to develop a specialized sentiment analysis model capable of detecting sarcasm in video game reviews. This study might prove useful due to the stereotype that the gaming community is used to being toxic and sarcastic. Thus, it may appear a common situation when a negative message in social media may be hidden under false compliments, which may bring wrong data into the model training process. The insights from this article can give us hints about detecting sarcasm in the data used.
Another significant concern is the specialized vocabulary of gamers. As a result, standard sentiment analysis tools might struggle to accurately determine the mood of a message. This issue was examined by Thompson et al. (2017) in their article [22]. The study aims to adapt and enhance the lexicon-based sentiment analysis tool, SO-CAL (Sentiment Orientation CALculator), for use in the context of video game player communications. The researchers gathered instant messaging data from 1,000 games of StarCraft 2, capturing a wide variety of player interactions. This dataset was used to assess the performance of the enhanced SO-CAL model in classifying sentiment and identifying toxic behavior. However, the study is limited to StarCraft 2, a real-time strategy game. Therefore, the findings may not be directly applicable to other game genres or platforms without further adaptation and validation.
A similar task to [20, 21], yet not to detect sarcasm but aggression in gamers' speech, was developed by Stepanova et al. (2021) in [23]. The initial goal of the study is to address the issue of cyberbullying in gaming environments by identifying and mitigating aggressive behaviors as they occur. The researchers collected and processed a dataset comprising chat logs from live video gaming sessions. The main insight of this article that is useful for us is that detecting verbal aggression accurately requires understanding context, sarcasm, and cultural nuances, which can be challenging for NLP models and may lead to false positives or negatives.
In conclusion, 131 articles were published in this field of sentiment analysis in video games. However, only nine publications have a connection to the sales level, and none of them focus on forecasting its level. Thus, we may suggest that our study may discover new topics of research and may be useful in the future.
-
C. Sentiment Analysis for Sales Prediction
In this subsection, we finally will explore how the sentiment analysis methods are used for completing the sales prediction task.
Huang et al. (2019) [24] aims to enhance the accuracy of online sales predictions by developing a novel model that integrates sentiment analysis and topic modeling of online textual reviews. The authors propose the Dependency SCOR-Topic Sentiment (DSTS) model to analyze how the distribution of sentiment-laden topics within reviews influences sales performance. The researchers introduced the DSTS model, which combines sentiment analysis with topic modeling to assess the impact of sentiment-topic distributions on sales predictions. By incorporating both sentiment and topic information from reviews, the DSTS model provides a more nuanced understanding of factors influencing sales, leading to improved prediction accuracy.
The study [25] by Zhang et al. (2022) aims to improve the accuracy of intelligent vehicle sales predictions by integrating online public opinion and online search index data into predictive models. Specifically, the authors investigate the impact of Key Opinion Leaders (KOLs) on online public sentiment and its subsequent effect on sales performance. The main advantage of this study is that by differentiating between general consumer sentiment and KOL-driven opinions, it highlights the growing role of influencers in shaping consumer behavior. However, while it demonstrates strong predictive performance, challenges related to data reliability and model scalability must be addressed for broader applications.
A very similar study was conducted by Du et al. (2022) in [26]. The primary difference here lies in the methods used for sentiment analysis. While [25] utilizes a Long Short-Term Memory (LSTM) neural network model, integrating online public opinion sentiment and online search index data, [26] employs a combination of Random Forest Regression (RFR) and Seasonal Autoregressive Integrated Moving Average (SARIMA) models. The SARIMA model forecasts influencing factors such as the number of charging stations and gasoline prices, which are then input into the trained RFR model to predict monthly vehicle sales. In summary, while both studies leverage online data to improve NEV sales predictions, Du et al. (2022) [26] adopts a hybrid statistical and machine learning approach focusing on a wide range of influencing factors, whereas Zhang et al. (2022) [25] employs a deep learning model centered on online public opinion and search trends.
Souza et al. (2024) [27] aim to improve the accuracy of iron ore price forecasting by integrating market sentiment analysis with a Weighted Fuzzy Time Series (WFTS) model. This model integrates sentiment data from textual sources, such as news articles, with quantitative indicators like production volumes and pricing trends to improve prediction accuracy. The results demonstrated superior forecasting precision compared to traditional multivariate models. The main strength of the suggested model is its ability to capture complex, nonlinear factors influencing market trends. The proposed WFTS model demonstrated a significant reduction in forecasting errors compared to traditional time series models.
The study [28] by Li et al. (2012) aims to predict sales performance by developing a sentiment autoregressive model that integrates sentiment analysis from online reviews and other time-dependent factors affecting sales trends. This study is a very early article in terms of discussing the sentiment analysis topic, yet it is still useful for new attempts. By incorporating sentiment analysis, the model improves the accuracy of predicting demand fluctuations and provides valuable insights for marketing strategies, and pricing decisions.
Here it is worth making a remark about the specifics of the market we are studying. All the studies we have discussed so far in this subsection [24, 25, 26, 27, 28], were related to the products which can be described as ‘physical’. That means that the product is a real-life object, which brings numerous limitations to the process of its distribution: the number of copies of the product is limited and cannot be duplicated infinitely, problems of logistic appear, the product needs to be stocked somewhere while not sold, then it has to be delivered to the client.
All these problems are not crucial for the video game market, for the copies of the products sold are digital, they can be easily duplicated, sold in almost unlimited quantities, etc. For example, a significant part of [28] study is referred to the problem of inventory management, which is not actual for our topic. All these specifics of the topic researched should be taken into account in our future study.
For example, a similar problem is discussed in [29] by Zhang et al. (2011). The study investigates the relationship between collective public sentiment on Twitter and stock market performance. The authors aim to determine whether sentiment analysis of tweets can serve as a predictive indicator for stock market fluctuations. This problem is not really a sales prediction task, yet the specificity of the question is quite similar. The study aims to investigate how users’ messages in social media correlate with the market, which is similar to the correlation in clients’ engagement in buying the product. The findings suggest a statistically significant correlation between Twitter sentiment trends and stock market movements, with certain sentiment metrics showing predictive potential. However, the article acknowledges limitations in accuracy and highlights the need for further refinement of sentiment analysis techniques for financial forecasting.
Also, Arias et al. (2013) [30] studied similar but wider questions. The study explores the use of Twitter data for forecasting various types of events, such as product sales, stock market trends, and consumer behavior. The authors aim to understand how real-time sentiment and content from Twitter can be leveraged to improve prediction models. Specifically, in the article, the question of the box office for films using sentiment analysis is discussed, while the specifications of the movie market are similar to those of the video games, as we said above. This study demonstrates that Twitter data, when processed and analyzed correctly, can provide useful signals for forecasting various events. The combination of tweet volume and sentiment polarity showed a positive impact on prediction accuracy, though challenges related to data noise and sentiment interpretation remain.
In conclusion to this subsection, we can say that sentiment analysis techniques are commonly utilized for sales prediction tasks, typically in hybrid systems that combine them with traditional forecasting methods. However, as we observed in earlier subsections, this methodology has yet to be effectively employed for predicting video game sales, so our initial study may prove to be innovative and valuable.
2.4. Collective Keyword Search
Using all three keywords for the search in the Scopus database proves that the studies in our particular topic don’t exist, at least, they are not indexed by Scopus. Several results are shown, yet none of them are actually about forecasting the sales level of video games using sentiment analysis. However, studies [15] by Abdul-Rahman et al. (2024), [22] by Thompson et al. (2017) and similar to them are shown. These papers have already been discussed.
3. Discussion
In this paper, we have reviewed articles that investigated the aspects of our initial research topic, which is predicting video game sales based on sentiment analysis of the social media data. This problem contains three terms as keywords: ‘Sentiment analysis’, ‘Sales prediction’ and ‘Video games’. That gave us three vectors of investigation: (i) predicting video game sales; (ii) using sentiment analysis in the context of video game market; (iii) using sentiment analysis for predicting product sales.
All three initial topics that generate a list of keywords are extensively covered by published materials, with over a thousand publications on each annually, exclusively in the computer science field.
Video games are among the few markets where the product is entirely digital. While some percentage of video game sales still comes from physical copies sold as discs for consoles, the majority is sold through online markets. Thus, we aim to answer the main question raised in the introduction of this paper: is it possible to use sentiment analysis tools for predicting video game sales, and may these methods prove useful?
From examining published documents in the field of predicting video game sales, we have noted that this topic is widely discussed, with various approaches employed to address it. [2] utilizes the ARIMA model for this purpose. [3] and [4] apply statistical methods based on data concerning the number of search queries. In [5], graphical neural networks are implemented to forecast game sales. However, none of the research has applied natural language processing and sentiment analysis techniques in this context.
Nevertheless, sentiment analysis is widely used in papers related to video games. These methods generate ratings for games [17, 18], detect sarcasm [20] and aggression [23] among gamers, and enhance lexicon-based tools [22]. Even though social media serves as the primary platform for gamers to express their opinions about both released and upcoming games, none of these publications have utilized this data to gauge gamer interest in future titles and predict potential sales levels.
On the other hand, sentiment analysis is employed for sales prediction in various fields. Models are built to forecast vehicles [25, 26] and food stock [28] sales, movie box offices [30], and stock market fluctuations [29]. Most of these studies propose using hybrid models that combine traditional historical data of the investigated variables with real-time sentiment analysis results from social media, reflecting the current mood of prospective clients.
Particular attention should be given to studies that explore the application of social media message analysis in evaluating the electoral prospects of political figures, the dissemination of products by specific brands, and so forth. For example, studies [31] and [32] have shown that such evaluations closely correspond to the results of sociological polls concerning the French presidential elections and the electoral popularity of the President of Ukraine, respectively. These findings suggest that electoral assessments can, to some extent, be conceptualized as forecasting the 'sale' of a candidate's promises to voters. Consequently, the approaches for sentiment analysis [31] and user reaction analysis [32] in social networks, as presented in these studies, also hold significant potential for forecasting trends in the video game sales.
From all that is shown above, we can draw a conclusion that both our hypotheses were confirmed. First, even though using sentiment analysis for sales prediction in various markets is a common research problem, video game sales didn’t get much attention in this context. Secondly, using hybrid models that implement data gathered from social media might lead to the creation of an effective model.
The main way now to go further is to gather social media data, process it using sentiment analysis tools, and develop the hybrid model which combines traditional statistical approaches with real-time social media data.
4. Conclusions
This work presents a review of sentiment analysis approaches proposed to address the problem of predicting future sales and how this topic is explored in the video game market. This survey encompasses primary studies published from 2011 to the present. Papers focusing on the collection and preprocessing of social media data, as well as forecasting future sales levels, have been investigated and discussed in this work.
The primary contribution of this work is a survey of recent literature on this topic. While several other reviews have been published recently, they differ from this work in their overall investigative objectives or in the primary studies they encompass. The publications are organized by their initial tasks and examined for their differences, advantages, and disadvantages.
A second important contribution to this work is a discussion of the fundamental concepts surrounding this topic. It has been demonstrated that our main hypotheses formulated in the introduction have been confirmed by our survey. The discussion results presented in this paper are summarized in two main theses: (i) it is feasible to use sentiment analysis tools to predict video game sales; (ii) this approach has not yet been thoroughly researched in scientific publications. Furthermore, the future directions of the initial study have been outlined.