Научные статьи \ Общие вопросы науки и культуры \ Информационные технологии. Вычислительная техника. Обработка данных \ Искусственный интеллект

Multimodal stock price prediction: a case study of the Russian securities market

Автор: Khubiyev K.U., Semenov M.E.

Журнал: Программные системы: теория и приложения @programmnye-sistemy

Рубрика: Искусственный интеллект и машинное обучение

Статья в выпуске: 1 (64) т.16, 2025 года.

Бесплатный доступ

Classical asset price forecasting methods primarily rely on numerical data, such as price time series, trading volumes, limit order book data, and technical analysis indicators. However, the news flow plays a significant role in price formation, making the development of multimodal approaches that combine textual and numerical data for improved prediction accuracy highly relevant. This paper addresses the problem of forecasting financial asset prices using the multimodal approach that combines candlestick time series and textual news flow data. A unique dataset was collected for the study, which includes time series for 176 Russian stocks traded on the Moscow Exchange and $79,555$ financial news articles in Russian. For processing textual data, pre-trained models RuBERT and Vikhr-Qwen2.5-0.5b-Instruct (a large language model) were used, while time series and vectorized text data were processed using an LSTM recurrent neural network. The experiments compared models based on a single modality (time series only) and two modalities, as well as various methods for aggregating text vector representations. Prediction quality was estimated using two key metrics: Accuracy (direction of price movement prediction: up or down) and Mean Absolute Percentage Error (MAPE), which measures the deviation of the predicted price from the true price. The experiments showed that incorporating textual modality reduced the MAPE value by 55%. The resulting multimodal dataset holds value for the further adaptation of language models in the financial sector. Future research directions include optimizing textual modality parameters, such as the time window, sentiment, and chronological order of news messages.

Еще

Multimodal forecasting, quantitative fiance, machine learning

Короткий адрес: https://sciup.org/143184154

IDR: 143184154 | УДК: 004.832:336.761 | DOI: 10.25209/2079-3316-2025-16-1-83-130

Текст научной статьи Multimodal stock price prediction: a case study of the Russian securities market

Building a price forecast for an asset is a crucial task for financial market participants, as it enables strategic planning, optimal investment portfolio management, and risk assessment. Numerous attempts have been made to apply machine learning methods to construct such forecasts [1 -3] .

With the growing popularity of deep learning models, researchers have shifted their focus toward the application of neural networks. At the same time, the problem of accurately accounting for the news flow as a key factor influencing market behavior is being reconsidered with the rapid development of generative artificial intelligence models and large language models (LLMs) such as ChatGPT, FinGPT, GigaChat, LLama, and others. In financial economics, LLMs are still rarely used, and their full potential remains untapped.

Researchers are exploring the use of natural language processing models to enhance the accuracy of asset price forecasts and investment portfolio management strategies.

The study [4] describes the use of sentiment analysis of news as an additional parameter. The authors employed the FinBert model, trained on financial data, to assess the sentiment of news articles as positive, negative, or neutral. The study utilized time series data from candlestick charts of the U. S. stock market index, Standard & Poor’s 500 (S&P 500). A machine learning model — random forest —was used for price prediction. The study concluded that incorporating sentiment analysis of news flow improves prediction accuracy.

In the study [5] , the authors aimed to develop a multimodal artificial intelligence model capable of providing well-founded and accurate forecasts for time series data. They implemented a model that generates predictions of an asset’s monthly or weekly returns, accompanied by a textual explanation from a language model based on the user’s input query.

The study [6] proposed an approach for fine-tuning instructions to interpret numerical values and contextualize financial data.

Kulikova et al. [7] examined the effect of classifying news into thematic groups. The authors demonstrated that, in most cases, it is advisable to use a single thematic group of news for the deep learning models considered (Temporal Convolutional Network, D-Linear, Transformer, and Temporal Fusion Transformer). They also determined the probabilities of forecast improvement for the 20 thematic groups analyzed.

In all the aforementioned studies, the models were implemented using a multimodal approach for the U. S. stock market, with English as the modality language. Notably, the news flow was not integrated directly into the predictor’s input vector but rather through a preprocessing block in the form of an additional parameter, such as sentiment analysis, news frequency related to the asset, or news classification, etc.

The ob jective of the current study is to demonstrate the advantages of a new multimodal method over predictions based solely on numerical data and to present a Russian-language financial news dataset.

To achieve this objective, we formulated the following key tasks:

( 1 ) Construct a multimodal dataset consisting of time series data and news articles.
( 2 ) Develop a predictive model capable of utilizing one or two modalities.
( 3 ) Train the predictive model and analyze the values of accuracy functions and metrics, specifically Accuracy and MAPE.

In this study, we propose a new multimodal approach for integrating news flow into time series numerical data. The text of the news articles is converted into a vector representation and fed into the model alongside the time series vector.

Our hypothesis is that the multimodal approach will enable predictive models to extract semantic information from the text, thereby improving the accuracy of asset price forecasts.

1. Data Collection and Structuring

Multimodality implies the use of more than one data modality, which affects both the data structure and the logic of predictive model development. We utilize two types of modalities:

numerical — time series of stock prices, textual — news streams.

To train the predictive model and analyze its performance, we collected an original dataset.

The time series, represented as candlestick data with open, close, high, and low prices, were obtained through the Algopack API of the Moscow Exchange (MOEX). For the numerical experiment, we selected stock time series data spanning from July 7, 2022, to August 30, 2024, covering 176

Table 1. Statistical features of the dataset after tokenization, RuBert

Source Mean Std Min Max Q25 Q50 Q75 RDV 134 88 8 512 65 123 187 Finam 221 135 18 512 116 178 284 BCS Express 20 10 4 82 13 17 26 BCS Technical Analysis 502 37 29 512 512 512 512 RBC 43 7 16 75 39 44 48 SmartLab 21 8 5 82 15 19 25 companies. During this period, the Russian stock market experienced phases of rapid growth and decline, with the IMOEX index rising from 2,213.81 to 2,650.32 points (+19, 72%).

We collected 79, 555 news articles from various sources, including the online publication “RBC” (1,823 articles), “BCS Express” (11,331), and “BCS Technical Analysis” (9,670), the investment company website “Finam” (20,647), the trader community website «SmartLab.ru» (30,857), as well as the Telegram channel “RDV” (5,227).

Several factors justify the selection of these sources. First, they provide news coverage for the required time period. Second, the institutional differences between sources, along with variations in writing style and levels of expertise, contribute to a more objective representation of events related to the analyzed time series.

News messages were tokenized using two models: RuBERT [8] and Vikhr-Qwen2.5-0.5b-Instruct [9] (further as Qwen). In the context of tokenized text, a word refers to a token — an element of the vector space represented as an index in the tokenizer’s vocabulary.

Descriptive statistics of the dataset (in tokens), including mean, standard deviation, minimum, maximum word count, and quartiles, are presented in Tables 1 and 2 . It is important to note that tokenization can increase the word count in a text, for example, by splitting words into smaller components.

Table 3 provides examples of how a phrase changes after tokenization. For instance, the word «открывает» is split into three subcomponents: «от», « ## к», and « ## рывает», where the “ ## ” prefix indicates that the token is a continuation of the previous token.

Table 2. Statistical features of the dataset after tokenization, Qwen

Source	Mean	Std	Min	Max	Q25	Q50	Q75
RDV	215	157	3	1324	92	187	304
Finam	453	405	35	5732	211	319	501
BCS Express	36	19	5	163	23	32	47
BCS Technical Analysis	1493	310	40	2221	1448	1545	1665
RBC	75	12	28	105	68	77	83
SmartLab	33	12	7	120	25	31	39

Table 3. Original and tokenized texts examples

Original text	Tokenized text
Доллар снова ниже 69 рублей	До ## лла ## р снова ниже 69 рублей
Москвич банкрот?	Москви ## ч банк ## рот ?
НПО Наука Отчет РСБУ	Н ## П, ## О Наука От ## чет Р ## С ## Б ## У
T-банк это желтый банк	T - банк это же ## лт ## ый банк

News articles characteristics On the “BCS Technical Analysis” platform, news articles tend to be lengthy, which imposes limitations on tokenizers. Specifically, as shown in Table 1 and Table 2 , the RuBERT model truncates the tokenized vector for longer texts. Additionally, the average length of tokenized text using the Qwen model exceeds that of RuBERT, indicating that Qwen has a broader vocabulary and a stronger text decomposition capability.

Furthermore, we collected data on 176 companies, forming a dataset consisting of tuples in the format:

(ticker, company name, company activity description).

Such data are essential in our case for:

(a) extracting keywords from company descriptions,
(b) improving the language model’s ability to link events described in news articles to specific companies and assess the impact of news on price dynamics.

Table 4. Examples of news articles (header snippet) and assigned tags

Source	Article fragment (heading)	Tags
RDV	Сегежа (SGZH): таргет 16.2 руб., апсайд +102...	SGZH
RDV	Артген биотех (ABIO) завершил доклинические...	аналитика, ABIO
Finam	Индекс МосБиржи восстанавливает позиции и приб...	ФосАгро, ВСМПО-АВСМ, CNYRUB
Finam	«Ашинский метзавод» назвал АО "Урал-ВК" своим ...	АшинскийМЗ
BCS Express	«Восходящее окно»: в каких бумагах замечен это...	Селигдар SELG, ЕвроТранс EUTR
BCS Express	«Сила Сибири» выйдет на максимальную мощность...	Газпром GAZP
BCS Technical Analysis	Мечел. Что ждать от бумаг на следующей неделе	Мечел
BCS Technical Analysis	На предыдущей торговой сессии акции Норникеля ...	ГМК Норникель

The dataset of news articles includes the following parameters: publication date, source, title, article body, and tags (keywords). For sources such as “RDV” and “SmartLab”, article titles are absent, and the corresponding fields are filled with a label: no title .

In our case, tags may include the full or abbreviated company name along with the corresponding ticker, the name of the market sector, and similar information. Tags in news articles were assigned by the article authors.

For the “RDV” source, tags were marked by authors in the form of hashtags (e. g. # цифры, # аналитика). In “BCS Express” and “BCS Technical Analysis”, tags were specified in dedicated fields at the beginning or end of the news article (e. g. PhoseAgro, Russian market) and were extracted from the HTML code of the page using the corresponding HTML tags. When tags were absent (“RBC”, “SmartLab”), the parameter in the dataset remained empty.

Table 4 provides examples of news articles (headline fragments) along with their assigned tags.

To validate our hypothesis regarding the advantages of the multimodal approach, we have planned a series of experiments.

The first series of experiments focused on predicting prices using only numerical time series of candlestick characteristics (close, open, high, and low prices). The quality metrics obtained from this experiment serve as baseline values against which improvements in price prediction accuracy using the proposed multimodal approach will be evaluated.

The second series of experiments aims to generate predictions and compute accuracy metrics (Accuracy, MAPE) using the multimodal approach while exploring different aggregation methods (Sum, Mean) for the vectorized news stream.

2.1. The Single-Modality Approach

We first conducted a series of experiments on asset price prediction using only time series data. For this, we applied classical machine learning models to the daily price values (close, open, high, low), including linear regression (LinReg), k-nearest neighbors (KNN), decision tree (DT), random forest (RF), and the boosting algorithm XGBoost (XGB). Among deep learning models, we utilized a long short-term memory recurrent neural network (LSTM).

Conceptually, the experiment consists of two tasks:

( a ) predicting the price movement direction (increase or decrease), which

is a binary classification task;

( b ) predicting the actual price, which is a regression task.

At this stage of the experiment, 176 companies were grouped into 23 industry sectors. We randomly selected 9 economic sectors and, within each sector, randomly chose two companies. Table 5 lists the selected sectors and companies (tickers) that participated in the computational experiment.

Table 7 provides statistical data on the closing price time series of the selected assets. Table 6 shows the distribution of news by companies after filtering. The correlation heat map of the closing price time series is shown in Figure 1 . An interesting feature of the examined period is that the market underwent two phase shifts — from a general price decline to growth and back again — as indicated by the vertical lines in Figure 2 .

Table 5. Economic sectors and companies (tickers) included into the dataset

Sector	Company (ticker)
Metal and Mining	Mechel (MLTR), TMK-Group (TRMK)
Oil and Gas	Surgutneftegas (SNGS), Gaspromneft (SIBN)
Consumer sector	Magnit (MGNT), Lenta (LENT)
Construction	PIK (PIKK), Samolet (SMLT)
Telecommunications	MTS (MTSS), Rostelecom (RTKMP)
Transport	AEROFLOT (AFLT), Sovcomflot (FLOT)
Finance	Bank Saint-Petersburg (BSPB), SFI (SFIN)
Chemical Industry	Phosagro (PHOR), Kazanorgsintez (KZOSP)
Power Engineering	Rushydro (HYDR), Rosseti Center (MRKC)

Table 6. Descriptive characteristics for company shares

Company (ticker)	Number of news items
Mechel (MLTR)	4258
Trubnaya Metallurgical Company (TRMK)	11739
Surgutneftegaz (SNGS)	12674
Gazpromneft (SIBN)	11421
Magnit (MGNT)	1236
Lenta (LENT)	311
PIK (PIKK)	897
Samolet (SMLT)	3392
MTS (MTSS)	1101
Rostelecom (RTKMP)	628
Aeroflot (AFLT)	1429
Sovcomflot (FLOT)	14476
Saint Petersburg Exchange (BSPB)	14278
SFAI (SFIN)	1647
PhosAgro (PHOR)	2773
Kazanorgsintez (KZOSP)	168
RusHydro (HYDR)	1921
MRSK Center (MRKC)	1576

Table 7. Descriptive characteristics for company shares

Ticker	Mean	Std	Min	Max	Q25	Q50	Q75
MTLR	191.8245	72.5652	81.2800	332.8800	123.8500	187.6700	251.6400
TRMK	153.1245	64.9362	55.8200	271.0000	87.1400	166.4200	218.7800
SNGS	27.0104	4.0119	17.3500	36.9600	23.7750	27.3300	30.0250
SIBN	601.5097	163.9205	335.5500	934.2500	452.0500	582.6500	748.9000
MGNT	5691.6429	1161.7684	4040.0000	8444.0000	4665.0000	5495.0000	6375.0000
LENT	814.3870	154.9502	650.0000	1263.0000	716.5000	749.0000	843.5000
PIKK	732.6617	94.8650	518.0000	955.5000	656.7000	732.9000	811.5000
SMLT	3120.8996	594.1018	1926.5000	4145.5000	2572.0000	3045.0000	3713.0000
MTSS	264.5382	32.0791	183.0000	346.9500	239.0000	266.2500	289.7500
RTKMP	68.1797	9.2753	52.2500	92.1000	60.4500	68.0000	74.7000
AFLT	38.1316	10.3131	22.4400	64.4000	27.9700	38.8800	44.1200
FLOT	88.0111	39.5834	29.9200	149.3000	42.1000	97.2000	124.1800
BSPB	211.1501	101.2533	67.5700	387.6800	100.8400	210.9900	295.3400
SFIN	762.9939	428.5679	425.8000	1975.0000	497.4000	518.0000	992.0000
PHOR	6774.6040	618.1977	4997.0000	8153.0000	6416.0000	6763.0000	7278.0000
KZOSP	25.8603	5.2029	15.3500	40.5700	21.9400	27.0700	29.8500
HYDR	0.7697	0.0810	0.5178	1.0278	0.7318	0.7721	0.8210
MRKS	0.5247	0.2382	0.2025	1.0745	0.2735	0.5550	0.7475

Figure 1. The correlations heatmap for 18 assets (close price)

Stock shares time-series graph

Normalized stocks prices

1.0

0.8

0.6

0.4

0.2

0.0

2022-07-01 2022-00-12 2022-09-23 2022-11-07 2022-12-19 2023-01-31 2023-03-16 2023-04-27 2023-06-13 2023-07-25 2023-09-05 2023-10-17 2023-11-20 2024-01-11 2024-02-22 2024-04-00 2024-05-21 2024-07-03 2024-00-14

Date

Figure 2. Normalized close prices of assets. Market phase transition dates denoted by vertical dashed lines

92 Kasymkhan U. Khubiyev, Mikhail E. Semenov EN RU

Figure 3. Pipeline for a single and dual modalities models

Input the vector

Instrument return prediction

0.067

Convert into price

₽

To evaluate prediction quality in the classification task, we used the Accuracy metric, while for regression, we employed MAPE (Mean Absolute Percentage Error). The choice of these metrics is justified by the nature of the tasks. In classification, the model must accurately predict the price movement direction either an increase (denoted by «+») or a decrease (denoted by « - »). The MAPE metric is best suited for assessing regression quality within the financial domain: it represents the average deviation from the asset’s actual price in percentage terms, making it easily interpretable in monetary value.

Figure 3 illustrates the model development process for utilizing one and two modalities.

As the input parameter, the model received a return vector of the asset, calculated based on the closing price (close) over the previous five trading sessions:

Returned +1) = ^seW+U close(d)

- 1.

The model’s output was a prediction for the next trading session.

To assess the accuracy of predicting the price movement direction, the predicted class was determined by the sign (±) of the forecasted return value, as the return of an asset represents the relative rate of change. Thus, a positive return indicates a price increase, while a negative return signifies a decline. To evaluate the quality of the asset price forecast, the predicted return vector was converted into price (in Russian rubles):

(2) price(d + 1) = (Returned + 1) + 1) • price(d).
2.2. The Dual-Modality Approach

The pointwise predicted price vector, obtained through transformation, was compared to the historical price vector of assets using the MAPE metric.

The choice of return (rather than price) as the target variable for the predictive model is justified by the fact that when prices exceed historical highs (or fall below historical lows) during market growth (or decline), the applicability of traditional methods becomes limited.

Based on this reasoning, candlestick characteristics (close, open, high, and low prices) were considered in the form of relative price changes , calculated using a formula similar to (1) .

Next, a rolling window of five trading days was applied to the relative price changes to form a vector-row, which was then fed into the predictive model. As a result, the model receives a vector of 20 parameters as input and predicts a single output value — the return of the instrument at the end of the next trading session.

For the experiment involving news flow, we selected news articles relevant to the analyzed assets based on keyword matching (Table 5) . The keywords were chosen as the top 30 words extracted using the TF-IDF method. This method determines the importance of words in a text by considering their frequency of occurrence and uniqueness across the entire corpus. An example of keywords extracted using TF-IDF is presented in Table 8.

After obtaining the list of keywords using the TF-IDF method, we further expanded it with the help of the ChatGPT-4o model. This allowed us to increase keyword variability through permutations, letter substitutions, and modifications of word endings (Table 9) . The selected news articles for each company (ticker) were converted into vectors and filtered to remove duplicates.

Table 8. Keywords by companies extracted from their descriptions

Ticker	Keywords
MTLR	mechel, mining, ore, raw materials, energy, ferroalloys, coal
SNGS	gas, geological exploration, oil, Surgutneftegas, petroleum products, electricity, drilling
SMLT	rent, development, developer, real estate, construction, Moscow region, residential areas
MTSS	subscriber, automation, internet, mobile communications, provider, communications
BSPB	bank, deposit, dividends, financial services, kaliningrad, spbank, saint-petersburg

Table 9. Complementary generated keywords

Ticker	Keywords
MTLR	мечел, метчел, мечал, mechel, Mchel, ферросплавы, фурросплав
SNGS	сургутнефтегаз, surgutneftegaz, surgut, сурнефтегаз, сургаз, cургут, сур-нфтгз
SMLT	самолет, smlt, samolet, samalet, Самлет
RTKMP	ростелеком, телеком, rostelecom, telecom, rtkm, ртк, r-telecom, растелком
HYDR	русгидро, rushydro, rshydro, r-gidro, гидрорус, гидра, русгидра

Figure 4 presents a distribution chart of the news articles for the companies after filtration.

As a vectorizer for the Russian language news stream, we employed two models: RuBERT [8] and Qwen [9] .

While working with the news stream, we encountered two main challenges. The first challenge is the problem of news rewriting, which necessitates filtering out duplicate articles. To ensure that our model accounts for each news article only once, it is essential to implement a duplicate identification algorithm.

The second challenge is to determinate an asset on which is affected the news article. This problem can be framed as a classification task, where

Company Ticker

Figure 4. The distribution of news articles by company after filtration (Numbers on the diagram show percentage of news about the company in the dataset)

tickers serve as class labels.

To address the issue of news rewriting, we designed a Siamese neural network. We constructed a training dataset using the GigaChat API as follows: for each article, three paraphrased versions of both the title and body were generated. Then, pairs were randomly formed in equal proportion from the original and paraphrased news articles and their titles.

The Siamese neural network was designed as follows: a pair of news articles is fed as input, and vector representations of the articles are extracted using the RuBERT model [8] . The two vectors are then concatenated, and the resulting vector is passed through a fully connected neural network (MLP). To determine the optimal depth of the MLP model, we conducted a series of experiments, evaluating both prediction accuracy and news stream processing time. Based on the results, we selected the MLP architecture with three layers.

The filtered news articles are then converted into vectors so that duplicate classification can be performed in a one-shot mode when new articles arrive. This approach reduces both the processing time of the news stream and the computational resources required (in our case, a GPU V100).

To address the second challenge — matching news article samples by date and utilizing them for price forecasting — it is essential to formalize the data selection and prediction process. We assume that the closing price prediction for an asset is made for each trading day at the market opening. In this case, only news articles published before the start of the current trading day are included in the dataset.

The dataset is formed by grouping news articles based on their publication date. For predicting the price on a given day, only articles published on the previous trading day are used. For example, analytical articles such as those under the “Technical Analysis” section from the “BCS” source, which are published daily before the market opens, are included in the dataset for forecasting the prices of assets analyzed in those reports. This approach ensures that the most relevant information is considered, thereby improving prediction accuracy.

For the dual-modality approach, training sequences were formed by concatenating price return vectors from the previous five days with news stream vectors. The relative price return vectors were constructed similarly to the single-modality experiment, while news articles were selected from the previous trading day based on the chosen asset. These news articles were then transformed into vectors and aggregated.

If no publications were available on the previous day or before the market opened on the current day, a zero vector was concatenated with the relative price return vector of length 768 for the RuBERT model and 896 for the Vikhr-Qwen2.5-0.5b-Instruct (Qwen) model. Otherwise, the aggregated news vector of the same length was appended. These final vector lengths correspond to the output sizes of the pretrained RuBERT and Qwen models.

In this study, we explored two approaches for aggregating news vectors: vector summation (Sum) and averaged summation (Mean). By vector summation, we mean summing the values of corresponding vector coordinates. In the averaged summation approach, each coordinate of the aggregated vector is assigned the arithmetic mean of the corresponding coordinates across all aggregated vectors.

The baseline RuBERT model has a limited context window of 512 tokens. As a result, articles exceeding this limit were either truncated or split for separate processing, meaning that a single news article could correspond to multiple vectors. In contrast, the Qwen model has a significantly larger context window of 32,768 tokens (64 times larger), allowing it to process entire articles without truncation. Next, we compare how different news vectorization methods impact the accuracy of price predictions.

The pointwise predicted return vectors were converted into asset prices using equation (2) . The prediction quality was evaluated using two metrics: Accuracy and Mean Absolute Percentage Error (MAPE). Accuracy was measured as the proportion of correctly predicted signs of the return vector elements—either positive or negative. The MAPE metric indicates the average percentage deviation of the predicted price from the actual value. This allows us to assess the prediction quality not only in relative terms but also in absolute monetary units (rubles).

3. Computational experiment

In this section, we present the results of computational experiments for two predictive models (single- and dual-modalities). The predictive model was developed using the Transformers framework from the Hugging Face platform. All computations were performed on an NVIDIA V100 GPU.

3.1. The Single-Modality Approach Performance

The results of the experiment on predicting return vectors using only time series data for classical and deep learning models are presented in a Table 10.

Table 11 provides the averaged prediction quality metrics for all models, sorted in ascending order of the mean absolute percentage error (MAPE) (column “Deviation”).

From the experiment results, it is evident that the recurrent model LSTM achieves the best classification performance (predicting upward or downward trends) and regression accuracy (smallest deviation of the predicted price from the actual price). However, it lags slightly in terms of the mean absolute error metric.

Table 10. Results of forecasting return vectors using only time series. Accuracy (left) and deviation (right) in percent

	Source	LSTM		XGB		KNN		RF		LinReg		DT
Metals and	MTLR	56.364	0.410	40.000	2.089	42.273	2.050	50.909	2.020	50.000	2.029	42.727	2.679
Mining	TRMK	56.364	0.362	40.909	2.105	38.182	2.167	47.273	2.154	49.091	2.114	52.727	2.308
Oil and Gas	SNGS	50.303	0.352	49.091	1.776	48.182	1.775	50.000	1.735	60.909	1.744	52.727	1.857
Oil and Gas	SIBN	58.182	0.341	40.000	1.766	58.182	1.746	46.364	1.788	41.818	1.839	51.818	1.813
	SIBN	58.182	0.341	40.000	1.766	58.182	1.746	46.364	1.788	41.818	1.839	51.818	1.813
Consumer	MGNT	46.667	0.331	39.091	1.517	43.636	1.493	49.091	1.519	40.000	1.709	60.000	1.672
Sector	LENT	56.364	0.371	54.546	2.202	39.091	2.178	52.723	2.145	51.818	2.220	51.818	2.589
Construction	PIKK	49.091	0.484	40.909	1.565	50.909	1.563	50.000	1.558	44.545	1.637	51.818	1.592
Construction	SMLT	53.939	0.328	42.727	1.577	38.182	1.552	46.364	1.539	49.091	1.536	41.818	1.683
	SMLT	53.939	0.328	42.727	1.577	38.182	1.552	46.364	1.539	49.091	1.536	41.818	1.683
Telecommuni-	MTSS	56.970	0.541	42.727	1.290	40.000	1.306	45.455	1.520	53.636	1.419	50.000	1.395
cations	RTKMP	55.152	0.246	45.455	1.299	42.723	1.303	42.727	1.335	50.909	1.355	48.182	1.411
Transport	AFLT	55.152	0.419	46.364	2.079	57.273	2.017	52.727	2.062	60.909	1.976	51.818	2.194
Transport	FLOT	47.273	0.258	43.637	2.116	38.182	2.124	42.727	2.104	45.454	2.074	49.091	2.294
Finance	BSPB	46.061	0.410	49.091	1.612	50.909	1.695	50.909	1.598	54.545	1.602	45.455	1.829
Finance	SFIN	49.697	0.447	40.000	1.603	30.909	1.647	39.091	1.743	48.182	1.960	41.818	1.959
	SFIN	49.697	0.447	40.000	1.603	30.909	1.647	39.091	1.743	48.182	1.960	41.818	1.959
Chemical	PHOR	41.818	0.231	42.727	1.194	52.723	1.149	48.182	1.168	50.000	1.227	45.455	1.218
Industry	KZOSP	57.576	0.458	49.091	1.198	42.723	1.237	49.091	1.210	46.364	1.217	54.545	1.581
Power	HYDR	59.394	0.380	51.182	1.124	60.000	1.130	48.182	1.214	45.455	1.151	49.091	1.355
Engineering	MRKC	40.000	0.768	51.182	1.182	49.091	1.225	54.545	1.214	50.000	1.224	55.455	1.403

Multimodal prediction 99

Table 11. The Single-Modality approach forecast (time-series) inference metrics: Accuracy and MAPE in percentage

Model	Accuracy, %	MAPE, %
LSTM	52.020	0.397
XGB	45.000	1.627
KNN	46.010	1.631
RF	48.384	1.646
LinReg	50.152	1.669
DT	49.798	1.824

3.2. The Dual-Modality Approach Performance

The results of the second experiment, which involved merging the news stream with numerical time series data and comparing the proposed multimodal approach with a forecast based solely on candlestick time series, are presented in the Table 12 .

The Table 13 provides the averaged prediction quality metrics for the considered models. The data in this table is sorted by the “Deviation” column in ascending order, reflecting the mean absolute percentage error (MAPE) of the predicted price deviations.

In this second experiment, the LSTM neural network was chosen as the baseline model. We compared different vectorization methods (RuBert, Qwen) and aggregation techniques (Sum, Mean) to evaluate their impact on prediction performance.

Figure 5 shows the dependence of the mean squared error (MSE Loss) function values on the number of training iterations for different models, based on the training set (from July 7, 2022, to March 27, 2024) and the test set (from March 28 to August 30, 2024). The graph indicates that after 30 training epochs, the curves reach a stationary value.

Table 12. The Dual-Modality returns vector forecasting metrics. Accuracy (the upper row), MAPE (the lower row) in percentage

	Source	vanilla LSTM		LSTM_RuBert	_SUM	LSTM_RuBert_	MEAN	LSTM_QWEN	_SUM	LSTM_QWEN_	MEAN
Metals and	MTLR	56.364	0.410	39.394	0.409	38.788	0.410	45.455	0.522	52.121	0.246
Mining	TRMK	56.364	0.362	35.152	0.392	42.424	0.192	36.364	0.504	35.758	0.419
Oil and Gas	SNGS	50.303	0.352	53.939	0.865	58.182	1.824	44.848	0.307	49.697	0.106
Oil and Gas	SIBN	58.182	0.341	58.182	0.265	58.182	0.216	39.394	0.368	47.879	0.165
	SIBN	58.182	0.341	58.182	0.265	58.182	0.216	39.394	0.368	47.879	0.165
Consumer	MGNT	46.667	0.331	53.333	0.417	47.879	0.299	46.061	0.307	48.485	0.235
Sector	LENT	56.364	0.371	49.091	0.400	50.909	0.359	53.333	0.346	52.121	0.331
Construction	PIKK	49.091	0.484	50.303	0.462	57.576	0.436	47.273	0.529	53.333	0.322
Construction	SMLT	53.939	0.328	38.788	0.200	46.061	0.270	36.364	0.311	43.030	0.241
	SMLT	53.939	0.328	38.788	0.200	46.061	0.270	36.364	0.311	43.030	0.241
Telecommuni-	MTSS	56.970	0.541	53.939	0.473	55.152	0.368	47.879	0.316	45.455	0.193
cations	RTKMP	55.152	0.246	49.697	0.274	45.455	0.271	44.848	0.171	44.242	0.178
Transport	AFLT	55.152	0.419	51.515	0.641	50.303	0.348	45.455	0.259	52.121	0.182
Transport	FLOT	47.273	0.258	43.636	0.532	52.121	0.262	43.636	0.392	43.636	0.345
Finance	BSPB	46.061	0.410	47.879	0.406	50.909	0.326	47.879	0.369	52.121	0.227
Finance	SFIN	49.697	0.447	44.848	0.445	47.273	0.390	56.970	0.195	56.970	0.272
	SFIN	49.697	0.447	44.848	0.445	47.273	0.390	56.970	0.195	56.970	0.272
Chemical	PHOR	41.818	0.231	53.333	0.264	55.152	0.238	60.000	0.354	44.848	0.219
Industry	KZOSP	57.576	0.458	42.424	0.492	41.212	0.491	48.485	0.369	49.697	0.352
Power	HYDR	59.394	0.380	58.788	0.326	55.758	0.321	47.879	0.292	61.212	0.178
Engineering	MRKC	40.000	0.768	42.424	0.742	43.030	0.839	42.424	0.660	41.818	0.543

Table 13. The Dual-Modality Approach forecast: Accuracy, MAPE

Model	Accuracy, %	MAPE, %
LSTM-Qwen-Mean	48.552	0.256
LSTM-Qwen-Sum	46.970	0.367
LSTM	52.020	0.397
LSTM-RuBert-Mean	49.798	0.437
LSTM-RuBert-Sum	48.148	0.445

The results from the tables implies that the forecast based on the vectorized news stream using a large language model outperforms the forecast built solely on candlestick data of assets, demonstrating the smallest deviation of the pointwise price prediction from the actual price vector. Additionally, averaging the vectors (Mean) provides the best results.

MSE Loss change graph during training io1 -

io°,

IO’1

--- LSTM-RuBert-Sum train

— LSTM-RuBert-Mean train

--- LSTM-RuBert-Sum test

--- LSTM-RuBert-Mean test

--- LSTM-Qwen-Sum train

--- LSTM-Qwen-mean train

LSTM-Qwen-Sum-test

--- LSTM-Qwen-Mean test

LSTM train

— LSTM test

10 20 30 40 50 60 70

number of training iterationd

Figure 5. Dependence of the mean squared error function values on the number of training iterations for different models. Training and test sets

The dataset (176 stocks of Russian companies traded on the Moscow Exchange and 79,555 Russian-language financial news articles) collected for the study is available at [11] .

Conclusion

As a result of the conducted experiments, we demonstrated that adding a textual modality—analyzing the news stream—positively impacts the accuracy of price prediction. On average, the MAPE metric (the deviation of the predicted price from the actual price) decreases by 55%: from 0.397 (LSTM model) to 0.256 (LSTM-Qwen-Mean model). Additionally, predictions based on vectors obtained using the large language model Vikhr-Qwen2.5-0.5b-Instruct outperformed those based on RuBert. This can be partly attributed to the fact that the Qwen model has a significantly larger context window and is trained on a larger text corpus with support for «Chain-of-Thought» (CoT) reasoning. This enhances the model’s ability to reason and capture complex semantic dependencies within the text. The experimental results indicate that the averaging method (Mean) performed better than summation (Sum) and is the preferred method for aggregating news stream vectors.

At the same time, it is important to note that the test data, on which the final metric values were calculated, covers the period from March 28 to August 30, 2024. During this period, the Russian securities market exhibited a general downward trend. The presence of a clear trend is a significant factor that simplifies the prediction task. However, even in this setting, the proposed multimodal approach proved to be the best among those considered.

The training and validation of the model for the rewriting task were conducted on news articles whose length did not exceed the context window of the RuBert model. As a result, artifacts related to the context window size only became apparent during the forecasting phase when the news dataset included articles averaging around 290 words in length. For future improvements in news filtering and classification by company, it is necessary to utilize models with a larger context window, such as Qwen.

The collected dataset [11] demonstrates good structuring and can be used for fine-tuning large language models in Russian or adapted for the Russian language for applications in the financial sector.

Table 14. Multimodal approach forecasting metrics in comparison with the approach based on news sentiment score (Baseline) offered by [7]

Model	Ticker	R2	MAPE, %	MAE
LSTM-Qwen-Mean	AAPL	0.989	0.628	0.003
Baseline	AAPL	0.947	2.333	0.018
LSTM-Qwen-Mean	AMZN	0.968	1.601	0.013
Baseline	AMZN	0.870	1.730	0.015
LSTM-Qwen-Mean	GOOGL	0.935	1.394	0.008
Baseline	GOOGL	0.788	2.286	0.020
LSTM-Qwen-Mean	NFLX	0.955	2.361	0.076
Baseline	NFLX	0.919	2.512	0.019
LSTM-Qwen-Mean	TSLA	0.915	3.206	0.006
Baseline	TSLA	0.930	7.423	0.034

For a quantitative comparison of the proposed model, we conducted a computational experiment based on the approach and metrics from the study [7] . Following the methodology of [7] , we used time series data of stock prices from five ma jor American companies: AAPL, AMZN, GOOGL, NFLX, and TSLA, along with a dataset of English-language news articles labeled by company for the period from October 12, 2012 to January 31, 2020 (Table 14) .

It is worth noting that the dataset used includes text data in English; therefore, we utilized the original Qwen2.5-0.5b-Instruct model [10] for news vectorization. To generate forecasts, we selected and trained the LSTM-Qwen-Mean model, as it demonstrated the best overall performance in our study. For evaluation, we used the coefficient of determination ( R2 ), mean absolute error ( MAE ), and mean absolute percentage error ( MAPE ).

Thus, we worked with the same time series and evaluation metrics. Across all metrics, except for MAE on NFLX and R2 on TSLA, the proposed multimodal approach with vector averaging outperformed the best-performing results from the approach in [7] . Based on our computational experiments, we conclude that the proposed multimodal approach demonstrated superior forecasting quality and greater adaptability to both Russian and international markets.

In the future, it is necessary to explore how to incorporate the incoming news stream into the predictive model—specifically, the optimal time window for using news data and the best approach for weighting news messages (e. g. adjusting the weight of a news article based on its chronological position in the dataset).