Multimodal stock price prediction: a case study of the Russian securities market

Бесплатный доступ

Classical asset price forecasting methods primarily rely on numerical data, such as price time series, trading volumes, limit order book data, and technical analysis indicators. However, the news flow plays a significant role in price formation, making the development of multimodal approaches that combine textual and numerical data for improved prediction accuracy highly relevant. This paper addresses the problem of forecasting financial asset prices using the multimodal approach that combines candlestick time series and textual news flow data. A unique dataset was collected for the study, which includes time series for 176 Russian stocks traded on the Moscow Exchange and $79,555$ financial news articles in Russian. For processing textual data, pre-trained models RuBERT and Vikhr-Qwen2.5-0.5b-Instruct (a large language model) were used, while time series and vectorized text data were processed using an LSTM recurrent neural network. The experiments compared models based on a single modality (time series only) and two modalities, as well as various methods for aggregating text vector representations. Prediction quality was estimated using two key metrics: Accuracy (direction of price movement prediction: up or down) and Mean Absolute Percentage Error (MAPE), which measures the deviation of the predicted price from the true price. The experiments showed that incorporating textual modality reduced the MAPE value by 55%. The resulting multimodal dataset holds value for the further adaptation of language models in the financial sector. Future research directions include optimizing textual modality parameters, such as the time window, sentiment, and chronological order of news messages.

Еще

Multimodal forecasting, quantitative fiance, machine learning

Короткий адрес: https://sciup.org/143184154

IDR: 143184154   |   DOI: 10.25209/2079-3316-2025-16-1-83-130

Текст научной статьи Multimodal stock price prediction: a case study of the Russian securities market

Building a price forecast for an asset is a crucial task for financial market participants, as it enables strategic planning, optimal investment portfolio management, and risk assessment. Numerous attempts have been made to apply machine learning methods to construct such forecasts [1 -3] .

With the growing popularity of deep learning models, researchers have shifted their focus toward the application of neural networks. At the same time, the problem of accurately accounting for the news flow as a key factor influencing market behavior is being reconsidered with the rapid development of generative artificial intelligence models and large language models (LLMs) such as ChatGPT, FinGPT, GigaChat, LLama, and others. In financial economics, LLMs are still rarely used, and their full potential remains untapped.

Researchers are exploring the use of natural language processing models to enhance the accuracy of asset price forecasts and investment portfolio management strategies.

The study [4] describes the use of sentiment analysis of news as an additional parameter. The authors employed the FinBert model, trained on financial data, to assess the sentiment of news articles as positive, negative, or neutral. The study utilized time series data from candlestick charts of the U. S. stock market index, Standard & Poor’s 500 (S&P 500). A machine learning model — random forest —was used for price prediction. The study concluded that incorporating sentiment analysis of news flow improves prediction accuracy.

In the study [5] , the authors aimed to develop a multimodal artificial intelligence model capable of providing well-founded and accurate forecasts for time series data. They implemented a model that generates predictions of an asset’s monthly or weekly returns, accompanied by a textual explanation from a language model based on the user’s input query.

The study [6] proposed an approach for fine-tuning instructions to interpret numerical values and contextualize financial data.

Kulikova et al. [7] examined the effect of classifying news into thematic groups. The authors demonstrated that, in most cases, it is advisable to use a single thematic group of news for the deep learning models considered (Temporal Convolutional Network, D-Linear, Transformer, and Temporal Fusion Transformer). They also determined the probabilities of forecast improvement for the 20 thematic groups analyzed.

In all the aforementioned studies, the models were implemented using a multimodal approach for the U. S. stock market, with English as the modality language. Notably, the news flow was not integrated directly into the predictor’s input vector but rather through a preprocessing block in the form of an additional parameter, such as sentiment analysis, news frequency related to the asset, or news classification, etc.

The ob jective of the current study is to demonstrate the advantages of a new multimodal method over predictions based solely on numerical data and to present a Russian-language financial news dataset.

To achieve this objective, we formulated the following key tasks:

  • ( 1 )    Construct a multimodal dataset consisting of time series data and news articles.

  • ( 2 )    Develop a predictive model capable of utilizing one or two modalities.

  • ( 3 )    Train the predictive model and analyze the values of accuracy functions and metrics, specifically Accuracy and MAPE.

In this study, we propose a new multimodal approach for integrating news flow into time series numerical data. The text of the news articles is converted into a vector representation and fed into the model alongside the time series vector.

Our hypothesis is that the multimodal approach will enable predictive models to extract semantic information from the text, thereby improving the accuracy of asset price forecasts.

1.    Data Collection and Structuring

Multimodality implies the use of more than one data modality, which affects both the data structure and the logic of predictive model development. We utilize two types of modalities:

numerical — time series of stock prices, textual — news streams.

To train the predictive model and analyze its performance, we collected an original dataset.

The time series, represented as candlestick data with open, close, high, and low prices, were obtained through the Algopack API of the Moscow Exchange (MOEX). For the numerical experiment, we selected stock time series data spanning from July 7, 2022, to August 30, 2024, covering 176

Table 1. Statistical features of the dataset after tokenization, RuBert

Source Mean Std Min Max Q25 Q50 Q75 RDV 134 88 8 512 65 123 187 Finam 221 135 18 512 116 178 284 BCS Express 20 10 4 82 13 17 26 BCS Technical Analysis 502 37 29 512 512 512 512 RBC 43 7 16 75 39 44 48 SmartLab 21 8 5 82 15 19 25 companies. During this period, the Russian stock market experienced phases of rapid growth and decline, with the IMOEX index rising from 2,213.81 to 2,650.32 points (+19, 72%).

We collected 79, 555 news articles from various sources, including the online publication “RBC” (1,823 articles), “BCS Express” (11,331), and “BCS Technical Analysis” (9,670), the investment company website “Finam” (20,647), the trader community website «SmartLab.ru» (30,857), as well as the Telegram channel “RDV” (5,227).

Several factors justify the selection of these sources. First, they provide news coverage for the required time period. Second, the institutional differences between sources, along with variations in writing style and levels of expertise, contribute to a more objective representation of events related to the analyzed time series.

News messages were tokenized using two models: RuBERT [8] and Vikhr-Qwen2.5-0.5b-Instruct [9] (further as Qwen). In the context of tokenized text, a word refers to a token — an element of the vector space represented as an index in the tokenizer’s vocabulary.

Descriptive statistics of the dataset (in tokens), including mean, standard deviation, minimum, maximum word count, and quartiles, are presented in Tables 1 and 2 . It is important to note that tokenization can increase the word count in a text, for example, by splitting words into smaller components.

Table 3 provides examples of how a phrase changes after tokenization. For instance, the word «открывает» is split into three subcomponents: «от», « ## к», and « ## рывает», where the “ ## ” prefix indicates that the token is a continuation of the previous token.

Table 2. Statistical features of the dataset after tokenization, Qwen

Source

Mean

Std

Min

Max

Q25

Q50

Q75

RDV

215

157

3

1324

92

187

304

Finam

453

405

35

5732

211

319

501

BCS Express

36

19

5

163

23

32

47

BCS Technical Analysis

1493

310

40

2221

1448

1545

1665

RBC

75

12

28

105

68

77

83

SmartLab

33

12

7

120

25

31

39

Table 3. Original and tokenized texts examples

Original text

Tokenized text

Доллар снова ниже 69 рублей

До ## лла ## р снова ниже 69 рублей

Москвич банкрот?

Москви ## ч банк ## рот ?

НПО Наука Отчет РСБУ

Н ## П, ## О Наука От ## чет Р ## С ## Б ## У

T-банк это желтый банк

T - банк это же ## лт ## ый банк

News articles characteristics On the “BCS Technical Analysis” platform, news articles tend to be lengthy, which imposes limitations on tokenizers. Specifically, as shown in Table 1 and Table 2 , the RuBERT model truncates the tokenized vector for longer texts. Additionally, the average length of tokenized text using the Qwen model exceeds that of RuBERT, indicating that Qwen has a broader vocabulary and a stronger text decomposition capability.

Furthermore, we collected data on 176 companies, forming a dataset consisting of tuples in the format:

(ticker, company name, company activity description).

Such data are essential in our case for:

  • (a)    extracting keywords from company descriptions,

  • (b)    improving the language model’s ability to link events described in news articles to specific companies and assess the impact of news on price dynamics.

  • 2.    Methods

Table 4. Examples of news articles (header snippet) and assigned tags

Source

Article fragment (heading)

Tags

RDV

Сегежа (SGZH): таргет 16.2 руб., апсайд +102...

SGZH

RDV

Артген биотех (ABIO) завершил доклинические...

аналитика, ABIO

Finam

Индекс МосБиржи восстанавливает позиции и приб...

ФосАгро, ВСМПО-АВСМ, CNYRUB

Finam

«Ашинский метзавод» назвал АО "Урал-ВК" своим ...

АшинскийМЗ

BCS Express

«Восходящее окно»: в каких бумагах замечен это...

Селигдар SELG, ЕвроТранс EUTR

BCS Express

«Сила Сибири» выйдет на максимальную мощность...

Газпром GAZP

BCS Technical Analysis

Мечел. Что ждать от бумаг на следующей неделе

Мечел

BCS Technical Analysis

На предыдущей торговой сессии акции Норникеля ...

ГМК Норникель

The dataset of news articles includes the following parameters: publication date, source, title, article body, and tags (keywords). For sources such as “RDV” and “SmartLab”, article titles are absent, and the corresponding fields are filled with a label: no title .

In our case, tags may include the full or abbreviated company name along with the corresponding ticker, the name of the market sector, and similar information. Tags in news articles were assigned by the article authors.

For the “RDV” source, tags were marked by authors in the form of hashtags (e. g. # цифры, # аналитика). In “BCS Express” and “BCS Technical Analysis”, tags were specified in dedicated fields at the beginning or end of the news article (e. g. PhoseAgro, Russian market) and were extracted from the HTML code of the page using the corresponding HTML tags. When tags were absent (“RBC”, “SmartLab”), the parameter in the dataset remained empty.

Table 4 provides examples of news articles (headline fragments) along with their assigned tags.

To validate our hypothesis regarding the advantages of the multimodal approach, we have planned a series of experiments.

The first series of experiments focused on predicting prices using only numerical time series of candlestick characteristics (close, open, high, and low prices). The quality metrics obtained from this experiment serve as baseline values against which improvements in price prediction accuracy using the proposed multimodal approach will be evaluated.

The second series of experiments aims to generate predictions and compute accuracy metrics (Accuracy, MAPE) using the multimodal approach while exploring different aggregation methods (Sum, Mean) for the vectorized news stream.

  • 2.1.    The Single-Modality Approach

We first conducted a series of experiments on asset price prediction using only time series data. For this, we applied classical machine learning models to the daily price values (close, open, high, low), including linear regression (LinReg), k-nearest neighbors (KNN), decision tree (DT), random forest (RF), and the boosting algorithm XGBoost (XGB). Among deep learning models, we utilized a long short-term memory recurrent neural network (LSTM).

Conceptually, the experiment consists of two tasks:

  • ( a )    predicting the price movement direction (increase or decrease), which

is a binary classification task;

  • ( b )    predicting the actual price, which is a regression task.

At this stage of the experiment, 176 companies were grouped into 23 industry sectors. We randomly selected 9 economic sectors and, within each sector, randomly chose two companies. Table 5 lists the selected sectors and companies (tickers) that participated in the computational experiment.

Table 7 provides statistical data on the closing price time series of the selected assets. Table 6 shows the distribution of news by companies after filtering. The correlation heat map of the closing price time series is shown in Figure 1 . An interesting feature of the examined period is that the market underwent two phase shifts — from a general price decline to growth and back again — as indicated by the vertical lines in Figure 2 .

Table 5. Economic sectors and companies (tickers) included into the dataset

Sector

Company (ticker)

Metal and Mining

Mechel (MLTR), TMK-Group (TRMK)

Oil and Gas

Surgutneftegas (SNGS), Gaspromneft (SIBN)

Consumer sector

Magnit (MGNT), Lenta (LENT)

Construction

PIK (PIKK), Samolet (SMLT)

Telecommunications

MTS (MTSS), Rostelecom (RTKMP)

Transport

AEROFLOT (AFLT), Sovcomflot (FLOT)

Finance

Bank Saint-Petersburg (BSPB), SFI (SFIN)

Chemical Industry

Phosagro (PHOR), Kazanorgsintez (KZOSP)

Power Engineering

Rushydro (HYDR), Rosseti Center (MRKC)

Table 6. Descriptive characteristics for company shares

Company (ticker)

Number of news items

Mechel (MLTR)

4258

Trubnaya Metallurgical Company (TRMK)

11739

Surgutneftegaz (SNGS)

12674

Gazpromneft (SIBN)

11421

Magnit (MGNT)

1236

Lenta (LENT)

311

PIK (PIKK)

897

Samolet (SMLT)

3392

MTS (MTSS)

1101

Rostelecom (RTKMP)

628

Aeroflot (AFLT)

1429

Sovcomflot (FLOT)

14476

Saint Petersburg Exchange (BSPB)

14278

SFAI (SFIN)

1647

PhosAgro (PHOR)

2773

Kazanorgsintez (KZOSP)

168

RusHydro (HYDR)

1921

MRSK Center (MRKC)

1576

Table 7. Descriptive characteristics for company shares

Ticker

Mean

Std

Min

Max

Q25

Q50

Q75

MTLR

191.8245

72.5652

81.2800

332.8800

123.8500

187.6700

251.6400

TRMK

153.1245

64.9362

55.8200

271.0000

87.1400

166.4200

218.7800

SNGS

27.0104

4.0119

17.3500

36.9600

23.7750

27.3300

30.0250

SIBN

601.5097

163.9205

335.5500

934.2500

452.0500

582.6500

748.9000

MGNT

5691.6429

1161.7684

4040.0000

8444.0000

4665.0000

5495.0000

6375.0000

LENT

814.3870

154.9502

650.0000

1263.0000

716.5000

749.0000

843.5000

PIKK

732.6617

94.8650

518.0000

955.5000

656.7000

732.9000

811.5000

SMLT

3120.8996

594.1018

1926.5000

4145.5000

2572.0000

3045.0000

3713.0000

MTSS

264.5382

32.0791

183.0000

346.9500

239.0000

266.2500

289.7500

RTKMP

68.1797

9.2753

52.2500

92.1000

60.4500

68.0000

74.7000

AFLT

38.1316

10.3131

22.4400

64.4000

27.9700

38.8800

44.1200

FLOT

88.0111

39.5834

29.9200

149.3000

42.1000

97.2000

124.1800

BSPB

211.1501

101.2533

67.5700

387.6800

100.8400

210.9900

295.3400

SFIN

762.9939

428.5679

425.8000

1975.0000

497.4000

518.0000

992.0000

PHOR

6774.6040

618.1977

4997.0000

8153.0000

6416.0000

6763.0000

7278.0000

KZOSP

25.8603

5.2029

15.3500

40.5700

21.9400

27.0700

29.8500

HYDR

0.7697

0.0810

0.5178

1.0278

0.7318

0.7721

0.8210

MRKS

0.5247

0.2382

0.2025

1.0745

0.2735

0.5550

0.7475

Figure 1. The correlations heatmap for 18 assets (close price)

Stock shares time-series graph

Normalized stocks prices

1.0

0.8

0.6

0.4

0.2

0.0

2022-07-01 2022-00-12 2022-09-23 2022-11-07 2022-12-19 2023-01-31 2023-03-16 2023-04-27 2023-06-13 2023-07-25 2023-09-05 2023-10-17 2023-11-20 2024-01-11 2024-02-22 2024-04-00 2024-05-21 2024-07-03 2024-00-14

Date

Figure 2. Normalized close prices of assets. Market phase transition dates denoted by vertical dashed lines

92                Kasymkhan U. Khubiyev, Mikhail E. Semenov              EN RU

Figure 3. Pipeline for a single and dual modalities models

Input the vector

Instrument return prediction

0.067

Convert into price

To evaluate prediction quality in the classification task, we used the Accuracy metric, while for regression, we employed MAPE (Mean Absolute Percentage Error). The choice of these metrics is justified by the nature of the tasks. In classification, the model must accurately predict the price movement direction either an increase (denoted by «+») or a decrease (denoted by « - »). The MAPE metric is best suited for assessing regression quality within the financial domain: it represents the average deviation from the asset’s actual price in percentage terms, making it easily interpretable in monetary value.

Figure 3 illustrates the model development process for utilizing one and two modalities.

As the input parameter, the model received a return vector of the asset, calculated based on the closing price (close) over the previous five trading sessions:

Returned +1) = ^seW+U close(d)

- 1.

The model’s output was a prediction for the next trading session.

To assess the accuracy of predicting the price movement direction, the predicted class was determined by the sign (±) of the forecasted return value, as the return of an asset represents the relative rate of change. Thus, a positive return indicates a price increase, while a negative return signifies a decline. To evaluate the quality of the asset price forecast, the predicted return vector was converted into price (in Russian rubles):

  • (2)    price(d + 1) = (Returned + 1) + 1) price(d).

  • 2.2.    The Dual-Modality Approach

The pointwise predicted price vector, obtained through transformation, was compared to the historical price vector of assets using the MAPE metric.

The choice of return (rather than price) as the target variable for the predictive model is justified by the fact that when prices exceed historical highs (or fall below historical lows) during market growth (or decline), the applicability of traditional methods becomes limited.

Based on this reasoning, candlestick characteristics (close, open, high, and low prices) were considered in the form of relative price changes , calculated using a formula similar to (1) .

Next, a rolling window of five trading days was applied to the relative price changes to form a vector-row, which was then fed into the predictive model. As a result, the model receives a vector of 20 parameters as input and predicts a single output value — the return of the instrument at the end of the next trading session.

For the experiment involving news flow, we selected news articles relevant to the analyzed assets based on keyword matching (Table 5) . The keywords were chosen as the top 30 words extracted using the TF-IDF method. This method determines the importance of words in a text by considering their frequency of occurrence and uniqueness across the entire corpus. An example of keywords extracted using TF-IDF is presented in Table 8.

After obtaining the list of keywords using the TF-IDF method, we further expanded it with the help of the ChatGPT-4o model. This allowed us to increase keyword variability through permutations, letter substitutions, and modifications of word endings (Table 9) . The selected news articles for each company (ticker) were converted into vectors and filtered to remove duplicates.

Table 8. Keywords by companies extracted from their descriptions

Ticker

Keywords

MTLR

mechel, mining, ore, raw materials, energy, ferroalloys, coal

SNGS

gas, geological exploration, oil, Surgutneftegas, petroleum products, electricity, drilling

SMLT

rent, development, developer, real estate, construction, Moscow region, residential areas

MTSS

subscriber, automation, internet, mobile communications, provider, communications

BSPB

bank, deposit, dividends, financial services, kaliningrad, spbank, saint-petersburg

Table 9. Complementary generated keywords

Ticker

Keywords

MTLR

мечел, метчел, мечал, mechel, Mchel, ферросплавы, фурросплав

SNGS

сургутнефтегаз, surgutneftegaz, surgut, сурнефтегаз, сургаз, cургут, сур-нфтгз

SMLT

самолет, smlt, samolet, samalet, Самлет

RTKMP

ростелеком, телеком, rostelecom, telecom, rtkm, ртк, r-telecom, растелком

HYDR

русгидро, rushydro, rshydro, r-gidro, гидрорус, гидра, русгидра

Figure 4 presents a distribution chart of the news articles for the companies after filtration.

As a vectorizer for the Russian language news stream, we employed two models: RuBERT [8] and Qwen [9] .

While working with the news stream, we encountered two main challenges. The first challenge is the problem of news rewriting, which necessitates filtering out duplicate articles. To ensure that our model accounts for each news article only once, it is essential to implement a duplicate identification algorithm.

The second challenge is to determinate an asset on which is affected the news article. This problem can be framed as a classification task, where

Company Ticker

Figure 4. The distribution of news articles by company after filtration (Numbers on the diagram show percentage of news about the company in the dataset)

tickers serve as class labels.

To address the issue of news rewriting, we designed a Siamese neural network. We constructed a training dataset using the GigaChat API as follows: for each article, three paraphrased versions of both the title and body were generated. Then, pairs were randomly formed in equal proportion from the original and paraphrased news articles and their titles.

The Siamese neural network was designed as follows: a pair of news articles is fed as input, and vector representations of the articles are extracted using the RuBERT model [8] . The two vectors are then concatenated, and the resulting vector is passed through a fully connected neural network (MLP). To determine the optimal depth of the MLP model, we conducted a series of experiments, evaluating both prediction accuracy and news stream processing time. Based on the results, we selected the MLP architecture with three layers.

The filtered news articles are then converted into vectors so that duplicate classification can be performed in a one-shot mode when new articles arrive. This approach reduces both the processing time of the news stream and the computational resources required (in our case, a GPU V100).

To address the second challenge — matching news article samples by date and utilizing them for price forecasting — it is essential to formalize the data selection and prediction process. We assume that the closing price prediction for an asset is made for each trading day at the market opening. In this case, only news articles published before the start of the current trading day are included in the dataset.

The dataset is formed by grouping news articles based on their publication date. For predicting the price on a given day, only articles published on the previous trading day are used. For example, analytical articles such as those under the “Technical Analysis” section from the “BCS” source, which are published daily before the market opens, are included in the dataset for forecasting the prices of assets analyzed in those reports. This approach ensures that the most relevant information is considered, thereby improving prediction accuracy.

For the dual-modality approach, training sequences were formed by concatenating price return vectors from the previous five days with news stream vectors. The relative price return vectors were constructed similarly to the single-modality experiment, while news articles were selected from the previous trading day based on the chosen asset. These news articles were then transformed into vectors and aggregated.

If no publications were available on the previous day or before the market opened on the current day, a zero vector was concatenated with the relative price return vector of length 768 for the RuBERT model and 896 for the Vikhr-Qwen2.5-0.5b-Instruct (Qwen) model. Otherwise, the aggregated news vector of the same length was appended. These final vector lengths correspond to the output sizes of the pretrained RuBERT and Qwen models.

In this study, we explored two approaches for aggregating news vectors: vector summation (Sum) and averaged summation (Mean). By vector summation, we mean summing the values of corresponding vector coordinates. In the averaged summation approach, each coordinate of the aggregated vector is assigned the arithmetic mean of the corresponding coordinates across all aggregated vectors.

The baseline RuBERT model has a limited context window of 512 tokens. As a result, articles exceeding this limit were either truncated or split for separate processing, meaning that a single news article could correspond to multiple vectors. In contrast, the Qwen model has a significantly larger context window of 32,768 tokens (64 times larger), allowing it to process entire articles without truncation. Next, we compare how different news vectorization methods impact the accuracy of price predictions.

The pointwise predicted return vectors were converted into asset prices using equation (2) . The prediction quality was evaluated using two metrics: Accuracy and Mean Absolute Percentage Error (MAPE). Accuracy was measured as the proportion of correctly predicted signs of the return vector elements—either positive or negative. The MAPE metric indicates the average percentage deviation of the predicted price from the actual value. This allows us to assess the prediction quality not only in relative terms but also in absolute monetary units (rubles).

3.    Computational experiment

In this section, we present the results of computational experiments for two predictive models (single- and dual-modalities). The predictive model was developed using the Transformers framework from the Hugging Face platform. All computations were performed on an NVIDIA V100 GPU.

  • 3.1.    The Single-Modality Approach Performance

The results of the experiment on predicting return vectors using only time series data for classical and deep learning models are presented in a Table 10.

Table 11 provides the averaged prediction quality metrics for all models, sorted in ascending order of the mean absolute percentage error (MAPE) (column “Deviation”).

From the experiment results, it is evident that the recurrent model LSTM achieves the best classification performance (predicting upward or downward trends) and regression accuracy (smallest deviation of the predicted price from the actual price). However, it lags slightly in terms of the mean absolute error metric.

Table 10. Results of forecasting return vectors using only time series. Accuracy (left) and deviation (right) in percent

Source

LSTM

XGB

KNN

RF

LinReg

DT

Metals and

MTLR

56.364

0.410

40.000

2.089

42.273

2.050

50.909

2.020

50.000

2.029

42.727

2.679

Mining

TRMK

56.364

0.362

40.909

2.105

38.182

2.167

47.273

2.154

49.091

2.114

52.727

2.308

Oil and Gas

SNGS

50.303

0.352

49.091

1.776

48.182

1.775

50.000

1.735

60.909

1.744

52.727

1.857

SIBN

58.182

0.341

40.000

1.766

58.182

1.746

46.364

1.788

41.818

1.839

51.818

1.813

Consumer

MGNT

46.667

0.331

39.091

1.517

43.636

1.493

49.091

1.519

40.000

1.709

60.000

1.672

Sector

LENT

56.364

0.371

54.546

2.202

39.091

2.178

52.723

2.145

51.818

2.220

51.818

2.589

Construction

PIKK

49.091

0.484

40.909

1.565

50.909

1.563

50.000

1.558

44.545

1.637

51.818

1.592

SMLT

53.939

0.328

42.727

1.577

38.182

1.552

46.364

1.539

49.091

1.536

41.818

1.683

Telecommuni-

MTSS

56.970

0.541

42.727

1.290

40.000

1.306

45.455

1.520

53.636

1.419

50.000

1.395

cations

RTKMP

55.152

0.246

45.455

1.299

42.723

1.303

42.727

1.335

50.909

1.355

48.182

1.411

Transport

AFLT

55.152

0.419

46.364

2.079

57.273

2.017

52.727

2.062

60.909

1.976

51.818

2.194

FLOT

47.273

0.258

43.637

2.116

38.182

2.124

42.727

2.104

45.454

2.074

49.091

2.294

Finance

BSPB

46.061

0.410

49.091

1.612

50.909

1.695

50.909

1.598

54.545

1.602

45.455

1.829

SFIN

49.697

0.447

40.000

1.603

30.909

1.647

39.091

1.743

48.182

1.960

41.818

1.959

Chemical

PHOR

41.818

0.231

42.727

1.194

52.723

1.149

48.182

1.168

50.000

1.227

45.455

1.218

Industry

KZOSP

57.576

0.458

49.091

1.198

42.723

1.237

49.091

1.210

46.364

1.217

54.545

1.581

Power

HYDR

59.394

0.380

51.182

1.124

60.000

1.130

48.182

1.214

45.455

1.151

49.091

1.355

Engineering

MRKC

40.000

0.768

51.182

1.182

49.091

1.225

54.545

1.214

50.000

1.224

55.455

1.403

Multimodal prediction                        99

Table 11. The Single-Modality approach forecast (time-series) inference metrics: Accuracy and MAPE in percentage

Model

Accuracy, %

MAPE, %

LSTM

52.020

0.397

XGB

45.000

1.627

KNN

46.010

1.631

RF

48.384

1.646

LinReg

50.152

1.669

DT

49.798

1.824

  • 3.2.    The Dual-Modality Approach Performance

The results of the second experiment, which involved merging the news stream with numerical time series data and comparing the proposed multimodal approach with a forecast based solely on candlestick time series, are presented in the Table 12 .

The Table 13 provides the averaged prediction quality metrics for the considered models. The data in this table is sorted by the “Deviation” column in ascending order, reflecting the mean absolute percentage error (MAPE) of the predicted price deviations.

In this second experiment, the LSTM neural network was chosen as the baseline model. We compared different vectorization methods (RuBert, Qwen) and aggregation techniques (Sum, Mean) to evaluate their impact on prediction performance.

Figure 5 shows the dependence of the mean squared error (MSE Loss) function values on the number of training iterations for different models, based on the training set (from July 7, 2022, to March 27, 2024) and the test set (from March 28 to August 30, 2024). The graph indicates that after 30 training epochs, the curves reach a stationary value.

I

Table 12. The Dual-Modality returns vector forecasting metrics. Accuracy (the upper row), MAPE (the lower row) in percentage

Source

vanilla LSTM

LSTM_RuBert

_SUM

LSTM_RuBert_

MEAN

LSTM_QWEN

_SUM

LSTM_QWEN_

MEAN

Metals and

MTLR

56.364

0.410

39.394

0.409

38.788

0.410

45.455

0.522

52.121

0.246

Mining

TRMK

56.364

0.362

35.152

0.392

42.424

0.192

36.364

0.504

35.758

0.419

Oil and Gas

SNGS

50.303

0.352

53.939

0.865

58.182

1.824

44.848

0.307

49.697

0.106

SIBN

58.182

0.341

58.182

0.265

58.182

0.216

39.394

0.368

47.879

0.165

Consumer

MGNT

46.667

0.331

53.333

0.417

47.879

0.299

46.061

0.307

48.485

0.235

Sector

LENT

56.364

0.371

49.091

0.400

50.909

0.359

53.333

0.346

52.121

0.331

Construction

PIKK

49.091

0.484

50.303

0.462

57.576

0.436

47.273

0.529

53.333

0.322

SMLT

53.939

0.328

38.788

0.200

46.061

0.270

36.364

0.311

43.030

0.241

Telecommuni-

MTSS

56.970

0.541

53.939

0.473

55.152

0.368

47.879

0.316

45.455

0.193

cations

RTKMP

55.152

0.246

49.697

0.274

45.455

0.271

44.848

0.171

44.242

0.178

Transport

AFLT

55.152

0.419

51.515

0.641

50.303

0.348

45.455

0.259

52.121

0.182

FLOT

47.273

0.258

43.636

0.532

52.121

0.262

43.636

0.392

43.636

0.345

Finance

BSPB

46.061

0.410

47.879

0.406

50.909

0.326

47.879

0.369

52.121

0.227

SFIN

49.697

0.447

44.848

0.445

47.273

0.390

56.970

0.195

56.970

0.272

Chemical

PHOR

41.818

0.231

53.333

0.264

55.152

0.238

60.000

0.354

44.848

0.219

Industry

KZOSP

57.576

0.458

42.424

0.492

41.212

0.491

48.485

0.369

49.697

0.352

Power

HYDR

59.394

0.380

58.788

0.326

55.758

0.321

47.879

0.292

61.212

0.178

Engineering

MRKC

40.000

0.768

42.424

0.742

43.030

0.839

42.424

0.660

41.818

0.543

Table 13. The Dual-Modality Approach forecast: Accuracy, MAPE

Model

Accuracy, %

MAPE, %

LSTM-Qwen-Mean

48.552

0.256

LSTM-Qwen-Sum

46.970

0.367

LSTM

52.020

0.397

LSTM-RuBert-Mean

49.798

0.437

LSTM-RuBert-Sum

48.148

0.445

The results from the tables implies that the forecast based on the vectorized news stream using a large language model outperforms the forecast built solely on candlestick data of assets, demonstrating the smallest deviation of the pointwise price prediction from the actual price vector. Additionally, averaging the vectors (Mean) provides the best results.

MSE Loss change graph during training io1 -

io°,

IO’1

--- LSTM-RuBert-Sum train

— LSTM-RuBert-Mean train

--- LSTM-RuBert-Sum test

--- LSTM-RuBert-Mean test

--- LSTM-Qwen-Sum train

--- LSTM-Qwen-mean train

LSTM-Qwen-Sum-test

--- LSTM-Qwen-Mean test

LSTM train

— LSTM test

10       20       30       40       50       60       70

number of training iterationd

Figure 5. Dependence of the mean squared error function values on the number of training iterations for different models. Training and test sets

The dataset (176 stocks of Russian companies traded on the Moscow Exchange and 79,555 Russian-language financial news articles) collected for the study is available at [11] .

Conclusion

As a result of the conducted experiments, we demonstrated that adding a textual modality—analyzing the news stream—positively impacts the accuracy of price prediction. On average, the MAPE metric (the deviation of the predicted price from the actual price) decreases by 55%: from 0.397 (LSTM model) to 0.256 (LSTM-Qwen-Mean model). Additionally, predictions based on vectors obtained using the large language model Vikhr-Qwen2.5-0.5b-Instruct outperformed those based on RuBert. This can be partly attributed to the fact that the Qwen model has a significantly larger context window and is trained on a larger text corpus with support for «Chain-of-Thought» (CoT) reasoning. This enhances the model’s ability to reason and capture complex semantic dependencies within the text. The experimental results indicate that the averaging method (Mean) performed better than summation (Sum) and is the preferred method for aggregating news stream vectors.

At the same time, it is important to note that the test data, on which the final metric values were calculated, covers the period from March 28 to August 30, 2024. During this period, the Russian securities market exhibited a general downward trend. The presence of a clear trend is a significant factor that simplifies the prediction task. However, even in this setting, the proposed multimodal approach proved to be the best among those considered.

The training and validation of the model for the rewriting task were conducted on news articles whose length did not exceed the context window of the RuBert model. As a result, artifacts related to the context window size only became apparent during the forecasting phase when the news dataset included articles averaging around 290 words in length. For future improvements in news filtering and classification by company, it is necessary to utilize models with a larger context window, such as Qwen.

The collected dataset [11] demonstrates good structuring and can be used for fine-tuning large language models in Russian or adapted for the Russian language for applications in the financial sector.

Table 14. Multimodal approach forecasting metrics in comparison with the approach based on news sentiment score (Baseline) offered by [7]

Model

Ticker

R2

MAPE, %

MAE

LSTM-Qwen-Mean

AAPL

0.989

0.628

0.003

Baseline

AAPL

0.947

2.333

0.018

LSTM-Qwen-Mean

AMZN

0.968

1.601

0.013

Baseline

AMZN

0.870

1.730

0.015

LSTM-Qwen-Mean

GOOGL

0.935

1.394

0.008

Baseline

GOOGL

0.788

2.286

0.020

LSTM-Qwen-Mean

NFLX

0.955

2.361

0.076

Baseline

NFLX

0.919

2.512

0.019

LSTM-Qwen-Mean

TSLA

0.915

3.206

0.006

Baseline

TSLA

0.930

7.423

0.034

For a quantitative comparison of the proposed model, we conducted a computational experiment based on the approach and metrics from the study [7] . Following the methodology of [7] , we used time series data of stock prices from five ma jor American companies: AAPL, AMZN, GOOGL, NFLX, and TSLA, along with a dataset of English-language news articles labeled by company for the period from October 12, 2012 to January 31, 2020 (Table 14) .

It is worth noting that the dataset used includes text data in English; therefore, we utilized the original Qwen2.5-0.5b-Instruct model [10] for news vectorization. To generate forecasts, we selected and trained the LSTM-Qwen-Mean model, as it demonstrated the best overall performance in our study. For evaluation, we used the coefficient of determination ( R2 ), mean absolute error ( MAE ), and mean absolute percentage error ( MAPE ).

Thus, we worked with the same time series and evaluation metrics. Across all metrics, except for MAE on NFLX and R2 on TSLA, the proposed multimodal approach with vector averaging outperformed the best-performing results from the approach in [7] . Based on our computational experiments, we conclude that the proposed multimodal approach demonstrated superior forecasting quality and greater adaptability to both Russian and international markets.

In the future, it is necessary to explore how to incorporate the incoming news stream into the predictive model—specifically, the optimal time window for using news data and the best approach for weighting news messages (e. g. adjusting the weight of a news article based on its chronological position in the dataset).

Статья научная