Integration of missing data imputation tools for time series in real-time mode into a relational DBMS

Основное

Автор: Yurtin A.A.

Журнал: Вестник Южно-Уральского государственного университета. Серия: Вычислительная математика и информатика @vestnik-susu-cmi

Статья в выпуске: 1 т.14, 2025 года.

Бесплатный доступ

The article addresses the problem of integrating time series imputation into relational database management systems (RDBMS). A method called ImputeDB is proposed, which enables the real-time integration of neuralnetwork-based imputation models into the PostgreSQL RDBMS. The imputation of missing values is carried outthrough triggers (stored functions automatically executed by the RDBMS kernel when new data is inserted). Whena trigger is activated, missing values are replaced by synthetic ones generated by a neural network model. Using theproposed method, a database application programmer can integrate the process of imputing missing values intothe standard time series processing pipeline within the PostgreSQL RDBMS, without relying on external services.The proposed approach includes a set of components implemented as user-defined functions (UDFs) in Pythonand PL/Python: Trigger Constructor, Model Manager, Model Storage, and Imputer. The Trigger Constructoris used to create triggers that automatically perform imputation of missing values in inserted data. The ModelManager is responsible for training neural network models, while the Model Storage is used to save these modelsin a file-based repository. The Imputer, in turn, synthesizes missing values using the trained models. Experimentswere conducted to evaluate the performance of the ImputeDB method. The experiments measured the processingtime of data insertion with automatic gap imputation as a function of the time series dimensionality. Experimentswere performed under two scenarios: single and multiple insertions. Neural network-based imputation modelswith various architectures, including recurrent neural networks, autoencoders, and transformers, were employed.The experimental results demonstrated that under conditions of increasing time series dimensionality and risingoverhead from network requests and data transfer, ImputeDB exhibits superior performance. Specifically, thesystem achieved an efficiency gain of 22.5% compared to another approach, while maintaining the accuracy of theemployed imputation methods.

Еще

Postgresql

Короткий адрес: https://sciup.org/147248016

IDR: 147248016 | DOI: 10.14529/cmse250102

Статья научная