Performance comparison of the VAEX and DASK libraries
Автор: Palmov S.V., Shatalov N.V.
Журнал: Инфокоммуникационные технологии @ikt-psuti
Рубрика: Новые информационные технологии
Статья в выпуске: 1 (85) т.22, 2024 года.
Бесплатный доступ
The purpose of the study was to compare the performance of the Vaex and Dask libraries, designed to enhance data processing efficiency. In thıs regard, experiments involving the assessment of time consumption for various classes of operations were conducted. The research included dataset preparation, data sampling, environment configuration execution, installation and setup of the aforementioned modules, Python script development, performance testing and subsequent analysis of the results obtained. It was observed that Vaex exhibits high performance when processing large datasets comprising of million objects on a single local machine; Dask's metrics performance is inferior to the former library. This fact indicates that Vaex is a more efficient tool for processing large datasets under conditions similar to those used in this study. The results and conclusions of the study emphasize the importance of choosing the optimal library when processing large volumes of data, and also confirm the advantages of the Vaex library in this context.
Vaex, dask, python, big data, data processing
Короткий адрес: https://sciup.org/140307957
IDR: 140307957 | DOI: 10.18469/ikt.2024.22.1.12