Обзор современных систем обработки временных рядов

Автор: Иванова Елена Владимировна, Цымблер Михаил Леонидович

Журнал: Вестник Южно-Уральского государственного университета. Серия: Вычислительная математика и информатика @vestnik-susu-cmi

Статья в выпуске: 4 т.9, 2020 года.

Бесплатный доступ

Временной ряд представляет собой последовательность хронологически упорядоченных числовых значений, отражающих течение некоторого процесса или явления. В настоящее время одним из наиболее актуальных классов задач обработки временных рядов являются приложения Индустрии 4.0 и Интернета вещей. В данных приложениях типичной является задача обеспечения умного управления и предиктивного технического обслуживания сложных машин и механизмов, которые оснащаются различными сенсорами. Такие сенсоры имеют высокую дискретность снятия показаний и за сравнительно короткое время продуцируют временные ряды длиной от десятков миллионов до миллиардов элементов. Получаемые с сенсоров данные накапливаются и подвергаются интеллектуальному анализу для принятия стратегически важных решений. Обработка временных рядов требует специфического системного программного обеспечения, отличного от имеющихся реляционных СУБД и NoSQL-систем. Системы обработки временных рядов должны обеспечивать, с одной стороны, эффективные операции добавления новых атомарных значений, поступающих в потоковом режиме, а с другой стороны, эффективные операции интеллектуального анализа, в рамках которых временной ряд рассматривается как единое целое. В статье рассмотрены особенности обработки временных рядов в сравнении с данными реляционной и нереляционной природы, и даны формальные определения основных задач интеллектуального анализа временных рядов. Представлен обзор основных возможностей трех наиболее популярных современных систем обработки временных рядов: InfluxDB, OpenTSDB, TimescaleDB.

Еще

Обработка и анализ временных рядов, реляционная субд

Короткий адрес: https://sciup.org/147234285

IDR: 147234285   |   УДК: 004.62,   |   DOI: 10.14529/cmse200406

Overview of modern time series management systems

A time series is a sequence of chronologically ordered numerical values that reflect some process or phenomenon. Currently, one of the most topical applications related to time series processing are Industry 4.0 and Internet of Things. In these applications, the typical task is to provide intelligent control and predictive maintenance of complex machines and mechanisms that are equipped with various sensors. Such sensors have a high frequency, and in a relatively short time interval produce time series from tens of millions to billions of elements. The data obtained from the sensors is accumulated and mined to make strategic decisions. Time series processing requires specific system software that is different from the existing relational DBMS and NoSQL systems. Time series database systems should provide, on the one hand, efficient operations for adding new atomic values arriving in streaming mode, and on the other hand, efficient mining operations where time series is considered as a whole. The paper discusses the features of time series processing in comparison with data of a relational and non-relational nature, and gives formal definitions of the basic tasks of time series mining. The paper also presents an overview of three most popular modern time series database systems, namely InfuxDB, OpenTSDB, TimescaleDB.

Еще

Список литературы Обзор современных систем обработки временных рядов

  • Agrawal В., Chakravorty A., Rong С., Wlodarczyk T.W. R2Time: A framework to analyse Open TSDB time-series data in HBase // Proceedings of the 6th International Conference on Cloud Computing Technology and Science, CloudCom 2014 (Singapore, December, 15-18, 2014). IEEE, 2014. P. 970-975. DOI: 10.1109/CloudCom.2014.84.
  • Andersen M.P., Culler D.E. BTrDB: Optimizing storage system design for tinreseries processing // Proceedings of the 14th USENIX Conference on File and Storage Technologies, FAST 2016 (Santa Clara, United States, February, 22-25, 2016). P. 39-52. URL: https://www.usenix.org/system/files/conference/fastl6/fastl6-papers-andersen.pdf (дата обращения: 30.07.2020).
  • Andiojaya A., Demirhan H. A bagging algorithm for the imputation of missing values in time series // Expert Syst. Appl. 2019. Vol. 129. P. 10-26. DOI: 10.1016/j.eswa.2019.03.044.
  • Arous I., Khayati M., Cudre-Mauroux P., et al. RecovDB: Accurate and efficient missing blocks recovery for large time series // Proceedings of the 35th International Conference on Data Engineering, ICDE 2019 (Macao, Macao, April, 8-11, 2019). IEEE Computer Society, 2019. P. 1976-1979. DOI: 10.1109/ICDE.2019.00218.
  • Bader A., Kopp O., Falkenthal M. Survey and comparison of open source time series databases // Proceedings of the Workshop on Business, Technologies and Web, BTW 2017 (Stuttgart, Germany, March, 6-7, 2017). Gesellschaft fur Informatik, 2017. P. 249-268. URL: https://dl.gi.de/bitstream/handle/20.500.12116/922/paper31.pdf (дата обращения: 16.07.2020).
  • Berndt D.J., Clifford J. Using Dynamic Time Warping to find patterns in time series // Knowledge Discovery in Databases: Papers from the 1994 AAAI Workshop (Seattle, Washington, USA, July 1994). 1994. P. 359-370.
  • Cao K., Liu Y., Meng G., Sun O. An overview on Edge Computing research // IEEE Access. 2020. Vol. 8. P. 85714-85728. DOI: 10.1109/ACCESS.2020.2991734.
  • Chandola V., Banerjee A., Kumar V. Anomaly detection for discrete sequences: Asurvey // IEEE Trans. Knowl. Data Eng. 2012. Vol. 24, no. 5. P. 823-839. DOI: 10.1109/TKDE.2010.235.
  • Cook A.A., Misirli G., Fan Z. Anomaly detection for IoT time-Series data: A Survey // IEEE Internet Things Journal. 2020. Vol. 7, no. 7. P. 6481-6494. DOI: 10.1109/JIOT.2019.2958185.
  • Da X.L., Duan L. Big data for cyber physical systems in Industry 4.0: a survey // Enterp. Inf. Syst. 2019. Vol. 13, no. 2. P. 148-169. DOI: 10.1080/17517575.2018.1442934.
  • Davoudian A., Chen L., Liu M. A survey on NoSQL stores // ACM Comput. Surv. 2018. Vol. 51, no. 2. P. 40:1-40:43. DOI: 10.1145/3158661.
  • DB-Engines Ranking of Time Series DBMS. URL: https://db-engines.com/en/ranking/time+series+dbms (дата обращения: 16.07.2020).
  • Deri L., Mainardi S., Fusco F. tsdb: A compressed database for time series // Proceedings of the 4th International Workshop on Traffic Monitoring and Analysis, TMA 2012 (Vienna, Austria, March, 12, 2012). P. 143-156. DOI: 10.1007/978-3-642-28534-9_16.
  • Donovan A.A.A., Kernighan B.W. The Go programming language. Addison-Wesley, 2015. 380 p. ISBN: 978-0134190440.
  • Hellerstein J.M., Re C., Schoppmann F., et al. The MADlib analytics library or MAD skills, the SQL // PVLDB. 2012. Vol. 5, no. 12. P. 1700-1711. DOI: 10.14778/2367502.2367510.
  • Esling P., Agon C. Time-series data mining // ACM Comput. Surv. 2012. Vol. 45, no. 1. P. 12:1-12:34. DOI: 10.1145/2379776.2379788.
  • Fu T.C. A review on time series data mining // Eng. Appl. of AI. 2011. Vol. 24, no. 1. P. 164-181. DOI: 10.1016/j.engappai.2010.09.007.
  • Garcia-Molina H., Ullman J.D., Widom J. Database systems - the complete book. Pearson, 2009. 1203 p.
  • Grzesik P., Mrozek D. Comparative analysis of time series databases in the context of Edge computing for low power sensor networks // Proceedings of the 20th International Conference on Computational Science, ICCS 2020 (Amsterdam, The Netherlands, June, 3-5, 2020). Part V. 2020. P. 371-383. DOI: 10.1007/978-3-030-50426-7_28.
  • Guo Z., Wan Y., Ye H. A data imputation method for multivariate time series based on generative adversarial network // Neurocomputing. 2019. Vol. 360. P. 185-197. DOI: 10.1016/j.neucom.2019.06.007.
  • Hamdi S., Chaabane N., Bedoui M.H. Intra and Inter Relationships between Biomedical Signals: A VAR Model Analysis // Proceedings of the International Conference on Information Visualisation, IV 2019 (Paris, France, July, 2-5, 2019). P. 411-416. DOI: 10.1109/IV.2019.00076.
  • Hanif M. Relationship between oil and stock markets: Evidence from Pakistan stock exchange // International Journal of Energy Economics and Policy. 2020. Vol. 10, no. 5. P. 150-157. DOI: 10.32479/ijeep.9653.
  • Harizopoulos S., Abadi D.J., Madden S., Stonebraker M. OLTP through the looking glass, and what we found there // Making Databases Work: the Pragmatic Wisdom of Michael Stonebraker / Ed. by Brodie M.L. ACM / Morgan & Claypool, 2019. P. 409-439. DOI: 10.1145/3226595.3226635.
  • Holt C.E. Forecasting seasonals and trends by exponentially weighted averages // International Journal of Forecasting. 2004. Vol. 20, no. 1. P. 5-10. DOI: 10.1016/j.ijforecast.2003.09.015.
  • Hyndman R.J., Koehler A.B. Another look at measures of forecast accuracy // International Journal of Forecasting. 2006. Vol. 22, no. 4. P. 679-688. DOI: 10.1016/j.ijforecast.2006.03.001.
  • Idreos S., Groffen F., Nes N., et al. MonetDB: Two decades of research in column-oriented database architectures // IEEE Data Engineering Bulletin. 2012. Vol. 35, no. 1. P. 40-45.
  • InffuxDB 1.8 Documentation. URL: https://docs.influxdata.eom/influxdb/vl.8/ (дата обращения: 27.09.2020).
  • KairosDB documentation. URL: https://kairosdb.github.io/docs/build/html/ (дата обращения: 27.09.2020).
  • Kdb+ and q documentation. URL: https://code.kx.com/ (дата обращения: 27.09.2020).
  • Keogh E., Lin J., Fu A. HOT SAX: efficiently finding the most unusual time series subsequence // Proceedings of the 5th IEEE International Conference on Data Mining, ICDM’05 (Houston, Texas, November, 27-30, 2005). 2005. P. 8. DOI: 10.1109/ICDM.2005.79.
  • Khayati M., Cudre-Mauroux P., Bohlen M.H. Scalable recovery of missing blocks in time series with high and low cross-correlations // Knowl. Inf. Syst. 2020. Vol. 62, no. 6. P. 2257-2280. DOI: 10.1007/sl0115-019-01421-7.
  • Kumar S., Tiwari P., Zymbler M. Internet of Things is a revolutionary approach for future technology enhancement: a review // Journal of Big Data. 2019. Vol. 6. Article 111. DOI: 10.1186/s40537-019-0268-2.
  • Lan L., Shi R., Wang B., et al. A lightweight time series main-memory database for IoT real-time services // Proceedings of the 6th International Conference on Internet of Vehicles, Technologies and Services Toward Smart Cities, IOV 2019 (Kaohsiung, Taiwan, November, 18-21, 2019). P. 220-236. DOI: 10.1007/978-3-030-38651-l_19.
  • Li C., Li B., Bhuiyan M.Z.A., et al. FluteDB: An efficient and scalable in-memory time series database for sensor-cloud // J. Parallel Distributed Comput. 2018. Vol. 122. P. 95-108. DOI: 10.1016/j.jpdc.2018.07.021.
  • Lin T., Kaminski N., Bar-Joseph Z. Alignment and classification of time series gene expression in clinical studies // Bioinf. 2008. Vol. 24, no. 13. P. 147-155. DOI: 10.1093/bioinformatics/btnl52.
  • Liu X.-Y., Ren C.-L. Fast subsequence matching under time warping in time-series databases // Proceedings of the International Conference on Machine Learning and Cybernetics, ICMLC 2013 (Tianjin, China, July, 14-17, 2013). P. 1584-1590. DOI: 10.1109/ICMLC.2013.6890855.
  • MacDonald A. PhilDB: the time series database with built-in change logging // PeerJ Comput. Sci. 2016. Vol. 2. P. e52. DOI: 10.7717/peerj-cs.52.
  • Matallah H., Belalem G., Bouamrane K. Evaluation of NoSQL databases: MongoDB, Cassandra, HBase, Redis, Couchbase, OrientDB // Int. J. Softw. Sci. Comput. Intell. 2020. Vol. 12, no. 4. P. 71-91. DOI: 10.4018/IJSSCI.2020100105.
  • Meng J., Yuan J., Hans M., Wu Y. Mining motifs from human motion // Proceedings of the Eurographics 2008 - Short Papers (Crete, Greece, April, 14-18, 2008). Eurographics Association, 2008. P. 71-74. DOI: 10.2312/egs.20081024.
  • Mueen A., Keogh E.J., Zhu Q., Cash S., Westover M.B. Exact Discovery of Time Series Motifs // Proceedings of the SIAM International Conference on Data Mining, SDM 2009 (Sparks, Nevada, USA, April, 30 - May, 2, 2009). SIAM, 2009. P. 473-484. DOI: 10.1137/1.9781611972795.41.
  • Namiot D. Time series databases // Selected Papers of the XVII International Conference on Data Analytics and Management in Data Intensive Domains, DAMDID/RCDL 2015 (Obninsk, Russia, October, 13-16, 2015). P. 132-137. URL: http://ceur-ws.org/Vol-1536/paper20.pdf (дата обращения: 16.07.2020).
  • O’Neil P., Cheng E., Gawlick D., O’Neil E. The log-structured merge-tree (LSM-tree) // Acta Informatica. 1996. Vol. 33. P. 351-385.
  • OpenTSDB 3.0 Documentation. URL: http://opentsdb.net/docs/3x/build/html/ (дата обращения: 27.09.2020).
  • Pelkonen Т., Franklin S., Cavallaro P., et al. Gorilla: A fast, scalable, in-memory time series database // Proc. VLDB Endow. 2015. Vol. 8, no. 12. P. 1816-1827. DOI: 10.14778/2824032.2824078.
  • Petersen D., Middleton D. Linear interpolation, extrapolation, and prediction of random space-time fields with a limited domain of measurement // IEEE Transactions on Information Theory. 1965. Vol. 11, no. 1. P. 18-30. DOI: 10.1109/TIT. 1965.1053734.
  • Petre I., Boncea R., Radulescu C.Z., et al. A time-series database analysis based on a multiattribute maturity model // Studies in Informatics and Control. 2019. Vol. 2, no. 2. P. 177-188. DOI: 10.24846/v28i2y201906.
  • Prometheus Documentation. URL: https://prometheus.io/docs/ (дата обращения: 27.09.2020) .
  • Queiroz-Sousa P.O., Salgado A.C. A review on OLAP technologies applied to information networks // ACM Trans. Knowl. Discov. Data. 2020. Vol. 14, no. 1. P. 8:1-8:25. DOI: 10.1145/3370912.
  • Rakthanmanon T., Campana B.J.L., Mueen A., et al. Searching and mining trillions of time series subsequences under Dynamic Time Warping // The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’12 (Beijing, China, August, 12-16, 2012). 2012. P. 262-270. DOI: 10.1145/2339530.2339576.
  • Ratanamahatana C.A., Keogh E.J. Three myths about Dynamic Time Warping data mining // Proceedings of the 2005 SIAM International Conference on Data Mining, SDM 2005 (Newport Beach, CA, USA, April, 21-23, 2005). 2005. P. 506-510. DOI: 10.1137/1.9781611972757.50.
  • Rhea S., Wang E., Wong E., et al. LittleTable: A time-series database and its uses // Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017 (Chicago, IL, USA, May, 14-19, 2017). P. 125-138. DOI: 10.1145/3035918.3056102.
  • Riak KV Documentation. URL: https://docs.riak.com/riak/kv/ (дата обращения: 27.09.2020) .
  • Riak TS Documentation. URL: https://docs.riak.com/riak/ts/ (дата обращения: 27.09.2020) .
  • Seltzer M.I. Berkeley DB: A retrospective // IEEE Data Eng. Bull. 2007. Vol. 30, no. 3. P. 21-28. URL: http://sites.computer.org/debull/A07Sept/seltzer.pdf (дата обращения: 30.07.2020).
  • Sim H., Khan A., Vazhkudai S.S., Lim S.-H., Butt A.R., Kim Y. An Integrated Indexing and Search Service for Distributed File Systems // IEEE Transactions on Parallel and Distributed Systems. 2020. Vol. 31, no. 10. P. 2375-2391. DOI: 10.1109/TPDS.2020.2990656.
  • Sivasubramanian S. Amazon dynamoDB: a seamlessly scalable non-relational database service // Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (Scottsdale Arizona, USA, May, 2012). P. 729-730. DOI: 10.1145/2213836.2213945.
  • Shen Z., Zhang Y., Lu J., et al. A novel time series forecasting model with deep learning // Neurocomputing. 2020. Vol. 396. P. 302-313. DOI: 10.1016/j.neucom.2018.12.084.
  • Shieh J., Keogh E.J. iSAX: Indexing and mining terabyte sized time series // Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Las Vegas, Nevada, USA, August, 24-27, 2008). ACM, 2008. P. 623-631. DOI: 10.1145/1401890.1401966.
  • Shvachko K., Kuang H., Radia S., Chansler R. The Hadoop Distributed File System // Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies, MSST 10 (May, 2010). P. 1-10. DOI: 10.1109/MSST.2010.5496972.
  • Song I.-Y. Data Warehouse // Encyclopedia of Database Systems (2nd ed.). Ed. Liu L., Ozsu M.T. Springer, 2018. DOI: 10.1007/978-1-4614-8265-9_882.
  • TimescaleDB Documentation. URL: https://docs.timescale.com/ (дата обращения: 27.09.2020).
  • Torkamani S., Lohweg V. Survey on time series motif discovery // Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2017. Vol. 7, no. 2. DOI: 10.1002/widm.ll99.
  • Truong C.D., Anh D.T. A survey on time series motif discovery // Int. J. Bus. Intell. Data Min. 2019. Vol. 15, no. 2. P. 204-227. DOI: 10.1504/IJBIDM.2019.101266.
  • Tsubouchi Y., Wakisaka A., Hamada K., et al. HeteroTSDB: An extensible time series database for automatically tiering on heterogeneous key-value stores // Proceedings of the 43rd IEEE Annual Computer Software and Applications Conference, COMPSAC 2019 (Milwaukee, WI, USA, July, 15-19, 2019). Vol. 1. P. 264-269. DOI: 10.1109/COMPSAC.2019.00046.
  • Vibhute A., Haider S., Singh P., et al. Decadal variability of tropical Indian Ocean sea surface temperature and its impact on the Indian summer monsoon // Theoretical and Applied Climatology. 2020. Vol. 141, no. 1-2. P. 551-566. DOI: 10.1007/s00704-020-03216-l.
  • Winters P.R. Forecasting sales by exponentially weighted moving averages // Management Science. 1960. Vol. 6. P. 324-342. DOI: 10.1287/mnsc.6.3.324.
  • Wu J., Wang P., Pan N., et al. KV-Match: A subsequence matching approach supporting normalization and time warping // Proceedings of the 35th IEEE International Conference on Data Engineering, ICDE 2019 (Macao, China, April, 8-11, 2019). P. 866-877. DOI: 10.1109/ICDE.2019.00082.
  • Yang F., Tschetter E., Leaute X., et al. Druid: a real-time analytical data store // Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD T4 (New York, NY, US, June, 2014). P. 157-168. DOI: 10.1145/2588555.2595631.
  • Yang Y., Cao Q., Jiang H. EdgeDB: An efficient time-series database for Edge Computing // IEEE Access. 2019. Vol. 7. P. 142295-142307. DOI: 10.1109/ACCESS.2019.2943876.
  • Yankov D., Keogh E.J., Rebbapragada U. Disk aware discord discovery: Finding unusual time series in terabyte sized datasets // Proceedings of the 7th IEEE International Conference on Data Mining, ICDM 2007 (Omaha, Nebraska, USA, October, 28-31, 2007). IEEE Computer Society, 2007. P. 381-390. DOI: 10.1109/ICDM.2007.61.
  • Yeh С.-С.М., Zhu Y., Ulanova L., et al. Time series joins, motifs, discords and shapelets: a unifying view that exploits the matrix profile // Data Min. Know! Discov. 2018. Vol. 32, no. 1. P. 83-123. DOI: 10.1007/sl0618-017-0519-9.
  • Zhang Y.-F., Thorburn P.J., Xiang W., Fitch P. SSIM - A deep learning approach for recovering missing time series sensor data // IEEE Internet Things Journal. 2019. Vol. 6, no. 4. P. 6618-6628. DOI: 10.1109/JIOT.2019.2909038.
Еще