Обзор современных систем обработки временных рядов
Автор: Иванова Елена Владимировна, Цымблер Михаил Леонидович
Статья в выпуске: 4 т.9, 2020 года.
Бесплатный доступ
Временной ряд представляет собой последовательность хронологически упорядоченных числовых значений, отражающих течение некоторого процесса или явления. В настоящее время одним из наиболее актуальных классов задач обработки временных рядов являются приложения Индустрии 4.0 и Интернета вещей. В данных приложениях типичной является задача обеспечения умного управления и предиктивного технического обслуживания сложных машин и механизмов, которые оснащаются различными сенсорами. Такие сенсоры имеют высокую дискретность снятия показаний и за сравнительно короткое время продуцируют временные ряды длиной от десятков миллионов до миллиардов элементов. Получаемые с сенсоров данные накапливаются и подвергаются интеллектуальному анализу для принятия стратегически важных решений. Обработка временных рядов требует специфического системного программного обеспечения, отличного от имеющихся реляционных СУБД и NoSQL-систем. Системы обработки временных рядов должны обеспечивать, с одной стороны, эффективные операции добавления новых атомарных значений, поступающих в потоковом режиме, а с другой стороны, эффективные операции интеллектуального анализа, в рамках которых временной ряд рассматривается как единое целое. В статье рассмотрены особенности обработки временных рядов в сравнении с данными реляционной и нереляционной природы, и даны формальные определения основных задач интеллектуального анализа временных рядов. Представлен обзор основных возможностей трех наиболее популярных современных систем обработки временных рядов: InfluxDB, OpenTSDB, TimescaleDB.
Обработка и анализ временных рядов, реляционная субд
Короткий адрес: https://sciup.org/147234285
IDR: 147234285 | DOI: 10.14529/cmse200406
Список литературы Обзор современных систем обработки временных рядов
- Agrawal В., Chakravorty A., Rong С., Wlodarczyk T.W. R2Time: A framework to analyse Open TSDB time-series data in HBase // Proceedings of the 6th International Conference on Cloud Computing Technology and Science, CloudCom 2014 (Singapore, December, 15-18, 2014). IEEE, 2014. P. 970-975. DOI: 10.1109/CloudCom.2014.84.
- Andersen M.P., Culler D.E. BTrDB: Optimizing storage system design for tinreseries processing // Proceedings of the 14th USENIX Conference on File and Storage Technologies, FAST 2016 (Santa Clara, United States, February, 22-25, 2016). P. 39-52. URL: https://www.usenix.org/system/files/conference/fastl6/fastl6-papers-andersen.pdf (дата обращения: 30.07.2020).
- Andiojaya A., Demirhan H. A bagging algorithm for the imputation of missing values in time series // Expert Syst. Appl. 2019. Vol. 129. P. 10-26. DOI: 10.1016/j.eswa.2019.03.044.
- Arous I., Khayati M., Cudre-Mauroux P., et al. RecovDB: Accurate and efficient missing blocks recovery for large time series // Proceedings of the 35th International Conference on Data Engineering, ICDE 2019 (Macao, Macao, April, 8-11, 2019). IEEE Computer Society, 2019. P. 1976-1979. DOI: 10.1109/ICDE.2019.00218.
- Bader A., Kopp O., Falkenthal M. Survey and comparison of open source time series databases // Proceedings of the Workshop on Business, Technologies and Web, BTW 2017 (Stuttgart, Germany, March, 6-7, 2017). Gesellschaft fur Informatik, 2017. P. 249-268. URL: https://dl.gi.de/bitstream/handle/20.500.12116/922/paper31.pdf (дата обращения: 16.07.2020).
- Berndt D.J., Clifford J. Using Dynamic Time Warping to find patterns in time series // Knowledge Discovery in Databases: Papers from the 1994 AAAI Workshop (Seattle, Washington, USA, July 1994). 1994. P. 359-370.
- Cao K., Liu Y., Meng G., Sun O. An overview on Edge Computing research // IEEE Access. 2020. Vol. 8. P. 85714-85728. DOI: 10.1109/ACCESS.2020.2991734.
- Chandola V., Banerjee A., Kumar V. Anomaly detection for discrete sequences: Asurvey // IEEE Trans. Knowl. Data Eng. 2012. Vol. 24, no. 5. P. 823-839. DOI: 10.1109/TKDE.2010.235.
- Cook A.A., Misirli G., Fan Z. Anomaly detection for IoT time-Series data: A Survey // IEEE Internet Things Journal. 2020. Vol. 7, no. 7. P. 6481-6494. DOI: 10.1109/JIOT.2019.2958185.
- Da X.L., Duan L. Big data for cyber physical systems in Industry 4.0: a survey // Enterp. Inf. Syst. 2019. Vol. 13, no. 2. P. 148-169. DOI: 10.1080/17517575.2018.1442934.
- Davoudian A., Chen L., Liu M. A survey on NoSQL stores // ACM Comput. Surv. 2018. Vol. 51, no. 2. P. 40:1-40:43. DOI: 10.1145/3158661.
- DB-Engines Ranking of Time Series DBMS. URL: https://db-engines.com/en/ranking/time+series+dbms (дата обращения: 16.07.2020).
- Deri L., Mainardi S., Fusco F. tsdb: A compressed database for time series // Proceedings of the 4th International Workshop on Traffic Monitoring and Analysis, TMA 2012 (Vienna, Austria, March, 12, 2012). P. 143-156. DOI: 10.1007/978-3-642-28534-9_16.
- Donovan A.A.A., Kernighan B.W. The Go programming language. Addison-Wesley, 2015. 380 p. ISBN: 978-0134190440.
- Hellerstein J.M., Re C., Schoppmann F., et al. The MADlib analytics library or MAD skills, the SQL // PVLDB. 2012. Vol. 5, no. 12. P. 1700-1711. DOI: 10.14778/2367502.2367510.
- Esling P., Agon C. Time-series data mining // ACM Comput. Surv. 2012. Vol. 45, no. 1. P. 12:1-12:34. DOI: 10.1145/2379776.2379788.
- Fu T.C. A review on time series data mining // Eng. Appl. of AI. 2011. Vol. 24, no. 1. P. 164-181. DOI: 10.1016/j.engappai.2010.09.007.
- Garcia-Molina H., Ullman J.D., Widom J. Database systems - the complete book. Pearson, 2009. 1203 p.
- Grzesik P., Mrozek D. Comparative analysis of time series databases in the context of Edge computing for low power sensor networks // Proceedings of the 20th International Conference on Computational Science, ICCS 2020 (Amsterdam, The Netherlands, June, 3-5, 2020). Part V. 2020. P. 371-383. DOI: 10.1007/978-3-030-50426-7_28.
- Guo Z., Wan Y., Ye H. A data imputation method for multivariate time series based on generative adversarial network // Neurocomputing. 2019. Vol. 360. P. 185-197. DOI: 10.1016/j.neucom.2019.06.007.
- Hamdi S., Chaabane N., Bedoui M.H. Intra and Inter Relationships between Biomedical Signals: A VAR Model Analysis // Proceedings of the International Conference on Information Visualisation, IV 2019 (Paris, France, July, 2-5, 2019). P. 411-416. DOI: 10.1109/IV.2019.00076.
- Hanif M. Relationship between oil and stock markets: Evidence from Pakistan stock exchange // International Journal of Energy Economics and Policy. 2020. Vol. 10, no. 5. P. 150-157. DOI: 10.32479/ijeep.9653.
- Harizopoulos S., Abadi D.J., Madden S., Stonebraker M. OLTP through the looking glass, and what we found there // Making Databases Work: the Pragmatic Wisdom of Michael Stonebraker / Ed. by Brodie M.L. ACM / Morgan & Claypool, 2019. P. 409-439. DOI: 10.1145/3226595.3226635.
- Holt C.E. Forecasting seasonals and trends by exponentially weighted averages // International Journal of Forecasting. 2004. Vol. 20, no. 1. P. 5-10. DOI: 10.1016/j.ijforecast.2003.09.015.
- Hyndman R.J., Koehler A.B. Another look at measures of forecast accuracy // International Journal of Forecasting. 2006. Vol. 22, no. 4. P. 679-688. DOI: 10.1016/j.ijforecast.2006.03.001.
- Idreos S., Groffen F., Nes N., et al. MonetDB: Two decades of research in column-oriented database architectures // IEEE Data Engineering Bulletin. 2012. Vol. 35, no. 1. P. 40-45.
- InffuxDB 1.8 Documentation. URL: https://docs.influxdata.eom/influxdb/vl.8/ (дата обращения: 27.09.2020).
- KairosDB documentation. URL: https://kairosdb.github.io/docs/build/html/ (дата обращения: 27.09.2020).
- Kdb+ and q documentation. URL: https://code.kx.com/ (дата обращения: 27.09.2020).
- Keogh E., Lin J., Fu A. HOT SAX: efficiently finding the most unusual time series subsequence // Proceedings of the 5th IEEE International Conference on Data Mining, ICDM’05 (Houston, Texas, November, 27-30, 2005). 2005. P. 8. DOI: 10.1109/ICDM.2005.79.
- Khayati M., Cudre-Mauroux P., Bohlen M.H. Scalable recovery of missing blocks in time series with high and low cross-correlations // Knowl. Inf. Syst. 2020. Vol. 62, no. 6. P. 2257-2280. DOI: 10.1007/sl0115-019-01421-7.
- Kumar S., Tiwari P., Zymbler M. Internet of Things is a revolutionary approach for future technology enhancement: a review // Journal of Big Data. 2019. Vol. 6. Article 111. DOI: 10.1186/s40537-019-0268-2.
- Lan L., Shi R., Wang B., et al. A lightweight time series main-memory database for IoT real-time services // Proceedings of the 6th International Conference on Internet of Vehicles, Technologies and Services Toward Smart Cities, IOV 2019 (Kaohsiung, Taiwan, November, 18-21, 2019). P. 220-236. DOI: 10.1007/978-3-030-38651-l_19.
- Li C., Li B., Bhuiyan M.Z.A., et al. FluteDB: An efficient and scalable in-memory time series database for sensor-cloud // J. Parallel Distributed Comput. 2018. Vol. 122. P. 95-108. DOI: 10.1016/j.jpdc.2018.07.021.
- Lin T., Kaminski N., Bar-Joseph Z. Alignment and classification of time series gene expression in clinical studies // Bioinf. 2008. Vol. 24, no. 13. P. 147-155. DOI: 10.1093/bioinformatics/btnl52.
- Liu X.-Y., Ren C.-L. Fast subsequence matching under time warping in time-series databases // Proceedings of the International Conference on Machine Learning and Cybernetics, ICMLC 2013 (Tianjin, China, July, 14-17, 2013). P. 1584-1590. DOI: 10.1109/ICMLC.2013.6890855.
- MacDonald A. PhilDB: the time series database with built-in change logging // PeerJ Comput. Sci. 2016. Vol. 2. P. e52. DOI: 10.7717/peerj-cs.52.
- Matallah H., Belalem G., Bouamrane K. Evaluation of NoSQL databases: MongoDB, Cassandra, HBase, Redis, Couchbase, OrientDB // Int. J. Softw. Sci. Comput. Intell. 2020. Vol. 12, no. 4. P. 71-91. DOI: 10.4018/IJSSCI.2020100105.
- Meng J., Yuan J., Hans M., Wu Y. Mining motifs from human motion // Proceedings of the Eurographics 2008 - Short Papers (Crete, Greece, April, 14-18, 2008). Eurographics Association, 2008. P. 71-74. DOI: 10.2312/egs.20081024.
- Mueen A., Keogh E.J., Zhu Q., Cash S., Westover M.B. Exact Discovery of Time Series Motifs // Proceedings of the SIAM International Conference on Data Mining, SDM 2009 (Sparks, Nevada, USA, April, 30 - May, 2, 2009). SIAM, 2009. P. 473-484. DOI: 10.1137/1.9781611972795.41.
- Namiot D. Time series databases // Selected Papers of the XVII International Conference on Data Analytics and Management in Data Intensive Domains, DAMDID/RCDL 2015 (Obninsk, Russia, October, 13-16, 2015). P. 132-137. URL: http://ceur-ws.org/Vol-1536/paper20.pdf (дата обращения: 16.07.2020).
- O’Neil P., Cheng E., Gawlick D., O’Neil E. The log-structured merge-tree (LSM-tree) // Acta Informatica. 1996. Vol. 33. P. 351-385.
- OpenTSDB 3.0 Documentation. URL: http://opentsdb.net/docs/3x/build/html/ (дата обращения: 27.09.2020).
- Pelkonen Т., Franklin S., Cavallaro P., et al. Gorilla: A fast, scalable, in-memory time series database // Proc. VLDB Endow. 2015. Vol. 8, no. 12. P. 1816-1827. DOI: 10.14778/2824032.2824078.
- Petersen D., Middleton D. Linear interpolation, extrapolation, and prediction of random space-time fields with a limited domain of measurement // IEEE Transactions on Information Theory. 1965. Vol. 11, no. 1. P. 18-30. DOI: 10.1109/TIT. 1965.1053734.
- Petre I., Boncea R., Radulescu C.Z., et al. A time-series database analysis based on a multiattribute maturity model // Studies in Informatics and Control. 2019. Vol. 2, no. 2. P. 177-188. DOI: 10.24846/v28i2y201906.
- Prometheus Documentation. URL: https://prometheus.io/docs/ (дата обращения: 27.09.2020) .
- Queiroz-Sousa P.O., Salgado A.C. A review on OLAP technologies applied to information networks // ACM Trans. Knowl. Discov. Data. 2020. Vol. 14, no. 1. P. 8:1-8:25. DOI: 10.1145/3370912.
- Rakthanmanon T., Campana B.J.L., Mueen A., et al. Searching and mining trillions of time series subsequences under Dynamic Time Warping // The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’12 (Beijing, China, August, 12-16, 2012). 2012. P. 262-270. DOI: 10.1145/2339530.2339576.
- Ratanamahatana C.A., Keogh E.J. Three myths about Dynamic Time Warping data mining // Proceedings of the 2005 SIAM International Conference on Data Mining, SDM 2005 (Newport Beach, CA, USA, April, 21-23, 2005). 2005. P. 506-510. DOI: 10.1137/1.9781611972757.50.
- Rhea S., Wang E., Wong E., et al. LittleTable: A time-series database and its uses // Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017 (Chicago, IL, USA, May, 14-19, 2017). P. 125-138. DOI: 10.1145/3035918.3056102.
- Riak KV Documentation. URL: https://docs.riak.com/riak/kv/ (дата обращения: 27.09.2020) .
- Riak TS Documentation. URL: https://docs.riak.com/riak/ts/ (дата обращения: 27.09.2020) .
- Seltzer M.I. Berkeley DB: A retrospective // IEEE Data Eng. Bull. 2007. Vol. 30, no. 3. P. 21-28. URL: http://sites.computer.org/debull/A07Sept/seltzer.pdf (дата обращения: 30.07.2020).
- Sim H., Khan A., Vazhkudai S.S., Lim S.-H., Butt A.R., Kim Y. An Integrated Indexing and Search Service for Distributed File Systems // IEEE Transactions on Parallel and Distributed Systems. 2020. Vol. 31, no. 10. P. 2375-2391. DOI: 10.1109/TPDS.2020.2990656.
- Sivasubramanian S. Amazon dynamoDB: a seamlessly scalable non-relational database service // Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (Scottsdale Arizona, USA, May, 2012). P. 729-730. DOI: 10.1145/2213836.2213945.
- Shen Z., Zhang Y., Lu J., et al. A novel time series forecasting model with deep learning // Neurocomputing. 2020. Vol. 396. P. 302-313. DOI: 10.1016/j.neucom.2018.12.084.
- Shieh J., Keogh E.J. iSAX: Indexing and mining terabyte sized time series // Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Las Vegas, Nevada, USA, August, 24-27, 2008). ACM, 2008. P. 623-631. DOI: 10.1145/1401890.1401966.
- Shvachko K., Kuang H., Radia S., Chansler R. The Hadoop Distributed File System // Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies, MSST 10 (May, 2010). P. 1-10. DOI: 10.1109/MSST.2010.5496972.
- Song I.-Y. Data Warehouse // Encyclopedia of Database Systems (2nd ed.). Ed. Liu L., Ozsu M.T. Springer, 2018. DOI: 10.1007/978-1-4614-8265-9_882.
- TimescaleDB Documentation. URL: https://docs.timescale.com/ (дата обращения: 27.09.2020).
- Torkamani S., Lohweg V. Survey on time series motif discovery // Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2017. Vol. 7, no. 2. DOI: 10.1002/widm.ll99.
- Truong C.D., Anh D.T. A survey on time series motif discovery // Int. J. Bus. Intell. Data Min. 2019. Vol. 15, no. 2. P. 204-227. DOI: 10.1504/IJBIDM.2019.101266.
- Tsubouchi Y., Wakisaka A., Hamada K., et al. HeteroTSDB: An extensible time series database for automatically tiering on heterogeneous key-value stores // Proceedings of the 43rd IEEE Annual Computer Software and Applications Conference, COMPSAC 2019 (Milwaukee, WI, USA, July, 15-19, 2019). Vol. 1. P. 264-269. DOI: 10.1109/COMPSAC.2019.00046.
- Vibhute A., Haider S., Singh P., et al. Decadal variability of tropical Indian Ocean sea surface temperature and its impact on the Indian summer monsoon // Theoretical and Applied Climatology. 2020. Vol. 141, no. 1-2. P. 551-566. DOI: 10.1007/s00704-020-03216-l.
- Winters P.R. Forecasting sales by exponentially weighted moving averages // Management Science. 1960. Vol. 6. P. 324-342. DOI: 10.1287/mnsc.6.3.324.
- Wu J., Wang P., Pan N., et al. KV-Match: A subsequence matching approach supporting normalization and time warping // Proceedings of the 35th IEEE International Conference on Data Engineering, ICDE 2019 (Macao, China, April, 8-11, 2019). P. 866-877. DOI: 10.1109/ICDE.2019.00082.
- Yang F., Tschetter E., Leaute X., et al. Druid: a real-time analytical data store // Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD T4 (New York, NY, US, June, 2014). P. 157-168. DOI: 10.1145/2588555.2595631.
- Yang Y., Cao Q., Jiang H. EdgeDB: An efficient time-series database for Edge Computing // IEEE Access. 2019. Vol. 7. P. 142295-142307. DOI: 10.1109/ACCESS.2019.2943876.
- Yankov D., Keogh E.J., Rebbapragada U. Disk aware discord discovery: Finding unusual time series in terabyte sized datasets // Proceedings of the 7th IEEE International Conference on Data Mining, ICDM 2007 (Omaha, Nebraska, USA, October, 28-31, 2007). IEEE Computer Society, 2007. P. 381-390. DOI: 10.1109/ICDM.2007.61.
- Yeh С.-С.М., Zhu Y., Ulanova L., et al. Time series joins, motifs, discords and shapelets: a unifying view that exploits the matrix profile // Data Min. Know! Discov. 2018. Vol. 32, no. 1. P. 83-123. DOI: 10.1007/sl0618-017-0519-9.
- Zhang Y.-F., Thorburn P.J., Xiang W., Fitch P. SSIM - A deep learning approach for recovering missing time series sensor data // IEEE Internet Things Journal. 2019. Vol. 6, no. 4. P. 6618-6628. DOI: 10.1109/JIOT.2019.2909038.