Input data cleaning in automatic systems for commercial measurement of power consumption
Автор: Fedosin Aleksander Sergeevich, Fedosin Sergey Alekseevich
Журнал: Инфокоммуникационные технологии @ikt-psuti
Рубрика: Технологии компьютерных систем и сетей
Статья в выпуске: 2 т.14, 2016 года.
Бесплатный доступ
Quality of service is the main issue for modern large information systems. Their parameters mostly depend on data sources. Power usage meter reading might be used for billing and data mining analysis, and errors in these time series are undesired. Therefore, effective data cleaning should be performed before necessary data processing. Data quality problems in automatic systems for commercial measurement of power consumption can occur due to various reasons. This work describes classification of those problems, and we propose two-step procedure for cleaning of time series containing errors in automatic systems for commercial measurement of power consumption. The first step applies hierarchical clustering based on Euclidian distance that provides detection the most “unusual” profiles. The second step uses statistical data processing to determine time series outliers. We assume expert makes the final decision. This work is concerned with comparison of two methods for error data detection: SD-method and “Supersmopther” algorithm. We produced comparison for 100 power usage profiles that preliminarily were analyzed by expert.
Алгоритм "supersmoother", data cleaning, automatic system for commercial measurement of power consumption, data store, supersmoother, hierarchical clustering, power usage profile, euclidian distance
Короткий адрес: https://sciup.org/140191825
IDR: 140191825 | DOI: 10.18469/ikt.2016.14.2.08