Empirical Study of Impact of Various Concept Drifts in Data Stream Mining Methods

Автор: Veena Mittal, Indu Kashyap

Журнал: International Journal of Intelligent Systems and Applications(IJISA) @ijisa

Статья в выпуске: 12 vol.8, 2016 года.

Бесплатный доступ

In the real world, most of the applications are inherently dynamic in nature i.e. their underlying data distribution changes with time. As a result, the concept drifts occur very frequently in the data stream. Concept drifts in data stream increase the challenges in learning as well, it also significantly decreases the accuracy of the classifier. However, recently many algorithms have been proposed that exclusively designed for data stream mining while considering drifting concept in the data stream.This paper presents an empirical evaluation of these algorithms on datasets having four possible types of concept drifts namely; sudden, gradual, incremental, and recurring drifts.

Еще

Concept drift, online learning, data stream mining, drift detection, ensembles

Короткий адрес: https://sciup.org/15010886

IDR: 15010886

Список литературы Empirical Study of Impact of Various Concept Drifts in Data Stream Mining Methods

  • Street, W. Nick, and YongSeog Kim. "A streaming ensemble algorithm (SEA) for large-scale classification." Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2001.
  • Kuncheva, Ludmila I. "Classifier ensembles for detecting concept change in streaming data: Overview and perspectives." 2nd Workshop SUEMA. 2008.
  • Gama, Joao. Knowledge discovery from data streams. CRC Press, 2010.
  • Su, Li, Hong-yan Liu, and Zhen-Hui Song. "A new classification algorithm for data stream." International Journal of Modern Education and Computer Science (IJMECS) 3.4 (2011): 32.
  • Cao, Yuan, Haibo He, and Hong Man. "SOMKE: Kernel density estimation over data streams by sequences of self-organizing maps." Neural Networks and Learning Systems, IEEE Transactions on 23.8 (2012): 1254-1268.
  • Sidhu, Parneeta, and M. P. S. Bhatia. "Empirical Support for Concept-Drifting Approaches: Results Based on New Performance Metrics." (2015).
  • Gama, Joao, et al. "Learning with drift detection." Advances in artificial intelligence–SBIA 2004. Springer Berlin Heidelberg, 2004. 286-295.
  • Littlestone, Nick, and Manfred K. Warmuth. "The weighted majority algorithm." Information and computation 108.2 (1994): 212-261.
  • Nishida, Kyosuke, and Koichiro Yamauchi. "Detecting concept drift using statistical testing." Discovery Science. Springer Berlin Heidelberg, 2007.
  • Baena-Garcıa, Manuel, José del Campo-Ávila, Raúl Fidalgo, Albert Bifet, R. Gavalda, and R. Morales-Bueno. "Early drift detection method." In Fourth international workshop on knowledge discovery from data streams, vol. 6, pp. 77-86. 2006.
  • Kolter, J. Z., & Maloof, M. (2003, November). Dynamic weighted majority: A new ensemble method for tracking concept drift. In Data Mining, 2003. ICDM 2003. Third IEEE International Conference on (pp. 123-130). IEEE.
  • J. Z. Kolter and M.A. Maloof‖ Using additive expert ensembles to cope with concept drift‖. In Proceedings of the Twenty-Second ACM International Conference on Machine Learning (ICML’05), Bonn, Germany, pp. 449– 456.
  • J.Z. Kolter and M.A. Maloof, ―Dynamic weighted majority: An ensemble method for drifting concepts‖, JMLR (2007)8: 2755–2790
  • L. L. Minku and X.Yao, ―DDD: A New Ensemble Approach for Dealing with Concept Drift‖, IEEE Transactions on Knowledge and Data Engineering, VOL. 24, No. 4, 619, 2012.
  • Minku, L. L., White, A. P., & Yao, X. (2010). The impact of diversity on online ensemble learning in the presence of concept drift. Knowledge and Data Engineering, IEEE Transactions on, 22(5), 730-742.
  • Blum, Avrim. "Empirical support for winnow and weighted-majorityalgorithms: Results on a calendar scheduling domain." Machine Learning 26.1 (1997): 5-23.
  • A. Bifet and R. Gavaldà, “Learning from time-changing data with adaptive windowing,” inProc. 7th SIAM Int. Conf. Data Mining, 2007, pp. 443–448.
  • E. S. Page, “Continuous inspection schemes,” Biometrika, vol. 41, nos. 1–2, pp. 100–115, Jun. 1954.
  • N. C. Oza and S. J. Russell, “Experimental comparisons of online and batch versions of bagging and boosting,” inProc. 7th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2001, pp. 359–364.
  • A. Bifet, G. Holmes, and B. Pfahringer, “Leveraging bagging for evolving data streams,” in Proc. Eur. Conf. Mach. Learn./PKDD, I, 2010, pp. 135–150.
  • L. L. Minku and X. Yao, “DDD: A new ensemble approach for dealing with concept drift,”IEEE Trans. Knowl. Data Eng., vol. 24, no. 4, pp. 619–633, Apr. 2012.
  • Friedman, Nir, Dan Geiger, and Moises Goldszmidt. "Bayesian network classifiers." Machine learning 29.2-3 (1997): 131-163.
  • Duda, Richard O., and Peter E. Hart. Pattern classification and scene analysis. Vol. 3. New York: Wiley, 1973.
  • Langley, Pat, Wayne Iba, and Kevin Thompson. "An analysis of Bayesian classifiers." AAAI. Vol. 90. 1992.
  • Bernhard Pfahringer, Geoffrey Holmes, and RichardKirkby. New options for hoeffding trees. In AI, pages 90–99, 2007
  • P. Domingos and G. Hulten, “Mining high-speed data streams,” inProc. 6th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2000, pp. 71–80
  • Fan, Wei, et al. "Active Mining of Data Streams." SDM. 2004.
  • Dawid, A. Philip, and Vladimir G. Vovk. "Prequential probability: Principles and properties." Bernoulli (1999): 125-162.
  • UCI Machine Learning Repository archive.ics.uci.edu/ml
  • Bifet, Albert, et al. "Moa: Massive online analysis." The Journal of Machine Learning Research 11 (2010): 1601-1604.
  • Brzeziński, Dariusz, and Jerzy Stefanowski. "Accuracy updated ensemble for data streams with concept drift." Hybrid Artificial Intelligent Systems. Springer Berlin Heidelberg, 2011. 155-163.
  • Veena Mittal and Indu Kashyap. "Online Methods of Learning in Occurrence of Concept Drift " International Journal of Computer Applications 117(13):18-22, May 2015.
Еще
Статья научная