IGICA: A Hybrid Feature Selection Approach in Text Categorization

Автор: Mohammad Mojaveriyan, Hossein Ebrahimpour-komleh, Seyed jalaleddin Mousavirad

Журнал: International Journal of Intelligent Systems and Applications(IJISA) @ijisa

Статья в выпуске: 3 vol.8, 2016 года.

Бесплатный доступ

Feature selection problem is one of the most important issues in machine learning and statistical pattern recognition. This problem is important in many applications such as text categorization because there are many redundant and irrelevant features in these applications which may reduce the classification performance. Indeed, feature selection is a method to select an appropriate subset of features for increasing the performance of learning algorithms. In the text categorization, there are many features which most of them are redundant. In this paper, a two-stage feature selection method-IGICA- based on imperialist competitive algorithm (ICA) is proposed. ICA is a new metaheuristic which is inspired by imperialist competition among countries. At the first stage of the proposed algorithm, a filtering technique using the information gain is applied and features are ranked based on their values. The top ranking features are then selected. In the second stage, ICA is applied to the select the efficient features. The presented method is evaluated on Retures-21578 dataset. The experimental results showed that the proposed method has a good ability to select efficient features compared to other methods.

Еще

Text classification, Feature selection, Imperialist competition algorithm, Information gain

Короткий адрес: https://sciup.org/15010804

IDR: 15010804

Список литературы IGICA: A Hybrid Feature Selection Approach in Text Categorization

  • Yang, Y. and Pedersen, J. A. (1997). “A comparative study on feature selection in text categorization.” In Proceedings of 14th International Conference on Machine Learning (ICML-97), PP.412-420.
  • Y. Yang, and J. O. Pedersen, "A comparative study on feature selection in text categorization," In Proceedings of the 14th International Conference on Machine Learning, pp. 412-420,1997.
  • J. Chen, H. Huang, S. Tian, and Y. Qu, "Feature Selection for text classification with Naive Bayes," Expert SystAppl, vol. 36, pp. 5432-5435,2009.
  • Forman, George. "An extensive empirical study of feature selection metrics for text classification." The Journal of machine learning research 3 (2003): 1289-1305.
  • Zia, Tehseen, Qaiser Abbas, and Muhammad Pervez Akhtar. "Evaluation of Feature Selection Approaches for Urdu Text Categorization." (2015).
  • Ahmadizar, Fardin, Majid Hemmati, and Ahmad Rabanimotlagh. "Two-stage text feature selection method using fuzzy entropy measure and an t colony optimization." Electrical Engineering (ICEE), 2012 20th Iranian Conference on. IEEE, 2012.
  • Aghdam, Mehdi Hosseinzadeh, Nasser Ghasem-Aghaee, and Mohammad EhsanBasiri. "Text feature selection using ant colony optimization." Expert systems with applications 36.3 (2009): 6843-6853.
  • U?uz, Harun. "A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm." Knowledge-Based Systems 24.7 (2011): 1024-1032.
  • E. Atashpaz-Gargari and C. Lucas, "Imperialist competitive algorithm: an algorithm for optimization inspired by imperialistic competition," in Evolutionary Computation, 2007. CEC 2007. IEEE Congress on, 2007, pp. 4661- 4667.
  • Mousavirad, S. J., and H. Ebrahimpour-Komleh. "Feature selection using modified imperialist competitive algorithm." Computer and Knowledge Engineering (ICCKE), 2013 3th International eConference on. IEEE, 2013.
  • SJ Mousavirad, F. Akhlaghian Tab, and K. Mollazade. "Application of imperialist competitive algorithm for feature selection: A case study on bulk rice classification." International Journal of Computer Applications (0975–8887) Volume (2012).
  • I. Guyon, and A. Elisseeff, "An introduction to variable and feature selection," J Mach Learn Res, vol. 3, pp. 1157-1182, 2003.
  • H. Liu, and L. Yu, "Toward integrating feature selection algorithms for classification and clustering," IEEE T Knowl Data En, vol. 17, iss. 4, pp. 491-502, 2005.
  • The Reuters -21578 text categorization test Collection.http://kdd.ics.uci.edu/databaseslreuters21578/reuters21578.html
  • G. Salton, and C. Buckley, Term-weighting approaches in automatic text retrieval. Cornell University Ithaca: NY, TR87-881, 1987.
  • C. 1. van Rijsbergen, Information Retrieval, 2nd ed., Butterworth:London, 1979.
  • Ahmadizar, Fardin, Majid Hemmati, and Ahmad Rabanimotlagh. "Two-stage text feature selection method using fuzzy entropy measure and an t colony optimization." Electrical Engineering (ICEE), 2012 20th Iranian Conference on. IEEE, 2012.
Еще
Статья научная