A New Evaluation Measure for Feature Subset Selection with Genetic Algorithm
Автор: Saptarsi Goswami, Sourav Saha, Subhayu Chakravorty, Amlan Chakrabarti, Basabi Chakraborty
Журнал: International Journal of Intelligent Systems and Applications(IJISA) @ijisa
Статья в выпуске: 10 vol.7, 2015 года.
Бесплатный доступ
Feature selection is one of the most important preprocessing steps for a data mining, pattern recognition or machine learning problem. Finding an optimal subset of features, among all the combinations is a NP-Complete problem. Lot of research has been done in feature selection. However, as the sizes of the datasets are increasing and optimality is a subjective notion, further research is needed to find better techniques. In this paper, a genetic algorithm based feature subset selection method has been proposed with a novel feature evaluation measure as the fitness function. The evaluation measure is different in three primary ways a) It considers the information content of the features apart from relevance with respect to the target b) The redundancy is considered only when it is over a threshold value c) There is lesser penalization towards cardinality of the subset. As the measure accepts value of few parameters, this is available for tuning as per the need of the particular problem domain. Experiments conducted over 21 well known publicly available datasets reveal superior performance. Hypothesis testing for the accuracy improvement is found to be statistically significant.
Feature Selection, Genetic Algorithm, Filter, Relevance, Redundancy
Короткий адрес: https://sciup.org/15010757
IDR: 15010757
Список литературы A New Evaluation Measure for Feature Subset Selection with Genetic Algorithm
- Huan Liu, Lei Yu (2005) Toward Integrating Feature Selection Algorithms for Classification and Clustering , IEEE Transactions On Knowledge and Data Engineering, VOL. 17, NO. 4, April
- Isabelle Guyon , Andr′e Elisseeff (2003) An Introduction to Variable and Feature Selection, Journal of Machine Learning Research 3 (2003) 1157-1182
- Liu, H., Motoda, H., Setiono, R., & Zhao, Z. (2010, June). Feature selection: An ever evolving frontier in data mining. In Proc. The Fourth Workshop on Feature Selection in Data Mining (Vol. 4, pp. 4-13).
- Arauzo-Azofra, Antonio, José Luis Aznarte, and José M. Benítez. "Empirical study of feature selection methods based on individual feature evaluation for classification problems." Expert Systems with Applications 38.7 (2011): 8170-8177.
- Yang, Jihoon, and Vasant Honavar. "Feature subset selection using a genetic algorithm." In Feature extraction, construction and selection, pp. 117-136. Springer US, 1998.
- Lanzi, Pier Luca. "Fast feature selection with genetic algorithms: a filter approach." In Evolutionary Computation, 1997., IEEE International Conference on, pp. 537-540. IEEE, 1997.
- El Akadi, Ali, Aouatif Amine, Abdeljalil El Ouardighi, and Driss Aboutajdine. "A two-stage gene selection scheme utilizing MRMR filter and GA wrapper."Knowledge and Information Systems 26, no. 3 (2011): 487-500.
- Basabi Chakraborty, “Genetic Algorithm with Fuzzy Operators for Feature Subset Selection”, IEICE Trans on Fundamentals of Electronics, Communications and Computer Sciences Vol.E85-A, No.9, pp.2089–2092,September2002.
- Gheyas, Iffat A., and Leslie S. Smith. "Feature subset selection in large dimensionality domains." Pattern recognition 43, no. 1 (2010): 5-13.
- Hall, M. A. (1999). Correlation-based feature selection for machine learning (Doctoral dissertation, The University of Waikato).
- Peng, Hanchuan, Fulmi Long, and Chris Ding. "Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy." Pattern Analysis and Machine Intelligence, IEEE Transactions on27.8 (2005): 1226-1238.
- Bache, K. & Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
- J. Alcalá-Fdez, A. Fernandez, J. Luengo, J. Derrac, S. García, L. Sánchez, F. Herrera. KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework. Journal of Multiple-Valued Logic and Soft Computing 17:2-3 (2011) 255-287.
- Luca Scrucca (2013). GA: A Package for Genetic Algorithms in R. Journal of Statistical Software, 53(4), 1-37. URL http://www.jstatsoft.org/v53/i04/.
- R Core Team (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL: http://www.R-project.org/.
- David Meyer, Evgenia Dimitriadou, Kurt Hornik, Andreas Weingessel and Friedrich Leisch (2012). e1071: Misc Functions of the Department of Statistics (e1071), TU Wien. R package version 1.6-1. http://CRAN.R-project.org/package=e1071.
- Dem?ar, Janez. "Statistical comparisons of classifiers over multiple data sets." The Journal of Machine Learning Research 7 (2006): 1-30.
- Chakraborty, Basabi. "Feature subset selection by particle swarm optimization with fuzzy fitness function." Intelligent System and Knowledge Engineering, 2008. ISKE 2008. 3rd International Conference on. Vol. 1. IEEE, 2008.
- Kanan, Hamidreza Rashidy, and Karim Faez. "An improved feature selection method based on ant colony optimization (ACO) evaluated on face recognition system." Applied Mathematics and Computation 205, no. 2 (2008): 716-725.
- Estévez, Pablo A., Michel Tesmer, Claudio A. Perez, and Jacek M. Zurada. "Normalized mutual information feature selection." Neural Networks, IEEE Transactions on 20, no. 2 (2009): 189-201.
- Saptarsi Goswami, Amlan Chakrabarti,"Feature Selection: A Practitioner View", IJITCS, vol.6, no.11, pp.66-77, 2014. DOI: 10.5815/ijitcs.2014.11.10
- Jain, Anil, and Douglas Zongker. "Feature selection: Evaluation, application, and small sample performance." Pattern Analysis and Machine Intelligence, IEEE Transactions on 19.2 (1997): 153-158
- Jean Hausser and Korbinian Strimmer (2012). entropy: Entropy and Mutual Information Estimation. R package version 1.1.7. http://CRAN.R-project.org/package=entropy81 (2013).