Features selection for text classification based on constraints for term weights
Автор: Sergienko R.B., Shan Ur rehman M., Khan A.E., Gasanova T.O., Minker W.
Журнал: Сибирский аэрокосмический журнал @vestnik-sibsau
Рубрика: Математика, механика, информатика
Статья в выпуске: 1 т.16, 2015 года.
Бесплатный доступ
Text classification is an important data analysis problem which can be applied in different domains including airspace industry. In this paper different text classification problems such as opinion mining and topic categorization are considered. Different text preprocessing techniques (TF-IDF, ConfWeight, and the Novel TW) and machine learning algorithms for classification (Bayes classifier, k-NN, SVM, and artificial neural network) are applied. The main goal of the presented investigations is to decrease text classification problem dimensionality by using features selection based on constraints for term weights. Such features selection provides significant reduction of dimensionality and less computational time for calculations. Besides, the use of constraints for term weights could increase classification effectiveness. We have observed such increase for three out of five problems. In the remaining two problems, no significant change and a decrease of classification effectiveness was observed.
Topic categorization, text classification, opinion mining, features selection, term weighting, constraint
Короткий адрес: https://sciup.org/148177382
IDR: 148177382
Список литературы Features selection for text classification based on constraints for term weights
- Joachims T. Learning to classify text using support vector machines: Methods, theory and algorithms. Kluwer Academic Publishers, 2002, p. 205
- Salton G. and Buckley C. Term-Weighting Approaches in Automatic Text Retrieval. Information Processing and Management. 1988, p. 513-523
- Soucy P., Mineau G. W. Beyond TFIDF Weighting for Text Categorization in the Vector Space Model. Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI 2005). 2005, p. 1130-1135
- T. Gasanova R. Sergienko W. Minker E. Semenkin, Zhukov E. A Semi-supervised Approach for Natural Language Call Routing. Proceedings of the SIGDIAL 2013 Conference, August 2013, p. 344-348
- Gasanova T., Sergienko R., Akhmedova S., Semenkin E., Minker W. Opinion Mining and Topic Categorization with Novel Term Weighting. ACL 2014. 2014, p. 84
- Gasanova T., Sergienko R., Semenkin E., Minker W. Dimension Reduction with Coevolutionary Genetic Algorithm for Text Classification. Proceedings of the 11th International Conference on Informatics in Control, Automation and Robotics (ICINCO), Vienna University of Technology, Austria, September 2014, vol. 1, p. 215-222
- Potter M. A., De Jong K. A. Cooperative coevolution: an architecture for evolving coadapted subcomponents. Trans. Evolutionary Computation, 8, Jan. 2000, p. 1-29
- Shafait F., Reif M., Kofler C., and Breuel T. M. Pattern Recognition Engineering. RapidMiner Community Meeting and Conference, 2010, p. 9
- DEFT (DÉfi Fouille de Textes). Available at: http://deft.limsi.fr/
- European Language Recourses Association. DEFT’08 Evaluation Package. Available at: http://catalog.elra.info/product_info.php?cPath=42_43&products_id=1165
- Bechet F., Beze M. E., Torres-Moreno J.-M. Proceedings of the 4th DEFT Workshop (Avignon, France, June 8-13, 2008). DEFT '08. TALN, Avignon, France, 2008, p. 27-36
- Charnois T., Doucet A., Mathet Y., Rioult F. Proceedings of the 4th DEFT Workshop (Avignon, France, June 8-13, 2008). DEFT '08. TALN, Avignon, France, 2008, p. 37-46
- Charton E., Camelin N., Acuna-Agost R., Gotab P., Lavalley R., Kessler R., Fernandez S. Proceedings of the 4th DEFT Workshop (Avignon, France, June 8-13, 2008). DEFT '08. TALN, Avignon, France, 2008, p. 47-56
- Cleuziou G., Poudat C. Proceedings of the 4th DEFT Workshop (Avignon, France, June 8-13, 2008). DEFT '08. TALN, Avignon, France, 2008, p. 57-64
- Ishibuchi H., Nakashima T., Murata T. Trans. on Systems, Man, and Cybernetics, 1999, vol. 29, p. 601-618