Optimization of classifiers ensemble construction: case study of educational data mining

Автор: Salal Y.K., Abdullaev S.M.

Журнал: Вестник Южно-Уральского государственного университета. Серия: Компьютерные технологии, управление, радиоэлектроника @vestnik-susu-ctcr

Рубрика: Краткие сообщения

Статья в выпуске: 4 т.19, 2019 года.

Бесплатный доступ

The choosing the best prediction method of education results is major challenge of Educational Data Mining (EDM). This EDM paper compares the results of student's performance forecast produced by the individual binary classifiers (Naïve Bayes, Decision Tree, Multi-Layer Perceptron, Nearest Neighbors, Support Vector Machine algorithms) and their ensembles, which are trained (tested) on dataset containing up to 38 input attributes (weekly attendance in mathematics, the intensity of study, interim assessment) of 84 (36) secondary school students from Nasiriyah, Iraq. The two-class school performance was predicted - passing or not passing on final exam. Three following stages of comparison were completed. At the first stage of the experiment, the dependence of classifiers from the input attributes was investigated. It was shown that the forecast accuracy rises from 61.1-77.7% when all 38 attributes were used, to 75.0-80.5%, if base classifier trained with five attributes pre-selected by Ranker Search method. Then, in second stage, to each of the base classifier the AdaBoost M1 procedure has been applied and five homogenous ensembles were created. And only two of these ensembles demonstrated small rise of 3% in accuracy comparing to corresponding stand-alone classifier, but the overall maximal prediction accuracy of 80.5% stayed the same. Finally, comparing the accuracies of 77.7% and 83.3% achieved by the heterogeneous ensemble consisted of five simple voting base classifiers and by the heterogeneous meta-ensemble of five simple voting AdaBoost homogenous ensembles correspondingly, we conclude that improvement of the quality of the individual classifier or homogeneous ensembles allows to construct more powerful EDM prediction methods.

Еще

Base classifiers, educational data mining, ranker search method, adaptive boosting, heterogeneous ensembles, метод селекции атрибутов ranker search, adaboost

Короткий адрес: https://sciup.org/147232279

IDR: 147232279 | DOI: 10.14529/ctcr190414

Текст краткого сообщения Optimization of classifiers ensemble construction: case study of educational data mining

Educational Data Mining and Learning Analytics (EDM/LM) are promising scientific field to enhance of teaching and learning technologies of traditional and e-learning education [1–5] and to manage of various forms of constructivist education [6]. The wide availability of data mining tools such as R, scikit-learn for Pyton, and Weka [7] allows us to solve one of the main tasks of EDM/LA: to forecast of student's performance and to help the needy [8, 9]. Most commonly, this task is resolved by using of individual classifiers with learning following algorithms [10]: Naïve Bayes (NB), Decision Tree (J48), Multi-Layer Perceptron (MLP), Nearest Neighbors (1NN) and Support Vector Machine (SVM) and other algorithms from the top-10 list [11].

On the other hand, in pedagogical practice to identify problematic students and to solve their fate are used collective expert decisions. In this sense, in the EDM/LA we should use one of the metalearning approaches consists of “learning from base learners” [12, 13]. The general purpose of this paper is to compare capacity of two types of heterogenous ensembles. First type of ensemble was created by base classifiers used NB, J48, MLP, 1NN and SVM and attribute structure improved by Ranker Search method application. The second ensemble consists of five homogeneous ensembles created by AdaBoost.M1 procedure from each of five base classifiers.

1. Data set

This particular student’s performance dataset are collected from secondary school of Nasiriyah, Iraq for first semester 2018–2019 of Mathematic subject. Detail for educational dataset containing up to 38 input attributes is shown in Table 1. The dataset includes student’s attributes like Name, Age, Gender, internal assignment attributes (Quizzes), number times of absence, in additionally two monthly exams,
2. Methods of analysis and results

finally the final exam of first semester for mathematic subject, collected by school reports and questionnaires are used for collecting data from the archives 120 students.

Table 1

Description of the dataset

#	Attributes	Description	Possible Values
1	Stu_N	Student name	String text
2	AG	Age of student	1: 14–16, 2: 16–18, 3: >18
3	Gender	Sex of Student	1: Male, 2: Female
4–7	Q1_M1, Q2_M1, Q3_M1, Q4_M1	Quiz 1, 2, 3, 4 for 1°, 2°, 3°, 4° week of first month	0 to 10
8–11	A1_M1, A2_M1, A3_M1, A4_M1	How many times absent 1°, 2°, 3°, 4° week of first month	0, 1, 2, 3, 4
12–15	Q1_M2, Q2_M2, Q3_M2, Q4_M2	As #4–7 of second month	0 to 10
16–19	A1_M2, A2_M2, A3_M2, A4_M2	As #8–11 of second month	0, 1, 2, 3, 4
20	Exam1	Exam of first attempt for semester	0 to 100
21–24	Q1_M3, Q2_M3, Q3_M3, Q4_M3	As #4–7 of third month	0 to 10
25–28	A1_M3, A2_M3, A3_M3, A4_M3	As #8–11 of third month	0, 1, 2, 3, 4
29–32	Q1_M4, Q2_M4, Q3_M4, Q4_M4	As #4–7 of fourth month	0 to 10
33–36	A1_M4, A2_M4, A3_M4, A4_M4	As #8–11 of fourth month	0, 1, 2, 3, 4
37	Exam2	Exam of second attempt for semester	0 to 100
38	AV_E1_E2	Average Exam1 & Exam2	0 to 100
39	FS_Exam	Exam First semester	0 to 100 (Class)

The output attribute is two class labeled as “Pass” if exam grade was ≥ 50 and as “Fail” if was not. Thus, two group of students with 80 students passed and 40 students that drop out on final exam were observed. The Weka version 3.8 (downloaded from NB, J48, MLP, 1NN and SVM default algorithms were trained on randomly choosing data of 84 students (70% of data set) and then testing on the data of rested 36 students (30% of data set).

Three stages of the experiment can distinguish.

Stage of selection of attribuites . Initially, we find the base classifier’s accuracy applying then to all 37 input attributes. Then we use attribute selection methods to eliminate both irrelevant attributes and redundant ones. A simpler idea is to rank the effectiveness of each [14, 15]. We implement Ranker Search Method (RSM) on the dataset to compare between results models accuracy for prediction students performance. RSM [14] is combined by 3 feature selection techniques:

1) Correlation Attribute Evaluation which correlate each attribute of dataset and the output class evaluation, choosing the most relevant attributes by value of Pearson’s correlation;
2) Information Gain Attribute Evaluation is entropy measure introduced to machine learning by Quilan [16];
3) Gain Ratio Attribute Evaluation overcomes the bias of Information Gain across the features with the large number of values;
4) Wrapper Subset Evaluation introduced by Kohavi and John [17].

Table 2 show the best five attributes chosen by these attribute evaluators.

Table 2

Attributes Selected by Ranker Search Method

Attribute Evaluator	Attribute Arrange	Attribute Name	Ranked Values
Correlation Attribute Evaluation	37, 36, 19, 18, 29	Exam2, AV_E1_E2, Exam1, A4_M2, Q2_M4	0.572, 0.543, 0.524, 0.348, 0.311
Information Gain Attribute Evaluation	36, 37, 19, 29, 18	Exam2, AV_E1_E2, Exam1, Q2_M4, A4_M2	0.343, 0.295, 0.287, 0.094, 0.088
Gain Ratio Attribute Evaluation	37, 19, 36, 18, 29	AV_E1_E2, Exam1, Exam2, A4_M2, Q2_M4	0.295, 0.288, 0.226, 0.095, 0.094
Wrapper Subset Evaluation	37, 19, 36, 29, 18	AV_E1_E2, Exam1, Exam2, Q2_M4, A4_M2	0.326, 0.317, 0.295, 0.104, 0.101

It is easy to notice that besides the same attributes related intermediate examinations, with a significant lag two attributes related to attendance and quizzes were selected by RSM, also.

The second stage is boosting. Adaptive Boosting (AdaBoost) introduced by Freud and Shapire [18] to improve prediction ability of one single classifier (old) when we train new classifier based on the same algorithms but on the dataset updated by using the rules which increase the weight of examples misclassified by old one, and to decrease the weight of correctly classified examples. Thus, the weight tends to concentrate the weak classifier on “hard” exam. And at final of iteration procedures, ensemble of classifiers is produced, where all classifier are voted by their weight. We used AdaBoost.M1 to our purposes because of according to [5], AdaBoost.M1 is more adequate classifier for EDM/LA mining.

At the third stage, we compared the improvement of overall accuracy (A) and F-measure for minor class (F) of individual base classifiers after feature selection and boosting stages. As can be seen from Table 3, the major advance in forecasting capacity was observed after Ranker Search Method (RSM) application. In particular, we see that two algorithms J48 and NN, that took F less than 40%, after RSM increased up to 20% their predictability of minor class to do useful forecasts (> 50%) and permit their boosting.

Table 3

Accuracy and F-measure of different classifiers

Classifier Algorithm	Classifier with all attributes		Base classifier using RSM		AdaBoost.M1 classifiers
Classifier Algorithm	A, %	F1, %	A, %	F1, %	A, %	F1, %
Naïve Bayes, NB	77.7	71.4	80.5	74.1	80.5	74.1
Decision Tree, J48	61.1	36.4	75.0	52.6	75.0	57.1
Multi-Layer Perceptron, MLP	75.0	66.7	75.0	52.6	77.7	55.6
Nearest Neighbors, NN	72.2	37.5	75.0	57.1	75.0	57.1
Support Vector Machine, SVM	75.0	64.0	77.7	63.6	80.5	66.7

Evaluating the effectiveness of AdaBoost homogeneous ensembles show that accuracy rise only to 0–3% in comparison to of RSM classifier 0–14%. At the same time, the capacity of leading NB-based classifier remained unchanged.

Finally, we compare accuracy of 3 simple voting ensembles: one built from base classifiers (72.2%); second contain the classifiers which received after the first RSM stage (77.7%) and the ensemble obtained by combination of Adaboost ensembles (83.3%).

Conclusion

In this study, the main focus has been comparison of various models of machine learning algorithms based on NB, J48, MLP, 1NN and SVM algorithms and their ensembles. We observe that applying Ranker Search method to choose the best attributes have major effect on forecast evaluated by F-measure of less representative data class, and consequently permit us to use the improved weak classier as initial Adaboost resident. We can see also that accuracy of final heterogeneous ensemble, for the first time on the 3% maximum single classifier performance surpassed and homogenious combination of their ensembles.

Список литературы Optimization of classifiers ensemble construction: case study of educational data mining

Romero C., Ventura S. Educational Data Mining: A Review of the State of the Art // IEEE Trans. Syst. Man Cybern. Part C Appl. Rev., 2010, vol. 40, no. 6, pp. 601-618. DOI: 10.1109/TSMCC.2010.2053532
U.S. Department of Education, Office of Educational Technology // Enhancing Teaching and Learning Through Educational Data Mining and Learning Analytics: An Issue Brief. Washington, D.C., 2012, Available at: https://tech.ed.gov/learning-analytics/edm-la-brief.pdf (accessed: 03.07.2018).
Baker R.S., Inventado P.S. Educational Data Mining and Learning Analytics // In: Larusson J., White B. (Eds.). Learning Analytics. Springer, New York, NY, 2014, pp. 61-75. DOI: 10.1007/978-1-4614-3305-7_4
Calvet Liñán L., Juan Pérez Á.A. Educational Data Mining and Learning Analytics: Differences, Similarities, and Time Evolution // RUSC. Universities and Knowledge Society Journal, 2015, vol. 12, no. 3, pp. 98-112. DOI: 10.7238/rusc.v12i3.2515
Jovanovic M., Vukicevic M., Milovanovic M., Minovic M. Using Data Mining on Student Behavior and Cognitive Style Data for Improving E-Learning Systems: a Case Study // I. Journal of Computational Intelligence Systems, 2012, vol. 5, no. 3, pp. 597-610. DOI: 10.1080/18756891.2012.696923
Berland М., Baker R.S., Blikstein P. Educational Data Mining and Learning Analytics: Applications to Constructionist Research // Tech Know Learn., 2014, vol. 19, pp. 205-220.
DOI: 10.1007/s10758-014-9223-7
Slater S., Joksimovic S., Kovanovic V., et al. Tools for Educational Data Mining: A Review // Journal of Educational and Behavioral Statistics, 2017, vol. 42, no. 1, pp. 85-106.
DOI: 10.3102/1076998616666808
Castro-Wunsch K., Ahadi A., Petersen A. Evaluating Neural Networks as a Method for Identifying Students in Need of Assistance // SIGCSE' 17, March 08-11, 2017, Seattle, WA, USA.
DOI: 10.1145/3017680.3017792
Hussain S., Fadhil M.Z., Salal Y.K., Theodoru P., Kurtoğlu F., Hazarika G.C. Prediction Model on Student Performance Based on Internal Assessment Using Deep Learning // I. Journal of Emerging Technologies in Learning, 2019, vol. 14, no. 8, pp. 4-22.
DOI: 10.3991/ijet.v14i08.10001
Wu X., Kumar V., Quinlan R.J. et al. Top 10 Algorithms in Data Mining // Knowl. Inf. Syst., 2008, vol. 14, pp. 1-37.
DOI: 10.1007/s10115-007-0114-2
Kumar M., Salal Y.K. Systematic Review of Predicting Student's Performance in Academics // I. J. of Engineering and Advanced Tech., 2019, vol. 8, no. 3, рp. 54-61.
Smith-Miles K.A. Cross-Disciplinary Perspectives on Meta-Learning for Algorithm Selection // ACM Comput. Surv., 2008, vol. 41, no. 1, Article 6, 25 p.
DOI: 10.1145/1456650.1456656
Vilalta R., Giraud-Carrier C., Brazdil P. Meta-Learning - Concepts and Techniques // In: Data Mining and Knowledge Discovery Handbook, Springer, 2010, pp. 717-732.
DOI: 10.1007/978-0-387-09823-4_36
Salal Y.K., Abdullaev S.M., Kumar M. Educational Data Mining: Student Performance Prediction in Academic // I. J. of Engineering and Advanced Tech., 2019, vol. 8, no. 4C, pp. 54-59.
Trabelsi M., Meddouri N., Maddouri M. A New Feature Selection Method for Nominal Classifier Based on Formal Concept Analysis // Procedia Computer Science, 2017, vol. 112, pp. 186-194.
DOI: 10.1016/j.procs.2017.08.227
Quinlan J.R. Induction of Decision Trees // Machine Learning, 1986, no. 1, pp. 81-106.
DOI: 10.1007/BF00116251
Kohavi R., John G.H. Wrappers for Feature Subset Selection // Artificial Intelligence (97), 1997, pp. 273-324
DOI: 10.1016/S0004-3702(97)00043-X
Freund Y., Schapire R.E. A Short Introduction to Boosting // J. of Japanese Society for Artificial Intelligence, 1999, vol. 14, no. 5, pp. 771-780.

Еще

Краткое сообщение