Optimization of classifiers ensemble construction: case study of educational data mining
Бесплатный доступ
The choosing the best prediction method of education results is major challenge of Educational Data Mining (EDM). This EDM paper compares the results of student's performance forecast produced by the individual binary classifiers (Naïve Bayes, Decision Tree, Multi-Layer Perceptron, Nearest Neighbors, Support Vector Machine algorithms) and their ensembles, which are trained (tested) on dataset containing up to 38 input attributes (weekly attendance in mathematics, the intensity of study, interim assessment) of 84 (36) secondary school students from Nasiriyah, Iraq. The two-class school performance was predicted - passing or not passing on final exam. Three following stages of comparison were completed. At the first stage of the experiment, the dependence of classifiers from the input attributes was investigated. It was shown that the forecast accuracy rises from 61.1-77.7% when all 38 attributes were used, to 75.0-80.5%, if base classifier trained with five attributes pre-selected by Ranker Search method. Then, in second stage, to each of the base classifier the AdaBoost M1 procedure has been applied and five homogenous ensembles were created. And only two of these ensembles demonstrated small rise of 3% in accuracy comparing to corresponding stand-alone classifier, but the overall maximal prediction accuracy of 80.5% stayed the same. Finally, comparing the accuracies of 77.7% and 83.3% achieved by the heterogeneous ensemble consisted of five simple voting base classifiers and by the heterogeneous meta-ensemble of five simple voting AdaBoost homogenous ensembles correspondingly, we conclude that improvement of the quality of the individual classifier or homogeneous ensembles allows to construct more powerful EDM prediction methods.
Base classifiers, educational data mining, ranker search method, adaptive boosting, heterogeneous ensembles, метод селекции атрибутов ranker search, adaboost
Короткий адрес: https://sciup.org/147232279
IDR: 147232279 | DOI: 10.14529/ctcr190414
Текст краткого сообщения Optimization of classifiers ensemble construction: case study of educational data mining
Educational Data Mining and Learning Analytics (EDM/LM) are promising scientific field to enhance of teaching and learning technologies of traditional and e-learning education [1–5] and to manage of various forms of constructivist education [6]. The wide availability of data mining tools such as R, scikit-learn for Pyton, and Weka [7] allows us to solve one of the main tasks of EDM/LA: to forecast of student's performance and to help the needy [8, 9]. Most commonly, this task is resolved by using of individual classifiers with learning following algorithms [10]: Naïve Bayes (NB), Decision Tree (J48), Multi-Layer Perceptron (MLP), Nearest Neighbors (1NN) and Support Vector Machine (SVM) and other algorithms from the top-10 list [11].
On the other hand, in pedagogical practice to identify problematic students and to solve their fate are used collective expert decisions. In this sense, in the EDM/LA we should use one of the metalearning approaches consists of “learning from base learners” [12, 13]. The general purpose of this paper is to compare capacity of two types of heterogenous ensembles. First type of ensemble was created by base classifiers used NB, J48, MLP, 1NN and SVM and attribute structure improved by Ranker Search method application. The second ensemble consists of five homogeneous ensembles created by AdaBoost.M1 procedure from each of five base classifiers.
-
1. Data set
This particular student’s performance dataset are collected from secondary school of Nasiriyah, Iraq for first semester 2018–2019 of Mathematic subject. Detail for educational dataset containing up to 38 input attributes is shown in Table 1. The dataset includes student’s attributes like Name, Age, Gender, internal assignment attributes (Quizzes), number times of absence, in additionally two monthly exams,
-
2. Methods of analysis and results
finally the final exam of first semester for mathematic subject, collected by school reports and questionnaires are used for collecting data from the archives 120 students.
Table 1
Description of the dataset
# |
Attributes |
Description |
Possible Values |
1 |
Stu_N |
Student name |
String text |
2 |
AG |
Age of student |
1: 14–16, 2: 16–18, 3: >18 |
3 |
Gender |
Sex of Student |
1: Male, 2: Female |
4–7 |
Q1_M1, Q2_M1, Q3_M1, Q4_M1 |
Quiz 1, 2, 3, 4 for 1°, 2°, 3°, 4° week of first month |
0 to 10 |
8–11 |
A1_M1, A2_M1, A3_M1, A4_M1 |
How many times absent 1°, 2°, 3°, 4° week of first month |
0, 1, 2, 3, 4 |
12–15 |
Q1_M2, Q2_M2, Q3_M2, Q4_M2 |
As #4–7 of second month |
0 to 10 |
16–19 |
A1_M2, A2_M2, A3_M2, A4_M2 |
As #8–11 of second month |
0, 1, 2, 3, 4 |
20 |
Exam1 |
Exam of first attempt for semester |
0 to 100 |
21–24 |
Q1_M3, Q2_M3, Q3_M3, Q4_M3 |
As #4–7 of third month |
0 to 10 |
25–28 |
A1_M3, A2_M3, A3_M3, A4_M3 |
As #8–11 of third month |
0, 1, 2, 3, 4 |
29–32 |
Q1_M4, Q2_M4, Q3_M4, Q4_M4 |
As #4–7 of fourth month |
0 to 10 |
33–36 |
A1_M4, A2_M4, A3_M4, A4_M4 |
As #8–11 of fourth month |
0, 1, 2, 3, 4 |
37 |
Exam2 |
Exam of second attempt for semester |
0 to 100 |
38 |
AV_E1_E2 |
Average Exam1 & Exam2 |
0 to 100 |
39 |
FS_Exam |
Exam First semester |
0 to 100 (Class) |
The output attribute is two class labeled as “Pass” if exam grade was ≥ 50 and as “Fail” if was not. Thus, two group of students with 80 students passed and 40 students that drop out on final exam were observed. The Weka version 3.8 (downloaded from NB, J48, MLP, 1NN and SVM default algorithms were trained on randomly choosing data of 84 students (70% of data set) and then testing on the data of rested 36 students (30% of data set).
Three stages of the experiment can distinguish.
Stage of selection of attribuites . Initially, we find the base classifier’s accuracy applying then to all 37 input attributes. Then we use attribute selection methods to eliminate both irrelevant attributes and redundant ones. A simpler idea is to rank the effectiveness of each [14, 15]. We implement Ranker Search Method (RSM) on the dataset to compare between results models accuracy for prediction students performance. RSM [14] is combined by 3 feature selection techniques:
-
1) Correlation Attribute Evaluation which correlate each attribute of dataset and the output class evaluation, choosing the most relevant attributes by value of Pearson’s correlation;
-
2) Information Gain Attribute Evaluation is entropy measure introduced to machine learning by Quilan [16];
-
3) Gain Ratio Attribute Evaluation overcomes the bias of Information Gain across the features with the large number of values;
-
4) Wrapper Subset Evaluation introduced by Kohavi and John [17].
Table 2 show the best five attributes chosen by these attribute evaluators.
Table 2
Attributes Selected by Ranker Search Method
Attribute Evaluator |
Attribute Arrange |
Attribute Name |
Ranked Values |
Correlation Attribute Evaluation |
37, 36, 19, 18, 29 |
Exam2, AV_E1_E2, Exam1, A4_M2, Q2_M4 |
0.572, 0.543, 0.524, 0.348, 0.311 |
Information Gain Attribute Evaluation |
36, 37, 19, 29, 18 |
Exam2, AV_E1_E2, Exam1, Q2_M4, A4_M2 |
0.343, 0.295, 0.287, 0.094, 0.088 |
Gain Ratio Attribute Evaluation |
37, 19, 36, 18, 29 |
AV_E1_E2, Exam1, Exam2, A4_M2, Q2_M4 |
0.295, 0.288, 0.226, 0.095, 0.094 |
Wrapper Subset Evaluation |
37, 19, 36, 29, 18 |
AV_E1_E2, Exam1, Exam2, Q2_M4, A4_M2 |
0.326, 0.317, 0.295, 0.104, 0.101 |
It is easy to notice that besides the same attributes related intermediate examinations, with a significant lag two attributes related to attendance and quizzes were selected by RSM, also.
The second stage is boosting. Adaptive Boosting (AdaBoost) introduced by Freud and Shapire [18] to improve prediction ability of one single classifier (old) when we train new classifier based on the same algorithms but on the dataset updated by using the rules which increase the weight of examples misclassified by old one, and to decrease the weight of correctly classified examples. Thus, the weight tends to concentrate the weak classifier on “hard” exam. And at final of iteration procedures, ensemble of classifiers is produced, where all classifier are voted by their weight. We used AdaBoost.M1 to our purposes because of according to [5], AdaBoost.M1 is more adequate classifier for EDM/LA mining.
At the third stage, we compared the improvement of overall accuracy (A) and F-measure for minor class (F) of individual base classifiers after feature selection and boosting stages. As can be seen from Table 3, the major advance in forecasting capacity was observed after Ranker Search Method (RSM) application. In particular, we see that two algorithms J48 and NN, that took F less than 40%, after RSM increased up to 20% their predictability of minor class to do useful forecasts (> 50%) and permit their boosting.
Table 3
Accuracy and F-measure of different classifiers
Classifier Algorithm |
Classifier with all attributes |
Base classifier using RSM |
AdaBoost.M1 classifiers |
|||
A, % |
F1, % |
A, % |
F1, % |
A, % |
F1, % |
|
Naïve Bayes, NB |
77.7 |
71.4 |
80.5 |
74.1 |
80.5 |
74.1 |
Decision Tree, J48 |
61.1 |
36.4 |
75.0 |
52.6 |
75.0 |
57.1 |
Multi-Layer Perceptron, MLP |
75.0 |
66.7 |
75.0 |
52.6 |
77.7 |
55.6 |
Nearest Neighbors, NN |
72.2 |
37.5 |
75.0 |
57.1 |
75.0 |
57.1 |
Support Vector Machine, SVM |
75.0 |
64.0 |
77.7 |
63.6 |
80.5 |
66.7 |
Evaluating the effectiveness of AdaBoost homogeneous ensembles show that accuracy rise only to 0–3% in comparison to of RSM classifier 0–14%. At the same time, the capacity of leading NB-based classifier remained unchanged.
Finally, we compare accuracy of 3 simple voting ensembles: one built from base classifiers (72.2%); second contain the classifiers which received after the first RSM stage (77.7%) and the ensemble obtained by combination of Adaboost ensembles (83.3%).
Conclusion
In this study, the main focus has been comparison of various models of machine learning algorithms based on NB, J48, MLP, 1NN and SVM algorithms and their ensembles. We observe that applying Ranker Search method to choose the best attributes have major effect on forecast evaluated by F-measure of less representative data class, and consequently permit us to use the improved weak classier as initial Adaboost resident. We can see also that accuracy of final heterogeneous ensemble, for the first time on the 3% maximum single classifier performance surpassed and homogenious combination of their ensembles.
Список литературы Optimization of classifiers ensemble construction: case study of educational data mining
- Romero C., Ventura S. Educational Data Mining: A Review of the State of the Art // IEEE Trans. Syst. Man Cybern. Part C Appl. Rev., 2010, vol. 40, no. 6, pp. 601-618. DOI: 10.1109/TSMCC.2010.2053532
- U.S. Department of Education, Office of Educational Technology // Enhancing Teaching and Learning Through Educational Data Mining and Learning Analytics: An Issue Brief. Washington, D.C., 2012, Available at: https://tech.ed.gov/learning-analytics/edm-la-brief.pdf (accessed: 03.07.2018).
- Baker R.S., Inventado P.S. Educational Data Mining and Learning Analytics // In: Larusson J., White B. (Eds.). Learning Analytics. Springer, New York, NY, 2014, pp. 61-75. DOI: 10.1007/978-1-4614-3305-7_4
- Calvet Liñán L., Juan Pérez Á.A. Educational Data Mining and Learning Analytics: Differences, Similarities, and Time Evolution // RUSC. Universities and Knowledge Society Journal, 2015, vol. 12, no. 3, pp. 98-112. DOI: 10.7238/rusc.v12i3.2515
- Jovanovic M., Vukicevic M., Milovanovic M., Minovic M. Using Data Mining on Student Behavior and Cognitive Style Data for Improving E-Learning Systems: a Case Study // I. Journal of Computational Intelligence Systems, 2012, vol. 5, no. 3, pp. 597-610. DOI: 10.1080/18756891.2012.696923
- Berland М., Baker R.S., Blikstein P. Educational Data Mining and Learning Analytics: Applications to Constructionist Research // Tech Know Learn., 2014, vol. 19, pp. 205-220.
- DOI: 10.1007/s10758-014-9223-7
- Slater S., Joksimovic S., Kovanovic V., et al. Tools for Educational Data Mining: A Review // Journal of Educational and Behavioral Statistics, 2017, vol. 42, no. 1, pp. 85-106.
- DOI: 10.3102/1076998616666808
- Castro-Wunsch K., Ahadi A., Petersen A. Evaluating Neural Networks as a Method for Identifying Students in Need of Assistance // SIGCSE' 17, March 08-11, 2017, Seattle, WA, USA.
- DOI: 10.1145/3017680.3017792
- Hussain S., Fadhil M.Z., Salal Y.K., Theodoru P., Kurtoğlu F., Hazarika G.C. Prediction Model on Student Performance Based on Internal Assessment Using Deep Learning // I. Journal of Emerging Technologies in Learning, 2019, vol. 14, no. 8, pp. 4-22.
- DOI: 10.3991/ijet.v14i08.10001
- Wu X., Kumar V., Quinlan R.J. et al. Top 10 Algorithms in Data Mining // Knowl. Inf. Syst., 2008, vol. 14, pp. 1-37.
- DOI: 10.1007/s10115-007-0114-2
- Kumar M., Salal Y.K. Systematic Review of Predicting Student's Performance in Academics // I. J. of Engineering and Advanced Tech., 2019, vol. 8, no. 3, рp. 54-61.
- Smith-Miles K.A. Cross-Disciplinary Perspectives on Meta-Learning for Algorithm Selection // ACM Comput. Surv., 2008, vol. 41, no. 1, Article 6, 25 p.
- DOI: 10.1145/1456650.1456656
- Vilalta R., Giraud-Carrier C., Brazdil P. Meta-Learning - Concepts and Techniques // In: Data Mining and Knowledge Discovery Handbook, Springer, 2010, pp. 717-732.
- DOI: 10.1007/978-0-387-09823-4_36
- Salal Y.K., Abdullaev S.M., Kumar M. Educational Data Mining: Student Performance Prediction in Academic // I. J. of Engineering and Advanced Tech., 2019, vol. 8, no. 4C, pp. 54-59.
- Trabelsi M., Meddouri N., Maddouri M. A New Feature Selection Method for Nominal Classifier Based on Formal Concept Analysis // Procedia Computer Science, 2017, vol. 112, pp. 186-194.
- DOI: 10.1016/j.procs.2017.08.227
- Quinlan J.R. Induction of Decision Trees // Machine Learning, 1986, no. 1, pp. 81-106.
- DOI: 10.1007/BF00116251
- Kohavi R., John G.H. Wrappers for Feature Subset Selection // Artificial Intelligence (97), 1997, pp. 273-324
- DOI: 10.1016/S0004-3702(97)00043-X
- Freund Y., Schapire R.E. A Short Introduction to Boosting // J. of Japanese Society for Artificial Intelligence, 1999, vol. 14, no. 5, pp. 771-780.