Literature Survey on Student’s Performance Prediction in Education using Data Mining Techniques
Автор: Mukesh Kumar, A.J. Singh, Disha Handa
Журнал: International Journal of Education and Management Engineering(IJEME) @ijeme
Статья в выпуске: 6 vol.7, 2017 года.
Бесплатный доступ
One of the most challenging tasks in the education sector in India is to predict student's academic performance due to a huge volume of student data. In the Indian context, we don't have any existing system by which analyzing and monitoring can be done to check the progress and performance of the student mostly in Higher education system. Every institution has their own criteria for analyzing the performance of the students. The reason for this happing is due to the lack of study on existing prediction techniques and hence to find the best prediction methodology for predicting the student academics progress and performance. Another important reason is the lack in investigating the suitable factors which affect the academic performance and achievement of the student in particular course. So to deeply understand the problem, a detail literature survey on predicting student’s performance using data mining techniques is proposed. The main objective of this article is to provide a great knowledge and understanding of different data mining techniques which have been used to predict the student progress and performance and hence how these prediction techniques help to find the most important student attribute for prediction. Actually, we want to improve the performance of the student in academic by using best data mining techniques. At last, it could also provide some benefits for faculties, students, educators and management of the institution.
Educational Data Mining, Prediction Techniques, Student attributes, Classification
Короткий адрес: https://sciup.org/15014097
IDR: 15014097
Текст научной статьи Literature Survey on Student’s Performance Prediction in Education using Data Mining Techniques
Published Online November 2017 in MECS
Available online at
observed by using internal assessment and co-curriculum. In the Indian context, an institution with the higher degree of reputation using the good academic record as its basic criteria for their admissions [1]. There are lots of definitions of student academic performance prediction should be given in the literature. Different authors are using different student factors/attributes for analyzing student performance. Most of the author used CGPA, Internal assessment, External assessment, Examination final score and extra co-circular activities of the student as prediction criteria.
Most of the Indian institution and universities using final examination grade of the student as the student academic performance criteria. The final grades of any student depend on different attributes like internal assessment, external assessment, laboratory file work and viva-voce, sessional test. The performance of the student depends upon how many grades a student score in the final examination. Norlida Buniyamin, Pauziah Mohd Arsad et al. (2013) stated that what are the significance of academic analytics for an educational institution and how they work for the improvement of education. They also proposed an intelligent recommendation intervention system to improve the student’s performance and achievement in education.
Nomenclature DT Decision Tree |
RB |
Rule based |
NB Naive Bayes KNN K-Nearest Neighbor |
NN |
Neural Network |
This system uses two different student attribute to measure the achievement and that is student grade and student information [2]. Zaidah Ibrahim and Daliela Rusli et al. (2007) stated that predicting student's performance is very critical for any educational institution because it is important for the formation of new rule and standards for the improvement of the education and reputation. They used CGPA and demographic attributes of the first year student to predict their result in the first year of education in engineering [3].
Data mining techniques which are used in mostly education are known as Educational data mining. There are lots of data mining techniques are available to predict the student performance. Education data mining help to find the hidden information from a huge database of education setting, because at present lots of data are generated in educational institution related to student [4]. Further, this hidden information can be used for performance, dropout and final result prediction of the student. It also helps the educator, management and faculties to work according to the learning standards of the students. Actually data mining help in the different field of education sector [5]. So to properly understand the real meaning of the data mining in education we need to do a systematic literature review on different work done by the different researcher. Our main objectives to this proposed work are:
-
i. To understand, analyse and then find the difference between different prediction techniques of data mining in education.^
-
ii. To identify and understand different student attributes which are mainly used for the predicting the student performance.^
-
iii. To identify and understand the different prediction techniques which are mainly used for predicting the student performance.
-
2. Research Questions Formation and Search Strategy for Literature Review
The above points are the main focus of our study. In section 2, the main focus will be given on the methodology adopted for the formation of research questions for this paper and literature survey. In section 3-4, the main focus of the study is to find or identify the important factors on Predicting Student's performance and prediction methods used for student performance. In section 5-6, the main focus is one the overall discussion on the result of the study and in the last conclusion and future work scope is given.
The main purpose of literature survey is to find out new techniques to work on the old data set and then find out some new information form that. To do some relational survey, the literature of more than 10 years should be taken into consideration and then find out some knowledge gaps between works done by the researcher. It helps to justify your research questions and gave some direction for future research.
Formulation of Research Questions : Research question formation is one of the essential tasks when going for written any research paper. Before the formation of any research question try to understand and following the Kitchenhams steps. B. Kitchenhams, R Pretorius et al. (2010), stated that PIOC (Population, Intervention, Outcome, Context) are the most critical factors which are considered when going to frame research question for you research paper [6].
Table 1. Research Question Formation Criteria
Criteria Detail of targeted organisation
Population University, Engineering Institution (Private/Govt), students
Intervention Data Mining Techniques/ method used for prediction of student performance and progress in education
Outcome Student performance prediction accuracy, finalise prediction techniques
Context University, Schools and colleges ( Private and Government)
From the above table, everything is clear about the research target organisation, techniques undertaken during the review, related outcome and affected organisation. Considering the above criteria in mind when framing the research question, we restricted the scope of this study with these research questions.
-
i. Try to identify those student attributes which are helpful for predicting student academic performance.
-
ii. Try to identify those data mining techniques which are mostly used for predicting student academic performance.
-
3. Important Factors of Students Used for Predicting Student’s Performance
The prediction of SAP is based on different factors of student’s like an individual, community, psychological and environmental variables. During last few years lots of researches have been carried out to predict students’ academic performance. So in this section, we are taking few research articles into consideration and then analyse them for different student's factors which affect the student academics prediction. Almost 30-40 research papers, article, book chapters are considered for review. Farhana Sarker and Hugh C Davis (2013) in his research showed that the institutional internal data sources (IDS) and external data source (EDS) gave the best result than the model based on only institutional internal student databases [4]. In another study, D. M. D. Angeline (2013) used Internal Assessment Test grade, Assignment submission and Grade, Correct Response, Self-Confidence, Interest in the particular course and Degree ambition for prediction of student's academic performance [5]. Abeer Badr El Din Ahmed et. al. (2014) in his study used the course of the student, HSD, mid-term marks, Lab test grade, seminar performance, assignment, attendance, homework, student participation for prediction SAP [6]. Fadhilah Ahmad and Azwa Abdul Aziz (2015) collected data from the database of Academic Department, UniSZA that stored in Informix Database Management System (DBMS). They further used nine different parameters like gender, race and hometown, GPA, family income, university entry mode, grades Malay Language, English, and Mathematics [7]. Mashael A. Al-Barrak and Mona S. Al-Razgan (2015) collected dataset of student's from the Information Technology department at Kin Saud University, Saudi Arabia for their analysis. They further used the different attribute for the prediction like student ID, student name, student grades in three different quiz's, midterm1, midterm2, project, tutorial, final exam, and total points obtained in Data structure course of computer science department [8]. Edin Osmanbegović and Mirza Suljic (2012) collected data from surveys in the midst of first-year students and the data taken during the enrollment at the University of Tuzla. They further used the different attribute for the prediction like Gender, Family, Distance, High School, GPA, Entrance exam, Scholarships, Time, Materials, the Internet, Grade importance, Earnings [9]. Raheela Asif and Mahmood K. Pathan (2014) in his study they used four academic batches of Computer Science & Information Technology (CS&IT) department at NED University, Pakistan. They used HSC marks, marks in MPC, Maths marks in HSC, marks in various subject studied in the regular course of a programming language, CSA, Logic design, OOP, DBMS, ALP, FAM, SAD, Data Structure etc for their analysis [10]. Mohammed M. Abu Tair and Alaa M. El-Halees (2012) in his study tried to extract some useful information from student's data of Science and Technology College – Khan Younis. They initially selected different attributes like Gender, date of Birth, Place of Birth, Speciality, Enrollment year, Graduation year, City, Location, Address, Telephone number, HSC Marks, SSC school type, HSC obtained the place, HSC year, College CGPA for analysis. But after preprocessing of the data they found that attribute like Gender, Speciality, City, HSC Marks, SSC school type, College CGPA are most significant [11]. Azwa Abdul Aziz and H.I.F Ahmad (2014) used first-semester student data of Bachelor of Computer Science from University Sultan ZainalAbidin (UniSZA) for analysis. They used the attributes like Gender, race, Hometown Location, University Entry Mode, Family Income for data collection [12]. K.D Kolo and J.K Alhassan (2015) collected computer science student's data of Nigerian Colleges of Education. In his study, they considered Data Structure course of computer science is one of the most important subjects and hence collect data respective to this subject. They considered student attributes like Student's grade, Student's status, Students gender, financial strength, Attitude to learning as important factors for the prediction of SAP [13]. Jyoti Bansode (2016) for predicting student academics performance collected data from Shah and Anchor Kutchhi Polytechnic, Chembur, Mumbai. They considered student attributes like parent's education, parent' s occupation, category, SSC board, admission type, SSC medium, SSC class, first-semester result, second-semester, third-semester, forth-semester, the fifth-semester and sixth-semester result as most important attributes [15]. R. Sumitha and E.S. Vinoth Kumar (2016) for his research collected data of around 350, BE (CSE) students of KLN College of Information Technology. Initially, they selected 24 attributes for analysis, but finally attributes with the higher ranking are
taken into consideration for the classification purpose. The selected attribute are CGPA, arrears, attendance, SSC marks, Engineering Cut-off, medium-of-education and type of Board [16]. Mrinal Pandey and S. Taruna (2016) for this study used datasets from an engineering Institution. They included the data related to the student’s academics attributes as well as their demographics information [18]. Maria Goga, Shade Kuyoro, Nicolae Goga (2015) used student data from Babcock University, Nigeria. On the basis of reviewed literature, they considered age, gender, parent's marital status, parent's qualification, parent's occupations, SSC score, HSC score, CGPA first year [19]. Maria Koutina and Katia Lida Kermanidis (2011) they tried to find out the best techniques for predicting the final grade of the postgraduate students of Ionian University Informatics, Greece. On the basis of reviewed literature, they considered Gender, Age, Marital Status, Number of children, Occupation, Job associated with computers, Bachelor, Another master, Computer literacy, Bachelor in informatics [24].
-
4. Different Data Mining Techniques used for Predicting Student’s Performance
After research questions formation we need to do the pilot study on the related topic and then need to find out the research gaps between different works done the different researcher by using data mining techniques. Before start the literature survey, everything should be clear in the mind of the researcher that what they want to search and how the search can be done.
Search strategy for literature review:
Searched databases: Springer Link, Researchgate, IEEE Xplore, ACM Digital library, Elsevier, Science Direct other computer science journals. Searching sentences and keywords: Predicting student performance, Predicting student performance uses data mining techniques, Application of data mining in education, Educational Data Mining methodology or techniques, Prediction of student result using data mining techniques. Publication periods are taken into consideration: 2007 to July 2016. Types of text searched: Documents, PDF, Full-length paper with abstract and keywords. Search Items: Journal articles, Conferences paper, Workshop papers, Expert lectures or talks, topics related blogs, Topic related communities (like Educational data mining community).
After reviewed almost 20-25 research paper, we found that in most of the cases, student's factors which affect the SAP are gender, high school grade, student's parental education, financial background, living location, medium of teaching, student's family status, students' previous semester marks, class test grade, seminar performance, assignment performance, general proficiency, attendance in class and lab work, Interest in particular course, Study Behaviour, Engage Time and Family Support for study, admission type, previous schools marks, accommodation type, parent's qualification, parent's occupation. All these attributes fall into different categories like personal, family, Academic, Institutional and Social. ^
The most important personal attributes of the student like gender, age, interested in the study, admission type, Study Behaviour are taken into consideration [7, 8, 9, 11, 12, 13, 18, 19, 24]. The family attributes like parent’s qualification, parent’s occupation, family income, family status, Family Support for study are also taken as important for the academics prediction [7, 9, 15, 19, 24]. Whereas for academic attributes like high school grade, students’ previous semester marks, class test grade, seminar performance, assignment performance, attendance in class and lab work, previous schools marks are taken into consideration [5, 6, 7, 8, 9, 10, 15, 16, 18, 19, 24] and for institutional attributes most the researcher are taken medium of teaching, accommodation type, infrastructure, water and toilet facilities, teaching methodology, transportation facilities into consideration[4, 7, 9, 12, 16, 18, 24].
In Educational data mining field, making a prediction about student academic performance is usually done. To build a predictive modelling we need to take different data mining techniques into consideration like classification, clustering association rule mining and regression analysis. In almost every research paper, the only classification algorithm is taken into consideration for predicting student academic performance. There are so many classification techniques available for prediction but we are taking into consideration only decision tree, Naive Bayes, Support Vector Machine (SVM), Artificial Neural Networks (ANN), K-Nearest Neighbor, SMO, Linear Regression, Random Forest, Random Tree, REPTree, LADTree, J48 etc. Table-2 gave a brief finding of different research papers with their author’s name, main attributes helpful for prediction accuracy with different data mining algorithm used.
Table 2. Different Data Mining Techniques used for Predicting Student’s Performance
From the above table, we find that Maria Koutina and Katia Lida Kermanidis, In his research found the 100% accuracy with Naive Bayes and K-Nearest Neighbor algorithm [24]. They represented their result in Table 6 under “Total accuracy (%) of re-sample data and feature selection”. For prediction student academic performance they used attributes like Gender, Age, Marital Status, Number of children, Occupation, Job associated with the computer, Bachelor, Another master, Computer literacy, Bachelor in informatics.
-
5. Discussion on This Predicting Student’s Survey
In this particular section, we will discuss the main finding of our meta-analysis. In this meta-analysis, we find that mostly used data mining algorithm for SAP is Decision Tree (DT), Naive Bayes (NB), Artificial Neural Networks (ANN), Rule-based (RB) and K-Nearest Neighbor (KNN). In Decision tree algorithm the maximum and minimum accuracy for predicting student’s academic performance are 99.9% and 66.8% respectively. To find the maximum prediction accuracy Maria Goga, Shade Kuyoro and Nicolae Goga used the combination of student’s attribute like family, PEP, EES, end of first session result [19]. In Naive Bayes algorithm, the maximum and minimum accuracy for predicting student's academic performance are 100% and 63.3% respectively. Maria Koutina et. al. used the different combination of student's attribute like Gender, Age, Marital Status, Number of children, Occupation, Job associated with the computer, Bachelor, Another master,
Computer literacy, Bachelor in informatics for getting maximum accuracy [24]. In rule-based algorithm, the maximum and minimum accuracy for predicting student's academic performance are 96.7% and 55.0% respectively. To find the maximum prediction accuracy Maria Goga et al. used a combination of student's attribute like family, PEP, EES, end of first session result [19]. In K-Nearest Neighbor algorithm the maximum and minimum accuracy for predicting student’s academic performance are 100% and 74% respectively [24]. In Artificial Neural Networks (ANN) the maximum and minimum accuracy for predicting student’s academic performance are 89.8% and 67.6% respectively. To find the maximum prediction accuracy Mashael A. Al-Barrak and Mona S. Al-Razgan used a combination of student's attribute like first mid-term examination in their first-year course [8]. Table-3 gave a brief representation of result analysis.
Table 3. Student Academic Performance Prediction Techniques with Their Accuracy
Data Mining Techniques |
DT |
NB |
RB |
KNN |
NN |
Highest Accuracy |
99.9% |
100% |
96.7% |
100% |
89.8% |
Lowest Accuracy |
66.8% |
63.3% |
55.0% |
74% |
67.6% |
Average Accuracy |
83.35% |
81.65% |
75.85% |
87% |
78.7% |
Fig. 1 shows the prediction accuracy that uses classification method grouped by algorithms for predicting student’s performance since 2012 to 2016.

Fig.1. Student Academic Performance Prediction Grouped By Algorithm used
-
6. Conclusion and Future Work
At present research in educational data mining create lots of interest in the research community. Because predicting student academic performance, predicting educational dropout student in near future, predicting institute placement and admission in a new academic year is most useful for educators and management and educational policy maker. It also used for improving the teaching-learning process in the institution as well. This paper has reviewed lots of research papers, the article on predicting student's academic performance with selected attribute and an analytical algorithm used. In most of the cases, CGPA and the internal marks of the student in academic are important attributes for prediction of result. In one of the research paper author's find 100% accuracy for their prediction with a combination of different attributes like Gender, Age, Marital Status, Number of children, Occupation, Job associated with the computer, Bachelor, Another master, Computer literacy, Bachelor in informatics. In the case of data mining prediction, classification is frequently used technique. Most of the researchers used Decision Tree, Naive Bayes and Rule- Based algorithm for predicting student’s academic performance. At the end, we conclude that the meta-analysis on predicting student's academic performance motivated us to do further research work in our own educational environment. It will really help to improve our education system to check the regular performance of the student.
Acknowledgements
I am grateful to my guide Prof. A.J. Singh and Dr Disha Handa for all help and valuable suggestion provided by them during the study.
Список литературы Literature Survey on Student’s Performance Prediction in Education using Data Mining Techniques
- Mihai Dascalu and Elvira Popescu et. al., Predicting Academic Performance Based on Students’ Blog and Microblog Posts, Springer International Publishing Switzerland 2016 K. Verbert et al. (Eds.): EC-TEL 2016, LNCS 9891, pp. 370–376, 2016. DOI: 10.1007/978-3-319-45153-4_29.
- U. bin Mat, N. Buniyamin, P. M. Arsad, R. Kassim, An overview of using academic analytics to predict and improve students’ achievement: A proposed proactive intelligent intervention, in: Engineering Education (ICEED), 2013 IEEE 5th Conference on, IEEE, 2013, pp. 126–130.
- Randa Kh. Hemaid and Alaa M. El-Halees, Improving Teacher Performance using Data Mining, International Journal of Advanced Research in Computer and Communication Engineering Vol. 4, Issue 2, February 2015.
- Farhana Sarker, Thanassis Tiropanis and Hugh C Davis, Students’ Performance Prediction by Using Institutional Internal and External Open Data Sources, http://eprints.soton.ac.uk/353532/1/Students' mark prediction model.pdf, 2013.
- D. M. D. Angeline, Association rule generation for student performance analysis using an apriori algorithm, The SIJ Transactions on Computer Science Engineering & its Applications (CSEA) 1 (1) (2013) p12–16.
- Abeer Badr El Din Ahmed and Ibrahim Sayed Elaraby, Data Mining: A prediction for Student's Performance Using Classification Method, World Journal of Computer Application and Technology 2(2): 43-47, 2014.
- Fadhilah Ahmad, Nur Hafieza Ismail and Azwa Abdul Aziz, The Prediction of Students’ Academic Performance Using Classification Data Mining Techniques, Applied Mathematical Sciences, Vol. 9, 2015, no. 129, 6415 - 6426HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2015.53289.
- Mashael A. Al-Barrak and Mona S. Al-Razgan, predicting students’ performance through classification: a case study, Journal of Theoretical and Applied Information Technology 20th May 2015. Vol.75. No.2.
- Edin Osmanbegović and Mirza Suljic, DATA MINING APPROACH FOR PREDICTING STUDENT PERFORMANCE, Economic Review – Journal of Economics and Business, Vol. X, Issue 1, May 2012.
- Raheela Asif, Agathe Merceron, Mahmood K. Pathan, Predicting Student Academic Performance at Degree Level: A Case Study, I.J. Intelligent Systems and Applications, 2015, 01, 49-61 Published Online December 2014 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijisa.2015.01.05.
- Mohammed M. Abu Tair, Alaa M. El-Halees, Mining Educational Data to Improve Students’ Performance: A Case Study, International Journal of Information and Communication Technology Research, ISSN 2223-4985, Volume 2 No. 2, February 2012.
- Azwa Abdul Aziz, Nor Hafieza Ismailand Fadhilah Ahmad, First Semester Computer Science Students’ Academic Performances Analysis by Using Data Mining Classification Algorithms, Proceeding of the International Conference on Artificial Intelligence and Computer Science(AICS 2014), 15 - 16 September 2014, Bandung, INDONESIA. (e-ISBN978-967-11768-8-7).
- Kolo David Kolo, Solomon A. Adepoju, John Kolo Alhassan, A Decision Tree Approach for Predicting Students Academic Performance, I.J. Education and Management Engineering, 2015, 5, 12-19 Published Online October 2015 in MECS (http://www.mecs-press.net) DOI: 10.5815/ijeme.2015.05.02.
- Dr Pranav Patil, a study of student’s academic performance using data mining techniques, international journal of research in computer applications and robotics, ISSN 2320-7345, vol.3 issue 9, pg.: 59-63 September 2015.
- Jyoti Bansode, Mining Educational Data to Predict Student‘s Academic Performance, International Journal on Recent and Innovation Trends in Computing and Communication ISSN: 2321-8169, Volume: 4 Issue: 1, 2016.
- R. Sumitha and E.S. Vinoth kumar, Prediction of Students Outcome Using Data Mining Techniques, International Journal of Scientific Engineering and Applied Science (IJSEAS) – Volume-2, Issue-6,June 2016 ISSN: 2395-3470.
- Karishma B. Bhegade and Swati V. Shinde, Student Performance Prediction System with Educational Data Mining, International Journal of Computer Applications (0975 – 8887) Volume 146 – No.5, July 2016.
- Mrinal Pandey and S. Taruna, Towards the integration of multiple classifiers pertaining to the Student's performance prediction, http://dx.doi.org/10.1016/j.pisc.2016.04.076 2213-0209/© 2016 Published by Elsevier GmbH. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/ licenses/by-nc-nd/4.0/).
- Maria Goga, Shade Kuyoro, Nicolae Goga, A recommender for improving the student academic performance, Social and Behavioural Sciences 180 (2015) 1481 – 1488.
- Anca Udristoiu, Stefan Udristoiu, and Elvira Popescu, Predicting Students’ Results Using Rough Sets Theory, E. Corchado et al. (Eds.): IDEAL 2014, LNCS 8669, pp. 336–343, 2014. © Springer International Publishing Switzerland 2014.
- Parneet Kaur, Manpreet Singh, Gurpreet Singh Josan, Classification and prediction based data mining algorithms to predict slow learners in education sector, Procedia Computer Science 57 (2015) 500 – 508.
- M. Durairaj and C. Vijitha, Educational Data mining for Prediction of Student Performance Using Clustering Algorithms, International Journal of Computer Science and Information Technologies, Vol. 5 (4), 2014, 5987-5991.
- Mohammed I. Al-Twijri and Amin Y. Noaman, A New Data Mining Model Adopted for Higher Institutions, Procedia Computer Science 65 ( 2015 ) 836 – 844, doi: 10.1016/j.procs.2015.09.037.
- Maria Koutina and Katia Lida Kermanidis, Predicting Postgraduate Students’ Performance Using Machine Learning Techniques, L. Iliadis et al. (Eds.): EANN/AIAI 2011, Part II, IFIP AICT 364, pp. 159–168, 2011. © IFIP International Federation for Information Processing 2011.