Visual Association Analytics Approach to Predictive Modelling of Students’ Academic Performance

Udoinyang G. Inyang; Imo J. Eyoh; Samuel A. Robinson; Edward N. Udo

doi:10.5815/ijmecs.2019.12.01

Scientific articles \ Education \ Higher education. Universities. Academic study

Visual Association Analytics Approach to Predictive Modelling of Students’ Academic Performance

Author: Udoinyang G. Inyang, Imo J. Eyoh, Samuel A. Robinson, Edward N. Udo

Journal: International Journal of Modern Education and Computer Science @ijmecs

Article in issue: 12 vol.11, 2019.

Free access

Persistent and quality graduation rates of students are increasingly important indicators of progressive and effective educational institutions. Timely analysis of students’ data to guide instructors in the provision of academic interventions to students who are at risk of performing poorly in their courses or dropout is vital for academic achievement. In addition there is need for performance attributes relationship mining for the generation of comprehensible patterns. However, there is dearth in pieces of knowledge relating to predicting students’ performance from patterns. This therefore paper adopts hierarchical cluster analysis (HCA) to analyze students’ performance dataset for the discovery of optimal number of fail courses clusters and partitioning of the courses into groups, and association rule mining for the extraction of interesting course-status association. Agglomerative HCA with Ward’s linkage method produced the best clustering structure (five clusters) with a coefficient of 92% and silhouette width 0.57. Apriori algorithm with support (0.5%), confidence (80%) and lift (1) thresholds were used in the extraction of rules with student’s status as consequent. Out of the twenty one courses offered by students in the first year, seven courses frequently occur together as failed courses, and their impact on the respective students’ performance status were assessed in the rules. It is conjectured that early intervention by the instructors and management of educational activities on these seven courses will increase the students’ learning outcomes leading to increased graduation rate at minimum course duration, which is the overarching objective of higher educational institutions. As further work, the integration of other machine learning and nature inspired tools for the adaptive learning and optimization of rules respectively would be performed.

Association Rule Mining, Predictive analytics, students’ performance, hierarchal clustering, at-risk students

Short address: https://sciup.org/15017148

IDR: 15017148 | DOI: 10.5815/ijmecs.2019.12.01

Text of the scientific article Visual Association Analytics Approach to Predictive Modelling of Students’ Academic Performance

Published Online December 2019 in MECS DOI: 10.5815/ijmecs.2019.12.01

The increasing dependency on technology and methods that are driven by information technology by academic institutions has accounted for the abundance of huge educational data repositories. Moreso, educators and educational administrators have intensified their efforts towards collecting and storing data providing information on the functionality of their educational systems. These repositories have the capacity of storing a large amount of student-related data and information. Students’ data and information are key requirements in educational learning systems, especially in the planning, monitoring, and assessment of educational management systems (EMS). EMS is a platform for data acquisition and collection, validation and processing, analysis and communication of information relating to administrators, students, teachers, staff and infrastructure in the educational environment [1]. EMS is a rapidly progressing field of data mining which concentrates on the search and discovery of interestingly new patterns, techniques, tools, and models for intelligent exploratory analysis and visualization of large educational dataset. EMS is aimed at the extraction of novel and interpretable structures that will enhance comprehensibility of students, their processes and environments [2,3]. Among the important modules of EMS are students’ management (SM), human resource, infrastructure management, school management, and graduands’ management module. The SM module captures and stores students’ data with a unique student enrolment number as the primary key, demographic data, academic status amongst others. Associated with EMS, is learning analytics (LA) and educational data mining (EDM), concerned with the exploratory analysis of the educational dataset and utilizing the outcome directly on the students, teachers and other components in the learning process. EDM and LA try to interpret how students cope and interact with educational resources at their disposal, their learning behavioural patterns, likely final academic outcome, and most importantly, their likelihood of completing their program within the minimum stipulated timeframe. EDM depends more on methods, tools, and techniques while LA focuses on the description of data, knowledge and resultant patterns. Machine learning techniques, statistics, visual analytics, link analysis, opinion, and sentiment analysis are some of the widely used tools of LA while classification, clustering, Bayesian modeling, relationship mining, discovery with models and predictive/prescriptive modeling are often associated with EDM [4]. Predictive analytics employs a range of statistical approaches ranging from machine learning and predictive modelling to data mining to competently analyze the historical and operational data and information to enable predictions about the unidentified future event. Its application cut across several domains including academic performance prediction.

Academic achievement is a key factor considered by recruiting organizations and motivates the monitoring of students’ performance during their academic pursuits. Students have to work hard for outstanding grades, in order to rise up to the potentials of recruiting organizations and meet the expectations of parents/guardians, educators and administrators. Persistent and quality graduation rates of students are increasingly important indicators of progressive and effective educational institutions. Any educational system characterized by high rates (frequency) of drop-outs (students who leave an institution precipitately without completing the desired programme of study), transfer-outs (students who started in one course of study or one institution and, thereafter move to another course or educational institution to enable him/her graduate) stop-outs (students who voluntarily withdraw and leave for a period of time, and then re-enroll in order to complete their programme), and spill-over (students who spend extra year(s) in due to poor performance) is said to fail [5,6]. The early identification of students’ weaknesses during their academic career will guide in the effective provision of necessary pedagogical interventions, suggesting behavioural changes to enhance students’ learning processes and also ensure students’ on-time and satisfactory graduation [7-9]. However, educational systems in most developing countries) lack facilities for automatic predictions of fail or pass percentages of students and cannot account for the number of drop-outs, stop-outs, transfer-out or spill-over students but rather concentrate more on successful students. They have no information about what patterns lead to these at-risk students and cannot identify students who are likely to struggle in their academics at an early stage of their academic pursuits. In consideration of these challenges, this paper employs EDM and LA methodologies (cluster analysis and association rule mining) to model students’ learning processes, for informed decisions and timely pedagogical interventions. Association rule mining [1]. [10] attempts to extract relevant and interesting relationships among items in a database. This paper aims at identifying relationships among courses offered by the students, and the effects of such correlations to learning and academic performance vis-a-vis status. It will also employ cluster analysis to reveal the optimal number of course clusters and their association with student’s status at the end of the minimum duration of the programme.

The rest of the paper is organized as follows. Section II presents literature review with emphasis on cluster analysis and performance-course association rule mining. In section III, the methodological framework is conceptualized for the implementation of the system. Course association rule mining procedure and results are described in Section IV. Discussion of results, and conclusions and further work is presented in section V and VI respectively.

II. Literature Review

Learning analytics and EDM can discover and extract trends in data, and also act as a medium for promoting educational activities by identifying and avoiding failure (or poor performance) trends and patterns while exploiting and utilizing success patterns. In Ref. [11], EDM and learning analytics promise to make sustainable impact on learning and teaching to transform slow learners into effective and better learners [12]. Reference [13] points out that learning analytics involves two major operations namely predicting student learning successes and providing proactive feedbacks. Reference [14] proposed a multivariate based method of predicting students’ results in learning courses associated with web learning while reference [15] reported that, to make sense of large amounts of educational data, intelligent systems must be developed to automatically process the data and provide reports to stakeholders. In reference [16] a LA dashboard to enhance students’ learning performance was developed. The system in reference [16] works by tracking and mining massive online student data and visualizing results so they can be comprehended at a glance. Experimental evaluation indicates that although the LA model did not have a significant impact on student achievement, there was an overall student satisfaction with the dashboard which impacts on students’ understanding level.

References Visual Association Analytics Approach to Predictive Modelling of Students’ Academic Performance

Hussein, S., Dahan, N. A., Ba-Alwib, F. M. and Ribata, N. “Educational Data Mining and Analysis of Students’ Academic Performance Using WEKA Indonesian Journal of Electrical Engineering and Computer Science” 9(2), (2018), 447~459,. DOI: 10.11591/ijeecs.v9.i2.pp447-459
Ray, S., and M. Saeed. “Applications of educational data mining and learning analytics tools in handling big data in higher education”. In Applications of Big Data Analytics, 135-160, 2018. Springer, Cham.
Inyang, U, G, Umoh, U. A., Nnaemeka, C and S. Robinson. “Unsupervised Characterization and Visualization of Students’ Academic Performance Features”. 12(2), 103-105, 2019. doi.org/10.5539/cis.v12n2p103
Romero, Cristóbal, and Sebastián Ventura. "Educational data mining: a review of the state of the art." IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 40, no. 6 (2010): 601-618.
Sklyar, Eduard. "Exploring First-Time Community College Transfer Students' Perception of Their Experience as They Transition to a Large Public Four-Year Institution." PhD diss., Northeastern University, 2017
O'Keeffe, Patrick. "A sense of belonging: Improving student retention." College Student Journal 47, no. 4 (2013): 605-613.
K. E. Arnold, and M. D. Pistilli. Course signals at Purdue: using learning analytics to increase student success. In Proceedings of the 2nd International Conference on Learning Analytics and Knowledge. 267–270, 2012. doi:10.1145/2330601.2330666.
U. G. Inyang and E. E Joshua. “Fuzzy Clustering of Students' Data Repository for At-Risks Students Identification and Monitoring”. Computer and Information Science, 2013. 6(4), 37-50.
J., Xu, Moon, K. H., and M. Van Der Schaar, “A machine learning approach for tracking and predicting student performance in degree programs” IEEE Journal of Selected Topics in Signal Processing, 2017. 11(5), 742-753
R. Agrawal, T. Imielinski and A. Swami “Mining association rules between sets of items in large databases”. in: Proceedings of the ACM SIGMOD Conference on Management of Data, (1993). 207-216.
D. Gašević, S. Dawson, T. Rogers, and D. Gasevic. “Learning analytics should not promote one size fits all: The effects of instructional conditions in predicting academic success”. The Internet and Higher Education, 2016. 28, 68-84.
O. Zaïane “Web usage mining for a better web-based learning environment”, In proceeedings of the 4th International Conference on Advanced Technology for Education (CATE’01), 27–28 June 2001, Banff, Canada.
D. Gašević, N. Mirriahi, and S. Dawson. “Analytics of the effects of video use and instruction to support reflective learning”. In Proceedings of the fourth international conference on learning analytics and Knowledge. 2014. 123-132
N. Zacharis “A multivariate approach to predicting student outcomes in web-enabled blended learning courses, Internet and Higher Education”, 2015, 27, 44–53.
J. Ruipérez-Valiente, P. Muñoz-Merino, D. Leony, and Kloos Delgado. “ALAS-KA: A learning analytics extension for better understanding the learning process in the Khan Academy platform”. Computers in Human Behavior, 2015. 47, 139–148.
Y. Park, and L. Jo. “Development of the Learning Analytics Dashboard to Support Students’ Learning Performance” Journal of Universal Computer Science, 2015. 21(1), 110-133
A. Daud, N. Aljohani, R. Abbasi, M. Lytras, F. Abbas, and J. Alowibdi. “Predicting Student Performance using Advanced Learning Analytics, International World Wide Web Conference Committee (IW3C2)”, 2017, 415-421.
X. Wanli, G. Rui, P. Eva, and G. Sean. “Participation-based student final performance prediction model through interpretable Genetic Programming: Integrating learning analytics, interpretable Genetic Programming: Integrating learning analytics, educational data mining and theory” Computers in Human Behaviour, 2015. 47. 168–181.
J. Hair, R. Anderson, Tatham; and C. Black. Análise multivariada de dados, Bookman, 2005, Porto Alegre, Brazil
R. W. Sembiring, J. M. Zain, and A. Embong. “A comparative agglomerative hierarchical clustering method to cluster implemented course”. Journal of Computing, 2,(12), December 2010, ISSN 2151-9617 Arxiv Preprint Arxiv:1101.4270.
Singh, E. Hjorleifsson, and G. Stefansson. “Robustness of fish assemblages derived from three hierarchical agglomerative clustering algorithms performed on Icelandic ground fish survey data” Journal of Marine Science, 2011, 68(1), 189 –200. doi:10.1093/icesjms/fsq144
O. Yim, and K. T. Ramdeen. “Hierarchical cluster analysis: comparison of three linkage measures and application to psychological data”. The quantitative methods for psychology, 11(1), 2015, 8-21.
Z. Li, and M. deRijke. “The impact of linkage methods in hierarchical clustering for active learning to rank”. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017, 941-944). ACM.
F. Murtagh, and P. Legendre. “Ward’s hierarchical agglomerative clustering method: which algorithms implement Ward’s criterion?” Journal of classification, 31(3), 2014, 274-295.
J. Vesanto, and E. Alhoniemi. “Clustering of the self-organizing map. IEEE Transactions on neural networks, 11(3), 2000, 586-600.
N. Ye. “Data mining: theories, algorithms, and examples”. 2014. CRC press.
P. D. McNicholas, T. B. Murphy, and O’Regan, M. Standardizing the lift of an association rule. Computational Statistics and Data Analysis. 52(10), 2008, 4712-4721.
N. Hussein, A. Alashqur and B. Sowan. “Using the interestingness measure lift to generate association rules”. Journal of Advanced Computer Science & Technology, 4(1), (2015, 156.
F. Verhein. “Frequent pattern growth (FP-growth) algorithm”. School of Information Studies, The University of Sydney, Australia, 2008, 1-16.
J. Han, M. Kamber and J. Pei. “Data mining: Concepts and techniques” (3rd ed.). 2012, San Francisco: Morgan Kaufmann Inc
M. Y. Avcilar, E. Yakut. “Association Rules in Data Mining: An Application on a Clothing and Accessory Specialty Store”. Canadian Social Science. 10(3), 2014.75-83. DOI: 10.3968/4541
M. Dimitrijevic and Z. Bosnjak. “Pruning statistically insignificant association rules in the presence of high-confidence rules in web usage data”. Procedia Computer Science, 35, 2014. 271-280.
Mandave, Pratibha, Megha Mane, and Sharada Patil. "Data mining using Association rule based on APRIORI algorithm and improved approach with illustration." International Journal of Latest Trends in Engineering and Technology (IJLTET), ISSN (2013).
A. M. Shahiri, and W. A. Husain. “Review on predicting student's performance using data mining techniques”. Procedia Computer Science, 72, 2015. 414-422
Meng, Xue-Hui, Yi-Xiang Huang, Dong-Ping Rao, Qiu Zhang, and Qing Liu. "Comparison of three data mining models for predicting diabetes or prediabetes by risk factors." The Kaohsiung journal of medical sciences 29, no. 2 (2013): 93-99.
W. Venables; and D. Smith. “The R Core Team, An introduction to R” (2017). https://cran.r project.org/doc/manuals/r-release/R-intro.pdf. Accessed on 6th June, 2019
R. Ihaka, and R. Gentleman. “R: A Language for Data Analysis and Graphics. Journal of Computational and Graphical Statistics. 5(3), 299-314,996:DOI: 10.1080/10618600.1996.10474713
I. Mohamad and D. Usman. “Standardization and Its Effects on K-Means Clustering Algorithm. Research” Journal of Applied Sciences, Engineering and Technology. 6(17), 2013. 3299-3303
M. Pathak. “Hierarchical Clustering in R”. (2018) https://www.datacamp.com/community/tutorials/hierarchical-clustering-R. Accessed June 28, 2019.
A. Timofeeva. “Evaluating the robustness of goodness-of-fit measures for hierarchical clustering”. In Journal of Physics: Conference Series, January 2019, 1145(1), 012049. IOP Publishing.
P. Carvalho, C. Munita, and A. Lapolli1. “Validity Studies among Hierarchical Methods of Cluster Analysis Using Cophenetic Correlation Coefficient” International Nuclear Atlantic Conference - INAC 2017 Belo Horizonte, MG, Brazil, October 22-27, 2017
R. Gove. “Using the elbow method to determine the optimal number of clusters for k-means clustering”. URL: https://blocks. org/rpgove/0060ff3b656618e9136b, 17-19. (2017)
P. Bholowalia, and A. Kumar. “EBK-means: A clustering technique based on elbow method and k-means in WSN”. International Journal of Computer Applications, 105(9), (2014). 17-24
Wolzinger, Renah, and Henry O'Lawrence. "Student Characteristics and Enrollment in a CTE Pathway Predict Transfer Readiness." Pedagogical Research 3, no. 2 (2018): n2..
M. Charrad, N. Ghazzali, V. Boiteau, and Niknafs, A. “NbClust: An R Package for Determining the Relevant Number of Clusters in a Data Set”. Journal of Statistical Software 61(6), (2014). 1-36
T. Van Craenendonck and H. Blockeel. “Using internal validity measures to compare clustering algorithms”. In AutoML workshop at ICML 2015, 1-8.
J. Deogun and L. Jiang. “Prediction mining–an approach to mining association rules for prediction”. In International Workshop on Rough Sets, Fuzzy Sets, Data Mining, and Granular-Soft Computing, 98-108, (August 2005). Springer, Berlin, Heidelberg.
J. Thakkar and M. Parikh. “An Efficient Approach for Accurate Frequent Pattern Mining Practicing Threshold Values”. International journal of Engineering and Technology. 4(4), (2018) 2394-4099

Visual Association Analytics Approach to Predictive Modelling of Students’ Academic Performance

Text of the scientific article Visual Association Analytics Approach to Predictive Modelling of Students’ Academic Performance

II. Literature Review

References Visual Association Analytics Approach to Predictive Modelling of Students’ Academic Performance