Predicting and analyzing student absenteeism using machine learning algorithm

Автор: Mukli Lindita, Rista Amarildo

Журнал: Интеграция образования @edumag-mrsu

Рубрика: Международный опыт интеграции образования

Статья в выпуске: 2 (107), 2022 года.

Бесплатный доступ

Introduction. In a developed society, the state should invest in the education of the younger generation. In less developed countries, Albania included, there are no nation-wide studies to show the factors that affect the lack of students in classrooms. The purpose of this study is to predict, analyze, and evaluate the possible causes of student absenteeism using machine learning algorithms. The attributes taken into account in this study are related to the family, demographic, social, university, and personal aspects according to academic criteria. Materials and Methods. Student absenteeism covers any student that has not attended class, irrespective of the reason. The data set consists of 26 attributes and 210,000 records corresponding to the teaching hours of 500 students during an academic year at Faculty of Information Technology. The students participating in the survey range from 18 to 25 years of age of both genders. The compilation of the student questionnaire was based on reviewing the literature and analyzing 26 attributes that we categorized into 5 groups included in the questionnaire. Results. This paper provides knowledge in the analysis and evaluation of factors that lead students to miss lectures using machine learning. It is important to note that this study was conducted on students of this faculty, and as such, the results may not be generalized to all universities. That’s why, researchers are encouraged to test the results achieved in this paper on other clusters. Discussion and Conclusion. The paper provides recommendations based on the findings by offering different problem-solving strategies. The questionnaire used only for 500 Faculty of Information Technology students can be widely applied in any educational institution in the region. However, the results of this study cannot be generalized for the student and youth population of other regions or other countries. This paper provides an original and easily usable questionnaire suitable to various study programs and universities.

Еще

Student absenteeism, family, demographic, social, university, personal aspects, data mining, machine learning

Короткий адрес: https://sciup.org/147237991

IDR: 147237991   |   DOI: 10.15507/1991-9468.107.026.202202.216-228

Текст научной статьи Predicting and analyzing student absenteeism using machine learning algorithm

Original article

Many developed countries use assessments tools and national surveys to assess the quality of teaching, as well as the determination of indicators that affect the motivation of students to achieve the best possible results in their studies.

S. Larabi-Marie-Sainte, R. Jan, A. Al-Matouq, S. Alabduhadi have pointed that student’s academic performance can be affected by several factors and one of them is student absences [1]. Marsh, Paulsen, and Richardson suggest that “student ratings demonstrate acceptable psychometric properties which can provide important evidence for educational research” [2–4]. Despite being aware of the harmful effects that absenteeism holds on academic performance, the absenteeism level remains high. J. Childs, R. Lofton have showed that the root causes of chronic absenteeism are complex [5]. M.H. Bahadori, A. Salari, I. Alizadeh, F. Moaddab, L. Rouhi have recommended that educational planners and policymakers pay more attention to the factors mentioned by students as the most important causes of absenteeism [6]. FTI part of UAMD (Aleksandër Moisiu University of Durrës, 2021) “the second-largest public academic institution of the Republic of Albania which enrolls about 500 students each year” is experiencing high rates of absences. If not addressed accordingly, the problem of absenteeism may reduce academic performance and have an impact on many social issues. Many factors influence student absenteeism, thus predicting it many a time proves to be very challenging. Özcan, found that “poor academic outcomes, parental involvement, school management, and

school schedules, as well as health issues and a lack of social activities, are the main factors influencing student absenteeism” [7]. Additionally, Balkis et al. observed that the major reason that was given by students for non-attendance, related to attitudes towards teacher and school, lack of motivation, level of parents education [8]. Based on the work of I. Dey and Kassarnig et al., it is understood that attendance is amongst the most crucial elements in determining a student’s academic performance and success [9; 10]. Wadesango and Machingambi found that auditor condition, socio-economic factors, and relations between students and lecturers are the main factor leading students toward nonattendance [11]. In their work, B.N. Young, W.O. Benka-Coker, Z.D. Weller, S. Oliver, J. W. Schaeffer, S. Magzamen, have shown the connection between student absenteeism and the test scores [12]. Referring to complex factors that influencing high student absenteeism, the usage of Data Mining (DM) and Machine Learning (ML) algorithms is a good method to analyze and predict student absenteeism. Helm et al. refer to ML as “an application of artificial intelligence (AI) that provides to build a model based on training data to make predictions or decisions without being programmed” [13]. DM is a process that extracts and discovers patterns with intelligent methods from a large dataset [14]. Based on the nature of the study and dataset organization, we chose classification methods to evaluate the data. Kantardzic states that “classification techniques are part of predictive methods and categorize a given dataset into classes” [15]. By using these methods, we can predict the unknown values by utilizing the known ones [15]. This dataset consists of 26 attributes and 210,000 records corresponding to the teaching hours of 500 students ranging from 18 to 25 years of age during an academic year at FTI. The attributes analyzed refer to demographic, family, university, and personal factors according to academic performance. This study is most helpful to UAMD and can be easily utilized by other universities in Albania. It is also a valuable tool for all universities in the world, serving to guide the management and provide a sense of understanding of the factors that make students not attend. The rest of the paper is structured as follows: Section 2 – an overview of the data mining classification techniques; Section 3 – the methodology; Section 4 is geared towards the findings of the study; Section 5 relates the discussion to the overall results observed and gives some recommendations.

Список литературы Predicting and analyzing student absenteeism using machine learning algorithm

  • Larabi-Marie-Sainte S., Jan R., Al-Matouq A., Alabduhadi S. The Impact of Timetable oh Student's Absences and Performance. Plos one. 2021;16(6):e0253256. doi: https://doi.org/10.1371/journal.pone.0253256
  • Marsh H.W. Students' Evaluations of University Teaching: Research Findings, Methodological Issues, and Directions for Future Research. International Journal of Educational Research. 1987;11(3):253-388. doi: https:// doi.org/10.1016/0883-0355(87)90001-2
  • Paulsen M.B. Evaluating Teaching Performance. New Directions for Institutional Research. Special Issue: Evaluating Faculty Performance. 2002;(114):5-18. doi: https://doi.org/10.1002/ir.42
  • Richardson J.T. Instruments for Obtaining Student Feedback: A Review of the Literature. Assessment & Evaluation in Higher Education. 2005;30(4):387-415. doi: https://doi.org/10.1080/02602930500099193
  • Childs J., Lofton R. Masking Attendance: How Education Policy Distracts from the Wicked Problem (s) of Chronic Absenteeism. Educational Policy. 2021;35(2):213-234. doi: https://doi.org/10.1177/0895904820986771
  • Bahadori M.H., Salari A., Alizadeh I., Moaddab F., Rouhi Balasi L., et al. The Root Causes of Absenteeism in Medical Students: Challenges and Strategies Ahead. Educational Research in Medical Sciences. 2020;9(2):e107120. doi: http://dx.doi.org/10.5812/erms.107120
  • Ozcan M. Student Absenteeism in High Schools: Factors to Consider. Journal of Psychologists and Counsellors in Schools. 2020. p. 1-17. doi: https://doi.org/10.1017/jgc.2020.22
  • Balkis M., Arslan G., Duru E. The School Absenteeism among High School Students: Contributing Factors. Educational Sciences: Theory and Practice. 2016;16(6):1819-1831. doi: https://doi.org/10.12738/estp.2016.6.0125
  • Dey I. Class Attendance and Academic Performance: A Subgroup Analysis. International Review of Economics Education. 2018;28:29-40. doi: https://doi.org/10.1016/j.iree.2018.03.003
  • Kassarnig V., Bjerre-Nielsen A., Mones E., Lehmann S., Lassen D.D. Class Attendance, Peer Similarity, and Academic Performance in a Large Field Study. PloS ONE. 2017;12(11):0187078. doi: https://doi.org/10.1371/ journal.pone.0187078
  • Wadesango N., Machingambi S. Causes and Structural Effects of Student Absenteeism: A Case Study of Three South African Universities. Journal of Social Sciences. 2011;26(2):89-97. doi: https://doi.org/10.1080/09 718923.2011.11892885
  • Young B.N., Benka-Coker W.O., Weller Z.D., Oliver S., Schaeffer J.W., Magzamen S. How Does Absenteeism Impact the Link between School's Indoor Environmental Quality and Student Performance? Building and Environment. 2021;203:108053. doi: https://doi.org/10.1016/j.buildenv.2021.108053
  • Helm J.M., Swiergosz A.M., Haeberle H.S., Karnuta J.M., Schaffer J.L., Krebs V.E., Ramkumar P.N. Machine Learning and Artificial Intelligence: Definitions, Applications, and Future Directions. Current Reviews in Musculoskeletal Medicine. 2020;13(1):69-76. doi: https://doi.org/10.1007/s12178-020-09600-8
  • Schuh G., Reinhart G., Prote J.P., Sauermann F., Horsthofer J., Oppolzer F., Knoll D. Data Mining Definitions and Applications for the Management of Production Complexity. Procedia CIRP. 2019;81:874-879. doi: https://doi.org/10.1016/j.procir.2019.03.217
  • Kantardzic M. Data Mining: Concepts, Models, Methods, and Algorithms. 3rd ed. John Wiley & Sons; 2019. doi: https://doi.org/10.1002/9781119516057
  • Niedermayer D. An Introduction to Bayesian Networks and Their Contemporary Applications. In: Holmes D.E., Jain L.C. (eds.) Innovations in Bayesian Networks. Studies in Computational Intelligence. Springer, Berlin, Heidelberg; 2008. Vol. 156. p. 117-130. doi: https://doi.org/10.1007/978-3-540-85066-3_5
  • Bramer M. Principles of Data Mining. 3rd ed. London; 2016. doi: https://doi.org/10.1007/978-1-4471-7307-6
  • Maalouf M. Logistic Regression in Data Analysis: An Overview. International Journal of Data Analysis Techniques and Strategies. 2011;3(3):281-299. doi: https://doi.org/10.1504/IJDATS.2011.041335
  • Biau G., Scornet E. Rejoinder on: A Random Forest Guided Tour. TEST. 2016;25(2):264-268. doi: https:// doi.org/10.1007/s11749-016-0488-0
  • Pfahringer B., Holmes G., Kirkby R. New Options for Hoeffding Trees. In: Orgun M.A., Thornton J. (eds.) AI 2007: Advances in Artificial Intelligence. AI 2007. Lecture Notes in Computer Science. Vol. 4830. Berlin, Heidelberg: Springer; 2007. doi: https://doi.org/10.1007/978-3-540-76928-6_11
  • Kalmegh S. Analysis of Weka Data Mining Algorithm Reptree, Simple Cart and Randomtree for Classification of Indian News. International Journal of Innovative Science, Engineering & Technology. 2015;2(2):438-446. Available at: http://iiiset.com/vol2/v2s2/IJISET_V2_I2_63.pdf (accessed 21.12.2021).
  • Mathuria M. Decision Tree Analysis on J48 Algorithm for Data Mining. International Journal of Advanced Research in Computer Science and Software Engineering. 2013;3(6). Available at: https://www. academia.edu/4375403/Decision_Tree_Analysis_on_J48_Algorithm_for_Data_Mining (accessed 21.12.2021).
  • Mohamed W.N.H.W., Salleh M.N.M., Omar A.H. A Comparative Study of Reduced Error Pruning Method in Decision Tree Algorithms. In: 2012 IEEE International Conference on Control System, Computing and Engineering. 2012. p. 392-397. doi: https://doi.org/10.1109/ICCSCE.2012.6487177
  • Srivastava S. Weka: A Tool for Data Preprocessing, Classification, Ensemble, Clustering and Association Rule Mining. International Journal of Computer Applications. 2014;88(10):26-29. Available at: https://research. ijcaonline.org/volume88/number10/pxc3893809.pdf (accessed 21.12.2021).
  • Powers D.M. Evaluation: From Precision, Recall and F-measure to ROC, Informedness, Markedness and Correlation. arXivpreprint arXiv. 2020;2010:16061. doi: https://doi.org/10.48550/arXiv.2010.16061
  • Arlot S., Celisse A. A Survey of Cross-Validation Procedures for Model Selection. Statistics Surveys. 2010;4:40-79. doi: https://doi.org/10.1214/09-SS054
Еще
Статья научная