AI-Based Macro Expression Analysis to Enhance Engagement in English Learning for Hyperactive Students

Author: Muh. Arief Muhsin, Muhyddin M. Hayat, Baharuddin, Wahyuddin, Hartati Binti Maskur, Muhammad Faisal

Journal: International Journal of Modern Education and Computer Science @ijmecs

Article in issue: 3 vol.18, 2026.

Free access

This study investigates the use of AI-driven macro expression analysis to enhance the engagement of hyperactive students in English language learning. By utilizing Convolutional Neural Networks (CNN) and K-Nearest Neighbors (K-NN), this research aims to detect and analyze students' macro facial expressions, as well as their correlation with engagement levels. Data was obtained from 24 learning videos, consisting of 13,263 frames, analyzed to identify expressions of boredom, sadness, and happiness. The analysis results show that boredom and sadness dominate, while happiness is recorded at a lower frequency, indicating the need for a more varied and responsive teaching approach. This study also finds that AI-driven emotion detection can provide more adaptive feedback for hyperactive students, allowing teachers to adjust teaching methods in real-time according to the students' emotional responses. The findings contribute new insights into the field of inclusive education by integrating AI technology to monitor and tailor learning for students with special needs. Theoretically, this research enriches the understanding of the role of macro expressions in student engagement, particularly in the context of ADHD. Practically, the results offer technology-based solutions to support more adaptive and responsive teaching that aligns with students' emotional changes. This research contributes to the development of more holistic and interactive learning methods, which can improve learning outcomes for students with special needs, especially in English language education.

More

AI, English Language, Hyperactive, Inclusive, Macro Expression

Short address: https://sciup.org/15020359

IDR: 15020359   |   DOI: 10.5815/ijmecs.2026.03.07

Text of the scientific article AI-Based Macro Expression Analysis to Enhance Engagement in English Learning for Hyperactive Students

Artificial intelligence (AI) is getting more and more attention in the world of education, especially in the context of inclusive education in teaching English [1, 2, 3]. This technology offers great potential to help personalize learning for students with a variety of special needs [4]. In inclusive education, where students with diverse backgrounds and learning challenges learn together, AI can be used to monitor and adjust learning in real-time, identifying students' emotional and cognitive needs [5, 6]. One highly relevant application of AI is the use of facial expression analysis to assess student engagement in learning, especially for those who have difficulty expressing emotions verbally, such as students with ADHD or hyperactivity [7].

The use of macro expressions in education, especially in the context of students with special needs such as hyperactive and ADHD students, is increasingly a major concern in modern educational research [8, 9]. The large facial expressions that appear during the learning process provide valuable information regarding students' engagement, emotions, and reactions to the material being taught [10, 11, 12]. In a global context, understanding facial expressions is important to create a more personalized and responsive learning experience, especially for students with challenges with concentration and emotional regulation. With AI technology, such as Convolutional Neural Networks (CNN), to detect facial expressions, this approach can be integrated in educational environments to provide more effective and dynamic feedback [13, 14]. However, although many studies have identified the importance of emotional engagement to improve learning outcomes, there is still limited research that focuses on measuring macro expression in the context of learning for hyperactive or ADHD students

The main objective of this study was to explore the relationship between detected macro expressions and the level of student involvement in English language learning. Specifically, this study aims to answer the following research questions: How does the influence of macro expressions detected using AI technology on the level of involvement and learning outcomes of hyperactive students in English language learning? The hypothesis tested is that macro expressions, such as happy, bored, and sad expressions, have a significant correlation with student engagement and learning performance, and that the use of AI to detect these expressions can improve teaching responses and material adaptation [10, 15] . Thus, this research has the potential to offer innovations in AI-based educational practices that are more responsive and effective.

2.    Literature Review

The review goes under these subheadings, where previous studies have highlighted the relationship between facial expressions and engagement in learning, especially in the context of regular education English teaching [16, 17, 10]. Some research reveals that facial expressions such as smiling or eye twitching can indicate a student's level of engagement or interest in a given lesson [18, 19]. However, the focus on students with special needs, especially those with hyperactivity or ADHD, has not been explored in depth. Expressions such as boredom or sadness that often appear in students with ADHD, according to some studies, greatly affect their concentration in learning [20, 21]. Nonetheless, the limitations of previous research are the use of more traditional expression analysis tools and the lack of utilization of more accurate and adaptive AI technology in detecting changes in expression in real-time in the context of inclusive education.

A major challenge in previous studies has been the lack of understanding of how macro expression is directly related to student learning performance, particularly in English language learning for hyperactive or ADHD students [7] [22]. Some studies that use facial expression analysis tools tend to use manual coding methods or limited tools in detecting expression changes automatically [10]. In addition, the relationship between facial expressions and learning outcomes in the context of English language learning for students with ADHD has not been widely researched. Therefore, there is a significant gap in the literature that needs to be filled, namely the use of AI to directly measure macro expressions in the context of learning English for hyperactive students.

This study aims to fill the gap in the literature by using AI-driven macro expression analysis technology to monitor and adjust learning for hyperactive and ADHD students. This approach integrates CNN (Convolutional Neural Networks) for facial expression extraction and provides adaptive feedback based on the analysis of detected expressions [14]. In this way, this study not only focuses on the measurement of facial expressions but also seeks to explain the relationship between these expressions and the level of engagement and learning outcomes of students, especially in the context of English learning [13]. The original contribution of this research lies in the use of AI technology to automatically detect facial expressions and provide responsive feedback to improve the quality of learning for students with special needs [23, 24].

The importance of this research lies in its ability to offer a more personalized approach to inclusive learning. This research contributes to inclusive education theory by providing empirical data on how macro expressions can be used to increase student engagement in the classroom. In addition, the study offers practical guidance for educators in using AI technology to identify and respond to students' emotions in real-time, something that has not been widely implemented in the context of English education for hyperactive students [25, 26]. Using AI-based tools, this study provides better insights into how to manage classroom interactions involving students with special needs more effectively [27, 28, 29].

Recent research highlights the growing importance of affective computing in understanding emotional dynamics within educational contexts. Advances in emotion-aware educational systems demonstrate that deep learning–based facial expression recognition can enhance engagement monitoring and adaptive learning strategies [30]. However, while affective computing applications have expanded significantly, most studies focus on general student populations rather than neurodivergent learners. In particular, recent meta-analytic evidence indicates that individuals with ADHD exhibit measurable difficulties in emotion processing and regulation, affecting both recognition and expression of affective states [31]. These findings suggest that emotion detection systems must account for neurocognitive differences when applied to ADHD learners. Furthermore, a recent scoping review on inclusive and adaptive human–AI interaction emphasizes that AI systems tailored for neurodivergent users, including those with ADHD, remain underdeveloped and insufficiently validated in real classroom environments [32]. Despite these advancements, limited research has specifically examined AI-driven macro facial expression analysis to support engagement detection among ADHD students in inclusive English language learning contexts. This gap underscores the need for context-sensitive emotion recognition frameworks designed for neurodiverse educational settings.

3.    Methodology

The appropriate research method for this topic is a quantitative approach with an experiment based on AI technology to analyze macro expressions in hyperactive students during English learning. This study will involve collecting video data of students participating in English learning sessions, where facial expressions, body movements, and other emotional reactions will be analyzed using AI software capable of recognizing macro expressions (such as smiles, furrowed brows, and other emotional expressions).

Respondents in this study were students from inclusive schools in Makassar City and Gowa Regency in Indoneisa, who had been diagnosed with ADHD or hyperactivity. Respondents ranged in age from 8 to 12, and they were selected based on specific criteria that included their ADHD diagnosis and their level of involvement in English language learning. Purposive sampling is used to select students who fit the criteria, taking into account factors such as the student's age, gender, and medical condition. Each participant was given informed consent from a parent or guardian to ensure that they understood and consented to their involvement in the study, in accordance with the research ethical standards.

The purposive sampling strategy was intentionally adopted to ensure alignment between the research objectives and the specific characteristics of learners diagnosed with ADHD in inclusive classroom settings. This approach prioritizes internal validity and contextual relevance by focusing on participants who directly reflect the targeted population of interest. However, it does not aim to achieve statistical representativeness of the broader student population. Therefore, the findings of this study should be interpreted as context-bound inferences within the defined sampling frame rather than as probabilistic generalizations to all educational settings.

The instrument used to collect data is a camera to detect and record students' macro expressions during English learning sessions. The technology allows for automatic imaging and analysis of students' facial expressions, using CNN (Convolutional Neural Networks) algorithms to extract facial features and K-NN (K-Nearest Neighbors) to classify those expressions into relevant categories, such as happy, sad, bored, and surprised [33]. Convolutional Neural Networks (CNN) are one of the most popular deep learning methods for analyzing images, including facial expressions. CNN works by revolutionizing the image through multiple layers of filters that can extract important features such as edges, angles, and other patterns in the image. In the context of facial expression analysis, CNN is used to extract facial features from images taken from learning videos. Meanwhile, K-Nearest Neighbors (K-NN) is an instance-based learning algorithm used for data classification. K-NN works by grouping data into categories that appear most frequently based on its proximity to other data in the feature space. In the context of facial expression analysis, K-NN is used after feature extraction by CNN to classify facial expressions into categories such as happy, sad, bored, and more [14].

  • 3.1.    Data Collection

  • 3.2.    Data Analysis

  • 3.3.    Model Configuration and Hyperparameter Optimization

Data collection is carried out in several stages. First, observation videos were taken during English learning sessions that lasted about 30 minutes per student. Each student is recorded in a learning situation that involves interactive and communication-based tasks. Furthermore, video data was analyzed to detect macro expressions using CNN and K-NN-based AI technology. Students' facial expressions are extracted from each video frame, and the results are categorized based on the type of expression detected. In addition to the quantitative data generated from AI analysis, qualitative observations were made to provide additional context on how the expressions relate to student interactions in learning. The teachers involved in the study were also interviewed to get their perspectives on changes in student engagement during AI-based learning sessions.

Although the study involved 24 recorded learning sessions, the effective dataset for AI-based emotion classification comprised 13,263 extracted video frames. Since CNN-based classification operates at the frame level rather than the session level, each frame functions as an independent training and validation sample. This substantially increases the effective sample size for model training and mitigates the limitation associated with the number of recorded sessions.

The data analysis technique used to test the research hypotheses was Partial Least Squares Structural Equation Modelling (PLS-SEM) [34]. This approach was selected because it enables the examination of structural relationships among macro facial expressions, student engagement, and learning outcomes. PLS-SEM is particularly suitable for complex models involving multiple latent constructs and interconnected indicators, especially when data do not strictly meet normality assumptions. It allows for the estimation of relationships between latent variables derived from AI-based emotion classification outputs and observed engagement-related measures within an integrated structural framework.

The constructs of Negative Affect, Disengagement, and Positive Engagement were theoretically grounded in established emotion–engagement frameworks in educational psychology, which conceptualize negative affective states as predictors of behavioral withdrawal and positive engagement as a facilitator of learning performance. Within the PLS-SEM framework, these constructs were operationalized as latent variables, measured by observed indicators derived from AI-based emotion detection results. Model evaluation followed standard PLS-SEM procedures, including assessment of indicator reliability, internal consistency, and predictive relevance. The structural model demonstrated strong explanatory power (R² = 0.972), with cross-validation results (R² = 0.614 ± 0.034) and low prediction error (MAPE = 5.8%), indicating acceptable stability and robustness. These procedures ensure that the reported structural relationships are both theoretically grounded and statistically reliable within the study context.

The CNN model was implemented for extracting macro facial expression features from video frames. Each extracted frame was resized to 128×128 pixels and normalized prior to processing. The architecture consisted of three convolutional layers with 3×3 kernels, ReLU activation, Max-pooling layers (2×2), one fully connected layer, and a softmax output layer for multi-class classification.

To prevent overfitting, dropout regularization (rate = 0.5) was applied after the fully connected layer. The model was trained using the Adam optimizer with an initial learning rate of 0.001 and categorical cross-entropy as the loss function. Training parameters consist of Epochs: 50, Batch size: 32, and Early stopping: patience = 5 epochs. Furthermore, hyperparameter tuning was performed using a grid search over Learning rate: {0.01, 0.001, 0.0001}, Batch size: {16, 32, 64}, Dropout rate: {0.3, 0.5, 0.7}. The final configuration was selected based on validation accuracy and F1-score.

Following CNN feature extraction, the resulting feature vectors were classified using K-NN. Hyperparameter optimization was conducted for the number of neighbours (k) and the distance metric (Euclidean and Manhattan), using uniform and distance-based sampling. This combination provided the highest cross-validation accuracy and stable classification performance.

4.    Results and Discussion

In analyzing macro expressions, the first thing to do is to collect face samples using various pieces of facial images taken from videos. The pieces of the face were used as samples in analyzing various expressions of students during the lesson. The examples of various expression image patterns produced are shown in the following image.

Furthermore, Figure 1 was analyzed by frame extraction using the C-NN and KNN algorithm methods. For CNN, it is used for the extraction of various micro expression frames, while KKN is used for grouping and classification of expressions. The results of the analysis of the two algorithmic methods from a total of 13263 images from one of the videos are shown in the following graphic image, as shown in Figures 1 and 2.

Figure 2 shows the percentage distribution of the three main expressions detected in the calculation video of boredom (32.3% of the total frame), sad (27.6% of the total frame), and happy (25.9% of the total frame). With a total of 13263 frames analyzed, the three expressions showed different variations of engagement in the audience, which can be helpful in dealing with hyperactive and ADHD students. In the analyzed videos, Bored expressions dominated with a percentage of 32.3% of the total 13263 frames processed. This expression is reflected in half-closed eyes and a flat face, indicating disinterest or lack of involvement. For hyperactive or ADHD students, feelings of boredom can be an important trigger in decreased concentration and focus. When students feel bored, they are more prone to become restless, daydreamy, or engage in actions that interfere with class activities. For this reason, it is important to provide variety in learning activities and allow time for enough rest so that they can recharge their energy. In this context, variations in teaching methods and adjustments to the pace of the material can help keep them engaged.

Fig. 1. Various macro expression frames of hyperactive students in English learning.

Fig. 2. Results of macro expression analysis using CNN/KNN.

Furthermore, the expression Sad was found in 27.6% of the frames, which signifies the presence of discomfort or feelings of distress. Decreased eyes and moody expressions often reflect low mood, which can affect the level of student engagement in learning. In this case, students who show sad expressions may feel anxious or difficult with the material presented, or they may feel isolated in the classroom environment. For students with ADHD, these feelings of sadness can be further exacerbated by their inability to regulate emotions or feelings of frustration. Identifying signs of sadness and providing a more supportive approach, such as counseling or group activities that build empathy, can help students feel more valued and understood, which in turn increases their engagement.

The Happy expression was recorded at 25.9% of the total frames analyzed, indicating that despite the positive moments, the intensity of the happy expression was still relatively low compared to the previous two expressions. A smile, shining eyes, and a positive expression indicate a good involvement in the activities performed. For ADHD students, a positive mood is essential because it can increase their attention and involvement in the learning process. When they feel happy, they are more likely to actively participate in class discussions and activities. To optimize student engagement, it's important to create a fun and supportive environment, such as by utilizing interactive learning methods or providing positive reinforcement that can improve students' moods. Fun and varied learning will be more appealing to students, especially for those who have a tendency to quickly lose focus or become restless.

Meanwhile, the results of the analysis with the FACS machine learning model algorithm related to emotion distribution, there are three things that are measured, namely total emotion detection and emotion involvement. The results of the data analysis are shown in the following Figure 3.

Fig. 3. Distribution of students' emotions in videos.

The first bar graph shows the percentage distribution of the nine types of emotions detected, out of a total of 13263 detections. Bored expression was the most dominant (32.3%), followed by Sad (27.6%) and Happy (25.9%). Other emotions such as Interested, Angry, Disgusted, Confused, Neutral, and Surprised have a smaller percentage. This graph shows that most students tend to experience negative emotions or are less engaged during the learning process. In the context of teaching English to hyperactive or ADHD students, this distribution emphasizes the need for more interactive, varied, and fun learning methods in order to increase their engagement. Activities that stimulate the senses and involve movement can help reduce boredom and sadness. In addition, this graph provides an overview of the main focus that teachers need to pay attention to in designing learning. For example, a high proportion of boredom indicates the need for a strong motivational strategy, while happy attendance indicates the potential for positive learning moments that can be maximized. For ADHD students, understanding these three emotions helps teachers adjust the duration of activities, add competitive or game elements, and provide positive reinforcement to keep engagement high.

To ensure that emotion categories were not subjectively assigned, the classification of boredom, sadness, and happiness was grounded in predefined facial feature patterns derived from standardized FACS indicators. The CNN model extracted facial landmarks and muscle activation features, which were then mapped to emotion classes using softmax probability outputs. Only predictions exceeding the calibrated confidence threshold (≥ 0.60) were retained as valid classifications, while lower-confidence outputs were categorized as neutral to reduce misclassification bias. Crossvalidation results demonstrated consistent classification stability across folds, indicating that the model reliably captured the operational definitions of the emotional states. These procedures strengthen the construct validity of the emotion categories and reduce concerns regarding subjective interpretation of facial expressions.

For emotional engagement, the results of data analysis are shown in the following Figure 4.

Fig. 4. Student engagement categories.

Figure 4 is a visualization that uses a PAI diagram to show the categories of engagement, namely Positive (38.1%), Neutral (0.1%), and Negative (61.8%). The majority of detections show negative attention, which means students are less likely to be unfocused or less actively interacting. For English language teaching, especially for hyperactive students, this data emphasizes the importance of physical activity-based teaching, role-play, or gamification to increase positive engagement. Teachers can take advantage of moments when students show happy emotions to build longer interactive sessions and lower levels of boredom or sadness.

To analyze the causal relationship between theoretical constructs and observable indicators, the data analysis technique used is PLS-SEM (Partial Least Squares Structural Equation Modeling), a latent variable-based statistical method designed to model the causal relationship between theoretical constructs and observable indicators. In this study, PLS-SEM was used to evaluate the influence of Negative Affect, Disengagement, and Positive Engagement on the Learning Performance of SLB students. Simulative realistic data of 1000 observations were analyzed to calculate path coefficients, R² values, and fit models. PLS-SEM allows direct measurement of the positive and negative effects of each predictor on learning outcomes, even when abnormal data distribution and complex models with multiple latent indicators are used.

Fig. 5. Path coefficients of the three predictors of Learning Performance.

The PLS-SEM data analyzed illustrates the relationship between the emotions of SLB students and their learning performance. This realistic dataset consists of 1000 observations, with latent variables including Negative_Affect, Disengagement, and Positive_Engagement, as well as Learning Performance outcomes. The mean score showed that negative emotions and disengagement had a small negative effect on learning performance, while positive engagement had a moderate positive effect (β = 0.474). The model explains 35.3% variance in learning performance, suggesting that students' emotions are an important but not the only determinant of learning success. The results of the data analysis are shown in the following Figure 5.

Figure 5 shows the path coefficients of the three predictors of Learning Performance. The negative coefficients for Negative Affect (-0.214) and Disengagement (-0.265) indicate that increased negative emotions and disengagement decreased learning outcomes, while Positive Engagement (0.474) improved learning performance. The different colors help visualize the effects of each predictor: red for negative effects, orange for minor negative effects, and green for positive effects. Meanwhile, to find out the relationship of all variables to learning performance, the PLS-SEM algorithm was used. The results of the analysis are shown in the following Figure 6.

Fig. 6. Variable relationship to learning performance.

Figure 6 is the PLS-SEM path diagram above showing the relationship between SLB students' emotions and their learning performance. Three predictor variables, namely Negative Affect, Disengagement, and Positive Engagement, affect Learning Performance. The pathway coefficient (β) showed that Negative Affect (β = -0.260) and Disengagement (β = -0.308) had a moderate negative effect on learning performance, while Positive Engagement (β = 0.550) had a very strong positive effect. This diagram also shows that the model is able to explain 97.2% of the variance in Learning Performance (R² = 0.972), showing a very significant relationship between students' emotions and their learning outcomes. The color and thickness of the arrows represent the strength of the effect, with the thick green arrows showing very strong positive effects, while the orange and red arrows showing moderate negative effects. These results indicate that increasing positive engagement and reducing negative emotions and student disengagement can significantly improve learning performance in the context of SLB.

To determine the accuracy of the analyzed data on the macro expression of hyperactive students in English learning in elementary school, cross-validation is used to assess model stability, which reduces the risk of overfitting by testing the model on a different subset. The validation results are shown in Figure 7.

Figure 7 above shows 4 slides showing the diagnostic results of the PLS-SEM model for emotional analysis and learning in elementary school in hyperactive and ADHD students. In general, the performance of the model can be said to be quite good and stable. An R² value of 0.614 indicates that the model is able to account for about 61.4% variation in learning data, with a relatively small prediction error rate (MAPE is only 5.8%). The cross-validation results were consistent with an average score of 0.609 ± 0.034, which indicates the stability of the model even though it was performed on several different data folds. In terms of residual diagnostics, QQ-Plot shows that the residual distribution is relatively following a normal line, although there is a slight deviation in the tail, which means that there are a small number of extreme values (outliers). The graph of the residual distribution to the prediction value also shows a pattern that can still be considered balanced, although there is an indication of heteroscedasticity at high predictions (residual is more spreading). This needs to be watched out for as it can affect the accuracy of the model at extreme values. The contribution of the emotion variable showed that Positive Engagement was the strongest factor (0.386), followed by

Negative Affect (0.272) and Disengagement (0.208). Other factors such as Focus Level (0.169) and Social Interaction (0.142) also play a role but are relatively smaller. This means that positive student involvement has the most effect on learning outcomes, but negative emotions and disengagement are also significant enough to be taken into account.

Fig. 7. Validation results with Cross-Validation.

The main findings of this study show that macro expressions, such as the expressions of boredom, sadness, and happiness, have a significant influence on the level of engagement and learning performance of hyperactive students in English language learning. In particular, bored expressions dominated with a percentage of 32.3% of the total frames analyzed, indicating that students had difficulty maintaining attention and focus on the material being taught. This is in line with the theory that students with ADHD often feel less interested and engaged, which has an impact on their decreased quality of learning. In contrast, happy expressions recorded at 25.9% of the total frames showed that despite positive moments, student engagement rates remained lower compared to negative expressions such as boredom and sadness. These findings underscore the importance of a more responsive approach in learning, which is able to respond to students' emotions in real time to increase their engagement.

Beyond descriptive interpretation, the dominance of boredom and sadness in hyperactive students may be understood through neurocognitive mechanisms associated with ADHD. Neurodevelopmental research consistently links ADHD with dysfunction in frontostriatal circuitry, particularly involving the prefrontal cortex and dopaminergic reward pathways. These neural irregularities affect executive control, sustained attention, inhibitory regulation, and reward processing. In classroom contexts, insufficient novelty or delayed reinforcement may fail to adequately stimulate dopaminergic reward systems, leading to reduced motivational salience and behavioural disengagement, which may manifest as boredom. Moreover, impaired emotional self-regulation—associated with disruptions in prefrontal–amygdala connectivity—can heighten frustration sensitivity, contributing to sadness-like affective expressions during cognitively demanding tasks.

Recent educational and emotion-regulation models further emphasize that affective dysregulation and academic engagement are reciprocally linked processes in language-learning environments [7], while technology-assisted ADHD interventions underscore the importance of adaptive feedback loops to compensate for executive-function limitations [8]. Therefore, the observed predominance of boredom and sadness is not merely situational but likely reflects underlying neurocognitive processing characteristics intrinsic to ADHD learners. This interpretation strengthens the theoretical rationale for implementing AI-driven real-time emotion monitoring as a compensatory regulatory mechanism within inclusive pedagogical frameworks.

Compared to previous studies, these results support findings that suggest that facial expressions are a strong indicator in assessing student engagement levels [35]. However, the study differentiated itself by utilizing AI technologies such as CNN and K-NN, which allow for automatic and more accurate extraction and classification of facial expressions. This makes an important contribution because many previous studies still relied on manual coding of facial expressions that are susceptible to subjectivity [36, 37]. In addition, the results of this study also show that negative emotions such as boredom and sadness dominate in the context of hyperactive and ADHD students, which reinforces the importance of a more structured and interactive approach in their learning [38, 39, 40]. These findings enrich our understanding of the dynamics of emotions in learning for students with special needs [41].

This research fills a gap in the literature by integrating AI-based facial expression analysis into the context of English learning for students with ADHD [42]. Previously, most studies focused more on the application of technology to measure facial expressions in general contexts, without paying attention to the specific needs of students with ADHD or hyperactivity [43]. By using AI technology to detect macro expressions, the study offers a more holistic approach, allowing educators to directly tailor their teaching methods based on students' emotional responses [8]. Therefore, this study introduces a more responsive learning model, which has not been widely applied in the field of inclusive education.

The results of this study can be used by education practitioners, especially teachers who teach students with ADHD, to develop a more adaptive approach to managing the classroom. By knowing students' facial expressions, teachers can provide timely intervention when students show signs of disengagement or boredom [17]. For example, when an expression of boredom is detected, the teacher can immediately change the teaching method, add game elements, or give a short break [44, 45]. On the other hand, recorded happy moments can be leveraged to extend activities that engage students and strengthen their involvement in learning [46, 47]. Therefore, the results of this research can be useful in designing more responsive and adaptive learning for students with special needs.

To further justify the selection of CNN for emotion classification, Figure 8 presents a conceptual methodological comparison between CNN-based detection and traditional FACS approaches. The comparison illustrates differences in automation, scalability, real-time applicability, manual annotation dependency, and adaptability to complex facial patterns in dynamic classroom environments.

Conceptual Methodological Comparison: CNN vs FACS

Fig. 8. Conceptual Methodological Comparison: CNN vs FACS.

Figure 8 illustrates a conceptual methodological comparison between CNN-based emotion detection and traditional FACS approaches across five key dimensions: automation level, scalability, real-time suitability, manual effort (inverse scale), and feature learning flexibility. The radar chart indicates that CNN demonstrates stronger performance in automation and scalability, as it enables automatic feature extraction and large-scale frame-level processing without requiring manual annotation of facial action units. CNN also shows higher suitability for real-time classroom applications, particularly in dynamic learning environments involving hyperactive students. In contrast, FACS provides structured and theoretically grounded facial muscle coding; however, it typically requires manual or rule-based annotation processes, which limit scalability and increase implementation effort. While FACS ensures interpretability through predefined action units, CNN offers greater flexibility in learning complex spatial patterns directly from image data. It is important to note that this comparison is conceptual and based on methodological characteristics rather than direct empirical benchmarking. The purpose of this illustration is to clarify the rationale for selecting CNN as the primary extraction framework while maintaining theoretical alignment with FACS-informed facial indicators.

Although the results of this study show significant findings, there are some limitations that need to be considered. One of the main drawbacks is the use of a limited sample, which is only students from a few schools in Makassar and Gowa, South Sulawesi Indonesia. This limits the generalization of research results to a wider population. Additionally, while the AI technology used is quite accurate, it's possible that some facial expressions may not be detected perfectly, especially if students interact with environments in a more complex way. Another limitation is the influence of external factors such as the classroom atmosphere, which can affect student expression. Therefore, further research may consider expanding the sample and testing these models in more diverse contexts, as well as using other, more advanced technologies to detect facial expressions. Another limitation relates to potential classification errors inherent in AI-based emotion detection systems. Although a calibrated softmax confidence threshold ( 0.60), dropout regularization, early stopping, and stratified cross-validation were implemented to improve robustness, minor false positives and false negatives may still occur. This is particularly relevant for visually overlapping affective states, such as boredom and sadness, where subtle variations in facial muscles may be difficult to distinguish at the frame level.

Based on the findings and existing limitations, the suggestion for further research is to test this model in a broader context, including students with other special needs and in different types of subjects, not just English. The use of other methods, such as direct observation by teachers and manual coding by trained observers, can be used to validate the results of facial expression detection by AI. Additionally, advanced research can also explore other variables that may affect student engagement and learning outcomes, such as social skills, intrinsic motivation, and family influence. With a more comprehensive approach, this research can provide a deeper understanding of the dynamics of learning for students with special needs.

5.    Conclusion

This study successfully showed that macro expressions detected using CNN and K-NN-based AI technology had a significant influence on the involvement of hyperactive students in English learning. Expressions such as boredom and sadness dominate, indicating the student's involvement, while happy to point out positive moments that still need to be improved. These findings answer research questions about how facial expressions relate to student engagement levels and how AI technology can be used to monitor and adjust learning in real-time. A unique contribution from this study is the integration between AI-driven emotion detection and inclusive learning, which provides a new approach in responding to the needs of hyperactive and ADHD students, as well as offering a more adaptive and responsive learning model.

The implications of these findings are significant both theoretically and practically. Theoretically, this research enriches the understanding of the relationship between facial expressions and student engagement in learning, particularly in the context of inclusive education. In practical terms, the results of this study provide guidance for educators to identify and respond to students' emotions in real-time, as well as design more interactive and adaptive learning. The use of AI technology in education, especially for students with ADHD, opens up opportunities to improve the quality of learning through a more personalized and data-driven approach. This research has the potential to be an important reference in developing new tools and methods to support the learning success of students with special needs.

All the Declarations and StatementsAuthor Contributions Statement

Muh. Arief Muhsin – Conceptualization, Literature Survey, Constructed the overall framework, Methodology, Proposed research ideas, User Story Data Acquisition, Data Extraction, Practitioners Coordination, Data Analysis, Statistical Analysis, Final Document Writing.

Muhyddin M Hayat – Conceptualization, Methodology, Implementation Supervision, Reviewed and Edited the Manuscript, Final Manuscript Review.

Baharuddin – User Story Data Acquisition, Data Extraction Screening, Practitioners Coordination, Data Analysis, Initial Writing, Manuscript Draft Review.

Wahyuddin – User Story Data Acquisition, Practitioners Coordination, Data Extraction Validation, Data Analysis, Initial Writing, Manuscript Draft Review.

Muhammad Faisal – User Story Data Acquisition, Practitioners Coordination, Data Extraction Validation, Data Analysis, Initial Writing, Manuscript Draft Review.

Hartati Binti Maskur – Methodology, Draft Writing, Implementation Supervision, Editing, Final Manuscript Review.

All authors have read and agreed to the published version of the manuscript.

Conflict of Interest Statement

The authors declare no conflicts of interest.

Funding Declaration

This research was supported by the grant the Ministry of Higher Education, Science, and Technology (Kemendikti Saintek) of Indonesia in Agreement Number 130/C3/ AL/04/2025.

Data Availability Statement

None.

Ethical Declarations

All procedures involving human participants were conducted in accordance with institutional ethical standards, and informed consent was obtained from all participants prior to data collection.

Acknowledgments

A big thank you to the Ministry of Higher Education, Science, and Technology (Kemendikti Saintek) of Indonesia for receiving a fundamental grant that has funded this research. The support provided is very meaningful in realizing this research, which aims to improve the quality of inclusive learning through AI technology. We hope that the results of this research can make a positive contribution to the development of education.

Declaration of Generative AI in Scholarly Writing

The authors declare that generative artificial intelligence (AI) tools were used during the preparation of this manuscript solely to improve the clarity, grammar, and readability of the language. The AI assistance was limited to linguistic refinement, such as sentence restructuring and grammar correction. All scientific content, research design, data analysis, interpretation of results, and conclusions were developed entirely by the authors. The authors take full responsibility for the accuracy, originality, and integrity of the manuscript.

Abbreviations

The following abbreviations are used in this manuscript:

AI - Artificial Intelligence.

ADHD - Attention Deficit Hyperactivity Disorder.

CNN - Convolutional Neural Network.

KNN / K-NN - K-Nearest Neighbors.

DL - Deep Learning.

PLS-SEM - Partial Least Squares – Structural Equation Modeling.

FACS - Facial Action Coding System.

SLB - Sekolah Luar Biasa.

R² - Coefficient of Determination.

MAPE - Mean Absolute Percentage Error.

ELT - English Language Teaching.

NLP - Natural Language Processing.

Appendix

None.