Detection of Student Engagement via Transformer-Enhanced Feature Pyramid Networks on Channel-Spatial Attention
- Authors: Naveen A.1, Jacob I.1, Mandava A.1
-
Affiliations:
- Gitam University-Bengaluru Campus
- Issue: Vol 24, No 2 (2025)
- Pages: 631-656
- Section: Artificial intelligence, knowledge and data engineering
- URL: https://journals.rcsi.science/2713-3192/article/view/289700
- DOI: https://doi.org/10.15622/ia.24.2.10
- ID: 289700
Cite item
Full Text
Abstract
One of the most important aspects of contemporary educational systems is student engagement detection, which involves determining how involved, attentive, and active students are in class activities. For educators, this approach is essential as it provides insights into students' learning experiences, enabling tailored interventions and instructional enhancements. Traditional techniques for evaluating student engagement are often time-consuming and subjective. This study proposes a novel real-time detection framework that leverages Transformer-enhanced Feature Pyramid Networks (FPN) with Channel-Spatial Attention (CSA), referred to as BiusFPN_CSA. The proposed approach automatically analyses student engagement patterns, such as body posture, eye contact, and head position, from visual data streams by integrating cutting-edge deep learning and computer vision techniques. By integrating the attention mechanism of CSA with the hierarchical feature representation capabilities of FPN, the model can accurately detect student engagement levels by capturing contextual and spatial information in the input data. Additionally, by incorporating the Transformer architecture, the model achieves better overall performance by effectively capturing long-range dependencies and semantic relationships within the input sequences. Evaluation using the WACV dataset demonstrates that the proposed model outperforms baseline techniques in terms of accuracy. Specifically, in terms of accuracy, the FPN_CSA_Trans_EH variant of the proposed model outperforms FPN_CSA by 3.28% and 4.98%, respectively. These findings underscore the efficacy of the BiusFPN_CSA framework in real-time student engagement detection, offering educators a valuable tool for enhancing instructional quality, fostering active learning environments, and ultimately improving student outcomes.
About the authors
A. Naveen
Gitam University-Bengaluru Campus
Author for correspondence.
Email: a.naveen21@gmail.com
Nagadenehalli Doddaballapur, Taluk 207
I. Jacob
Gitam University-Bengaluru Campus
Email: ijacob@gitam.edu
Nagadenehalli Doddaballapur, Taluk 207
A. Mandava
Gitam University-Bengaluru Campus
Email: amandava@gitam.edu
Nagadenehalli Doddaballapur, Taluk 207
References
- Marks H.M. Student engagement in instructional activity: Patterns in the elementary, middle, and high school years. American Educational Research Journal. 2000. vol. 37. pp. 153–184. doi: 10.3102/00028312037001153.
- Nomura K., Iwata M., Augereau O., Kise K. Estimation of student’s engagement based on the posture. Proceedings of the ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the ACM International Symposium on Wearable Computers. ACM, 2019. pp. 164–167. doi: 10.1145/3341162.3343767.
- Kaur A., Mustafa A., Mehta L., Dhall A. Prediction and Localization of Student Engagement in the Wild. Digital Image Computing: Techniques and Applications (DICTA). Australia, Canberra: ACT, 2018. pp. 1–8. doi: 10.1109/DICTA.2018.8615851.
- Hatori Y., Nakajima T., Watabe S. Body Posture Analysis for the Classification of Classroom Scenes. Interdisciplinary Information Sciences. 2022. vol. 28(1). pp. 55–62. doi: 10.4036/iis.2022.a.05.
- Liu Y., Chen J., Zhang M., Rao C. Student engagement study based on multi-cue detection and recognition in an intelligent learning environment. Multimedia Tools Appl. 2018. vol. 77(21). pp. 28749–28775.
- Sharma P., Joshi S., Gautam S., Maharjan S., Khanal S.R., Reis M.C., Barroso J., de Jesus Filipe V.M. Student Engagement Detection Using Emotion Analysis, Eye Tracking and Head Movement with Machine Learning. Technology and Innovation in Learning, Teaching and Education. TECH-EDU 2022. Communications in Computer and Information Science. vol. 1720. pp. 52–68. doi: 10.1007/978-3-031-22918-3_5.
- Bourel F., Chibelushi C. Low A. Recognition of Facial Expressions in the Presence of Occlusion. 2001. vol. 1. doi: 10.5244/C.15.23.
- Mao X., Xue Y., Li Z., Huang K., Lv S. Robust facial expression recognition based on RPCA and AdaBoost. 10th Workshop on Image Analysis for Multimedia Interactive Services. 2009. pp. 113–116. doi: 10.1109/WIAMIS.2009.5031445.
- Jiang B., Jia Kb. Research of Robust Facial Expression Recognition under Facial Occlusion Condition. Active Media Technology, 2011. pp. 92–100. doi: 10.1007/978-3-642-23620-4_13.
- Hammal Z., Arguin M., Gosselin F. Comparing a novel model based on the transferable belief model with humans during the recognition of partially occluded facial expressions. Journal of vision. 2009. vol. 9. doi: 10.1167/9.2.22.
- Zhang F., Zhang T., Mao Q., Xu C. Joint Pose and Expression Modeling for Facial Expression Recognition. IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018. pp. 3359–3368. doi: 10.1109/CVPR.2018.00354.
- Wang C., Wang S., Liang G. Identity- and Pose-Robust Facial Expression Recognition through Adversarial Feature Learning. 2019. pp. 238–246. doi: 10.1145/3343031.3350872.
- Moubayed A., Injadat M., Shami A., Lutfiyya H. Student Engagement Level in e-Learning Environment: Clustering Using K-means. American Journal of Distance Education. 2020. vol. 34(2). pp. 137–156. doi: 10.1080/08923647.2020.1696140.
- Gupta S., Kumar P., Tekchandani R. Facial emotion recognition based real-time learner engagement detection system in online learning context using deep learning models. Multimedia Tools and Applications. 2022. vol. 82. pp. 11365–11394. doi: 10.1007/s11042-022-13558-9.
- Bhardwaj P., Gupta P., Panwar H., Siddiqui M.K., Morales-Menendez R., Bhaik A. Application of Deep Learning on Student Engagement in e-learning environments. Computers & Electrical Engineering. 2021. vol. 93. doi: 10.1016/j.compeleceng.2021.107277.
- Fakhar S., Baber J., Bazai S., Marjan S., Jasiński M., Jasińska E., Chaudhry M.U., Leonowicz Z., Hussain S. Smart Classroom Monitoring Using Novel Real-Time Facial Expression Recognition System. Applied Sciences. 2022. vol. 12(23). doi: 10.3390/app122312134.
- Sümer Ö., Goldberg P., D’Mello S., Gerjets P., Trautwein U., Kasneci E., Multimodal Engagement Analysis From Facial Videos in the Classroom. IEEE Transactions on Affective Computing. 2023. vol. 14. no. 2. pp. 1012–1027. doi: 10.1109/TAFFC.2021.3127692.
- Psaltis A., Apostolakis K.C., Dimitropoulos K., Daras P. Multimodal Student Engagement Recognition in Prosocial Games. IEEE Transactions on Games. 2018. vol. 10. no. 3. pp. 292–303. doi: 10.1109/TCIAIG.2017.2743341.
- Mohamad Nezami O., Dras M., Hamey L., Richards D., Wan S., Paris C. Automatic Recognition of Student Engagement Using Deep Learning and Facial Expression. Lecture Notes in Computer Science. Springer, Cham. 2020. vol. 11908. pp. 273–289. doi: 10.1007/978-3-030-46133-1_17.
- Yu H., Gupta A., Lee W., Arroyo I., Betke M., Allesio D., Murray T., Magee J., Woolf B.P. Measuring and integrating facial expressions and head pose as indicators of engagement and affect in tutoring systems Adaptive Instructional Systems. Adaptation Strategies and Methods. Cham Springer. 2021. pp. 219–233.
- He K., Zhang X., Ren S., Sun J. Deep Residual Learning for Image Recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016. pp. 770–778. doi: 10.1109/CVPR.2016.90.
- Gao M., Song P., Wang F., Liu J., Mandelis A., Qi D. A Novel Deep Convolutional Neural Network Based on ResNet-18 and Transfer Learning for Detection of Wood Knot Defects. Journal of Sensors. 2021. pp. 1–16. doi: 10.1155/2021/4428964.
- Lin T.-Y., Dollár P., Girshick R., He K., Hariharan B., Belongie S. Feature Pyramid Networks for Object Detection. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017. pp. 936–944. doi: 10.1109/CVPR.2017.106.
- Woo S., Park J., Lee J.Y., Kweon I.S. CBAM: Convolutional Block Attention Module. Computer Vision – ECCV 2018. Lecture Notes in Computer Science. Springer, Cham. 2018. vol. 11211. pp. 3–19. doi: 10.1007/978-3-030-01234-2_1.
- Gupta A., DCunha A., Awasthi K., Balasubramanian V. DAiSEE: Towards User Engagement Recognition in the Wild. 2016. arXiv preprint: arXiv:1609.01885.
- Islam M., Hossain E. Foreign Exchange Currency Rate Prediction using a GRU-LSTM Hybrid Network. Soft Computing Letters. 2021. vol. 3. doi: 10.1016/j.socl.2020.100009.
- Zhang H., Xiao X., Huang T., Liu S., Xia Y., Li J. An Novel End-to-end Network for Automatic Student Engagement Recognition. 2019 IEEE 9th International Conference on Electronics Information and Emergency Communication (ICEIEC). 2019. pp. 342–345. doi: 10.1109/ICEIEC.2019.8784507.
- Batra S., Wang S., Nag A., Brodeur P., Checkley M., Klinkert A., Dev S. DMCNet: Diversified Model Combination Network for Understanding Engagement from Video Screengrabs. 2022. arXiv preprint: arXiv: 2204.06454. doi: 10.48550/arXiv.2204.06454.
- Abedi A., Khan S.S. Improving state-of-the-art in Detecting Student Engagement with Resnet and TCN Hybrid Network. 18th Conference on Robots and Vision (CRV). 2021. pp. 151–157. doi: 10.1109/CRV52889.2021.00028.
- Liao J., Liang Y., Pan J. Deep facial spatiotemporal network for engagement prediction in online learning. Applied Intelligence. 2021. vol. 51. pp. 6609–6621. doi: 10.1007/s10489-020-02139-8.
- Huang T., Mei Y., Zhang H., Liu S., Yang H. Fine-grained Engagement Recognition in Online Learning Environment. IEEE 9th International Conference on Electronics Information and Emergency Communication (ICEIEC). 2019. pp. 338–341. doi: 10.1109/ICEIEC.2019.8784559.
- Ma X., Xu M., Dong Y., Sun Z. Automatic Student Engagement in Online Learning Environment Based on Neural Turing Machine. International Journal of Information and Education Technology. 2021. vol. 11(3). pp. 107–111. doi: 10.18178/ijiet.2021.11.3.1497.
Supplementary files
