Your browser doesn't support javascript.
Vision Transformer for Automatic Student Engagement Estimation
5th IEEE International Image Processing, Applications and Systems Conference, IPAS 2022 ; 2022.
Article in English | Scopus | ID: covidwho-2270648
ABSTRACT
Availability of the internet and quality of content attracted more learners to online platforms that are stimulated by COVID-19. Students of different cognitive capabilities join the learning process. However, it is challenging for the instructor to identify the level of comprehension of the individual learner, specifically when they waver in responding to feedback. The learner's facial expressions relate to content comprehension and engagement. This paper presents use of the vision transformer (ViT) to model automatic estimation of student engagement by learning the end-to-end features from facial images. The ViT architecture is used to enlarge the receptive field of the architecture by exploiting the multi-head attention operations. The model is trained using various loss functions to handle class imbalance. The ViT is evaluated on Dataset for Affective States in E-Environments (DAiSEE);it outperformed frame level baseline result by approximately 8% and the other two video level benchmarks by 8.78% and 2.78% achieving an overall accuracy of 55.18%. In addition, ViT with focal loss was also able to produce well distribution among classes except for one minority class. © 2022 IEEE.
Keywords

Full text: Available Collection: Databases of international organizations Database: Scopus Language: English Journal: 5th IEEE International Image Processing, Applications and Systems Conference, IPAS 2022 Year: 2022 Document Type: Article

Similar

MEDLINE

...
LILACS

LIS


Full text: Available Collection: Databases of international organizations Database: Scopus Language: English Journal: 5th IEEE International Image Processing, Applications and Systems Conference, IPAS 2022 Year: 2022 Document Type: Article