ABSTRACT
Knowing the difficulty of a given task is crucial for improving the learning outcomes. This paper studies the difficulty level classification of memorization tasks from pupillary response data. Developing a difficulty level classifier from pupil size features is challenging because of the inter-subject variability of pupil responses. Eye-tracking data used in this study was collected while students solved different memorization tasks divided as low-, medium-, and high-level. Statistical analysis shows that values of pupillometric features (as peak dilation, pupil diameter change, and suchlike) differ significantly for different difficulty levels. We used a wrapper method to select the pupillometric features that work the best for the most common classifiers; Support Vector Machine (SVM), Decision Tree (DT), Linear Discriminant Analysis (LDA), and Random Forest (RF). Despite the statistical difference, experiments showed that a random forest classifier trained with five features obtained the best F1-score (82%). This result is essential because it describes a method to evaluate the cognitive load of a subject performing a task using only pupil size features.
Subject(s)
Memory, Short-Term , Pupil , Humans , Memory, Short-Term/physiology , Pupil/physiology , LearningABSTRACT
BACKGROUND: A learning task recurrently perceived as easy (or hard) may cause poor learning results. Gamer data such as errors, attempts, or time to finish a challenge are widely used to estimate the perceived difficulty level. In other contexts, pupillometry is widely used to measure cognitive load (mental effort); hence, this may describe the perceived task difficulty. OBJECTIVE: This study aims to assess the use of task-evoked pupillary responses to measure the cognitive load measure for describing the difficulty levels in a video game. In addition, it proposes an image filter to better estimate baseline pupil size and to reduce the screen luminescence effect. METHODS: We conducted an experiment that compares the baseline estimated from our filter against that estimated from common approaches. Then, a classifier with different pupil features was used to classify the difficulty of a data set containing information from students playing a video game for practicing math fractions. RESULTS: We observed that the proposed filter better estimates a baseline. Mauchly's test of sphericity indicated that the assumption of sphericity had been violated (χ214=0.05; P=.001); therefore, a Greenhouse-Geisser correction was used (ε=0.47). There was a significant difference in mean pupil diameter change (MPDC) estimated from different baseline images with the scramble filter (F5,78=30.965; P<.001). Moreover, according to the Wilcoxon signed rank test, pupillary response features that better describe the difficulty level were MPDC (z=-2.15; P=.03) and peak dilation (z=-3.58; P<.001). A random forest classifier for easy and hard levels of difficulty showed an accuracy of 75% when the gamer data were used, but the accuracy increased to 87.5% when pupillary measurements were included. CONCLUSIONS: The screen luminescence effect on pupil size is reduced with a scrambled filter on the background video game image. Finally, pupillary response data can improve classifier accuracy for the perceived difficulty of levels in educational video games.