Automated assessment of non-verbal social behaviors in educational contexts using deep learning frameworks

Date
2022
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
Non-verbal social behaviors, including gaze movements, facial expressions, and body gestures, help educators and trainers measure students' social interactions and evaluate their learning performance during the educational process. Collecting non-verbal signals between students and teachers/collaborators in a co-located learning setting requires a significant amount of time and effort from data-analysis researchers to manually collect, monitor, and analyze students' behaviors during the learning process. With the rapidly developing educational technologies, it is critical to devise more efficient and reliable tools that can reduce annotation costs and automatically comprehend the students' non-verbal social behavior states in educational environments. Over the past two decades, there has been significant progress in deep learning-based computer vision methods that exhibit a superior capability in visual feature extraction and drive intelligent applications in multiple disciplines without human intervention. The blossom of deep learning provided benefits in big data, computational power, and algorithms. ☐ This dissertation employs computer vision methods with deep learning-based frameworks to automatically extract non-verbal social interaction features from video recordings captured during the learning process. It analyzes students' learning performance based on the detected features in (special) educational contexts. The non-verbal symbols that we mainly focus on include joint attention and mutual attention (overall the social visual behaviors) of two targeted subgroups: university students and children with autism. The deep learning-based frameworks we used in this dissertation are state-of-the-art methods but have not been applied to new domain problems related to education. The research outcomes from our works include the quantifiable, objective measures and computational models that we developed for social interaction measurements and learning performance analytics as follows. ☐ Firstly, we used an object detection method based on the Mask R-CNN framework to detect/locate the students and their learning tools from the image data collected from a team-based anatomy learning activity. We then proposed a method for quantifying the physical proximity information from students’ locations in the activity room and evaluated collaborative actions based on the team's physical proximity dynamics. Despite its strengths for small-group activity analysis, this proximity-based measure cannot handle static/seated position settings, such as standard classrooms. Thus, we investigated other social/collaborative metrics beyond physical proximity, including gaze-related indicators. As the next step, we looked into a gaze point prediction method using the Gaze Following framework for joint visual attention (JVA) measurement and team performance evaluation using the image data from collaborative anatomy learning activity as a test-case study. We found that the JVA frequency of collaborators was a reliable measure in the successful distinction of the study conditions for most educational scenarios. We later introduced an automated social visual behaviors assessment tool by mutual gaze detection method along with an in-house autism dataset collected from a therapeutic intervention in Dr. Bhat's research group. The mutual gaze ratio generated from the detection outcomes was comparable to the social visual behavior score hand-coded by the therapy experts. Lastly, we introduced a social visual behavior analytics approach based on the advanced mutual gaze detection method and the expanded autism dataset. We found that mutual gaze behavior could generate reasonable non-verbal social behavior measures in learning analytics, especially in special education contexts. ☐ Our contribution includes providing multiple automated non-verbal social behavior assessment tools to assist and replace traditional hand-coded annotation; applying computer vision approaches with the state-of-the-art deep learning-based frameworks to solve new domain problems in educational contexts; generating objective non-verbal social behavior indicators for education and social behavior analytics; applying different learning analysis approaches to evaluate our proposed methods. Our findings have implications for teaching and learning technologies in educational environments, special training, and autism therapy analysis by offering novel assessment tools and analytic approaches for non-verbal behaviors. Beyond the educational contexts, our work can also be applied to scenarios that demand reliable automatic behavior analytics from video/image data involving human-human or human-virtual agent social interactions.
Description
Keywords
Autism, Computer vision, Deep learning, Educational technology, Human computer interaction, Social behavior analysis
Citation