EXPLORING LARGE LANGUAGE MODELS IN AUTOMATED SCORING OF STUDENTS’ COGNITION AND AFFECT IN PROBLEM-POSING BASED LEARNING
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This dissertation investigates the potential of Large Language Models (LLMs) to support educational research through automated analysis of students’ cognitive and affective mathematics data within a problem-posing project. Two studies examine whether LLMs can approximate expert human judgments and how prompt design influences their performance. Study 1 evaluates the use of LLMs for quantitative scoring and qualitative coding of 1,116 responses. The LLM assessed holistic quality, mathematical complexity, linguistic structure, and unexpected responses. Baseline prompts produced moderate agreement with expert coders. After systematic prompt refinement, which included clarifying rubric language, resolving ambiguities, and incorporating implicit human coding patterns, performance improved substantially and reached Cohen’s kappa values of 0.749 to 0.929. Study 2 investigates LLM-based scoring of 814 student affective descriptions of mathematics. Three prompt conditions were tested, ranging from baseline rubrics to refined prompts that embedded common human scoring patterns. The most refined prompts achieved high agreement with human ratings (Cohen’s kappa = 0.819 to 0.845). Effective strategies included being specific about the scoring rubrics, intensifier cues, majority–minority reasoning, and accommodating grammatical variation. Across both studies, results show that LLM accuracy can be significantly improved through prompt engineering that mirrors human reasoning. The findings provide methodological guidance for educational researchers and foundational evidence for using LLMs to support near real-time cognitive and affective analysis.
