Exploring the effects of the statewide implementation of an automated writing evaluation system among K-12 students

University of Delaware
Automated writing evaluation (AWE) is a cutting-edge technology-based intervention designed to help teachers meet their challenges in writing classrooms and improve students’ writing proficiency. The fast development of AWE systems, along with the encouragement of technology use in the U.S. K–12 education system by the Common Core State Standards (CCSS), has led to increased adoption of AWE systems by schools and districts. However, implementation has largely outpaced research (Shermis et al., 2016; Xi, 2010), especially with regard to rigorous evaluations of AWE efficacy. This dissertation study aims to investigate the naturalistic statewide implementation of an AWE system known as Utah Compose (UC), and to rigorously explore the short-term and long-term effects of UC usage on students’ state test English Language Arts (ELA) performance using quasi-experimental designs. ☐ Specifically, this study employed descriptive analyses and hierarchical linear modeling (HLM) to understand the users and the usage of UC in Grades 3–11 across five school years from 2015 to 2019. To compare the effects of more-UC-usage on students’ ELA performance with the less-UC-usage and no-UC-usage in its single year of implementation in school year 2015, this study used propensity score weighting method. In addition, the inverse probability of treatment weighting (IPTW) method was adopted to examine the cumulative years of effects of UC implementation over four school years (2015–2018) on students’ ELA performance. ☐ Findings showed that UC had been widely used in the state for all years, but usage had varied across grades. Most UC users only utilized its main feature targeting writing prompts practices and did not use the other embedded features. The detailed usage of UC was determined by factors not only at the student level but also at the teacher and school levels. Regarding UC’s single year of effect, the results showed that students with greater UC usage performed better in ELA compared to their expected ELA scores if they had not been exposed to UC at all (effect size = 0.17) or if they had used UC less frequently (effect size = 0.11). Furthermore, English language learners, students in special education programs, and low-income students with greater UC usage benefitted more in ELA performance, but the improvements were not sufficient to close the performance gap between them and their peers. Additionally, there was a cumulative benefit to students who used UC repeatedly, but those cumulative effects decreased each year and peaked after three years of implementation. ☐ This study is the largest evaluation of AWE effects to date in terms of both its substantial sample size and scope. The findings regarding AWE’s usage and its causal effects on students’ ELA performance, which is a distal and important outcome at the state level, collectively have significant implications for policy and practice regarding large-scale AWE implementation.
Discourse, Mathematics writing, Teachers' orientations, Assessing questions, Teacher noticing