Human raters

The study, which has been derived from a larger project, examined how effective ChatGPT, compared to human raters, is for scoring writing tasks when tasks were arranged from simple to complex or vice versa. In so doing, a correlational design was employed. The participants were 113 EFL learners. Two sets of writing tasks were customized based on the SSARC (simplify, stabilize, automatize, reconstruct, complexify) model. The participants were divided into two groups. They took a pre-test and did tasks in two different orders. The tasks were rectified by the researcher and returned to them later. The participants enhanced their text based on comments on tasks. After that, they took a posttest. Human raters and ChatGPT scored the pretests and posttests. A Pearson Correlation test was run to obtain the correlation between a human rater and ChatGPT. The results indicated a strong positive correlation between scores assessed by human raters and those by ChatGPT when tasks were arranged from simple to complex (r = 968, p > 05) or complex to simple (r = 860, p > 05). These findings suggest that ChatGPT can be an effective tool for writing assessments. Suggestions for further research are discussed.

۲.

Machine or Human?: An Inspection of the Scoring of Writing Essays

نویسنده: مصطفی غفاری

منبع: Journal Of English Language and Literature Teaching, Volume ۴, Issue ۱, August ۲۰۲۵ 118-136

کلیدواژه‌ها: automated writing evaluation Essay Writing Human raters Machine Raters Perceptions about AWE

حوزه‌های تخصصی:

حوزه‌های تخصصی علوم تربیتی

تعداد بازدید : ۱۰۰ تعداد دانلود : ۱۰۴

Automated Writing Evaluation (AWE) systems are used to evaluate measurable characteristics of written texts, thereby creating a scoring model based on a compilation of essays. While considerable research has focused on the feedback provided by AWE systems, there is a conspicuous absence of studies examining these tools specifically in the Iranian context. Therefore, this research aimed to investigate the consistency of scores obtained from automated systems and human raters. Furthermore, it sought to explore the perceptions of EFL learners regarding the application of AWE in their writing practices. To facilitate this investigation, 30 male and female IELTS students participated, each writing two essays: one selected from topics provided by the AWE system and the other derived from Cambridge Official IELTS past papers. The essays were assessed by both My Access and three human raters. For the topics designated for the AWE system, a significant and robust positive correlation was identified between the ratings assigned by human raters and the machine. A similar significant and strong positive correlation was also found for the second essay, which did not utilize pre-defined topics. The results of two linear regression analyses demonstrated that the scores produced by the machine could significantly predict human scores for both pre-defined and non-pre-defined topics. Additionally, the findings indicated that My Access Home Edition is perceived to significantly enhance students' accuracy and autonomy, although it does not contribute to improved interaction. This study presents important implications for writing instructors and the field of second language education.

Human raters

Combined Effects of Task Sequencing and Corrective Feedback on EFL Learners’ Writing: a comparison between human raters and ChatGPT(مقاله علمی وزارت علوم)

Machine or Human?: An Inspection of the Scoring of Writing Essays