外语界2024,Issue(2) :71-80.

基于任务的人机协同口语评分效度论证:以CET-SET为例

张晓艺 王伟 杨浩然
外语界2024,Issue(2) :71-80.

基于任务的人机协同口语评分效度论证:以CET-SET为例

张晓艺 1王伟 2杨浩然3
扫码查看

作者信息

  • 1. 复旦大学外国语言文学学院,上海 200433
  • 2. 教育部教育考试院,北京 100084
  • 3. 上海交通大学外国语学院,上海 200240
  • 折叠

摘要

本研究基于交互口语构念,依托全国大学英语口语考试智能评分系统,在任务层面对考生口语能力实施分析性和综合性评分,对开放性口语测试任务的人机协同评分开展效度论证.研究发现,机器能够根据不同口语任务的特点和测试目的进行评分,智能评分系统在任务层面的评分结果具有相当高的准确性,人机协同评分模式下与任务层面口语能力相关的因子解释了绝大部分考试分数方差.智能评分应用于大规模口语评分的探索中,专家定标阶段可采用基于任务的分析性评分,大规模评分阶段可采用基于任务的综合性人机协同评分,以增强考试结果的可解释性,提高评分效率.

Abstract

Based on the interactionalist speaking construct,this study validates the human-machine collaboration in rating open-ended speaking tasks with an automated scoring system for the College English Test-Spoken English Test,in which test takers'performances are scored both analytically and holistically on rating scales at the task level.The findings show that the automated scoring system gives scores in relation to task features and test purposes,demonstrating a high level of rating accuracy.In the mode of human-machine collaboration,quite a large portion of the score variance could be attributed to the speaking abilities related factors deemed essential for task completion.When applying automated scoring to the large-scale rating of speaking tests,it is suggested that the task-based analytic scoring be used in setting gold standards for machine learning,and the task-based holistic scoring be adopted for human-machine collaboration in large-scale rating sessions,in order to facilitate score interpretation and ensure rating efficiency.

关键词

大学英语四、六级考试/口语考试/自动评分/交互构念理论

Key words

College English Test/speaking test/automated scoring/interactionalist construct theory

引用本文复制引用

基金项目

2022年度教育部人文社会科学研究青年基金项目(22YJC740102)

出版年

2024
外语界
上海外国语大学

外语界

CSTPCDCSSCICHSSCD北大核心
影响因子:6.117
ISSN:1004-5112
参考文献量27
段落导航相关论文