基于标签嵌入的作文自动评分方法

Automatic scoring method for essays based on label embedding

宋超 ¹任鸽 ¹宋银忠 ¹柳骏杰 ¹杨勇¹

扫码查看

作者信息

1. 新疆师范大学计算机科学技术学院,乌鲁木齐 830054
折叠

摘要

目前的作文自动评分方法往往采用大型预训练模型来获取语义特征,由于预训练语料与作文领域特征不符,且对长篇作文提取特征效果不佳,因此该类方法的性能并不理想.文中提出了一种基于标签嵌入的作文自动评分方法,使用了一个改进的BiLSTM网络和BERT模型来提取作文的领域特征与抽象特征,同时利用门控机制调整两者对作文评分的影响,最后经过特征融合对作文进行自动评分.实验结果表明,所提出模型在Kaggle ASAP竞赛的作文自动评分数据集上性能显著提升,平均QWK值达到81.22％,验证了标签嵌入方法在作文自动评分任务中的有效性.

Abstract

Currently,the automatic scoring methods for essays often use large pre-trained models to obtain semantic features,and the performance of such methods is not satisfactory because the pre-trained corpus does not match the domain features of essays,and the extraction of features for long essays is not effective.The paper proposes a label embedding-based automatic scoring method for essays,using an improved BiL-STM network and BERT model to extract domain features and abstract features of essays,while using a ga-ting mechanism to adjust the influence of both on essay scoring,and finally automatic scoring of essays through feature fusion.The experiment results show that the proposed model performs significantly better on the essay auto-scoring data set of the Kaggle ASAP competition,with an average QWK value of 81.22％,verifying the effectiveness of the label embedding approach in the essay auto-scoring task.

关键词

计算机应用技术/预训练嵌入/标签嵌入/特征融合/自然语言处理

Key words

computer application techniques/pre-trained embedding/label embedding/feature fusion/natural language processing

引用本文复制引用

基金项目

新疆维吾尔自治区自然科学基金(2021D01B72)

国家自然科学基金(62066044)

国家自然科学基金(62167008)

国家自然科学基金青年研究者资助项目(62006130)

出版年

2024

信息技术

黑龙江省信息技术学会中国电子信息产业发展研究院　中国信息产业部电子信息中心

信息技术

CSTPCD

影响因子：0.413

ISSN：1009-2552

参考文献量23

段落导航