基于标签嵌入的作文自动评分方法
Automatic scoring method for essays based on label embedding
宋超 1任鸽 1宋银忠 1柳骏杰 1杨勇1
作者信息
- 1. 新疆师范大学计算机科学技术学院,乌鲁木齐 830054
- 折叠
摘要
目前的作文自动评分方法往往采用大型预训练模型来获取语义特征,由于预训练语料与作文领域特征不符,且对长篇作文提取特征效果不佳,因此该类方法的性能并不理想.文中提出了一种基于标签嵌入的作文自动评分方法,使用了一个改进的BiLSTM网络和BERT模型来提取作文的领域特征与抽象特征,同时利用门控机制调整两者对作文评分的影响,最后经过特征融合对作文进行自动评分.实验结果表明,所提出模型在Kaggle ASAP竞赛的作文自动评分数据集上性能显著提升,平均QWK值达到81.22%,验证了标签嵌入方法在作文自动评分任务中的有效性.
Abstract
Currently,the automatic scoring methods for essays often use large pre-trained models to obtain semantic features,and the performance of such methods is not satisfactory because the pre-trained corpus does not match the domain features of essays,and the extraction of features for long essays is not effective.The paper proposes a label embedding-based automatic scoring method for essays,using an improved BiL-STM network and BERT model to extract domain features and abstract features of essays,while using a ga-ting mechanism to adjust the influence of both on essay scoring,and finally automatic scoring of essays through feature fusion.The experiment results show that the proposed model performs significantly better on the essay auto-scoring data set of the Kaggle ASAP competition,with an average QWK value of 81.22%,verifying the effectiveness of the label embedding approach in the essay auto-scoring task.
关键词
计算机应用技术/预训练嵌入/标签嵌入/特征融合/自然语言处理Key words
computer application techniques/pre-trained embedding/label embedding/feature fusion/natural language processing引用本文复制引用
基金项目
新疆维吾尔自治区自然科学基金(2021D01B72)
国家自然科学基金(62066044)
国家自然科学基金(62167008)
国家自然科学基金青年研究者资助项目(62006130)
出版年
2024