首页|基于ALBERT-Seq2Seq-Attention模型的数字化档案多标签分类

基于ALBERT-Seq2Seq-Attention模型的数字化档案多标签分类

扫码查看
针对现有的数字化档案多标签分类方法存在分类标签之间缺少关联性的问题,提出一种用于档案多标签分类的深层神经网络模型ALBERT-Seq2Seq-Attention.该模型通过ALBERT(A Little BERT)预训练语言模型内部多层双向的Transfomer结构获取进行文本特征向量的提取,并获得上下文语义信息;将预训练提取的文本特征作为Seq2Seq-Attention(Sequence to Sequence-Attention)模型的输入序列,构建标签字典以获取多标签间的关联关系.将分类模型在3种数据集上分别进行对比实验,结果表明:模型分类的效果F1值均超过90%.该模型不仅能提高档案文本的多标签分类效果,也能关注标签之间的相关关系.
Multi-label Classification of Digital Archives Based on ALBERT-Seq2Seq-Attention Model
Aiming at the problem of lack of correlation between classification labels in existing digital archive multi label classification methods,a deep neural network model for archive multi label classification,ALBERT-Seq2Seq-Attention,is proposed.This model uses the multi-layer and bidirectional Transfomer structure acquisition within the ALBERT(A Little BERT)pre training language model to extract text feature vectors and obtain contextual semantic information.Secondly,the text features extracted by pre-training are used as the input sequence of the Seq2Seq-Attention(Sequence to Sequence-Attention)model,and a label dictionary is constructed to obtain the association relationship between multiple labels.Comparative experiments were conducted on three datasets using the classification model.The experimental results showed that the F1 value of the model classification effect exceeded 90%,not only improving the multi label classification effect of archive text,but also paying attention to the correlation between labels.

ALBERTSeq2SeqAttentionmulti-label classificationdigital archives

王少阳、成新民、王瑞琴、陈静雯、周阳、费志高

展开 >

湖州师范学院 信息工程学院,浙江 湖州 313000

ALBERT Seq2Seq Attention 多标签分类 数字化档案

国家自然科学基金项目湖州师范学院研究生科研创新项目

622770162022KYCX45

2024

湖州师范学院学报
湖州师范学院

湖州师范学院学报

影响因子:0.301
ISSN:1009-1734
年,卷(期):2024.46(2)
  • 12