湖州师范学院学报2024,Vol.46Issue(2) :65-72.

基于ALBERT-Seq2Seq-Attention模型的数字化档案多标签分类

Multi-label Classification of Digital Archives Based on ALBERT-Seq2Seq-Attention Model

王少阳 成新民 王瑞琴 陈静雯 周阳 费志高
湖州师范学院学报2024,Vol.46Issue(2) :65-72.

基于ALBERT-Seq2Seq-Attention模型的数字化档案多标签分类

Multi-label Classification of Digital Archives Based on ALBERT-Seq2Seq-Attention Model

王少阳 1成新民 1王瑞琴 1陈静雯 1周阳 1费志高1
扫码查看

作者信息

  • 1. 湖州师范学院 信息工程学院,浙江 湖州 313000
  • 折叠

摘要

针对现有的数字化档案多标签分类方法存在分类标签之间缺少关联性的问题,提出一种用于档案多标签分类的深层神经网络模型ALBERT-Seq2Seq-Attention.该模型通过ALBERT(A Little BERT)预训练语言模型内部多层双向的Transfomer结构获取进行文本特征向量的提取,并获得上下文语义信息;将预训练提取的文本特征作为Seq2Seq-Attention(Sequence to Sequence-Attention)模型的输入序列,构建标签字典以获取多标签间的关联关系.将分类模型在3种数据集上分别进行对比实验,结果表明:模型分类的效果F1值均超过90%.该模型不仅能提高档案文本的多标签分类效果,也能关注标签之间的相关关系.

Abstract

Aiming at the problem of lack of correlation between classification labels in existing digital archive multi label classification methods,a deep neural network model for archive multi label classification,ALBERT-Seq2Seq-Attention,is proposed.This model uses the multi-layer and bidirectional Transfomer structure acquisition within the ALBERT(A Little BERT)pre training language model to extract text feature vectors and obtain contextual semantic information.Secondly,the text features extracted by pre-training are used as the input sequence of the Seq2Seq-Attention(Sequence to Sequence-Attention)model,and a label dictionary is constructed to obtain the association relationship between multiple labels.Comparative experiments were conducted on three datasets using the classification model.The experimental results showed that the F1 value of the model classification effect exceeded 90%,not only improving the multi label classification effect of archive text,but also paying attention to the correlation between labels.

关键词

ALBERT/Seq2Seq/Attention/多标签分类/数字化档案

Key words

ALBERT/Seq2Seq/Attention/multi-label classification/digital archives

引用本文复制引用

基金项目

国家自然科学基金项目(62277016)

湖州师范学院研究生科研创新项目(2022KYCX45)

出版年

2024
湖州师范学院学报
湖州师范学院

湖州师范学院学报

影响因子:0.301
ISSN:1009-1734
参考文献量12
段落导航相关论文