针对学术论文在学科领域内进行层次标签分类问题,提出了一种基于知识增强的语义表示与图注意力网络的文本层次标签分类(text hierarchical label classification based on enhanced representation through knowledge integration and graph attention networks,GETHLC)模型。首先,通过层次标签抽取模块提取学科领域下层次标签的结构特征,并通过预训练模型对学术论文的摘要、标题和抽取后的层次标签结构特征进行嵌入;然后,在分类阶段基于层次标签的结构分层构造层次分类器,将学术论文逐层分类至最符合的类别中。在大规模中文科学文献数据集CSL上进行的实验结果表明,与基准的ERNIE模型相比,GETHLC模型的准确率、召回率和F1值分别提升了 5。78、4。31和5。02百分点。
Hierarchical label classification method for academic papers'subject domain
Regarding the problem of hierarchical label classification of academic papers within subject domains,a text hierarchical label classification model based on enhanced representation through knowledge integration(ERNIE)and graph attention networks(GETHLC)was proposed.Firstly,the structural features of hierarchical labels under the subject domain were extracted through the hierarchical label extraction module,and the abstracts,titles and extracted structural features of hierarchical labels of academic papers were embedded through the pre-trained model.Then,a hierarchical classifier was constructed based on the structural hierarchy of hierarchical labels in the classification stage to categorize academic papers layer by layer into the most compatible categories.Experimental results performed on a large-scale Chinese scientific literature dataset CSL show that,compared with the benchmark ERNIE model,the accuracy,recall and Fl value of the GETHLC model are improved by 5.78,4.31 and 5.02 percentage points respectively.
hierarchical labeltext classificationgraph attention mechanismenhanced representation through knowledge integration(ERNIE)pre-training