首页|融合多尺度跨度特征的谓语中心词识别模型

融合多尺度跨度特征的谓语中心词识别模型

扫码查看
针对谓语中心词识别模型中存在缺失跨度长度信息和多尺度跨度关联信息等问题,提出一种融合多尺度跨度特征的汉语谓语中心词识别模型.首先,使用ChineseBERT预训练语言模型和双向长短期记忆(BiLSTM)网络提取文本中包含上下文信息的字符向量序列;其次,利用线性神经网络对字符向量进行初步识别,形成跨度遮蔽矩阵;然后,将字符向量序列二维化表示为跨度信息矩阵,使用多尺度卷积神经网络(MSCNN)对跨度信息矩阵进行运算,提取跨度的多尺度关联信息;最后,采用特征嵌入神经网络嵌入跨度的长度信息,丰富跨度的特征向量以识别谓语中心词.实验结果表明,该模型能够有效融合跨度的多尺度关联信息和长度信息,提升谓语中心词识别的性能,相比于同类模型中性能最优的谓语中心词识别模型的F1值提升了 0.43个百分点.
Predicate Center Word Recognition Model Fused with Multiscale Span Features
To address the issues of missing span length and multiscale span correlation information in predicate center word recognition models,this study proposes a Chinese predicate center recognition model fused with multiscale span features.First,a Chinese Bidirectional Encoder Representations from Transformers(ChineseBERT)pre-trained language model and a Bidirectional Long Short-Term Memory(BiLSTM)network extract character vector sequences containing contextual information from the text.Second,a linear neural network performs the initial recognition of character vectors,forming a span-masking matrix.The character vector sequence is then represented in a two-dimensional format as a span information matrix,and a Multiscale Convolutional Neural Network(MSCNN)processes the span information matrix and extracts multiscale correlation information from the spans.Finally,a feature-embedding neural network embeds the length information of the spans,enriching the feature vectors of the spans for predicate head recognition.The experimental results demonstrate that this model can effectively integrate multiscale correlation and span length information,thereby enhancing the performance of predicate head recognition.Compared to the best-performing existing predicate center word recognition model,the proposed model achieves an improvement of 0.43 percentage points in F1 score.

predicate center word recognitionmultiscale convolutionChineseBERT pre-trained language modelspan length informationmultiscale span correlation information

施竣潇、陈艳平、穆肇南

展开 >

文本计算与认知智能教育部工程研究中心,贵州贵阳 550025

贵州大学公共大数据国家重点实验室,贵州贵阳 550025

贵州大学计算机科学与技术学院,贵州贵阳 550025

谓语中心词识别 多尺度卷积 ChineseBERT预训练语言模型 跨度长度信息 多尺度跨度关联信息

国家自然科学基金贵州省自然科学基金贵州省教育厅青年科技人才成长项目

62166007黔科合基础-ZK[2022]027黔教合KY字[2022]205号

2024

计算机工程
华东计算技术研究所 上海市计算机学会

计算机工程

CSTPCD北大核心
影响因子:0.581
ISSN:1000-3428
年,卷(期):2024.50(10)
  • 7