首页|基于短语及依存的标注规则和短语识别算法研究

基于短语及依存的标注规则和短语识别算法研究

扫码查看
目前,自然语言处理大多是借助于分词结果进行句法依存分析,主要采用基于监督学习的端对端模型.该方法主要存在两个问题,一是标注体系繁多,相对比较复杂;二是无法识别语言嵌套结构.为了解决以上问题,该文提出了基于短语窗口的依存句法标注规则,并标注了中文短语窗口数据集(CPWD),同时引入短语窗口模型.该标注规则以短语为最小单位,把句子划分为7类可嵌套的短语类型,同时标示出短语间的句法依存关系;短语窗口模型借鉴了计算机视觉领域目标检测的思想,检测短语的起始位置和结束位置,实现了对嵌套短语及句法依存关系的同步识别.实验结果表明,在CPWD数据集上,短语窗口模型比传统端对端模型F1 值提升超过1个百分点.相应的方法应用到了 CCL2018的中文隐喻情感分析比赛中,在原有基础上F1值提升了 1个百分点以上,取得第一名成绩.
Research on Annotation Rules and Phrase Recognition Algorithm Based on Phrase and Dependency
At present,most syntactic dependency analysis is conducted via supervised learning with the help of word segmentation results.This practice is challenged by complex label schemes and the nesting structure which is diffi-cult to parse.This paper proposes a phrase window model together with a dependency syntax labeling rule based on the phrase window.The labeling rule divides sentences into 7 types of nestable phrases,with annotation for the syn-tactic dependence between phrases.Inspired by the idea of target detection in the computer vision field,the phrase window model detects the beginning and end positions of phrases and realizes the synchronous recognition of nested phrases and syntactic dependencies.Experimental results show that on the self-built Chinese Phrase Window Dataset(CPWD),the phrase window model is more than 1 point better than the traditional end-to-end model.The corre-sponding method won the champion in the CCL2018 Chinese Metaphor Sentiment Analysis Competition,which im-proved more than 1 point than the baseline.

natural language processingtagging systemphrase extractiondependency parsing

刘广、涂刚、李政、刘译键

展开 >

华中科技大学计算机科学与技术学院,湖北武汉 430074

自然语言处理 标注体系 短语识别 依存分析

2024

中文信息学报
中国中文信息学会,中国科学院软件研究所

中文信息学报

CSTPCDCHSSCD北大核心
影响因子:0.8
ISSN:1003-0077
年,卷(期):2024.38(2)
  • 30