首页|基于RoBERTa-BiLSTM-SelfAttention-CRF的中文地址解析方法

基于RoBERTa-BiLSTM-SelfAttention-CRF的中文地址解析方法

扫码查看
针对中文地址解析精准度不高、效率低以及忽略细粒度地址要素等问题,提出融合自注意力机制的RoBERTa-BiLSTM-SelfAttention-CRF的中文地址解析方法.首先,利用RoBERTa提取地址文本的深层语义特征和丰富的上下文信息;其次,通过BiLSTM网络建模地址文本的序列关系,捕捉地址要素之间的关系依赖;然后,在不同地址要素之间引入自注意力机制建立有效关联,优化模型在解析中文地址时的表现;最后,采用CRF标注地址序列,实现精确的地址解析.实验结果表明,自注意力机制的引入有助于提升中文地址解析效果,该方法在自建数据集上,准确率为0.959 4,召回率为0.969 7,F1值为0.964 5.在CCKS2021公开数据集上,准确率为0.908 0,召回率为0.915 8,F1值为0.911 9,较目前先进方法F1值提升0.006 9,表现出良好的性能及泛化能力.
Chinese address resolution method based on RoBERTa-BiLSTM-SelfAttention-CRF
To address the current challenges of low precision,inefficiency,and the neglect of fine-grained address elements in Chinese address parsing,a Chinese address resolution model integrating RoBERTa-BiLSTM-SelfAttention-CRF is proposed.Firstly,RoBERTa is employed to extract deep semantic features and rich contextual information from address texts.Secondly,the dependencies among address elements are captured through the sequential relationships within address texts modeled by the BiLSTM network.Then,the SelfAttention mechanism is introduced to establish effective correlations between different address elements,thereby optimizing the model's parsing performance in Chinese addresses.Finally,CRF is employed to label address sequences to achieve precise parsing.Experimental results show that the introduction of the SelfAttention mechanism significantly improves the effect of Chinese address parsing.On the self-built dataset,the model achieves an accuracy of 0.959 4,a recall of 0.969 7,and an F1 score of 0.964 5.On the publicly available CCKS2021 dataset,it achieves an accuracy of 0.908 0,a recall of 0.915 8,and an F1 score of 0.911 9,representing an improvement of 0.006 9 in F1 score over current state-of-the-art models.These results demonstrate the model's robust performance and generalization ability.

Chinese address resolutionAddress elementsRoBERTaBiLSTMCRFSelfAttention mechanism

苗佳池、陈颖、生龙、魏忠诚、王巍

展开 >

河北工程大学信息与电气工程学院/河北省安防信息感知与处理重点实验室,河北邯郸 056038

中文地址解析 地址要素 RoBERTa BiLSTM CRF 自注意力机制

2024

河北省科学院学报
河北省科学院

河北省科学院学报

影响因子:0.176
ISSN:1001-9383
年,卷(期):2024.41(6)