首页|CCL23-Eval任务1系统报告:基于持续预训练方法与上下文增强策略的古籍命名实体识别

CCL23-Eval任务1系统报告:基于持续预训练方法与上下文增强策略的古籍命名实体识别

扫码查看
本文描述了队伍“翼智团”在CCL23古籍命名实体识别评测中提交的参赛系统。该任务旨在自动识别出古籍文本中人名、书名、官职名等事件基本构成要素的重要实体,并根据使用模型参数是否大于10b分为开放赛道和封闭赛道。该任务中,我们首先利用古籍相关的领域数据和任务数据对开源预训练模型进行持续预训练和微调,显著提升了基座模型在古籍命名实体识别任务上的性能表现。其次提出了一种基于pair-wise投票的不置信实体筛选算法用来得到候选实体,并对候选实体利用上下文增强策略进行实体识别修正。在最终的评估中,我们的系统在封闭赛道中排名第二,F1得分为95。8727。
CCL23-Eval任务1系统报告:基于持续预训练方法与上下文增强策略的古籍命名实体识别
This article describes the entry system submitted by our team in the CCL23 ancient book named entity recognition evaluation. The task aims to automatically identify important entities of the basic elements of events such as names of people, titles of books, and official titles in ancient texts, and divide them into open tracks and closed tracks according to whether the model parameters used are greater than 10b. In this pre-train and fine-tune the open source pre-training model, which significantly improves the performance of the pedestal model on the task of named entity recognition in ancient books. Secondly, an untrusted entity screening algorithm based on pair-wise voting is proposed to obtain candidate entities, and the context enhancement strategy is used to correct entity recognition for candidate entities. In the final evaluation, our system ranked second in the closed circuit with an F1 score of 95.8727.

命名实体识别持续预训练实体修正

王士权、石玲玲、蒲璐汶、方瑞玉、赵宇、宋双永

展开 >

中国电信股份有限公司数字智能科技分公司

命名实体识别 持续预训练 实体修正

Chinese national conference on computational linguistics

Harbin(CN)

22nd Chinese national conference on computational linguistics (CCL 2023): evaluations

14-22

2023