GuNER is the basic step for analyzing and processing ancient Chinese texts correctly, which is also an important prerequisite for in-depth mining and organizing human knowledge. Due to its high information entropy and difficulty, the technological research progress in ancient Chinese filed is slow. To address the issues of poor anti-interference ability and inaccurate entity boundary recognition in existing entity recognition models, this article proposes a method of combining NEZHA-TCN with global pointer for ancient named entity recognition. At the same time, an ancient text dataset was constructed, which includes various ancient texts from the historical collection, totaling 87M and 397,995 texts, for incremental pretraining of the NEZHA-TCN model. In the process of model training, in order to enhance the anti-interference ability of the model, the fast gradient method is introduced to add interference in the word embedding layer. The experimental results show that the method proposed in this article can effectively mine the entities in the ancient texts, with an F1 value of 95.34%.
古籍命名实体识别增量预训练快速梯度法
李剑龙、于右任、刘雪阳、朱思文
展开 >
中国工商银行/北京##BISTU-IIIP/北京
BISTU-IIIP/北京
古籍命名实体识别 增量预训练 快速梯度法
Chinese national conference on computational linguistics
Harbin(CN)
22nd Chinese national conference on computational linguistics (CCL 2023): evaluations