基于对比优化的多输入融合拼写纠错模型

Multi-input Fusion Spelling Error Correction Model Based on Contrast Optimization

伍瑶瑶 ¹黄瑞章 ¹白瑞娜 ¹曹军航 ¹赵建辉¹

扫码查看

作者信息

1. 贵州大学文本计算与认知智能教育部工程研究中心贵阳 550025;贵州大学公共大数据国家重点实验室贵阳 550025;贵州大学计算机科学与技术学院贵阳 550025
折叠

摘要

文本编辑工作中,中文拼写纠错必不可少.现有中文拼写纠错模型大多为单输入模型,语义信息和纠错结果存在局限性.因此,文中提出基于对比优化的多输入融合拼写纠错模型,包含多输入语义学习阶段和对比学习驱动的语义融合纠错阶段.第一阶段集成多个单模型的初步纠错结果,为语义融合提供充分的互补语义信息.第二阶段基于对比学习方法优化多个互补的句子语义,避免模型过度纠正句子,同时融合多个互补语义对错误句子进行再纠错,改善模型纠错结果的局限性.在SIGHAN13、SIGHAN14、SIGHAN15数据集上的实验表明文中方法可有效提升纠错性能.

Abstract

Chinese spelling correction is essential in text editing.Most of the existing Chinese spelling error correction models are single input models,and there are limitations in the semantic information and error correction results of the models.In this paper,a multi-input fusion spelling error correction method based on contrast optimization,MIF-SECCO,is proposed.MIF-SECCO contains two stages:multi-input semantic learning and contrast learning-driven semantic fusion error correction.In the first stage,preliminary error correction results from multiple single input models are integrated to provide sufficient complementary semantic information for semantic fusion.In the second stage,multiple complementary sentence semantics are optimized based on the contrastive learning approach to avoid over-correction of sentences by the model.The limitations of error correction results of the model are improved by fusing multiple complementary semantics for re-correction of erroneous sentences.Experimental results on the public datasets SIGHAN13,SIGHAN14 and SIGHAN15 demonstrate MIF-SECCO effectively improves the error correction performance of the model.

关键词

中文拼写纠错/多输入语义学习/互补语义融合/对比学习优化

Key words

Chinese Spelling Error Correction/Multi-input Semantic Learning/Complementary Seman-tic Fusion/Contrastive Learning Optimization

引用本文复制引用

基金项目

国家自然科学基金(62066007)

贵州省科技支撑计划项目(2022277)

出版年

2024

模式识别与人工智能

中国自动化学会,国家智能计算机研究开发中心,中国科学院合肥智能机械研究所

模式识别与人工智能

CSTPCD北大核心

影响因子：0.954

ISSN：1003-6059

参考文献量23

段落导航