AI人工智能翻译中数据增广策略和语法错误分析技术研究

扫码查看

原文链接

万方数据
维普

中文摘要：人工智能的信息处理逻辑可以对语言系统进行学习理解,进而在翻译工作中给出最优结果,以满足实际应用需要.研究结合数据增广策略和语料库对语法错误生成、纠正及检测模型进行数据训练.研究分析基于规则的数据增广策略对其数据处理进行分析,进而提高训练数据的质量,采用学习者语料库对不同规模的语法纠错(grammatical error correction,GEC)模型进行结果分析,得出200 M左右的合成数据训练的GEC模型的精准率为45％、召回率最高为24％、F_0.5值最高为38％.再对优化后的GEC模型进行训练,得出其值分别为37％、24％和34％.最后在重排序策略下基于数据增广策略的语法错误模型的结果为75％、43％和65％.因此,证明基于数据增广策略的语法错误模型具有高检测精度,为人工智能翻译技术提供技术支持.

外文标题：Research on Data Augmentation Strategies and Grammar Error Analysis Techniques in AI Artificial Intelligence Translation

外文摘要：The information processing logic of artificial intelligence can learn and understand language systems,and then provide optimal results in translation work to meet practical application needs.Research combines data augmentation strategies and corpora to train grammar error generation,correction,and detection models.Research and analyze rule-based data augmentation strategies to improve the quality of training data.Using a learner corpus to analyze the results of GEC models of different scales,it was found that the accuracy of the GEC model trained with synthetic data of around 200M is 45％,with the highest recall rate of 24％,and F_The maximum value of 0.5 is 38％.Further training was conducted on the optimized GEC model,resulting in values of 37％,24％,and 34％,respectively.Finally,the results of the grammar error model based on data augmentation strategy under reordering strategy are 75％,43％,and 65％.Therefore,it is proven that the grammar error model based on data augmentation strategy has high detection accuracy and improves technical support for artificial intelligence translation technology.

外文关键词：

data augmentation strategylearner corpusgrammar error correctionGEG modelreordering strategy

作者：

李潇

展开 >

作者单位：

咸阳师范学院,陕西咸阳 712000

关键词：

数据增广策略学习者语料库语法错误纠正 GEG模型重排序策略

基金：

陕西省教育厅专项科研计划项目

项目编号：

23JK0245

出版年：

2024

DOI：

10.14016/j.cnki.1001-9227.2024.07.243

自动化与仪器仪表

重庆工业自动化仪表研究所,重庆市自动化与仪器仪表学会

自动化与仪器仪表

CSTPCD

影响因子：0.327

ISSN：1001-9227

年,卷(期)：2024.(7)