四款主流语料对齐工具性能对比探析
COMPARISON AND EVALUATION OF THE PERFORMANCE OF FOUR MAINSTREAM BILINGUAL CORPUS ALIGNMENT TOOLS
王琴 1王宇春1
作者信息
- 1. 山西工商学院 外国语学院 山西 太原 030036
- 折叠
摘要
利用实验研究的方法,以科技、旅游、文学、政治四种中英双语文本为样本,对四款语料对齐工具进行了对比研究.研究发现,从基础技术指标来看,Matecat Aligner、ABBYY Aligner 2.0、Tmxmall Aligner更有优势;从非基础技术指标来看,Matecat Aligner和 Déjà Vu X3 Alignment在断句准确率方面更为突出,ABBYY Aligner 2.0 在对齐准确率方面要优于其他工具;ABBYY Aligner 2.0 和 Matecat Aligner具有纠错功能.通过具体分析发现,使用不同类型的文本,对齐质量也有所不同;不同的语料对齐工具适合不同文本的对齐.
Abstract
This paper describes an experiment to compare and evaluate the performance of four bilingual corpus alignment tools using English/Chinese texts of science and technology,tourism,literature,and politics as samples.Our results showed that the performance of Matecat Aligner and Tmxmall Aligner were better in terms of basic technical indicators such as the size of the aligned files,the types of the alignment,supported text languages and file formats;In terms of non-basic technical indicators,Matecat Aligner and Déjà Vu X3 Alignment were more prominent in segmentation accuracy,and ABBYY Aligner 2.0 outperformed the other tools in terms of align-ment quality;ABBYY Aligner 2.0 and Matecat Aligner offered the feature of correctly aligning the following segments after the formal non-aligned source and target segments.Through specific analysis,it was found that the alignment quality was different when different types of texts were used;Different corpus alignment tools were suitable for the alignment of different texts.
关键词
语料对齐工具/性能对比/评价Key words
bilingual corpus alignment tools/comparison of their performance/evaluation引用本文复制引用
基金项目
中国高教学会高等教育科研规划项目(2023)(23XJH0410)
山西工商学院教学改革创新项目(2023)(JG202350)
出版年
2024