数字人文范式下机器学习文本分类应用于翻译研究的路径探索——以翻译汉语句法特征研究为例

A Study of Text Classification Based on Machine Learning in Translation Studies

钟书能 ¹杨立汝²

扫码查看

作者信息

1. 华南理工大学外国语学院(广州510640)
2. 华南理工大学外国语学院(广州 510640)
折叠

摘要

文本分类等大数据挖掘技术的应用是数字人文范式下翻译研究的主要特征之一.翻译语言特征研究是翻译研究的基础领域.本研究提出机器学习文本分类应用于翻译语言特征研究的"五步法"研究路径,包含文本分类数据远观、贡献度排序特征中观、随机选择文本细读、语言规律总结和规律成因阐释等五个步骤.本研究依循该路径考察了翻译汉语的句法特征,发现翻译汉语相比原创汉语的最显著特征是数词在"数词+作名词的量词"表名词短语、习语、"数词+量词+名词"表模糊义等范畴边缘成员上的负使用,其认知成因在于译者倾向于忽略语义网络中突显程度较低的范畴边缘成员.案例研究表明,引入机器学习文本分类算法能够提升语言宏观描写层面的全面性、客观性与科学性,基于数据结论随机选择文本开展语例细读则有助于深入挖掘形式数据背后隐含的更细颗粒度的语言规律.本研究旨在为数字人文范式下的翻译研究提供新的方法与思路.

Abstract

As is known,there is a close correlation between machine learning in translation studies and digital humanities.This paper proposes a five-steps method for text classification in translation studies,including distant reading of data,middle-distance reading of discriminative features,scrutinizing of randomly selected texts,identification of linguistic patterns and interpretation of linguistic patterns.With the five-steps method,it is found that the translation-oriented Chinese is characterized by less use of two noun phrase patterns—numeral+quantifier+noun and numeral+quantifier,which fall into the peripheral members in the respective categories concerned,and of which cognitive mechanism may be attributed to the translators'ignorance of the less prominent members of the semantic network.The case study reveals that text classification algorithm can improve the sketch of the integrity and objectivity of the translation-oriented language per se,while the close reading of data-driven text facilitates to pining on more fine-grained language patterns.This paper aims to shed some light on new methods for translation studies in the perspective of digital humanities.

关键词

数字人文/机器学习/文本分类/翻译语言特征研究/五步法

Key words

digital humanity/machine learning/text classification/translation studies/five-steps method

引用本文复制引用

基金项目

教育部首批新文科研究与改革实践项目(2021110070)

广州市哲学社会科学发展"十四五"规划2023年度课题(2023GZGJ229)

出版年

2024

上海交通大学学报(哲学社会科学版)

上海交通大学

上海交通大学学报(哲学社会科学版)

CSSCICHSSCD北大核心

影响因子：0.94

ISSN：1008-7095

段落导航