首页|大语言模型词向量的语言学价值及其应用所面临的课题

大语言模型词向量的语言学价值及其应用所面临的课题

扫码查看
词向量是利用大语言模型提取的人类语言的数学表征,可为语言研究提供科学手段.词向量的方法论价值在于引入数学方法研究语言规律,为解决语言计算问题打下了理论基础.研究表明,用词向量可以实现一些语义和语法关系的计算,在解决语言学问题上具有应用价值.词向量技术虽然已经达到一定的高度,但在利用词向量开展语言学研究时仍面临一些课题,如语言学研究中词向量的应用范式、跨语言研究词向量的应用等等.
The Linguistic Significance of Word Vectors and Issues in the Applied Research
Word vectors,the mathematical expression of human languages extracted from large-scale corpora,are an important scientific achievement in computational linguistics.Applying word vectors to linguistics research,linguists can not only describe linguistic phenomena with morphological features,but also provide a powerful means to better grasp semantics.Word vectors prove to be a theoretical foundation for introducing mathematical methods to study and describe the laws of languages,and has been instrumental in solving the calculation problem of language phenomenon.The methodological value of word vectors is that it opens the door for explaining languages with mathematical methods,which can substantially address the linguistics research dilemma of using languages to explain the laws of languages.Foreign studies based on an English word-vector model show that word vectors can calculate five semantic relations and nine grammatical relations in English.Our research based on a Chinese word-vector model shows that word vectors can not only calculate the semantic relationship of some words,but also analyze the meaning induction,use and distribution of Chinese words.These results show that word vectors have applicational value in solving specific linguistic problems.As the technique of word vector continue to evolve,the opaque representation of word vectors has raised new issues in linguistic research,such as the linguistic meaning contained in a high-dimensional word-vector space as well as its decomposition and interpretation,the application paradigm of word vectors in linguistics research,the size and content of corpora required for training word-vector models,the application of word vectors in cross-linguistic research,and the training of word-vector models in specific period of diachronic research.

word vectorlarge language modelmathematical expression of languagecomputational linguistics

施建军

展开 >

上海外国语大学语言研究院

词向量 语言学价值 大语言模型 数学表征 计算语言学

2024

当代语言学
中国社会科学院语言研究所

当代语言学

CSTPCDCSSCICHSSCD北大核心
影响因子:0.384
ISSN:1007-8274
年,卷(期):2024.26(4)