基于英汉双语短语级平行语料的类别知识挖掘研究

扫码查看

原文链接

NETL
NSTL
万方数据
维普

中文摘要：在已有聚类算法的基础上，基于英汉双语短语级人文社会科学平行语料，进行类别知识挖掘的实验。根据实验数据并结合具体的研究需求，确定相应的聚类算法和英语形态转换的算法。通过对汉语、英语和英汉双语阋汇级知识聚类的性能进行对比，确定英汉双语词汇特征的性能优于单语。获取的类别知识可以直接应刚到知识库、机器翻译模型的构建中，同时探究英汉两种词汇在类别知识获取过程中具体表现。

外文标题：Research of Mining the Category Knowledge Based on English - Chinese Humanities and Social Sciences Parallel Corpus in Phrase Level

外文摘要：The experiment of mining the category knowledge from English - Chinese humanities and social sciences parallel corpus in phrase level is performed based on the established clustering algorithm. The clustering and morphological conver- sion algorithms are determined by experimental data and specific research needs. The performance of English - Chinese bilingual word features is better than monolingual word by comparing the performance of the Chinese, English and English - Chinese word level knowledge clustering. The category knowledge is directly applied to knowledge base and machine translation system, and the English and Chinese word' s expression is explored in mining the category knowledge.

外文关键词：

CSSCI English- Chinese parallel corpus in phrase level Bisecting Kmeans clustering algorithm Category knowledge

作者：

王东波、韩普、沈思、魏向清

展开 >

作者单位：

南京农业大学信息科学技术学院,南京210095

南京大学信息管理学院,南京210093

南京大学双语词典研究中心,南京210093

关键词：

CSSCI英汉双语短语级平行语料Bisecting K—means Clustering算法类别知识

基金：

国家高技术研究发展计划(863计划)国家社会科.学:基金重点项目江苏省研究生培养创新工程

项目编号：

2011AA01A20611AYY002CXZZ12-0073

出版年：

2012

数据分析与知识发现

中国科学院文献情报中心

数据分析与知识发现

CSSCICHSSCD北大核心

影响因子：1.452

ISSN：2096-3467

年,卷(期)：2012.(11)

被引量1
参考文献量1