首页|Bilingual Parallel Active Learning Between Chinese and English

Bilingual Parallel Active Learning Between Chinese and English

扫码查看
Active learning is an effective machine learning paradigm which can significantly reduce the amount of labor for manually annotating NLP corpora while achieving competitive performance。 Previous studies on active learning are focused on corpora in one single language or two languages translated from each other。 This paper proposes a Bilingual Parallel Active Learning paradigm (BPAL), where an instance-level parallel Chinese and English corpus adapted from OntoNotes is augmented for relation extraction and both the seeds and jointly selected unlabeled instances at each iteration are parallel between two languages in order to enhance active learning。 Experimental results on the task of relation classification on the corpus demonstrate that BPAL can significantly outperform monolingual active learning。 Moreover, the success of BPAL suggests a new way of annotating parallel corpora for NLP tasks in order to induce two high-performance classifiers in two languages respectively。

Active learningParallel corpusRelation classification

Longhua Qian、JiaXin Liu、Guodong Zhou、Qiaoming Zhu

展开 >

Natural Language Processing Lab, Soochow University, Suzhou 215006, Jiangsu, China,School of Computer Science and Technology, Soochow University, Suzhou 215006, Jiangsu, China

International conference on computer processing of oriental languages;CCF conference on natural language processing and Chinese computing

Kunming(CN)

Natural language understanding and intelligent applications

116-128

2016