首页|Resolving Coordinate Structures for Chinese Constituent Parsing

Resolving Coordinate Structures for Chinese Constituent Parsing

扫码查看
Coordinate structures are linguistic structures consisting of two or more conjuncts, which usually compose into larger constituent as a whole unit。 However, the boundary of each conjunct is difficult to identify, which makes it difficult to parse the whole coordinate and larger structures。 In labeled data, such as the Penn Chinese Tree Bank (CTB), coordinate structures are not labeled explicitly, which makes solving the problem more complicated。 In this paper, we treat resolving coordinate structures as an independent sub-problem of parsing。 We first define coordinate structures explicitly and design rules to extract the coordinate structures from labeled CTB data。 Then a specifically designed grammar is proposed for automatic parsing of coordinate structures。 We propose two groups of new features to better model coordinate structures in a shift-reduce parsing framework。 Our approach can achieve a 15% improvement in F-1 score on resolving coordinate structures。

Coordinate structureGrammarShift-reducePhrase similarity

Yichu Zhou、Shujian Huang、Xinyu Dai、Jiajun Chen

展开 >

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China

International conference on natural language processing and Chinese computing

Nanchang(CN)

Natural language processing and Chinese computing

353-361

2015