Creative natural language generation, such as poetry generation, writing lyrics, and storytelling, is appealing but difficult to evaluate。 We take the application of image-inspired poetry generation as a showcase and investigate two problems in evaluation: (1) how to evaluate the generated text when there are no ground truths, and (2) how to evaluate nondeterministic systems that output different texts given the same input image。 Regarding the first problem, we first design a judgment tool to collect ratings of a few poems for comparison with the inspiring image shown to assessors。 We then propose a novelty measurement that quantifies how different a generated text is compared to a known corpus。 Regarding the second problem, we experiment with different strategies to approximate evaluating multiple trials of output poems。 We also use a measure for quantifying the diversity of different texts generated in response to the same input image, and discuss their merits。
EvaluationPoetry generationNatural language generationAl-based creationImage
Chao-Chung Wu、Ruihua Song、Tetsuya Sakai、Wen-Feng Cheng、Xing Xie、Shou-De Lin
展开 >
National Taiwan University, Taipei, Taiwan
Microsoft Xialce, Beijing, China
Waseda University, Tokyo, Japan
Microsoft Research Asia, Beijing, China
展开 >
CCF international conference on natural language processing and Chinese computing