首页|VD-PCR: Improving visual dialog with pronoun coreference resolution

VD-PCR: Improving visual dialog with pronoun coreference resolution

扫码查看
The visual dialog task requires an AI agent to interact with humans in multi-round dialogs based on a visual environment. As a common linguistic phenomenon, pronouns are often used in dialogs to improve the communication efficiency. As a result, resolving pronouns (i.e., grounding pronouns to the noun phrases they refer to) is an essential step towards understanding dialogs. In this paper, we propose VDPCR, a novel framework to improve Visual Dialog understanding with Pronoun Coreference Resolution in both implicit and explicit ways. First, to implicitly help models understand pronouns, we design novel methods to perform the joint training of the pronoun coreference resolution and visual dialog tasks. Second, after observing that the coreference relationship of pronouns and their referents indicates the relevance between dialog rounds, we propose to explicitly prune the irrelevant history rounds in visual dialog models' input. With pruned input, the models can focus on relevant dialog history and ignore the distraction in the irrelevant one. With the proposed implicit and explicit methods, VD-PCR achieves state-of-the-art experimental results on the VisDial dataset. (c) 2022 Elsevier Ltd. All rights reserved.

Vision and languageVisual dialogPronoun coreference resolution

Yu, Xintong、Zhang, Hongming、Hong, Ruixin、Song, Yangqiu、Zhang, Changshui

展开 >

Tsinghua Univ

Hong Kong Univ Sci & Technol

2022

Pattern Recognition

Pattern Recognition

EISCI
ISSN:0031-3203
年,卷(期):2022.125
  • 2
  • 43