Grammar induction from visual, speech and text

扫码查看

原文链接

NETL
NSTL
Elsevier

外文摘要：Grammar Induction (GI) seeks to uncover the underlying grammatical rules and linguistic patterns of a language, positioning it as a pivotal research topic within Artificial Intelligence (AI). Although extensive research in GI has predominantly focused on text or other singular modalities, we reveal that GI could significantly benefit from rich heterogeneous signals, such as text, vision, and acoustics. In the process, features from distinct modalities essentially serve complementary roles to each other. With such intuition, this work introduces a novel unsvpervised visual-audio-text grammar induction task (named VAT-GI), to induce the constituent grammar trees from parallel images, text, and speech inputs. Inspired by the fact that language grammar natively exists beyond the texts, we argue that the text has not to be the predominant modality in grammar induction. Thus we further introduce a textless setting of VAT-GI, wherein the task solely relies on visual and auditory inputs. To approach the task, we propose a visual-audio-text inside-outside recursive autoencoder (VaTiora) framework, which leverages rich modal-specific and complementary features for effective grammar parsing. Besides, a more challenging benchmark data is constructed to assess the generalization ability of VAT-GI system. Experiments on two benchmark datasets demonstrate that our proposed VaTiora system is more effective in incorporating the various multimodal signals, and also presents new state-of-the-art performance of VAT-GI. Further in-depth analyses are shown to gain a deep understanding of the VAT-GI task and how our VaTiora system advances.

外文关键词：

Grammar inductionMultimodal learningStructure modeling

作者：

Yu Zhao、Hao Fei、Shengqiong Wu、Meishan Zhang、Min Zhang、Tat-seng Chua

展开 >

作者单位：

Harbin Institute of Technology (Shenzhen), Shenzhen, 518055, China

National University of Singapore, Singapore, 118404, Singapore

出版年：

2025

DOI：

10.1016/j.artint.2025.104306

Artificial intelligence

SCI

ISSN：0004-3702

年,卷(期)：2025.341(Apr.)

参考文献量64