Neural Networks2022,Vol.14618.DOI:10.1016/j.neunet.2021.11.019

Leveraging hierarchy in multimodal generative models for effective cross-modality inference

Vasco M. Yin H. Melo F.S. Paiva A.
Neural Networks2022,Vol.14618.DOI:10.1016/j.neunet.2021.11.019

Leveraging hierarchy in multimodal generative models for effective cross-modality inference

Vasco M. 1Yin H. 2Melo F.S. 1Paiva A.1
扫码查看

作者信息

  • 1. INESC-ID & Instituto Superior Técnico University of Lisbon
  • 2. Division of Robotics Perception and Learning EECS at KTH Royal Institute of Technology
  • 折叠

Abstract

? 2021 Elsevier LtdThis work addresses the problem of cross-modality inference (CMI), i.e., inferring missing data of unavailable perceptual modalities (e.g., sound) using data from available perceptual modalities (e.g., image). We overview single-modality variational autoencoder methods and discuss three problems of computational cross-modality inference, arising from recent developments in multimodal generative models. Inspired by neural mechanisms of human recognition, we contribute the NEXUS model, a novel hierarchical generative model that can learn a multimodal representation of an arbitrary number of modalities in an unsupervised way. By exploiting hierarchical representation levels, NEXUS is able to generate high-quality, coherent data of missing modalities given any subset of available modalities. To evaluate CMI in a natural scenario with a high number of modalities, we contribute the “Multimodal Handwritten Digit” (MHD) dataset, a novel benchmark dataset that combines image, motion, sound and label information from digit handwriting. We access the key role of hierarchy in enabling high-quality samples during cross-modality inference and discuss how a novel training scheme enables NEXUS to learn a multimodal representation robust to missing modalities at test time. Our results show that NEXUS outperforms current state-of-the-art multimodal generative models in regards to their cross-modality inference capabilities.

Key words

Cross-modality inference/Deep learning/Multimodal representation learning

引用本文复制引用

出版年

2022
Neural Networks

Neural Networks

EISCI
ISSN:0893-6080
被引量2
参考文献量42
段落导航相关论文