Modality-experts coordinated adaptation for large multimodal models

扫码查看

原文链接

万方数据
维普

外文摘要：Driven by the expansion of foundation models and the increasing variety of downstream tasks,parameter-efficient fine-tuning(PEFT)methods have exhibited remarkable efficacy in the unimodal domain,effectively mitigating the consumption of computational resources.Although recent research has shifted at-tention to the multimodal domain and achieved efficient parametric adaptation of large multimodal models(LMMs)for downstream tasks,they still encounter two limitations:(1)low performance;(2)poor compati-bility.This work proposes a modality-experts coordinated adaptation(ModeX)method for the multimodal domain,offering an effective,plug-and-play,and lightweight adaptation architecture for diverse LMMs.Specifically,ModeX adaptively coordinates different modality experts in terms of the types of network struc-ture and input data.Besides,an effective coordinator equipped with a routing algorithm is developed for generating corresponding weights,which centers on leveraging the synergy among multimodal data.Ex-tensive experiments on 15 multimodal downstream benchmarks and five LMMs demonstrate that ModeX is capable of seamlessly adapting to diverse LMMs,outperforms the state-of-the-art PEFT methods and even exhibits superior performance compared with full fine-tuning methods.Notably,on NLVR2 task,ModeX achieves 84.06％accuracy with only 12.0M trainable parameters,outperforming the full fine-tuning by 1.63％.Moreover,our ModeX method demonstrates superior stability and offers higher training efficiency,both in terms of training parameters and training duration.Our source code has been released at https://github.com/zhangy0822/ModeX.

外文关键词：

large multimodal modelmultimodal learningvision-language pretrainingparameter-efficient fine-tuningadaptermodality expert

作者：

Yan ZHANG、Zhong JI、Yanwei PANG、Jungong HAN、Xuelong LI

展开 >

作者单位：

School of Electrical and Information Engineering,Tianjin Key Laboratory of Brain-Inspired Intelligence Technology,Tianjin University,Tianjin 300072,China

Shanghai Artificial Intelligence Laboratory,Shanghai 200232,China

Department of Automation,Tsinghua University,Beijing 100084,China

Institute of Artificial Intelligence(TeleAI),China Telecom Corporation Limited,Beijing 100033,China

展开 >

出版年：

2024

DOI：

10.1007/s11432-024-4234-4

中国科学:信息科学(英文版)

中国科学院

中国科学:信息科学(英文版)

CSTPCDEI

影响因子：0.715

ISSN：1674-733X

年,卷(期)：2024.67(12)