多模态大语言模型对开源声像信息研究的影响

The impact of multi-modal large language models on open-source audio-visual information research

吴叔義 ¹郭秀峰 ¹侯丽¹

扫码查看

作者信息

1. 军事科学院军事科学信息研究中心,北京 100142
折叠

摘要

开源声像信息研究作为国防科技信息研究的组成部分,在自媒体与短视频爆发的现阶段重要性愈发凸显.大模型浪潮爆发后,深入探析多模态大语言模型对开源声像信息研究工作的影响具有重要意义.通过研究梳理多种多模态大语言模型技术特点和应用场景特点,提出在开源声像信息研究中的潜在应用方向,为开源声像信息研究工作提供参考.现阶段多模态大语言模型距离直接落地应用还有差距,但其将是重塑重构声像信息研究工作的重要推手,其生成特性也为开源声像信息研究带来极大挑战,开源声像信息研究进入转型升维的战略机遇期.

Abstract

Open-source audio-visual information research,as a component of defense technology information research,has become increasingly significant in the current era of social media and short video explosions.Following the surge of large model technology,it is of great significance to deeply analyze the impact of multimodal large language models on open-source audio-visual information research work.By studying and organizing the technical characteristics and application scenarios of various multimodal large language models,potential application directions in open-source audio-visual information research are proposed,providing a reference for the research work in this field.At present,there is still a gap for multimodal large models to be directly applied,but multimodal large language models will be an important driver in reshaping and reconstructing the work of audio-visual information research.Their generative characteristics also pose significant challenges to open-source audio-visual information research.Open-source audio-visual information research has entered a strategic period of transformation and upgrading.

关键词

多模态大语言模型/开源声像信息/人工智能

Key words

multi-modal large language model/open-source audio-visual information/artificial intelligence

引用本文复制引用

出版年

2024

国防科技

国防科学技术大学

国防科技

CSTPCD

影响因子：0.646

ISSN：1671-4547

参考文献量4

段落导航