单目标跟踪中的视觉智能评估技术综述

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：单目标跟踪任务旨在对人类动态视觉系统进行建模,让机器在复杂环境中具备类人的运动目标跟踪能力,并已广泛应用于无人驾驶、视频监控、机器人视觉等领域.研究者从算法设计的角度开展了大量工作,并在代表性数据集中表现出良好性能.然而,在面临如目标形变、快速运动、光照变化等挑战因素时,现有算法的跟踪效果和人类预期相比还存在着较大差距,揭示了当前的评测技术发展仍存在滞后性和局限性.综上,区别于以算法设计为核心的传统综述思路,本文依托单目标跟踪任务、从视觉智能评估技术出发,对评测流程中涉及的各个关键性环节(评测任务、评测环境、待测对象和评估机制)进行系统梳理.首先,对单目标跟踪任务的发展历程和挑战因素进行介绍,并详细对比了评估所需的评测环境(数据集、竞赛等).其次,对单目标跟踪待测对象进行介绍,不仅包含以相关滤波和孪生神经网络为代表的跟踪算法,同时也涉及跨学科领域开展的人类视觉跟踪实验.最后,从"机机对抗"和"人机对抗"两个角度对单目标跟踪评估机制进行回顾,并对当前待测对象的目标跟踪能力进行分析和总结.在此基础上,对单目标跟踪智能评估的发展趋势进行总结和展望,进一步分析未来研究中存在的挑战因素,并探讨了下一步可能的研究方向.

外文标题：Visual intelligence evaluation techniques for single object tracking:a survey

外文摘要：Single object tracking(SOT)task,which aims to model the human dynamic vision system and accomplish human-like object tracking ability in complex environments,has been widely used in various real-world applications like self-driving,video surveillance,and robot vision.Over the past decade,the development in deep learning has encouraged many research groups to work on designing different tracking frameworks like correlation filter(CF)and Siamese neural networks(SNNs),which facilitate the progress of SOT research.However,many factors(e.g.,target deformation,fast motion,and illumination changes)in natural application scenes still challenge the SOT trackers.Thus,algorithms with novel architectures have been proposed for robust tracking and to achieve better performance in representative experimental environments.However,several poor cases in natural application environments reveal a large gap between the performance of state-of-the-art trackers and human expectations,which motivates us to pay close attention to the evaluation aspects.Therefore,instead of the traditional reviews that mainly concentrate on algorithm design,this study systematically reviews the visual intelligence evaluation techniques for SOT,including four key aspects:the task definition,evaluation environ-ments,task executors,and evaluation mechanisms.First,we present the development direction of task definition,which includes the original short-term tracking,long-term tracking,and the recently proposed global instance tracking.With the evolution of the SOT definition,research has shown a progress from perceptual to cognitive intelligence.We also summa-rize challenging factors in the SOT task to help readers understand the research bottlenecks in actual applications.Second,we compare the representative experimental environments in SOT evaluation.Unlike existing reviews that mainly introduce datasets based on chronological order,this study divides the environments into three categories(i.e.,general datasets,dedicated datasets,and competition datasets)and introduces them separately.Third,we introduce the executors of SOT tasks,which not only include tracking algorithms represented by traditional trackers,CF-based trackers,SNN-based track-ers,and Transformer-based trackers but also contain human visual tracking experiments conducted in interdisciplinary fields.To our knowledge,none of the existing SOT reviews have included related works on human dynamic visual ability.Therefore,introducing interdisciplinary works can also support the visual intelligence evaluation by comparing machines with humans and better reveal the intelligence degree of existing algorithm modeling methods.Fourth,we review the evalu-ation mechanism and metrics,which encompass traditional machine-machine and novel human-machine comparisons,and analyze the target tracking capability of various task executors.We also provide an overview of the human-machine com-parison named visual Turing test,including its application in many vision tasks(e.g.,image comprehension,game navi-gation,image classification,and image recognition).Especially,we hope that this study can help researchers focus on this novel evaluation technique,better understand the capability bottlenecks,further explore the gaps between existing methods and humans,and finally achieve the goal of algorithmic intelligence.Finally,we indicate the evolution trend of visual intelligence evaluation techniques:1)designing more human-like task definitions,2)constructing more comprehen-sive and realistic evaluation environments,3)including human subjects as task executors,and 4)using human abilities as a baseline to evaluate machine intelligence.In conclusion,this study summarizes the evolution trend of visual intelligence evaluation techniques for SOT task,further analyzes the existing challenge factors,and discusses the possible future research directions.

外文关键词：

intelligence evaluation techniquecompetitions and datasetsvisual tracking abilitysingle object tracking(SOT)object tracking algorithms

作者：

胡世宇、赵鑫、黄凯奇

展开 >

作者单位：

中国科学院大学人工智能学院,北京 100049

中国科学院自动化研究所智能系统与工程研究中心,北京 100190

中国科学院脑科学与智能技术卓越创新中心,上海 200031

关键词：

智能评估技术竞赛和数据集视觉跟踪能力单目标跟踪(SOT) 目标跟踪算法

基金：

科技创新新一代人工智能重大项目(2030)国家自然科学基金中国科学院战略性先导科技专项

项目编号：

2022ZD011640361721004XDA27000000

出版年：

2024

DOI：

10.11834/jig.230498

中国图象图形学报

中国科学院遥感应用研究所,中国图象图形学学会 ,北京应用物理与计算数学研究所

中国图象图形学报

CSTPCD北大核心

影响因子：1.111

ISSN：1006-8961

年,卷(期)：2024.29(8)