首页|室内场景拟人交互研究进展

室内场景拟人交互研究进展

扫码查看
人类智能是在与环境交互中进化的,因而如何实现智能体与环境的自主交互是推进智能演化的关键。环境自主交互是一项涉及计算机图形学、计算机视觉和机器人等多个学科领域的研究课题,引起广泛的关注和探究,学术界已围绕这一热点研究问题从不同视角和技术维度开展了一系列研究工作。本文着眼于室内场景拟人交互,全面梳理数字人与机器人在室内环境下学习完成特定交互任务过程中需要涉及的仿真交互平台、场景交互数据和交互生成算法3方面基本要素的研究进展。在仿真交互环境搭建方面,本文梳理了仿真环境涉及的仿真技术和研究进展,并对代表性的拟人交互仿真平台进行了介绍;在场景交互数据构建方面,本文从场景交互感知数据集、场景交互运动数据集以及交互数据规模的高效扩充3方面对国内外研究现状进行了详细介绍;在拟人交互感知与生成方面,本文介绍了以交互为导向的场景可供性分析的相关工作,并以交互生成为线索,分别梳理了数字人—场景交互生成、机器人—场景交互生成的相关工作。基于对国内外相关工作的梳理和讨论,最后从交互仿真、交互数据、交互感知和交互生成4个方面,总结了该领域目前仍面临的挑战,并对未来的发展趋势进行了展望。
Research progress in human-like indoor scene interaction
Human intelligence evolves through interactions with the environment,which makes autonomous interaction between intelligent agents and the environment a key factor in advancing intelligence.Autonomous interaction with the environment is a research topic that involves multiple disciplines,such as computer graphics,computer vision,and robot-ics,and has attracted significant attention and exploration in recent years.In this study,we focus on human-like interac-tion in indoor environment and comprehensively review the research progress in the fundamental components including simulation interaction platforms,scene interaction data,and interaction generation algorithms for digital humans and robots.Regarding simulation interaction platforms,we comprehensively review representative simulation methods for vir-tual humans,objects,and human-object interaction.Specifically,we cover critical algorithms for articulated rigid-body simulation,deformable-body and cloth simulation,fluid simulation,contact and collision,and multi-body multi-physics coupling.In addition,we introduce several popular simulation platforms that are readily available for practitioners in the graphics,robotics,and machine learning communities.We classify these popular simulation platforms into two main cat-egories:simulators focusing on single-physics systems and those supporting multi-physics systems.We review typical simu-lation platforms in both categories and discuss their advantages in human-like indoor-scene interaction.Finally,we briefly discuss several emerging trends in the physical simulation community that inspire promising future directions:developing a full-featured simulator for multi-physics multi-body physical systems,equipping modern simulation platforms with differen-tiability,and combining physics principles with insights from learning techniques.Regarding scene interaction data,we provide an in-depth review of the latest developments and trends in datasets that support the understanding and generation of human-scene interactions.We focus on the need for agents to perceive scenes with a focus on interaction,assimilate interactive information,and recognize human interaction patterns to improve simulation and movement generation.Our review spans three areas:perception datasets for human-scene interaction,datasets for interaction motion,and methods for scaling data efficiently.Perception datasets facilitate a deeper understanding of 3D scenes,which highlights geometry,structure,functionality,and motion.They offer resources for interaction affordances,grasping poses,interactive compo-nents,and object positioning.Motion datasets,which are essential for crafting interactions,delve into interaction move-ment analysis,including motion segmentation,tracking,dynamic reconstruction,action recognition,and prediction.The fidelity and breadth of these datasets are vital for creating lifelike interactions.We also discuss scaling challenges,with the limitations of manual annotation and specialized hardware,and explore current solutions like cost-effective capture sys-tems,dataset integration,and data augmentation to enable the generation of extensive interactive models for advancing human-scene interaction research.For robot-scene interaction,this study emphasizes the importance of affordance,that is,the potential action possibilities that objects or environments can provide to users.It discusses approaches for detecting and analyzing affordance at different granularities,as well as affordance modeling techniques that combine multi-source and multimodal data.In the aspect of digital human-scene interaction,this study provides a detailed introduction to the simulation and generation methods of human motion,especially focusing on technologies based on deep learning and gen-erative models in recent years.Building on this foundation,the study reviews ways to represent a scene and recent success-ful approaches that achieve high-quality human-scene interaction simulation.Finally,we discuss the challenges and future development trends in this field.

environment interactioninteraction simulationinteraction datainteraction perceptioninteraction genera-tion

杜韬、胡瑞珍、刘利斌、弋力、赵昊

展开 >

清华大学交叉信息研究院,北京 100084

上海人工智能实验室,上海 200232

上海期智研究院,上海 200232

深圳大学计算机与软件学院,深圳 518061

北京大学智能学院,北京 100871

清华大学智能产业研究院,北京 100084

展开 >

环境交互 交互仿真 交互数据 交互感知 交互生成

国家自然科学基金项目

62322207

2024

中国图象图形学报
中国科学院遥感应用研究所,中国图象图形学学会 ,北京应用物理与计算数学研究所

中国图象图形学报

CSTPCD北大核心
影响因子:1.111
ISSN:1006-8961
年,卷(期):2024.29(6)