Research progress in human-like indoor scene interaction
Human intelligence evolves through interactions with the environment,which makes autonomous interaction between intelligent agents and the environment a key factor in advancing intelligence.Autonomous interaction with the environment is a research topic that involves multiple disciplines,such as computer graphics,computer vision,and robot-ics,and has attracted significant attention and exploration in recent years.In this study,we focus on human-like interac-tion in indoor environment and comprehensively review the research progress in the fundamental components including simulation interaction platforms,scene interaction data,and interaction generation algorithms for digital humans and robots.Regarding simulation interaction platforms,we comprehensively review representative simulation methods for vir-tual humans,objects,and human-object interaction.Specifically,we cover critical algorithms for articulated rigid-body simulation,deformable-body and cloth simulation,fluid simulation,contact and collision,and multi-body multi-physics coupling.In addition,we introduce several popular simulation platforms that are readily available for practitioners in the graphics,robotics,and machine learning communities.We classify these popular simulation platforms into two main cat-egories:simulators focusing on single-physics systems and those supporting multi-physics systems.We review typical simu-lation platforms in both categories and discuss their advantages in human-like indoor-scene interaction.Finally,we briefly discuss several emerging trends in the physical simulation community that inspire promising future directions:developing a full-featured simulator for multi-physics multi-body physical systems,equipping modern simulation platforms with differen-tiability,and combining physics principles with insights from learning techniques.Regarding scene interaction data,we provide an in-depth review of the latest developments and trends in datasets that support the understanding and generation of human-scene interactions.We focus on the need for agents to perceive scenes with a focus on interaction,assimilate interactive information,and recognize human interaction patterns to improve simulation and movement generation.Our review spans three areas:perception datasets for human-scene interaction,datasets for interaction motion,and methods for scaling data efficiently.Perception datasets facilitate a deeper understanding of 3D scenes,which highlights geometry,structure,functionality,and motion.They offer resources for interaction affordances,grasping poses,interactive compo-nents,and object positioning.Motion datasets,which are essential for crafting interactions,delve into interaction move-ment analysis,including motion segmentation,tracking,dynamic reconstruction,action recognition,and prediction.The fidelity and breadth of these datasets are vital for creating lifelike interactions.We also discuss scaling challenges,with the limitations of manual annotation and specialized hardware,and explore current solutions like cost-effective capture sys-tems,dataset integration,and data augmentation to enable the generation of extensive interactive models for advancing human-scene interaction research.For robot-scene interaction,this study emphasizes the importance of affordance,that is,the potential action possibilities that objects or environments can provide to users.It discusses approaches for detecting and analyzing affordance at different granularities,as well as affordance modeling techniques that combine multi-source and multimodal data.In the aspect of digital human-scene interaction,this study provides a detailed introduction to the simulation and generation methods of human motion,especially focusing on technologies based on deep learning and gen-erative models in recent years.Building on this foundation,the study reviews ways to represent a scene and recent success-ful approaches that achieve high-quality human-scene interaction simulation.Finally,we discuss the challenges and future development trends in this field.