New Findings from Tongji University in the Area of Robotics Described (Contrast, Imitate, Adapt: Learning Robotic Skills From Raw Human Videos)

扫码查看

原文链接

NETL
NSTL

外文摘要：By a News Reporter-Staff News Editor at Robotics & Machine Learning Daily News Daily News-Investigators publish new report on Ro botics. According to news reporting out of Shanghai, People's Republic of China, by NewsRx editors, research stated, "Learning robotic skills from raw human vid eos remains a non-trivial challenge. Previous works tackled this problem by leve raging behavior cloning or learning reward functions from videos." Financial support for this research came from National Natural Science Foundatio n of China (NSFC). Our news journalists obtained a quote from the research from Tongji University, "Despite their remarkable performances, they may introduce several issues, such as the necessity for robot actions, requirements for consistent viewpoints and s imilar layouts between human and robot videos, as well as low sample efficiency. To this end, our key insight is to learn task priors by contrasting videos and to learn action priors through imitating trajectories from videos, and to utiliz e the task priors to guide trajectories to adapt to novel scenarios. We propose a three-stage skill learning framework denoted as Contrast-Imitate-Adapt (CIA). An interaction-aware alignment transformer is proposed to learn task priors by t emporally aligning video pairs. Then a trajectory generation model is used to le arn action priors. To adapt to novel scenarios different from human videos, the Inversion-Interaction method is designed to initialize coarse trajectories and r efine them by limited interaction. In addition, CIA introduces an optimization m ethod based on semantic directions of trajectories for interaction security and sample efficiency. The alignment distances computed by IAAformer are used as the rewards. We evaluate CIA in six real-world everyday tasks, and empirically demo nstrate that CIA significantly outperforms previous state-of-the-art works in te rms of task success rate and generalization to diverse novel scenarios layouts a nd object instances. Note to Practitioners-This work aims to study robot skill l earning from raw human videos. Compared with teleoperation or kinesthetic teachi ng in the laboratory, such learning method can flexibly utilize large-scale huma n videos available on the Internet, thereby improving the robot's ability to gen eralize to various complex scenarios. Previous works on learning from videos usu ally have some issues, including requirements for robot actions, consistent view points, similar layouts and low sample efficiency. To alleviate these issues, we propose a three-stage skill learning framework CIA. Temporal alignment is utili zed to learn task priors through our proposed transformer-based model and self-s upervised loss functions. A trajectory generation model is trained to learn the action priors. To further adapt to diverse scenarios, we propose a two-stage pol icy improvement method by initialization and interaction. An optimization method is introduced to ensure safe interaction and sample efficiency, where the optim ization objective is guided by the learned task priors."

外文关键词：

ShanghaiPeople's Republic of ChinaAs iaEmerging TechnologiesMachine LearningRobotRoboticsRobotsTongji Uni versity

出版年：

2024

Robotics & Machine Learning Daily News

ISSN：

年,卷(期)：2024.(Jun.24)