Robotics & Machine Learning Daily News2024,Issue(Jun.24) :56-57.

New Findings from Tongji University in the Area of Robotics Described (Contrast, Imitate, Adapt: Learning Robotic Skills From Raw Human Videos)

描述了同济大学在机器人领域的新发现(对比、模仿、适应:从原始人类视频中学习机器人技能)

Robotics & Machine Learning Daily News2024,Issue(Jun.24) :56-57.

New Findings from Tongji University in the Area of Robotics Described (Contrast, Imitate, Adapt: Learning Robotic Skills From Raw Human Videos)

描述了同济大学在机器人领域的新发现(对比、模仿、适应:从原始人类视频中学习机器人技能)

扫码查看

摘要

由一名新闻记者-机器人与机器学习的工作人员新闻编辑每日新闻-调查人员发布了关于机器人的新报告。根据NewsRx编辑在中国上海的新闻报道,研究表明:“从原始的人类视频eos中学习机器人技能仍然是一个不小的挑战。以前的工作通过疯狂的行为克隆或从视频中学习奖励函数来解决这个问题。”本研究经费来源于国家自然科学基金(NSFC)。我们的新闻记者从同济大学的研究中获得了一句话:“尽管他们表现出色,但他们可能会引入几个问题,例如机器人动作的必要性,人与机器人视频之间一致的视点和相似的布局要求,以及样本效率低。”我们的主要见解是通过对比视频来学习任务先验,通过模仿视频中的轨迹来学习动作先验,本文提出了一种基于对比-模仿-适应(CIA)的三阶段技能学习框架,提出了一种基于交互感知的对齐变换,通过对视频对的对齐来学习任务优先级,并利用轨迹生成模型来提取动作优先级,以适应不同于人类视频的新场景,为了提高交互的安全性和采样效率,提出了一种基于轨迹语义方向的优化方法,并以IAAformer计算的对齐距离作为奖励,对CIA在6个实际任务中的性能进行了评估。实验证明,CIA在任务成功率和对不同场景布局和对象实例的泛化能力方面明显优于以往的研究成果,本文旨在研究从原始人类视频中获得的机器人技能,与实验室中的遥操作或动觉教学相比,这种学习方法可以灵活地利用互联网上的大规模人类视频。以往关于视频学习的工作通常存在一些问题,包括对机器人动作的要求、一致的视点、相似的布局和低的采样效率。为了缓解这些问题,本文提出了一种三阶段技能学习框架CIA,通过基于变换的模型和自监督损失函数,利用时间对齐来学习任务先验知识,通过训练轨迹生成模型来学习动作先验知识.为了进一步适应不同场景,我们提出了一种基于初始化和交互的两阶段策略改进方法,并引入优化方法来保证安全交互和样本效率."其中最优化目标是由学习到的任务优先级指导的."

Abstract

By a News Reporter-Staff News Editor at Robotics & Machine Learning Daily News Daily News-Investigators publish new report on Ro botics. According to news reporting out of Shanghai, People's Republic of China, by NewsRx editors, research stated, "Learning robotic skills from raw human vid eos remains a non-trivial challenge. Previous works tackled this problem by leve raging behavior cloning or learning reward functions from videos." Financial support for this research came from National Natural Science Foundatio n of China (NSFC). Our news journalists obtained a quote from the research from Tongji University, "Despite their remarkable performances, they may introduce several issues, such as the necessity for robot actions, requirements for consistent viewpoints and s imilar layouts between human and robot videos, as well as low sample efficiency. To this end, our key insight is to learn task priors by contrasting videos and to learn action priors through imitating trajectories from videos, and to utiliz e the task priors to guide trajectories to adapt to novel scenarios. We propose a three-stage skill learning framework denoted as Contrast-Imitate-Adapt (CIA). An interaction-aware alignment transformer is proposed to learn task priors by t emporally aligning video pairs. Then a trajectory generation model is used to le arn action priors. To adapt to novel scenarios different from human videos, the Inversion-Interaction method is designed to initialize coarse trajectories and r efine them by limited interaction. In addition, CIA introduces an optimization m ethod based on semantic directions of trajectories for interaction security and sample efficiency. The alignment distances computed by IAAformer are used as the rewards. We evaluate CIA in six real-world everyday tasks, and empirically demo nstrate that CIA significantly outperforms previous state-of-the-art works in te rms of task success rate and generalization to diverse novel scenarios layouts a nd object instances. Note to Practitioners-This work aims to study robot skill l earning from raw human videos. Compared with teleoperation or kinesthetic teachi ng in the laboratory, such learning method can flexibly utilize large-scale huma n videos available on the Internet, thereby improving the robot's ability to gen eralize to various complex scenarios. Previous works on learning from videos usu ally have some issues, including requirements for robot actions, consistent view points, similar layouts and low sample efficiency. To alleviate these issues, we propose a three-stage skill learning framework CIA. Temporal alignment is utili zed to learn task priors through our proposed transformer-based model and self-s upervised loss functions. A trajectory generation model is trained to learn the action priors. To further adapt to diverse scenarios, we propose a two-stage pol icy improvement method by initialization and interaction. An optimization method is introduced to ensure safe interaction and sample efficiency, where the optim ization objective is guided by the learned task priors."

Key words

Shanghai/People's Republic of China/As ia/Emerging Technologies/Machine Learning/Robot/Robotics/Robots/Tongji Uni versity

引用本文复制引用

出版年

2024
Robotics & Machine Learning Daily News

Robotics & Machine Learning Daily News

ISSN:
段落导航相关论文