基于多级空洞金字塔网络的视频指令学习框架

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据

中文摘要：为了从未修剪视频中生成操作指令,提出基于多级空洞金字塔网络(MS-APN)的视频指令学习框架.具体来说,使用空洞卷积金字塔模块捕捉视频多尺度动作特征,并采用多级网络结构优化分割结果,将未修剪视频分割成一系列视频片段并抽取动作特征.运用目标检测模型提取物体特征,并将其与动作特征进行融合,输入分类器识别主体和受体物体.通过定义指令四元组生成机器人指令.在MPII Cooking 2数据集上进行了实验,视频动作分割、操作物体分类、操作指令生成的准确率分别达到了 84.1％、76.5％和62.4％,并成功将系统部署到Baxter机器人上进行验证.

外文标题：A VIDEO COMMANDS LEARNING FRAMEWORK BASED ON MULTI-STAGE ATROUS PYRAMID NETWORK

外文摘要：We propose a video commands learning framework based on multi-stage atrous pyramid network(MS-APN)for generating robot manipulation instructions from untrimmed videos.Specifically,we introduced an atrous convolution pyramid module to capture multi-scale action features and a multi-stage architecture to refine the segmentation results.The untrimmed video was divided into a series of video segments,and action features were extracted.We applied the object detection model to extract the object features,and they were fused with the action features for inputting into two classifiers to recognize the subject and patient object.A command quadruplet was defined to represent robot commands.Experiments conducted on the MPII Cooking 2 dataset show that the accuracy of the action segmentation,object classification,and robot commands generation reach 84.1％,76.5％,62.4％,respectively.And we successfully deploy our system on a Baxter robot for further verifying the effectiveness of our framework.

外文关键词：

Video commands learningRobot commands generationAction segmentationAtrous convolution

作者：

朱展模、陈俊洪、杨振国、刘文印

展开 >

作者单位：

广东工业大学计算机学院广东广州 510006

关键词：

视频指令学习机器人指令生成动作分割空洞卷积

基金：

国家自然科学基金广东省基础与应用基础研究基金广东省引进创新科研团队计划广东省科技创新战略专项

项目编号：

917481072020A15150106162014ZT05G157pdjh2020a0173

出版年：

2024

DOI：

10.3969/j.issn.1000-386x.2024.05.019

计算机应用与软件

上海市计算技术研究所上海计算机软件技术开发中心

计算机应用与软件

CSTPCD北大核心

影响因子：0.615

ISSN：1000-386X

年,卷(期)：2024.41(5)

参考文献量23