首页|自动驾驶开源数据体系:现状与未来

自动驾驶开源数据体系:现状与未来

扫码查看
随着自动驾驶技术的不断成熟与应用,系统性梳理开源自动驾驶数据集有利于产业生态良性循环.现有自动驾驶数据集可大致分为两代,第一代数据集的传感模态复杂度相对较低、数据集规模相对较小,且大多局限于感知级任务,以发布于2012年的KITTI为代表.相比于第一代数据集,第二代数据集的特征为传感模态复杂度较高、数据集规模与多样性较丰富、所设置任务从感知扩展到预测、规控上,以2019年前后提出的nuScenes,Waymo为代表.本文联合学术界、产业界同仁,首次系统性梳理了国内外70余种开源自动驾驶数据集,对如何构建高质量数据集、数据在算法闭环体系中发挥的核心作用、如何利用生成式大模型规模化生产数据等进行了总结.此外,就未来第三代自动驾驶数据集应该具备的特质和数据规模,以及需要解决的科学与技术问题,进行了详细分析与讨论.希望本文的归纳与展望能促进新一代自动驾驶数据集与生态体系的建设、推动关键领域自主原创与科技自强的发展.
Open-sourced data ecosystem in autonomous driving:the present and future
With the continuous maturation and application of autonomous driving technology,a systematic examination of open-source autonomous driving datasets becomes instrumental in fostering the robust evolution of the industry ecosystem.Current autonomous driving datasets can broadly be categorized into two generations.The first-generation autonomous driving dataset is characterized by relatively simpler sensor modalities,a smaller dataset scale,and a limitation to perception-level tasks.KITTI,introduced in 2012,serves as a prominent representative of this initial wave.In contrast,the second-generation datasets exhibit heightened complexity in sensor modalities,greater dataset scale and diversity,and an expansion of tasks from perception to encompass prediction and control.Leading examples of the second generation include nuScenes and Waymo,introduced around 2019.This comprehensive review,conducted in collaboration with esteemed colleagues from both academia and industry,systematically assesses over seventy open-source autonomous driving datasets from domestic and international sources.It offers insights into various aspects,such as the principles underlying the creation of high-quality datasets,the pivotal role of data within algorithmic closed-loop systems,and the utilization of generative foundation models to facilitate scalable data generation.Furthermore,this review undertakes an exhaustive analysis and discourse regarding the characteristics and data scales that future third-generation autonomous driving datasets should possess.It also delves into the scientific and technical challenges that warrant resolution.The synthesis and perspectives presented in this article provide valuable guidance for the development of a novel generation of autonomous driving datasets and ecosystems.These endeavors are pivotal in advancing autonomous innovation and fostering technological enhancement in critical domains.

autonomous drivingdata pipelinefoundation modeldataset and challenge

李弘扬、李阳、王晖杰、曾嘉、徐慧琳、蔡品隆、陈立、严骏驰、徐丰、熊璐、王井东、朱福堂、许春景、汪天才、夏飞、穆北鹏、彭志辉、林达华、乔宇

展开 >

上海人工智能实验室,上海 200232

上海交通大学计算机科学与工程系,上海 200240

复旦大学信息科学与工程学院,上海 200433

同济大学汽车学院,上海 200092

百度,北京 100085

比亚迪,深圳 518118

华为,深圳 518129

旷视科技,北京 100096

美团,北京 100102

智元机器人,上海 201315

展开 >

自动驾驶 数据算法闭环 基础模型 数据集与挑战赛

科技创新2030"新一代人工智能"重大项目国家自然科学基金青年项目国家自然科学基金重大研究计划重点项目国家自然科学基金优秀青年项目上海市启明星计划中国博士后科学基金上海市青年科技英才扬帆计划

2022ZD016010462206172923702016222260722QA14125002023M74184823YF1462000

2024

中国科学F辑
中国科学院,国家自然科学基金委员会

中国科学F辑

CSTPCD北大核心
影响因子:1.438
ISSN:1674-5973
年,卷(期):2024.54(6)