中国科学:信息科学(英文版)2024,Vol.67Issue(5) :65-79.DOI:10.1007/s11432-021-3688-y

Understanding adversarial attacks on observations in deep reinforcement learning

You QIAOBEN Chengyang YING Xinning ZHOU Hang SU Jun ZHU Bo ZHANG
中国科学:信息科学(英文版)2024,Vol.67Issue(5) :65-79.DOI:10.1007/s11432-021-3688-y

Understanding adversarial attacks on observations in deep reinforcement learning

You QIAOBEN 1Chengyang YING 1Xinning ZHOU 1Hang SU 2Jun ZHU 2Bo ZHANG1
扫码查看

作者信息

  • 1. Department of Computer Science and Technology,Beijing National Research Center for Information Science and Technology,Tsinghua-Bosch Joint Center for Machine Learning,Institute for Artificial Intelligence,Tsinghua University,Beijing 100084,China
  • 2. Department of Computer Science and Technology,Beijing National Research Center for Information Science and Technology,Tsinghua-Bosch Joint Center for Machine Learning,Institute for Artificial Intelligence,Tsinghua University,Beijing 100084,China;Peng Cheng Laboratory,Shenzhen 518055,China
  • 折叠

Abstract

Deep reinforcement learning models are vulnerable to adversarial attacks that can decrease the cumulative expected reward of a victim by manipulating its observations.Despite the efficiency of previous optimization-based methods for generating adversarial noise in supervised learning,such methods might not achieve the lowest cumulative reward since they do not generally explore the environmental dynamics.Herein,a framework is provided to better understand the existing methods by reformulating the problem of adversarial attacks on reinforcement learning in the function space.The reformulation approach adopted herein generates an optimal adversary in the function space of targeted attacks,repelling them via a generic two-stage framework.In the first stage,a deceptive policy is trained by hacking the environment and discovering a set of trajectories routing to the lowest reward or the worst-case performance.Next,the adversary misleads the victim to imitate the deceptive policy by perturbing the observations.Compared to existing approaches,it is theoretically shown that our adversary is strong under an appropriate noise level.Extensive experiments demonstrate the superiority of the proposed method in terms of efficiency and effectiveness,achieving state-of-the-art performance in both Atari and MuJoCo environments.

Key words

deep learning/reinforcement learning/adversarial robustness/adversarial attack

引用本文复制引用

基金项目

National Key Research and Development Program of China(2020AAA0104304)

National Key Research and Development Program of China(2017YFA0700904)

National Natural Science Foundation of China(61620106010)

National Natural Science Foundation of China(62061136001)

National Natural Science Foundation of China(61621136008)

National Natural Science Foundation of China(62076147)

National Natural Science Foundation of China(U19B2034)

National Natural Science Foundation of China(U1811461)

National Natural Science Foundation of China(U19A2081)

Beijing NSF Project(JQ19016)

Beijing Academy of Artificial Intelligence(BAAI)()

Tsinghua-Huawei Joint Research Program()

Tsinghua Institute for Guo Qiang()

Tsinghua-OPPO Joint Research Center for Future Terminal Technology()

Tsinghua-China Mobile Communications Group Co.,Ltd.Joint Institute()

出版年

2024
中国科学:信息科学(英文版)
中国科学院

中国科学:信息科学(英文版)

CSTPCDEI
影响因子:0.715
ISSN:1674-733X
参考文献量33
段落导航相关论文