通信学报2024,Vol.45Issue(12) :16-27.DOI:10.11959/j.issn.1000-436x.2024264

无人系统中离线强化学习的隐蔽数据投毒攻击方法

Stealthy data poisoning attack method on offline reinforcement learning in unmanned systems

周雪 苘大鹏 许晨 吕继光 曾凡一 高朝阳 杨武
通信学报2024,Vol.45Issue(12) :16-27.DOI:10.11959/j.issn.1000-436x.2024264

无人系统中离线强化学习的隐蔽数据投毒攻击方法

Stealthy data poisoning attack method on offline reinforcement learning in unmanned systems

周雪 1苘大鹏 1许晨 1吕继光 1曾凡一 1高朝阳 1杨武1
扫码查看

作者信息

  • 1. 哈尔滨工程大学计算机科学与技术学院,黑龙江 哈尔滨 150000
  • 折叠

摘要

针对现有离线强化学习数据投毒攻击方法有效性及隐蔽性不足的问题,提出一种关键时间步动态投毒攻击方法,通过对重要性较高的样本进行动态扰动,实现高效隐蔽的攻击效果.具体来说,通过理论分析发现时序差分误差对于模型学习过程具有重要影响,将其作为投毒目标选择的依据;进一步提出基于双目标优化的投毒方法,在最小化扰动幅度的同时,最大化攻击对模型性能产生的负面影响,为每个投毒样本生成最优扰动幅度.在多种任务及算法中的实验结果表明,所提攻击方法仅在投毒比例为整体数据1%的情况下,就能使智能体的平均性能下降84%,揭示了无人系统中离线强化学习模型的敏感性及脆弱性.

Abstract

Aiming at the limitations in effectiveness and stealth of existing offline reinforcement learning(RL)data poi-soning attacks,a critical time-step dynamic poisoning attack was proposed,perturbing important samples to achieve effi-cient and covert attacks.Temporal difference errors,identified through theoretical analysis as crucial for model learning,were used to guide poisoning target selection.A bi-objective optimization approach was introduced to minimize perturba-tion magnitude while maximizing the negative impact on performance.Experimental results show that with only a 1%poisoning rate,the method reduces agent performance by 84%,revealing the sensitivity and vulnerability of offline RL models in unmanned systems.

关键词

无人系统/离线强化学习/数据投毒攻击/数据安全

Key words

unmanned system/offline reinforcement learning/data poisoning attack/data security

引用本文复制引用

出版年

2024
通信学报
中国通信学会

通信学报

CSTPCDCSCD北大核心
影响因子:1.265
ISSN:1000-436X
段落导航相关论文