首页|考虑智能体前置状态及环境特征自适应机制的强化学习电网调度方法

考虑智能体前置状态及环境特征自适应机制的强化学习电网调度方法

扫码查看
高比例可再生能源的接入使得电网潮流难以预测与控制,给电网的安全稳定运行带来了新的挑战.相较于传统的调度控制模式,以强化学习为代表的智能调度方式能够应对部分可观测电网环境下的顺序决策问题,但在电网中可再生能源比例发生变化时易出现适应性较差的状况.针对该问题,以Actor-Critic为基础框架,采用前置状态表征智能体状态,并引入环境特征自适应机制,用于可再生能源比例变化场景的电网调度任务.由于调度动作后的电网状态受源荷波动等外源性随机事件影响,易引起状态空间爆炸问题,在潮流计算之前采用前置状态表征智能体状态,可有效缩减状态空间.引入环境特征的自适应机制可有效避免"决策遗忘"的问题,从而提高智能体对电网中可再生能源比例变化的适应性.仿真实验结果表明,在可再生能源比例动态变化的 118 节点电网调度任务中,该方法在收敛速度和控制稳定性等方面均表现优异.
Reinforcement Learning Power Grid Dispatching Method Considering the Pre-state of Intelligent Agent and Adaptive Mechanism of Environmental Features
The high proportion of renewable energy integration makes it difficult to predict and control power grid trends,posing new challenges to the safe and stable operation of the power grid.Compared to traditional control modes,intelli-gent scheduling methods represented by reinforcement learning can cope with sequential decision-making problems in partially observable grid environments,but are prone to poor adaptability when the proportion of renewable energy in the grid changes.In response to this issue,the Actor-Critic is taken as the basic framework,the pre-state is used to represent the state of the intelligent agent,an adaptive mechanism for environmental features is introduced,and the above opera-tions are used for power grid scheduling tasks in scenarios where the proportion of renewable energy changes.Due to the influence of exogenous random events such as source load fluctuations on the state of the power grid after dispatch ac-tions,a state space explosion problem may easily arise.Using the pre-state representation of the intelligent agent state before power flow calculation can effectively reduce the state space.The introduction of adaptive mechanisms with uni-versal environmental features can effectively avoid the problem of"decision forgetting",thereby improving the adaptability of intelligent agents to changes in the proportion of renewable energy in the power grid.The simulation ex-perimental results show that this method performs well in terms of convergence speed and control stability in 118 node power grid scheduling tasks with dynamic changes in renewable energy proportion.

reinforcement learningA3C algorithmpre-stateadaptive mechanismpower grid scheduling

杨艳红、卢鑫、张雷杰、周世威、裴玮、朱丹丹

展开 >

中国科学院电工研究所,北京 100190

中国石油大学(北京)人工智能学院,北京 102249

强化学习 A3C算法 前置状态 自适应机制 电网调度

国家自然科学基金国家自然科学基金中国科学院青年创新促进会

52277131U20662112021136

2024

高电压技术
中国电力科学研究院 中国电机工程学会

高电压技术

CSTPCD北大核心
影响因子:2.32
ISSN:1003-6520
年,卷(期):2024.50(8)