奖励函数设计的合理性对于提升深度强化学习算法的性能至关重要.针对投资组合管理任务,识别并解决了现有奖励函数的两大缺陷:一是过度关注短期市场波动而忽略长期趋势;二是对带来奖励和造成损失行为的奖惩相当,这并不符合投资者的损失厌恶心理.为此,借鉴行为金融学中的投资者损失厌恶理论,创新性地提出了一种多步损失厌恶(Multi-step Loss Aversion,MSLA)奖励函数,以更准确地刻画投资者在交易中的行为模式,并据此构建了在线投资组合管理策略.选取A股市场上三个具有代表性的指数,构建了相应的投资组合,在2019年至2023年的历史数据上进行了回测实验.实验结果表明,MSLA奖励函数显著提升了策略的整体性能,从累计收益率、夏普比率和最大回撤等指标来看,普遍优于现有的其他算法.此外,该策略不仅适用于不同市值大小股票组成的投资组合,而且在上涨、下跌和震荡的市场状态下均能保持稳健的性能,这充分说明了该算法在投资组合管理中的有效性和实用性.
A Novel Online Portfolio Management Strategy Based on Dynamic Multi-step Loss Aversion Reward
The rationality of the reward function is crucial for enhancing the performance of the Deep Reinforcement Learning algorithms.In portfolio management,this study identifies and solves two major flaws in existing reward functions:first,overemphasis on short-term mar-ket fluctuations and neglect of long-term trends;second,the equivalent rewards or punishments for actions that result in gains or losses,which is not in line with the investor's loss aversion psychology.To this end,drawing on the loss aversion theory in behavioral finance,this paper innovatively proposes a multi-step loss aversion(MSLA)reward function,which more accurate-ly captures the behavioral patterns of investors in trading and constructs an online portfolio management strategy based on the MSLA.The study selects three representative indices from the A-share market to build corresponding portfolios and conducts several backtesting exper-iments on historical data from 2019 to 2023.The experimental results demonstrate that the MSLA reward function significantly improves the overall performance of the portfolio strate-gy,outperforming other existing algorithms in terms of cumulative returns,Sharpe ratio,and maximum drawdown.Furthermore,the proposed strategy is not only applicable to portfolios composed of stocks with different market capitalizations,but also maintains robust performance in rising,falling,and volatile market conditions,fully illustrating its effectiveness and practi-cality in portfolio management.
deep reinforcement learningportfolio managementloss aversion theoryMSLA reward function