首页|TradeBot: Bandit learning for hyper-parameters optimization of high frequency trading strategy
TradeBot: Bandit learning for hyper-parameters optimization of high frequency trading strategy
扫码查看
点击上方二维码区域,可以放大扫码查看
原文链接
NSTL
Elsevier
Quantitative trading takes advantage of mathematical functions for automatically making stock or futures trading decisions. Specifically, various trading strategies that proposed by human-experts are associated with weight hyper-parameters to determine the probability of selecting a specific strategy according to market conditions. Prior work manually adjusting the weight hyper-parameters is error-prone, because the essential advantage of quantitative trading, i.e., automation, is lost. In this paper, we propose a dynamic parameter tuning algorithm, i.e., TradeBot, based on bandit learning for quantitative trading. We consider sequentially selecting hyper-parameters of rules for trading as a bandit game, where a set of hyper-parameters of trading rule is considered as an action. A novel reward-agnostic Upper Confidence Bound bandit method is proposed to solve the automatically trading problem with a reward function estimated by inverse reinforcement learning. Experimental results on China Commodity Futures Market Data show state-of-the-art performance. To our best knowledge, this is one of the first work deployed in the online trading system via reinforcement learning, in published literature. (c) 2021 Published by Elsevier Ltd.