TradeBot: Bandit learning for hyper-parameters optimization of high frequency trading strategy

扫码查看

原文链接

NSTL
Elsevier

外文摘要：Quantitative trading takes advantage of mathematical functions for automatically making stock or futures trading decisions. Specifically, various trading strategies that proposed by human-experts are associated with weight hyper-parameters to determine the probability of selecting a specific strategy according to market conditions. Prior work manually adjusting the weight hyper-parameters is error-prone, because the essential advantage of quantitative trading, i.e., automation, is lost. In this paper, we propose a dynamic parameter tuning algorithm, i.e., TradeBot, based on bandit learning for quantitative trading. We consider sequentially selecting hyper-parameters of rules for trading as a bandit game, where a set of hyper-parameters of trading rule is considered as an action. A novel reward-agnostic Upper Confidence Bound bandit method is proposed to solve the automatically trading problem with a reward function estimated by inverse reinforcement learning. Experimental results on China Commodity Futures Market Data show state-of-the-art performance. To our best knowledge, this is one of the first work deployed in the online trading system via reinforcement learning, in published literature. (c) 2021 Published by Elsevier Ltd.

外文关键词：

High-Frequency tradingHyper-parameter optimizationMulti-armed bandit learningInverse reinforcement learningEVOLUTIONARY ALGORITHMSFEATURESSYSTEM

作者：

Zhang, Weipeng、Wang, Lu、Xie, Liang、Feng, Ke、Liu, Xiang

展开 >

作者单位：

Shanghai Jiao Tong Univ

East China Normal Univ

Tencent Co Ltd

Dongguan Univ Technol

展开 >

出版年：

2022

DOI：

10.1016/j.patcog.2021.108490

Pattern Recognition

EISCI

ISSN：0031-3203

年,卷(期)：2022.124

被引量3
参考文献量82