Online opponent exploitation method based on particle swarm optimization for Texas Hold'em
In Texas Hold'em,opponent exploitation is the more effective method to obtain larger income from opponents with weakness in contrast to the Nash equilibrium searching method.However,how to effectively exploit the brand new opponent under the condition of online competitions is still a challenge.The existing methods usually use offline training and online adaptation to avoid this problem,that is,using like learning,evolution methods to obtain a model with opponent adaptability through massive offline training,so that it can adapt to different opponents in competitions,instead of actively optimizing its own policy for a new opponent in the online competition.For the purpose of online active policy optimizing to achieve effective opponent exploitation,a policy optimization method based on particle swarm optimization(PSO)is proposed to maximize the competition income,which introduces the idea of online optimization into Texas Hold'em regarded as an game problem with strong randomness.Aiming to the problems that fitness computation is affected by random luck and targeted policies for some opponents are hard to optimize with the standard PSO,a modified PSO method called BR-PSO(best replacement-PSO)is proposed based on local optimal solution replacement and global optimal solution replacement.The result of experiments indicates the proposed method can find targeted policies to maximize opponent exploitation of the opponents that are hard to counter with the standard PSO,and the income of the optimized policy is comparable to that of Al based on the hand prediction method.