A population diversity-based robust policy generation method in adversarial game environments
In adversarial game environments,the objective agent aims to generate robust game poli-cies,ensuring high returns when facing different opponent policies consistently.Existing self-play-based policy generation methods often overfit to learning against a specific opponent policy,resulting in low robustness and vulnerability to attacks from other opponent policies.Additionally,existing methods that combine deep rein-forcement learning and game theory to iteratively generate opponent policies have low convergence efficiency in complex adversarial scenarios with large decision spaces.To address these challenges,a population diversity-based robust policy generation method is proposed.In this method,both adversaries maintain a policy population pool,ensuring diversity within the population to generate a robust target policy.To ensure population diversity,policy diversity is measured from two perspectives:behavioral and quality diversity.Behavioral diversity refers to the differences in state-action trajectories of different policies,while quality diversity refers to the differences in the returns obtained when facing the same opponent.Finally,the robustness of the policies generated based on population diversity is val-idated in typical adversarial environments with continuous stateaction spaces.