Considering the issues of limited receptive field and insufficient feature interaction in vision-language tracking framework combineing Bi-level routing Perception and Scattering Visual Trans-formation(BPSVTrack)is proposed in this paper.Initially,a Bi-level Routing Perception Module(BRPM)is designed which combines Efficient Additive Attention(EAA)and Dual Dynamic Adaptive Module(DDAM)in parallel to enable bidirectional interaction for expanding the receptive field.Consequently,enhancing the model's ability to integrate features between different windows and sizes efficiently,thereby improving the model's ability to perceive objects in complex scenes.Secondly,the Scattering Vision Transform Module(SVTM)based on Dual-Tree Complex Wavelet Transform(DTCWT)is introduced to decompose the image into low frequency and high frequency information,aiming to capture the target structure and fine-grained details in the image,thus improving the robustness and accuracy of the model in complex environments.The proposed framework achieves accuracies of 86.1%,64.4%,and 63.2%on OTB99,LaSOT and TNL2K tracking datasets respectively.Moreover,it attains an accuracy of 70.21%on the RefCOCOg dataset,the performance in tracking and locating surpasses that of the baseline model.