In complex traffic environments,autonomous vehicles must thoroughly analyze the motion direction,speed,and other information of surrounding traffic objects to accurately predict future trajectories.A network model based on spatio-temporal Transformer was proposed to address this issue.The framework initially employs a spatial self-attention mechanism to capture the spatial interactions between vehicles at the same moment,achieving precise modeling of the spatial relationship interactivity among multiple vehicles.Subsequently,a temporal self-attention mechanism was uti-lized to extract the temporal dependencies between consecutive frames,thereby generating a set of spatiotemporal fea-tures that reflect the dynamic behavior of vehicles.These features were then fed into a decoder to predict the motion tra-jectories of vehicles over the next 5 s.The proposed model was trained and validated on the publicly available NGSIM dataset.Compared to other state-of-the-art schemes,our scheme demonstrates greater accuracy and precision in trajec-tory prediction over the subsequent 5 s.The long-term forecasting accuracy is increased by 14.6%compared to the ad-vanced schemes.