At present,the main problems faced by real road disease image recognition algorithms based on deep learn-ing include serious imbalance in categories caused by different proportions of complex road background and foreground of diseases,and small disease scales.What's more,the inconspicuous contrast between pavement diseases and the geo-metric structure characteristics of roads leads to their difficulty in recognition.To address the above issues,we propose a semantic prior two-branch network to guide Transformer's backbone feature network in mining the complex relation-ship between background and foreground of pavement disease.It uses high-efficiency self-attention mechanism and cross-covariance image transformers(XCiT)to extract semantic features from two-dimensional space and feature chan-nels,respectively,and a semantic locally-enhanced feed-forward(SLeff)module to improve the ability of local feature aggregation.We also propose a new sparse subject sampling point stream module,which is combined with the tradition-al FPN structure to further alleviate the category imbalance problem of pavement diseases.Finally,we constructed the road disease segmentation dataset based on real scene and compared it with multiple baseline models on this dataset and public dataset.The experimental results demonstrated effectiveness of this model.