The use of semantic segmentation technology to extract high-resolution remote sensing image object segmentation has important application prospects.With the rapid development of multi-sensor technology,the good complementary advantages between multimodal remote sensing images have received widespread attention,and joint analysis of them has become a research hotspot.This article analyzes both optical remote sensing images and elevation data,and proposes a multi-task collaborative model based on multimodal remote sensing data(United Refined PSPNet,UR-PSPNet)to address the issue of insufficient fusion classification accuracy of the two types of data due to insufficient fully registered elevation data in real scenarios.This model extracts deep features of optical images,predicts semantic labels and elevation values,and embeds elevation data as supervised information,to improve the accuracy of target segmentation.This article designs a comparative experiment based on ISPRS,which proves that this algorithm can better fuse multimodal data features and improve the accuracy of object segmentation in optical remote sensing images.