首页|基于预训练模型的单帧航拍图像无监督语义分割

基于预训练模型的单帧航拍图像无监督语义分割

扫码查看
针对航拍图像语义分割成本高、通用性差和精度低等问题,提出了一种两阶段无监督语义分割网络(two-stage unsupervised semantic segmentation net,TUSSNet),针对单帧航拍图像训练进而生成最终的语义分割结果。算法分为2个阶段。首先,使用对比语言-图像预训练(contrastive language-image pretraining,CLIP)模型生成航拍图像的粗粒度语义标签,然后进行网络的预热训练。其次,在第一阶段的基础上,采用分割一切模型(segment anything model,SAM)对航拍图像进行细粒度类别预测,生成精细化类别掩码伪标签;然后迭代优化网络,得到最终语义分割结果。实验结果显示,相较于现有无监督语义分割方法,算法显著提高了航拍图像的分割精度,同时提供了准确的语义信息。
Unsupervised semantic segmentation of single-frame aerial images based on pretrained models
To address the challenges of high cost,limited generalizability,and low accuracy in semantic segmentation of aerial images,a two-stage unsupervised semantic segmentation net(TUSSNet)was proposed to train single-frame aerial images and generate the final semantic segmentation outcomes.The algorithm was divided into two stages.Firstly,the contrastive language-image pretraining(CLIP)model was applied to generate coarse-grained semantic labels for aerial images,followed by network warm-up training.Secondly,on the basis of the first phase,the segment anything model(SAM)was leveraged to predict the fine-grained categories of aerial images and generate refined category mask pseudo-labels.Then,the network was iteratively optimized to achieve the ultimate semantic segmentation outcomes.Experimental results demonstrate a significant enhancement in segmentation accuracy compared with existing unsupervised methods for aerial images.Moreover,the algorithm offers precise semantic information.

pretrained modelaerial imagesemantic segmentationunsupervised algorithmclustering performance estimationdeep learning

任月冬、游新冬、滕尚志、吕学强

展开 >

北京信息科技大学网络文化与数字传播北京市重点实验室,北京 100101

预训练模型 航拍图像 语义分割 无监督算法 聚类效果估计 深度学习

国家自然科学基金国家自然科学基金北京市自然科学基金北京市教委科研项目科技一般项目

62202061621710434232025KM202311232002

2024

北京信息科技大学学报(自然科学版)
北京信息科技大学

北京信息科技大学学报(自然科学版)

影响因子:0.363
ISSN:1674-6864
年,卷(期):2024.39(2)
  • 20