Bi-aggregation and self-merging network for few-shot image semantic segmentation
Few-shot image semantic segmentation is a very challenging task that attempts to segment objects of new classes using only a few labeled samples.The mainstream methods often have problems of low discriminative feature and prototype deviation.To alleviate these problems,a new few-shot image semantic segmentation method based on a bi-aggregation and self-merging network is proposed,which can fully mine the similarity of features and reduce prototype bias.Firstly,we propose a feature-mask bi-aggregation module to provide global semantic information for the feature aggregation and mask aggregation by constructing a dense similarity relation between the support features and the query features covering all spatial locations.Specifically,an enhanced feature and an initial mask with guiding information can be obtained for the query image by performing feature and mask bi-aggregation on the similarity matrices.Then,a self-merging decoder is proposed,which reduces the prototype bias by adding the initial mask-based self-prototype with the known support prototypes,and conveys rich category semantic information to the decoder by fusing the merged prototype with the enhancement feature.Finally,the prediction results obtained by the decoder are further optimized by the prediction results of the base classes.The mIoU values of our method on the dataset PASCAL-5i achieve 68.3%and 71.5%in the 1-shot and 5-shot cases,respectively,and on the dataset COCO-20i achieve 46.5%and 51.4%in the 1-shot and 5-shot cases,respectively,which is superior to the segmentation performance of the mainstream methods,and can segment the target region of the new class more accurately.
few-shot semantic segmentationsimilarity of featuresbi-aggregationintra-class diversityself-merging