Mask encoding: A general instance mask representation for object segmentation

扫码查看

原文链接

NSTL
Elsevier

外文摘要：Instance segmentation is one of the most challenging tasks in computer vision, which requires separating each instance in pixels. To date, a low-resolution binary mask is the dominant paradigm for representation of instance mask. For example, the size of the predicted mask in Mask R-CNN is usually 28 x 28 . Generally, a low-resolution mask can not capture the object details well, while a high-resolution mask dramatically increases the training complexity. In this work, we propose a flexible and effective approach to encode the high-resolution structured mask to the compact representation which shares the advantages of high-quality and low-complexity. The proposed mask representation can be easily integrated into two-stage pipelines such as Mask R-CNN, improving mask AP by 0.9% on the COCO dataset, 1.4% on the LVIS dataset, and 2.1% on the Cityscapes dataset. Moreover, a novel single shot instance segmentation framework can be constructed by extending the existing one-stage detector with a mask branch for this instance representation. Our model shows its superiority over the explicit contour-based pipelines in accuracy with similar computational complexity. We also evaluate our method for video instance segmentation, achieving promising results on YouTube-VIS dataset. Code is available at: https://git.io/AdelaiDet (c) 2021 Elsevier Ltd. All rights reserved.

外文关键词：

Mask encodingInstance segmentationVideo instance segmentation

作者：

Zhang, Rufeng、Kong, Tao、Wang, Xinlong、You, Mingyu

展开 >

作者单位：

Tongji Univ

ByteDance AI Lab

Univ Adelaide

出版年：

2022

DOI：

10.1016/j.patcog.2021.108505

Pattern Recognition

EISCI

ISSN：0031-3203

年,卷(期)：2022.124

被引量2
参考文献量37