A lightweight dual-branch network incorporating equivariant cross-regularization for building change detection
Recently, deep learning has witnessed rapid advancements in building change detection based on remote sensing images. Much of this progress can be attributed to the simultaneous capture of spatial details and contextual semantics within single-branch architectures, thereby generating fine-grained and high-level semantic changed feature maps. Nevertheless, capturing spatial details necessitates convolutional layers with wide channels, while understanding contextual semantics requires a network with sufficient depth. Once simultaneously meeting both requirements in single-branch network architectures inevitably faces challenges in terms of computation costs and model sizes.To address these challenges, this study proposes a lightweight dual-branch network architecture for efficient feature extraction and introduces an equivariant cross-regularization module for enhanced feature expression, aimed at achieving effective building change detection. Specifically, the dual-branch network consists of a detail branch, a semantics branch, and a detail-semantics aggregation module. The detail branch adopts three simple convolution layers with wide channels and small receptive fields to maintain high-resolution spatial feature maps of changed buildings. Concurrently, the semantics branch employs a fast down-sampling strategy based on a stem block, six gather-and-expansion blocks, and a context embedding block to efficiently capture high-level semantic feature maps of changed buildings. The detail-semantics aggregation module serves as a bridge, mitigating the gaps in spatial resolution and semantic level gaps between the two types of feature maps to generate fine-grained and high-level semantic changed features. Additionally, the equivariant cross-regularization module constrains the changed feature at both the semantic and spatial levels without inflating network parameters while enhancing the model's sensitivity to the scale and boundaries of changed buildings. To evaluate the effectiveness of the proposed method, we compare it with numerous state-of-the-art lightweight and non-lightweight change detection networks using the WHU and LEVIR datasets. The results demonstrate that with just 2.27 M parameters and 4.25 G floating-point operations, our approach attains intersection-over-union accuracies of 87.03% and 83.41% on two datasets, respectively, surpassing both lightweight and non-lightweight change detection networks in comprehensive performance metrics. Furthermore, ablation experiments on the LEVIR dataset are conducted to analyze the effectiveness of our proposed dual-branch and modules. The results demonstrate that the dual-branch network incorporating the detail-semantics aggregation module effectively integrates the advantages of the detail branch and the semantics branch to generate fine-grained and high-level semantic changed features. On the other hand, integrating the equivariant cross-regularization module into the dual-branch network effectively enhances the network's capacity for identifying the scale and boundaries of changed buildings effectively.