Metro Platform Foreign Object Detection Based on Dual-channel Transformer
Recently,Transformers have achieved more competitive results than Convolutional Neural Network(CNN)in foreign object detection owing to their global self-attention advantages.However,they still face problems such as high computing costs,a fixed scale of input image patches,and less interaction between local and global information.To address the aforementioned challenges,a DualFormer model that incorporates a dual-channel Transformer backbone,pyramid lightweight Transformer blocks,and a channel cross-attention mechanism is proposed.The model aims to detect foreign objects in the gap between metro platform screen and train doors.A dual-channel strategy is proposed to address the fixed input image patch size issue by designing two feature extraction channels to extract features from input image patches of various scales,thus improving the ability of the network to extract both coarse-grained and fine-grained features and enhancing the recognition accuracy of multiscale targets.To address the issue of high computational cost,a pyramid lightweight Transformer block is proposed,which introduces cascaded convolution into the Multi-Head Self-Attention(MHSA)module and leverages the dimensionality compression capability of the convolution to decrease the computational cost of the model.Regarding the issue of limited interaction between local and global information,a channel cross-attention mechanism is proposed,which allows coarse-grained and fine-grained features to interact at the channel level and optimizes the weight allocation of local and global information in the network.The results demonstrate that DualFormer has a mean average precision of 89.7%on the standardized metro anomaly detection dataset with a detection speed of 24 frame/s and 1.98× 107 model parameters,which is superior to those of existing Transformer detection algorithms.