Existing open vocabulary target detection algorithms tend to discard multi-scale information when dealing with image-text correspondence,resulting in lower accuracy in small target detection.To address this issue,a C-Baron algorithm was proposed by combining the channel attention mechanism with feature pyramid networks to construct the C-FPN module.In the region selection stage,C-Baron adopted a region packing alignment method to handle the image-text correspondence.The experimental results show that compared with the baseline model,C-Baron achieves an improved recognition accura-cy of 2%for new categories and 6.3%for base categories.
open vocabulary target detectionmulti-scale informationmulti-modal processingimage-text alignmentC-FPN module