Cross-Modal Retrieval with Improved Graph Convolution
Aiming at the problem that existing image text cross-modal retrieval is difficult to fully exploit the local consis-tency in the mode in the common subspace,a cross-modal retrieval method based on improved graph convolution is pro-posed.In order to improve the local consistency within each mode,the modal diagram is constructed with a single sample as a node,fully mining the interactive information between features.In order to solve the problem that graph convolution network can only do shallow learning,the method of adding initial residual link and weight identity map in each layer of graph convolution is adopted to alleviate this phenomenon.In order to jointly update the central node features through higher-order and lower-order neighbor information,an improvement is proposed to reduce neighbor nodes and increase the number of layers in graph convolution network.In order to learn highly locally consistent and semantically consistent public representation,it shares the weights of common representation learning layer,and jointly optimizes the semantic constraints within the modes and the modal invariant constraints between modes in the common subspace.The experimen-tal results show that on the two cross-modal data sets of Wikipedia and Pascal sentence,the average mAP values of differ-ent retrieval tasks are 2.2%~42.1%and 3.0%~54.0%higher than the 11 existing methods.