To address the issue of low retrieval accuracy caused by the inability of traditional unsupervised cross modal retrieval algorithms to fully extract the correlation semantics between samples,an unsupervised cross modal hash retrieval algorithm CAFM_Net based on CLIP and attention fusion mechanism was proposed.The multimodal pre-training model CLIP was applied to the sample feature extraction stage,mining similar information from different dimensions of the data.The attention fusion mechanism was used to process extracted features and enhance the weight of salient regions.The idea of adversarial learning was introduced to design a modal classifier,generating cross modal data hash encoding that tended towards semantic consistency.Compared with existing representative hash methods,CAFM_Net improves the accuracy by at least 11%and 9%on multimodal retrieval tasks.