Semantic description of remote sensing images is a cross-modal task to explain or annotate the types,states and features of ground objects and scenes in remote sensing images.It deepens the in-terpretation and understanding of remote sensing images,and becomes a research hotspot in the field of remote sensing.Firstly,from the perspectives of different technologies used in the current research situation,the semantic description work of remote sensing images under the pixel-based and target-based methods is mainly reviewed.Secondly,these two methods are further subdivided into CNN-RNN method and CNN-Transformer method according to different decoders.Although the research on re-mote sensing image description has made remarkable progress,many problems still need to be over-come in the face of complex background interference,variable scale,fuzzy target and similarity be-tween classes.In the future,the research of remote sensing image semantic description should focus on the use of image visual information,feature enhancement and integration of large-scale models to im-prove the robustness and accuracy of models.