Addressing the issues of high costs and limited accuracy associated with the high dependence on external devices such as Bluetooth in existing passive indoor positioning methods,an in-station positioning framework combining improved image retrieval and bag-of-words tree-based motion recovery structure algorithm is proposed.Firstly,based on the statistical distributions of local image features in different areas within the station,image sets with spatial relevance are clustered to form an image bag-of-words tree.Secondly,the tree structure is refined through pruning and merging calculations.Branch image sets corresponding to different areas within the station are selected to improve the hierarchical motion recovery structure algorithm in order to achieve three-dimensional reconstruction in station,generating the three-dimensional point cloud.Finally,by inputting passenger perspective images,the DenseNet network image retrieval algorithm based on local sensitive hash coding is employed to obtain visually similar images.The spatial mapping relationship between these images and the three-dimensional point cloud is then calculated,and the passenger's position coordinates are output.A case study conducted using images from Hengshui North Station demonstrates that the proposed in-station positioning framework exhibits stronger three-dimensional reconstruction capabilities,low error,and high efficiency.The error margin with actual coordinates is maintained below 1%,and retrieval efficiency is increased by 3.21%to 5.61%.This in-station positioning framework provides effective architectural guidance and technical support for enhancing passenger travel service quality and transport efficiency in high-speed railway stations.