Balanced Clustering-based Index for Approximate Nearest Neighbor Retrieval
In the era of big data,deep learning has been widely applied in recommendation systems,user profiling,and data management by representing complex objects as high-dimensional feature vectors and evaluating their similarities based on vector distance measurements.However,with the continuous growth of data scale,the retrieval of similar feature vectors from massive data faces significant challenges such as large memory consumption of retrieval models and low recall rates of feature retrieval algorithms.It is crucial to design compact index graph structures and reduce memory consumption in feature retrieval to improve the efficiency of nearest neighbor search in large-scale data systems while ensuring retrieval accuracy.Therefore,a balanced-aware distributed K-means clustering-based user feature binning approach and a compact index design algorithm for graph structures are proposed.Firstly,fast balanced-aware K-means clustering algorithm is designed to achieve balanced binning of massive feature data during graph index construction,compressing high-dimensional vectors into lightweight and compact graph index structures.Subsequently,quantization operation is conducted to further compress high-dimensional vectors sample and improve its nearest neighbor search speed in dataset.Experimental results on benchmark datasets demonstrate that the proposed method can effectively accelerate index construction speed while ensuring high accuracy,thus enabling efficient indexing and retrieval of massive data.
big data retrieval and analysisnearest neighbor searchbalanced perception