In the context of big data,it is necessary to improve the traditional sampling survey technology to cope with the reality of data structure changes.Leverage importance sampling with leverage score as the sampling probability can increase the probability of sample points with high leverage value being selected,but it also increases the risk of outliers being selected into the sampling subset,which makes the sampling estimation deviate from the true value.In order to reduce the influence of outliers and improve the robustness of sampling subset estimation of big data,this paper proposes a two-stage Leverage importance sampling method based on threshold self-selection.In the first stage,the method identifies robust subsets by ordered clustering of sample distances,which makes the samples used for two-stage sampling more representative.In the second stage,robust sampling estimation is obtained on the basis of robust subsets.The simulation results show that the method proposed in this paper can improve the accuracy of linear regression coefficient estimation,and is applicable to drift,fluctuation and mixed outliers.In the empirical analysis,the method has a small mean square error of the predicted value in the data of three cases,effectively reducing the influence of outliers.