首页|海量数据的分布式主成分分析算法及其在共同富裕测度中的应用

海量数据的分布式主成分分析算法及其在共同富裕测度中的应用

扫码查看
基于两轮型方法的分布式PCA算法(TR-DPCA),每台局部机器计算出和向量,并将它们传输到中央机器计算全样本数据的均值向量,再将它们传输给每台局部机器;然后,每台局部机器计算出散度矩阵,并将它们传输到中央机器计算全样本数据的协方差矩阵;最后根据协方差矩阵进行特征分解获得特征向量.通过数值模拟发现,TR-DPCA算法的表现与全样本PCA一致,且优于基于单轮型方法的分布式PCA算法.此外,将TR-DPCA算法应用到中国共同富裕测度中发现,中国的共同富裕水平呈现上升趋势,且个体差距在不断缩小.
Distributed Principal Component Analysis for Massive Data and Its Application in Measurement of Common Prosperity
Based on the two-wheeled method of distributed PCA algorithm(TR-DPCA),each local machine calculates the sum vectors which transmits them to the central machine to calculate the mean vector of the whole sample data for each local machine.And then,each local machine calculates the divergence matrix and transmits them to the central machine to calculate the covariance matrix of the full sample data.Finally,the feature vectors are obtained by feature decomposition according to the covariance matrix.Through numerical simulation,it is found that the performance of the TR-DPCA algorithm is consistent with that of the full-sample PCA,and is better than that of the distributed PCA algorithm based on the single-wheel method.In addition,the application of TR-DPCA algorithm to the measurement of China's common prosperity shows that the level of China's common prosperity is on the rise,and the individual gap is narrowing.

principal component analysismassive datadistributedtwo-round methodcommon prosperity

薛伟、吴文彬

展开 >

山东工商学院 统计学院,山东 烟台 264005

安徽大学 纽约石溪学院,合肥 230039

主成分分析 海量数据 分布式 两轮型方法 共同富裕测度

国家社会科学基金项目

22BTJ038

2024

山东工商学院学报
山东工商学院

山东工商学院学报

CHSSCD
影响因子:0.304
ISSN:1672-5956
年,卷(期):2024.38(5)