海量数据的分布式主成分分析算法及其在共同富裕测度中的应用

Distributed Principal Component Analysis for Massive Data and Its Application in Measurement of Common Prosperity

薛伟 ¹吴文彬²

扫码查看

作者信息

1. 山东工商学院统计学院,山东烟台 264005
2. 安徽大学纽约石溪学院,合肥 230039
折叠

摘要

基于两轮型方法的分布式PCA算法(TR-DPCA),每台局部机器计算出和向量,并将它们传输到中央机器计算全样本数据的均值向量,再将它们传输给每台局部机器;然后,每台局部机器计算出散度矩阵,并将它们传输到中央机器计算全样本数据的协方差矩阵;最后根据协方差矩阵进行特征分解获得特征向量.通过数值模拟发现,TR-DPCA算法的表现与全样本PCA一致,且优于基于单轮型方法的分布式PCA算法.此外,将TR-DPCA算法应用到中国共同富裕测度中发现,中国的共同富裕水平呈现上升趋势,且个体差距在不断缩小.

Abstract

Based on the two-wheeled method of distributed PCA algorithm(TR-DPCA),each local machine calculates the sum vectors which transmits them to the central machine to calculate the mean vector of the whole sample data for each local machine.And then,each local machine calculates the divergence matrix and transmits them to the central machine to calculate the covariance matrix of the full sample data.Finally,the feature vectors are obtained by feature decomposition according to the covariance matrix.Through numerical simulation,it is found that the performance of the TR-DPCA algorithm is consistent with that of the full-sample PCA,and is better than that of the distributed PCA algorithm based on the single-wheel method.In addition,the application of TR-DPCA algorithm to the measurement of China's common prosperity shows that the level of China's common prosperity is on the rise,and the individual gap is narrowing.

关键词

主成分分析/海量数据/分布式/两轮型方法/共同富裕测度

Key words

principal component analysis/massive data/distributed/two-round method/common prosperity

引用本文复制引用

基金项目

国家社会科学基金项目(22BTJ038)

出版年

2024

山东工商学院学报

山东工商学院

山东工商学院学报

CHSSCD

影响因子：0.304

ISSN：1672-5956

段落导航