统计研究2024,Vol.41Issue(9) :150-160.DOI:10.19343/j.cnki.11-1302/c.2024.09.011

基于混合成对惩罚的多个数据集效应异质性分析

Effect Heterogeneity Analysis of Multiple Datasets Based on a Hybrid Pairwise Penalty Method

孙怡帆 姚一枝 于雪
统计研究2024,Vol.41Issue(9) :150-160.DOI:10.19343/j.cnki.11-1302/c.2024.09.011

基于混合成对惩罚的多个数据集效应异质性分析

Effect Heterogeneity Analysis of Multiple Datasets Based on a Hybrid Pairwise Penalty Method

孙怡帆 1姚一枝 2于雪2
扫码查看

作者信息

  • 1. 中国人民大学应用统计科学研究中心、统计学院、未来区块链与隐私计算高精尖创新中心
  • 2. 中国人民大学应用统计科学研究中心、统计学院
  • 折叠

摘要

大数据通常是由主体或来源各异的多个数据集融合而成,因此同一个自变量对因变量的影响在不同数据集间可能存在差异,即效应异质性.从数据中挖掘出潜在的效应异质性已成为大数据分析的重要目标之一.基于融合惩罚和成对惩罚的整合分析方法是目前较为主流的两类效应异质性分析方法,但前者高度依赖模型系数的排序,而后者则计算量较大.为此,本文提出基于混合成对惩罚的新型整合分析方法.相比基于融合惩罚的整合分析方法,新方法对模型系数排序的敏感度大大降低.相比基于成对惩罚的整合分析方法,新方法减少了大量的冗余惩罚项,在降低计算量的同时提高了结果准确性.大量的模拟实验和黑色素瘤的致病基因识别应用研究均展示了新方法在识别效应异质性方面的优势.

Abstract

Big data are usually combined by multiple datasets composed of different subjects or from different sources,which may lead to differences in the impact of the same independent variable on dependent variables between different datasets,namely,effect heterogeneity.Mining the potential effect heterogeneity from data has become one of the important goals of big data analysis.The integrative analysis methods based on fusion penalty and pairwise penalty are the two mainstream methods at present,but the fusion penalty is highly dependent on the ordering of coefficients,and the pairwise penalty incurs high computational cost.To this end,this paper proposes a new integrative analysis method based on a hybrid pairwise penalty.Compared with the fusion penalty-based method,the sensitivity of the new method to the coefficient ordering is greatly reduced.Compared with the pairwise penalty-based method,the new method reduces a large number of redundant penalty terms so that it can reduce computation cost and improve the accuracy of the results.We conduct extensive simulation studies and provide an application example in identification of pathogenicity genes in melanoma to demonstrate the advantage of the new method in identifying the effect heterogeneity over other methods.

关键词

大数据/效应异质性/混合成对惩罚/整合分析

Key words

Big Data/Effect Heterogeneity/Hybrid Pairwise Penalty/Integrative Analysis

引用本文复制引用

基金项目

中国人民大学科学研究基金(中央高校基本科研业务费专项资金资助)项目(23XNL014)

出版年

2024
统计研究
中国统计学会,国家统计局统计科学研究所

统计研究

CSTPCDCSSCICHSSCD北大核心
影响因子:2.019
ISSN:1002-4565
段落导航相关论文