Application and Prospects of Polygenic Risk Score(PRS)in Genetic Disease Research:a Review of Data Analysis Methods
Lower-cost genotyping technology has promoted the generation of large genetic datasets with the evolving next-generation sequencing technology.The emergence of genome-wide association studies(GWAS)has facilitated researchers'understanding of common complex diseases.GWAS refers to finding the sequence variations present in the human genome and screening out disease-related single nucleotide polymorphisms(SNPs).These SNPs are considered as the basis for assessing the stability of complex diseases.However,a single variation is not sufficient to assess an individual's risk of disease.Polygenic risk score(PRS)is an emerging genetic data analysis method for quantitatively estimating an individual's genetic risk for complex diseases by comprehensively considering multiple genetic variation sites.A single-value estimate of an individual's genetic risk for a certain phenotype can be calculated as the cumulative impact of multiple genetic variants by building a PRS model.The finally expected risk score is weighted by the strength and direction of association of each SNP with the phenotype based on the number of alleles carried by each SNP.With the continuous development of various PRS calculation methods and the constant accumulation of genomic data,PRS has received widespread attention in the field of genetics.So far,quite a few studies at home and abroad have shown that PRS is valuable in risk prediction of different types of human traits or complex diseases,and its effectiveness has been further verified in clinical applications.At present,many studies have established PRS models based on GWAS summary statistics to quantify the genetic risk of susceptibility loci and clinical characteristics on diseases such as lung cancer,breast cancer,coronary heart disease,diabetes and Alzheimer's disease.The disease-susceptible populations can be recognized through comparing the relative risk and absolute risk of the disease in different risk groups according to the population risk stratification results.Additionally,individual-level genotype data and omics data can also be used as data sources for PRS analysis research,especially the latter can dynamically reflect the short-term or long-term effects of environmental factors on human gene expression,and has potential application value in building early warning models to assess health risks.Since the calculation of PRS involves a large amount of genomic data analysis,there are big differences in the methods for data selection,model building and validation.Different PRS construction methods and software have different performances in disease risk prediction,and even the performance of same algorithm varies across diseases.It is worth noting that the PRS model often needs to be re-evaluated and verified for different groups of.people,because PRS is affected by race and region.This review combines currently published PRS-related research and algorithms to describe the basic principles of PRS,compares their construction and verification methods,and discusses their applications and prospects.As a powerful genetic risk assessment tool,PRS has great potential in analyzing the genetic code of complex diseases and achieving precise diagnosis and personalized treatment.
polygenic genetic risk scoregenome-wide association studygeneticscomplex disease