查看更多>>摘要:We describe how many dimension reduction strategies are connected conceptually and philosophically, paving the way for a unified approach to multivariate dimension reduction in statistics. Specific methods covered include envelopes, sufficient dimension reduction methods like SIR and SAVE, principal components, principal fitted components, and partial least squares. (C) 2021 Elsevier Inc. All rights reserved.
查看更多>>摘要:In this paper, we give the most current account of methods for comparison of populations or treatment groups with high-dimensional data. We conveniently group the methods into three categories based on the hypothesis of interest and the model assumptions they make. We offer some perspectives on the connections and distinctions among the tests and discuss the ramifications of the model assumptions for practical applications. Among other things, we discuss the interpretation of the hypotheses and results of the appropriate tests and how this distinguishes the methods in terms of what data type they are suitable for. Further, we provide a discussion of computational complexity and a list of available R-packages implementations and their limitations. Finally, we illustrate the numerical performances of the various tests in a simulation study. (C) 2021 Elsevier Inc. All rights reserved.
查看更多>>摘要:We present in three parts different concepts of correlation and statistical association, with some historical notes, starting with Galton's notion of correlation, subsequently improved by Pearson. Continuing in this first part, we discuss the correlation ratio, the intraclass correlation, multiple correlation, and redundancy analysis. Throughout we use the classic data set of Galton on the heights of parents and their children. In the second part we explain how these same data can be studied from a multivariate viewpoint, using canonical correlation analysis, Procrustes correlation and simple/multiple correspondence analysis. For correspondence analysis, we use the same data as categorized by Galton into intervals of heights for the parents and their children. In this part we also make an incursion into the continuous form of correspondence analysis. The third part is dedicated to bivariate distributions, where we give the main results of bivariate distributions with given marginals, commenting on the correlations of Spearman and Kendall. Seeing that a bivariate distribution can be generated using a copula, we fit Galton's data to two copulas: the Gaussian copula and the copula which has the best fit. (C) 2021 Published by Elsevier Inc.
查看更多>>摘要:Consider a measure mu on R-n generating a natural exponential family F(mu) with variance function V-F(mu)(m) and Laplace transform exp(l(mu)(s)) = integral(Rn) exp(-< x, s >mu(dx)). A dual measure mu(*) satisfies -l(mu*)'(-l(mu)'(s)) = s. Such a dual measure does not always exist. One important property is l(mu*)"(m) = (V-F(mu)(m))(-1), leading to the notion of duality among exponential families (or rather among the extended notion of T exponential families TF obtained by considering all translations of a given exponential family F). (C) 2021 Elsevier Inc. All rights reserved.
查看更多>>摘要:Since the introduction of Dyson's Brownian motion in early 1960s, there have been a lot of developments in the investigation of stochastic processes on the space of Hermitian matrices. Their properties, especially, the properties of their eigenvalues have been studied in great detail. In particular, the limiting behaviours of the eigenvalues are found when the dimension of the matrix space tends to infinity, which connects with random matrix theory. This survey reviews a selection of results on the eigenvalues of stochastic processes from the literature of the past three decades. For most recent variations of such processes, such as matrix-valued processes driven by fractional Brownian motion or Brownian sheet, the eigenvalues of them are also discussed in this survey. In the end, some open problems in the area are also proposed. (C) 2021 Elsevier Inc. All rights reserved.
查看更多>>摘要:Discrete approximations of statistical continuous distributions have been widely requested in various fields. Using random samples generated by Monte Carlo (MC) method to infer the population has been dominant in statistics. The empirical distribution of a random sample can be regarded as a discrete approximation of the population distribution in a certain statistical sense. However, MC has a poor performance in many problems. This paper concerns some alternative methods, such as Quasi-Monte Carlo (QMC) F-numbers and Mean Square Error Representative Points (MSE-RPs), and constructs approximation distributions for elliptically contoured distributions and skew normal distributions. Numerical comparisons are given for two geometric probability problems and for estimation accuracy by resampling from the discrete approximation distributions obtained by MC, QMC and MSE-RPs. Our simulation results indicate that QMC and MSE-RPs have better performance in most comparisons. These results show that QMC and MSE-RPs have high potential in statistical inference. In addition, we also discuss the relationship between principal component analysis and MSE-RPs for elliptically contoured distributions, as well as its potential applications. (C) 2021 Elsevier Inc. All rights reserved.
查看更多>>摘要:Since its introduction in the early 90s, the Sliced Inverse Regression (SIR) methodology has evolved adapting to increasingly complex data sets in contexts combining linear dimension reduction with non linear regression. The assumption of dependence of the response variable with respect to only a few linear combinations of the covariates makes it appealing for many computational and real data application aspects. This work proposes an overview of the most active research directions in SIR modeling from multivariate regression models to regularization and variable selection. (C) 2021 Elsevier Inc. All rights reserved.
查看更多>>摘要:This work gives an overview of statistical analysis for some models for multivariate discrete-valued (MDV) time series. We present observation-driven models and models based on higher-order Markov chains. Several extensions are highlighted including non-stationarity, network autoregressions, conditional non-linear autoregressive models, robust estimation, random fields and spatio-temporal models. (C) 2021 Elsevier Inc. All rights reserved.
查看更多>>摘要:The classification of high-dimensional data is a very important problem that has been studied for a long time. Many studies have proposed linear classifiers based on Fisher's linear discriminant rule (LDA) which consists of estimating the unknown covariance matrix and the mean vector of each group. In particular, if the data dimension p is larger than the number of observations n (p > n), the sample covariance matrix cannot be a good estimator of the covariance matrix due to the well-known rank deficiency. To solve this problem, many studies proposed methods by modifying the LDA classifier through diagonalization or regularization of covariance matrix. In this paper, we categorize existing methods into three cases and discuss the shortcomings of each method. To compensate for these shortcomings, our baseline idea is that we consider estimation of the high dimensional mean vector and covariance matrix altogether while existing methods focus on shrinkage estimator of either mean vector or covariance matrix. We provide theoretical result that the proposed method is successful in both sparse and dense situations of the mean vector structures. In contrast, some existing methods work well only under specific situations. We also present numerical studies that our methods outperform existing methods through various simulation studies and real data examples such as electroencephalogy (EEG), gene expression microarray, and Spectro datasets. (C) 2021 Elsevier Inc. All rights reserved.
查看更多>>摘要:Though playing an important role in longitudinal data analysis, the uses of growth curve models are constrained by the crucial assumption that the grouping design matrix is known. In this paper we propose a Gaussian mixture model within the framework of growth curve models which handles the problem caused by the unknown grouping matrix. This allows for a greater degree of flexibility in specifying the model and freeing the response matrix from following a single multivariate normal distribution. The new model is considered under two parsimonious covariance structures together with the unstructured covariance. The maximum likelihood estimation of the proposed model is studied using the ECM algorithm, which clusters growth curve data simultaneously. Data-driving methods are proposed to find various model parameters so as to create an appropriate model for complex growth curve data. Simulation studies are conducted to assess the performance of the proposed methods and real data analysis on gene expression clustering is made, showing that the proposed procedure works well in both, model fitting and growth curve data clustering. (C) 2021 Elsevier Inc. All rights reserved.