首页期刊导航|Computational statistics & data analysis
期刊信息/Journal information
Computational statistics & data analysis
North-Holland Pub. Co.
Computational statistics & data analysis

North-Holland Pub. Co.

0167-9473

Computational statistics & data analysis/Journal Computational statistics & data analysisSCIISTPAHCI
正式出版
收录年代

    Mallows model averaging with effective model size in fragmentary data prediction

    Yuan, ChaoxiaFang, FangNi, Lyu
    18页
    查看更多>>摘要:Most existing model averaging methods consider fully observed data while fragmentary data, in which not all the covariate data are available for many subjects, becomes more and more popular nowadays with the increasing data sources in many areas such as economics, social sciences and medical studies. The main challenge of model averaging in fragmentary data is that the samples to fit candidate models are different to the sample used for weight selection, which introduces bias to the Mallows criterion in the classical Mallows Model Averaging (MMA). A novel Mallows model averaging method that utilizes the "effective model size " taking different samples into consideration is proposed and its asymptotic optimality is established. Empirical evidences from a simulation study and a real data analysis are presented. The proposed Effective Mallows Model Averaging (EMMA) method not only provides a novel solution to the fragmentary data prediction, but also sheds light on model selection when candidate models have different sample sizes, which has rarely been discussed in the literature. (C)& nbsp;2022 Elsevier B.V. All rights reserved.

    Local and global topics in text modeling of web pages nested in web sites*

    Wang, JasonWeiss, Robert E.
    14页
    查看更多>>摘要:Topic models assert that documents are distributions over latent topics and latent topics are distributions over words. A nested document collection has documents nested inside a higher order structure such as articles nested in journals, podcasts within authors, or web pages nested in web sites. In a single collection of documents, topics are global or shared across all documents. For web pages nested in web sites, topic frequencies likely vary across web sites and within a web site, topic frequencies almost certainly vary from web page to web page. A hierarchical prior for topic frequencies models this hierarchical structure with a global topic distribution, web site topic distributions varying around the global topic distribution, and web page topic distributions varying around the web site topic distribution. Web pages in one United States local health department web site often contain local geographic and news topics not found on web pages of other local health department web sites. For web pages nested in web sites, some topics are likely local topics and unique to an individual web site. Regular topic models ignore the nesting structure and may identify local topics but cannot label those topics as local nor identify the corresponding web site owner. Explicitly modeling local topics identifies the owning web site and identifies the topic as local. In US health web site data, topic coverage is defined at the web site level after removing local topic words from pages. Hierarchical local topic models can be used to study how well health topics are covered.

    Entropy-based test for generalised Gaussian distributions

    Cadirci, Mehmet SiddikEvans, DafyddMakogin, VitaliiLeonenko, Nikolai...
    20页
    查看更多>>摘要:The proof of L-2 consistency for the kth nearest neighbour distance estimator of the Shannon entropy for an arbitrary fixed k >= 1 is provided. It is constructed the non-parametric test of goodness-of-fit for a class of introduced generalised multivariate Gaussian distributions based on a maximum entropy principle. The theoretical results are followed by numerical studies on simulated samples. It is shown that increasing of k improves the power of the introduced goodness of fit tests. The asymptotic normality of the test statistics is experimentally proven. (C)& nbsp;2022 Published by Elsevier B.V.

    Feature screening and FDR control with knockoff features for ultrahigh-dimensional right-censored data

    Pan, Yingli
    16页
    查看更多>>摘要:A model-free feature screening method for ultrahigh-dimensional right-censored data is advocated. A two-step approach, with the help of knockoff features, is proposed to specify the threshold for feature screening such that the false discovery rate (FDR) is controlled under a prespecified level. The proposed two-step approach enjoys both a sure screening property with high probability and FDR control simultaneously if the prespecified FDR level is greater than or equal to 1/s, where s is the number of active features. The finite sample properties of the newly suggested method are assessed through simulation studies. An application to the mantle cell lymphoma (MCL) study demonstrates the utility of the proposed method in practice.

    A rank-based high-dimensional test for equality of mean vectors

    Ouyang, YanyanLiu, JiaminTong, TiejunXu, Wangli...
    21页
    查看更多>>摘要:The Wilcoxon signed-rank test and the Wilcoxon-Mann-Whitney test are two commonly used rank-based methods for one- and two-sample tests when the one-dimensional data are not normally distributed. The new rank-based nonparametric tests for equality of mean vectors are proposed in the high-dimensional settings. To overcome the technical challenges in data sorting, the new statistics are constructed by taking the sum of the Wilcoxon signed-rank or Wilcoxon-Mann-Whitney test statistics from each dimension of the data. The asymptotic properties of the proposed test statistics are investigated under the null and local alternative hypotheses. Simulation studies show that the new tests perform as well as the state-of-the-art methods when the high-dimensional data are normally distributed, but they turn out to be more powerful when the normality assumption is violated. Finally, the new testing methods are also applied to a human peripheral blood mononuclear cells gene expression data set for demonstrating their usefulness in practice. (C) 2022 Elsevier B.V. All rights reserved.

    Stochastic representation of FGM copulas using multivariate Bernoulli random variables

    Blier-Wong, ChristopherCossette, HeleneMarceau, Etienne
    24页
    查看更多>>摘要:A one-to-one correspondence between Frechet's class of multivariate Bernoulli distribution with symmetric marginals and the well-known family of Farlie-Gumbel-Morgenstern (FGM) copulas is established. A new stochastic representation of the family of d-variate FGM copulas is introduced. The representation is bijective: from any d-variate Bernoulli distribution, one may define a corresponding d-variate FGM copula; and for any d-variate FGM copula, one finds the corresponding d-variate Bernoulli distribution. The proposed stochastic representation has many advantages, notably establishing stochastic orders, constructing subclasses of FGM copulas and sampling. In particular, one may use the stochastic representation to develop computational methods to perform sampling from subclasses of FGM copulas, which scale well to large dimensions. (c) 2022 Elsevier B.V. All rights reserved.

    On the semi-varying coefficient dynamic panel data model with autocorrelated errors

    Wei, HongleiZhang, HongfanJiang, HuiHuang, Lei...
    14页
    查看更多>>摘要:In nonlinear time series modeling, autocorrelation of the random errors may cause critical problems in estimation and inference. The situation becomes even worse for panel data with dynamic structure. However, most of the existing literature has not taken into account this problem. The challenge comes from the fact that the expectation of random errors conditional on lag variables is hardly to be zero. Based on the extension of Whittle likelihood, a semi-parametric dynamic model with ARMA errors for panel data is proposed. Asymptotic normality for the estimators of finite parameters and varying coefficients have been established respectively. Statistical simulations show that the proposed method can efficiently remove the bias of estimation. In real data analysis, it demonstrates that the proposed method can improve prediction when errors are autocorrelated.

    Varying-coefficient hidden Markov models with zero-effect regions

    Liu, HefeiSong, XinyuanZhang, Baoxue
    17页
    查看更多>>摘要:In psychological, social, behavioral, and medical studies, hidden Markov models (HMMs) have been extensively applied to the simultaneous modeling of longitudinal observations and the underlying dynamic transition process. However, the existing HMMs mainly focus on constant-coefficient HMMs. This study considers a varying-coefficient HMM, which enables simultaneous investigation of the dynamic covariate effects and between-state transitions. Moreover, a soft-thresholding operator is introduced to detect zero-effect regions of the coefficient functions. A full Bayesian approach with a hybird Markov chain Monte Carlo algorithm that combines B-spline approximation and penalization technique is developed for statistical inference. The empirical performance of the propose method is evaluated through simulation studies. An application to a study on the Alzheimer's Disease Neuroimaging Initiative dataset is presented. (c) 2022 Elsevier B.V. All rights reserved.

    Marginal M-quantile regression for multivariate dependent data

    Merlo, LucaPetrella, LeaSalvati, NicolaTzavidis, Nikos...
    24页
    查看更多>>摘要:An M-quantile regression model is developed for the analysis of multiple dependent outcomes by introducing the notion of directional M-quantiles for multivariate responses. In order to incorporate the correlation structure of the data into the estimation framework, a robust marginal M-quantile model is proposed extending the well-known generalized estimating equations approach to the case of regression M-quantiles with Huber's loss function. The estimation of the model and the asymptotic properties of estimators are discussed. In addition, the idea of M-quantile contours is introduced to describe the dependence between the response variables and to investigate the effect of covariates on the location, spread and shape of the distribution of the responses. To examine their variability, confidence envelopes via nonparametric bootstrap are constructed. The validity of the proposed methodology is explored both by means of simulation studies and through an application to educational data. (C)& nbsp;2022 Elsevier B.V. All rights reserved.

    Joint non-parametric estimation of mean and auto-covariances for Gaussian processes

    Serra, PauloRosales, FranciscoKlockmann, KarolinaKrivobokova, Tatyana...
    17页
    查看更多>>摘要:Gaussian processes that can be decomposed into a smooth mean function and a stationary autocorrelated noise process are considered and a fully automatic nonparametric method to simultaneous estimation of mean and auto-covariance functions of such processes is developed. The proposed empirical Bayes approach is data-driven, numerically efficient, and allows for the construction of confidence sets for the mean function. Performance is demonstrated in simulations and real data analysis. The method is implemented in the R package eBsc.1 (c) 2022 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).