Naive Bayes combined with partial least squares for classification of high dimensional microarray data

扫码查看

原文链接

NSTL
Elsevier

外文摘要：Technological advances allow for the measurement of high dimensional data sets with small sample size. When dealing with such high-dimensional data, the consistency of estimations and classification accuracy is called into question. Partial least squares (PLS) scores have traditionally been coupled with linear discriminant analysis, which requires a multivariate normally distributed PLS score. For the classification of high-dimensional data sets, we introduce PLS-NB, a classification strategy that combines PLS with a variant of Naive Bayes (NB). PLS-NB with standard NB, PLS-NB-G with Gaussian(G) kernel NB, PLS-NB-N with non-parametric (N) kernel NB, and PLS-NB-L with Laplace (L) correction are compared to reference approaches PLS coupled with linear discriminate analyses (LDA) and sparse LDA, which are PLS-LDA and SPLS-LDA, respectively, over gene expression data. Cross -validation is used in conjunction with Monte Carlo simulation to avoid over-fitting. The suggested classifier PLS-NB has been validated and calibrated against reference classifiers. PLS-NB-N outperforms when it comes to classifying embryonal cancer with 89.1% accuracy on test data, and it outperforms when it comes to classifying prostate cancer with 92.3% accuracy on test data. The presented method appears to be a viable contender for high-dimensional data classification; its merits can be investigated further, and it can be used to a variety of classification problems.

外文关键词：

MicroarrayHigh dimensionalSmall sample sizeClassificationPartial least squaresNaive bayesCANCER CLASSIFICATIONPREDICTIONPLS

作者：

Mehmood, Tahir、Kanwal, Arzoo、Butt, Muhammad Moeen

展开 >

作者单位：

Natl Univ Sci & Technol NUST

Univ Management & Technol

出版年：

2022

DOI：

10.1016/j.chemolab.2022.104492

Chemometrics and Intelligent Laboratory Systems

EISCI

ISSN：0169-7439

年,卷(期)：2022.222

被引量1
参考文献量29