首页期刊导航|Chemometrics and Intelligent Laboratory Systems
期刊信息/Journal information
Chemometrics and Intelligent Laboratory Systems
Elsevier BV
Chemometrics and Intelligent Laboratory Systems

Elsevier BV

0169-7439

Chemometrics and Intelligent Laboratory Systems/Journal Chemometrics and Intelligent Laboratory SystemsSCIISTPEI
正式出版
收录年代

    Peak-aware guided filtering for spectrum signal denoising

    Liu, DonghongHe, Chuanjiang
    10页
    查看更多>>摘要:In the analysis of spectrum signals, it is especially important to identify the peaks of a spectrum signal. Due to various noise, spectrum signals must be smoothed in advance, while preserving peaks information. In this paper, we propose a peak-aware guided filtering (PAGF) for peak-preserving smoothing of a noisy spectrum signal. The content of a guidance signal is taken into account in the filtering process, which can provide smooth structure information and peak position of the input signal. The guidance signal is generated from the input signal by a nonlinear diffusion filtering (NDF) since the output of NDF provides sufficiently smooth structure and accurate peak position for the input signal. The proposed PAGF is validated by experiments on commonly tested signals, in comparison to previous peak-preserving smoothing methods. The results show that the PAGF can better preserve the peak height, width and position of the input spectrum signal while removing the noise.

    A hybrid feature selection method for predicting lysine malonylation sites in proteins via machine learning

    Rajabiun, HananehMohammadHoseini, MahdisZarezadeh, HadiDelkhosh, Mehdi...
    10页
    查看更多>>摘要:Providing efficient methods for analyzing medical data is one of the important needs of modern biological sci-ences. For this, in this paper, a new feature selection method is introduced using the combination of several feature selection methods. At first, the algorithms of EAAC, EGAAC, PKa, TF-IDF, TF-CRF, and PSSM are expressed, which are among the well-known methods for feature extraction, and then three proposed models are provided that are the combinations of these algorithms. The proposed method has been implemented on three lysine malonylation datasets of M. musculus, H. sapiens, and E. coli, and also several machine learning methods have been used to categorization the data. Finally, to show the efficiency of the proposed method, some important parameters have been calculated and compared with other feature extraction methods. Furthermore, the results have been compared with several well-known articles and the results have been reported tabularly and graphically.

    Pattern recognition method from hydrochemical parameters to predict uranium concentrations in groundwater

    Khurelbaatar, LuvsanbatBatdelger, AnkhnybayarKhinayat, TsookhuuOyuntsetseg, Bolormaa...
    12页
    查看更多>>摘要:Traditional methods for determining uranium in groundwater are spectroscopic methods that require time, money, and experienced chemists. A prediction method for predicting uranium concentration using a few hydrochemical parameters that are related to uranium was developed within this study. In this study, uranium, hydrochemical parameters, and trace elements of groundwater around Ulaanbaatar in Mongolia were measured. Inductively coupled plasma mass spectrometry was used to determine the uranium concentration in 135 samples. The relationship between uranium concentration and hydrochemical parameters was studied to determine whether the concentration in groundwater could be predicted using chemometric methods based on some hydrochemical parameters. Chemometric methods were performed using a Python programming language. A pattern recognition method for classifying groundwater samples by specific threshold uranium concentration uses a principal component analysis (PCA) and support vector machine (SVM). The average accuracy of the classifi-cation models was 88.21%. PCA is used to visualize the classification by uranium concentration and show which hydrochemical parameters are crucial to predicting uranium concentration. In this study, a regression model was developed to predict uranium concentration in groundwater using hydrochemical parameters selected by the PCA-SVM combination method. The regression method uses a combination of polynomial regression and multiple linear regression. This combined regression method has shown good results for predicting uranium concentrations based on selected hydrochemical parameters.

    The development of nano-QSPR models for viscosity of nanofluids using the index of ideality of correlation and the correlation intensity index

    Jafari, KimiaFatemi, Mohammad HosseinToropova, Alla P.Toropov, Andrey A....
    9页
    查看更多>>摘要:Utilizing nanofluids as a suspension containing nanoparticles in an ordinary liquid is a relatively new field, which received great attention in recent years. The main object of this investigation is focused on modeling the effective viscosity of nanofluids using the nano-quantitative structure-property relationship (nano-QSPR) paradigm. Two distinct data sets were considered containing four types of nanoparticles (Al2O3, CuO, SiO2, and ZnO) dispersed in water (as the most common base fluid) at volume fraction ranges of 1-5% and various shapes (blades, bricks, cylindrical, spherical, and platelets). Simplified Molecular Input-Line Entry System (SMILES) is a tool to represent the molecular structure. Quasi-SMILES is a sequence of symbols that represents all available data e.g. molecular structure together with physicochemical conditions. Taking into account the capability of quasi-SMILES molecular representation to define the eclectic data such as size and shape of nanoparticles, this notation was chosen to exemplify nanofluids structure. It is remarkable to point out that the proposed attitude to generate nano-QSPR models introduced a comparison between two specific predictive potential criteria using by Monte Carlo technique. It was concluded that the development of models based on Correlation intensity index (CII) is statistically more reliable than model generation based on the Index of ideality of correlation (IIC).

    Optimization of ultrasound-assisted extraction of phenolic-saponin content from Carthamus caeruleus L. rhizome and predictive model based on support vector regression optimized by dragonfly algorithm

    Moussa, HamzaDahmoune, FaridHentabli, MohamedRemini, Hocine...
    12页
    查看更多>>摘要:Box-Behnken design and support vector regression optimized using dragonfly algorithm as chemometrics tech-niques were employed to optimize and predict total phenolic (TPC) and saponin content (TSC) from Carthamus caeruleus L. rhizome using ultrasound-assisted extraction. Moreover, the comparative study of the antioxidant activity of rhizomes and leaves parts was also performed using different assays including scavenging free radical (ABTS', DPPH') activity, FRAP, and phosphomolybdenum assays. The results confirmed that the Box-Behnken design was achieved and the optimal conditions for the recovery of maximum TPC and TSC were obtained with 87.66 % methanol concentration, a solvent to solid ratio of 23 mL:g(-1), a temperature of 50 & nbsp;C, and 26 min sonication time. The established SVR-DA model has been successfully predicted the extraction of TPC and TSC from C. caeruleus L. rhizome with a higher R-2 = 0.99 and low error. Matlab graphical user interface of optimized SVR-DA model was developed to predict TPC and TSC that could be used in pharmaceutical purposes. Further-more, the optimal extract of rhizome and leaves extract showed high capacity of antioxidants, thus the C. caeruleus L. can be a promising candidate for the cosmetic and pharmaceutical industry.

    A note on neighborhood first Zagreb energy and its significance as a molecular descriptor

    Mondal, SouravBarik, SasmitaDe, NilanjanPal, Anita...
    10页
    查看更多>>摘要:The concept of graph spectra can be thought of as an approach to use linear algebra including, in particular, the well developed theory of matrices for unlocking a thousand secrets about graph theory and its applications. A novel neighborhood degree sum based matrix is proposed as a modification of classical adjacency matrix. Using the spectrum of this matrix, a graph energy and its Estrada index are introduced, and their role as a molecular structural descriptor in chemical graph theory is investigated. An algorithm is designed to make the computation of the energy and its Estrada index convenient. The relationship between the recently proposed matrix and its associated graph invariant is studied using the spectral moment. Several sharp bounds for spectral radius, energy, and Estrada index are computed, and the corresponding extremal graphs are characterized. The integral representation of the energy is also reported.

    RoBoost-PLS2-R: An extension of RoBoost-PLSR method for multi-response

    Mas-Garcia, SilviaBendoula, RyadDardenne, PierreLesnoff, Matthieu...
    9页
    查看更多>>摘要:Recently, a novel robust PLSR method was developed to address the problem of outliers in the data. In this paper, an extension of this method, called RoBoost-PLS2-R is proposed to predict multi-response variables. Robustness and efficiency of this new approach have been validated on two simulated data sets and one real data set containing different outlier scenarios. Its performance was also compared with reference methods (PLS2-R and RSIMPLS) for predicting multi-response variables. Results confirm that RoBoost-PLS2-R greatly reduces prediction errors when data contain outliers. Prediction performances of RoBoost-PLS2-R are close to the optimal model (PLS2-R) calibrated without outliers and also to RSIMPLS method. This method seems to be a reliable and a competitive robust regression tool for predicting multi-response variables.

    Robust probabilistic principal component regression with switching mixture Gaussian noise for soft sensing

    Sadeghian, AnahitaJan, Nabil MagboolWu, OuyangHuang, Biao...
    11页
    查看更多>>摘要:In the era, that data collection is not as challenging as before, data-driven process modeling for prediction of unmeasurable or expensive-to-measure variables is gaining popularity. Probabilistic principal component analysis has powerful features for modeling such as considering uncertainty and dealing with high-dimensional process data. Although data collection is more attainable these days, low quality of data still diminishes model perfor-mance. High-fidelity modeling requires high-quality data. The focus of this work is to deal with outlying ob-servations by developing a Robust Probabilistic Principal Component Regression (RPPCR). Here, we have investigated a scenario of mixture Gaussian switching measurement noise to mimic certain type of outliers in a forward-looking approach that extends our previous work. A rigorous modeling approach that can handle switching noise and the solution methodology are discussed in detail. Two case studies, a numerical illustrative example and a real industrial counterpart, are considered to verify the robustness of proposed model.

    An alternative point of view on PLS

    Stocchero, MatteoDe Nardi, MartinoScarpa, Bruno
    12页
    查看更多>>摘要:PLS has been extensively studied in the past, but several issues about its theoretical foundation are still open. In this study, we draw an alternative formulation of PLS2 based on gradient descent method. A new algorithm to solve the least squares problem called "gradient descent" PLS2 is introduced. The "latent variable" language is not used in its formulation. Well-defined elements of linear algebra are used and the method is developed within the non-linear programming field, rather than statistical modelling. The new algorithm is equivalent to the standard "eigenvalue" PLS2 algorithm. The use of "gradient descent" PLS2 to solve linear regression problems is discussed and two data sets, one simulated and one real, are investigated to show the behaviour of the algorithm.

    Naive Bayes combined with partial least squares for classification of high dimensional microarray data

    Mehmood, TahirKanwal, ArzooButt, Muhammad Moeen
    8页
    查看更多>>摘要:Technological advances allow for the measurement of high dimensional data sets with small sample size. When dealing with such high-dimensional data, the consistency of estimations and classification accuracy is called into question. Partial least squares (PLS) scores have traditionally been coupled with linear discriminant analysis, which requires a multivariate normally distributed PLS score. For the classification of high-dimensional data sets, we introduce PLS-NB, a classification strategy that combines PLS with a variant of Naive Bayes (NB). PLS-NB with standard NB, PLS-NB-G with Gaussian(G) kernel NB, PLS-NB-N with non-parametric (N) kernel NB, and PLS-NB-L with Laplace (L) correction are compared to reference approaches PLS coupled with linear discriminate analyses (LDA) and sparse LDA, which are PLS-LDA and SPLS-LDA, respectively, over gene expression data. Cross -validation is used in conjunction with Monte Carlo simulation to avoid over-fitting. The suggested classifier PLS-NB has been validated and calibrated against reference classifiers. PLS-NB-N outperforms when it comes to classifying embryonal cancer with 89.1% accuracy on test data, and it outperforms when it comes to classifying prostate cancer with 92.3% accuracy on test data. The presented method appears to be a viable contender for high-dimensional data classification; its merits can be investigated further, and it can be used to a variety of classification problems.