首页期刊导航|Chemometrics and Intelligent Laboratory Systems
期刊信息/Journal information
Chemometrics and Intelligent Laboratory Systems
Elsevier BV
Chemometrics and Intelligent Laboratory Systems

Elsevier BV

0169-7439

Chemometrics and Intelligent Laboratory Systems/Journal Chemometrics and Intelligent Laboratory SystemsSCIISTPEI
正式出版
收录年代

    A generalized stability estimator based on inter-intrastability of subsets for high-dimensional feature selection

    Wahid A.Khan D.M.Iqbal N.Janjuhah H.T....
    14页
    查看更多>>摘要:? 2021 Elsevier B.V.Feature selection is an important preprocessing step in high-dimensional regression and classification problems because it helps to avoid the effect of noise, redundant, and irrelevant features on model performance. A variety of methods for feature selection have been proposed in the literature. However, small perturbations in the training data may produce highly different feature subsets; this is known as instability. Evaluating the stability of feature selection approaches has grown in importance and popularity in recent years. This paper introduces a novel stability estimator for measuring the internal and external stability of features subsets chosen using various methods in random subsampling experiments. The proposed estimator evaluates the similarity of features within selected subset as well as measuring the variation with respect to the number of selected features between selected subsets in different subsampling experiments. Furthermore, the asymptotic normality of the proposed stability estimator for large number of subsamples is also established. Experiments are carried out on both simulated and real-world datasets; where results demonstrate the usefulness of the proposed stability estimator.

    StackACPred: Prediction of anticancer peptides by integrating optimized multiple feature descriptors with stacked ensemble approach

    Arif M.Ge F.Yu D.-J.Ahmed S....
    10页
    查看更多>>摘要:? 2021 Elsevier B.V.Anticancer peptides (ACPs) have been emerged as a potential safe therapeutic agent for treating cancer. Identifying novel ACPs is crucial for understanding deep insight their functional mechanisms and vaccine production. Conventional wet-lab technological methods for finding ACPs are overpriced, slow, and resource-intensive. Thus, fast and accurate ACPs prediction through computational approach is highly desired because of massive peptide sequences accumulated in the post-genomic era. Recently, several intelligent statistical approaches have been designed for discriminating ACPs from non-ACPs. Although remarkable achievements have been accomplished, available methods still have inadequate feature descriptors and learning algorithms, thereby restricting the predictive performance. To address this, we develop a novel predictor called Stack-ACPred for the correct identification of ACPs. More specifically, the proposed method possesses three nominal feature encoding strategies i.e., evolutionary-profile and physicochemical information as segmented position-specific scoring matrix (SegPSSM), pseudo (PsePSSM), and extended pseudo amino acid composition (PseAAC). The extracted features are serially fused and further optimized through a powerful support vector machine recursive feature elimination and correlation bias reduction (SVM-RFE ?+ ?CBR) algorithm. The optimal selected attributes are provided to build the stacking-base ensemble model for targeting effective ACPs. The proposed StackACPred attained 84.45% and 86.21% accuracy based on ACP740 and ACP240 datasets with 5-fold cross-validation test, which was 2.97% and 0.79% higher than other existing studies, respectively. The empirical outcomes of our developed automated tool demonstrate the excellent discriminative power for annotating large scale ACPs in particular and other peptides in general.

    Dehydration as a Tool to improve predictability of sugarcane juice carbohydrates using near-infrared spectroscopy based PLS models

    Cardoso W.J.Gomes J.G.R.Roque J.V.Teofilo R.F....
    9页
    查看更多>>摘要:? 2021 Elsevier B.V.The aim of this work was to study dehydration as a way to improve the prediction of sucrose, glucose, and fructose in sugarcane juice using near-infrared (NIR) spectroscopy and partial least squares (PLS) regression models. The temperature, time, and sample volume involved in the dehydration process were optimized using design of experiments. Six different sample supports were assessed, being the thick couche paper the best support. NIR spectra from liquid (LSJ) and dehydrated sugarcane juice (DSJ) were obtained. Sucrose, glucose, and fructose in LSJ were analyzed using high-performance liquid chromatography with an evaporative light scattering detector (HPLC-ELSD). Sucrose, glucose, and fructose ranged from 99.29 to 249.27 ?mg/mL, 5.96–14.94 ?mg/mL and 3.99–16.10 ?mg/mL. PLS models were built using the sugars content and NIR spectra collected from a benchtop and a portable instrument. Ordered predictors selection (OPS) was applied to select the most informative variable. The results indicated better predictions for all sugars using the DSJ for both instruments, being the benchtop statistically better than the portable instrument. On the benchtop instrument, the PLS-OPS models presented root mean square error of prediction (RMSEP) respectively for sucrose, glucose, and fructose 7.98, 0.82, and 1.00 ?mg/mL using the DSJ against 12.75, 1.00, and 1.35 ?mg/mL using the LSJ. For the portable instrument, the RMSEP were respectively 15.90, 1.18, and 1.65 ?mg/mL using DSJ against 23.23, 1.40, and 2.08 ?mg/mL using LSJ. To sum up, the dehydration approach showed to be a great technique to improve the predictability of PLS-OPS models for sugarcane juice sugars using NIR spectra by removing the water and concentrating the analytes.

    Gas sensors data analysis system: A user-friendly interface for fast and reliable response-recovery analysis

    de Lima B.S.Silva W.A.S.Mastelaro V.R.Ndiaye A.L....
    7页
    查看更多>>摘要:? 2021 Elsevier B.V.Semiconductor-based gas sensors have been commercially available since the early seventies. Over the past decade, the development of nanotechnology and new carbon-nanomaterials has further increased both fundamental research and commercial innovations of such materials and devices. Each sensing element is expected to exhibit a signal for a given gas concentration that is described by three parameters: 1) the response or sensitivity, 2) the response time, and 3) the recovery time. A typical calibration or characterization procedure involves exposing several samples or devices simultaneously to different concentrations of a gas of interest. The response is then dynamically measured over time, and these three parameters can be calculated for each exposure cycle. Within this context, we present an open-source graphical user interface (GUI) that aims to facilitate the analysis procedure of dynamic response-recovery curves of resistive semiconductor-based gas sensors. The code was written in python, and it uses the open-source libraries matplotlib, pandas, NumPy, and SciPy for data visualization, handling, and fitting. PyQt is the library used for the graphical elements because it offers excellent flexibility and compatibility with different operating systems. Our software can analyze eight samples simultaneously that share the same time data, shortening the analysis process to a couple of minutes. Its source code is available at Github. This article describes its main features, the workflow, and we present three examples for data analysis whose data tables are available for user testing.

    Prediction of f-CaO content in cement clinker: A novel prediction method based on LightGBM and Bayesian optimization

    Hao X.Zhang Z.Xu Q.Huang G....
    11页
    查看更多>>摘要:? 2021 Elsevier B.V.The content of free calcium oxide(f-CaO) in cement clinker is an important index affecting the quality of cement clinker. Because f-CaO content in cement clinker cannot be measured directly, it is of great significance to accurately and quickly predict its content. However, it is difficult to establish an accurate prediction model of f-CaO due to the problems of time-varying delay, coupling between variables and uncertainty in the cement calcination production process. To solve these problems, we proposed a Bayesian Optimization Light Gradient Boosting Machine (BO-LightGBM) cement clinker f-CaO prediction model based on time series input window. Firstly, a time series input window containing time-varying delay information is designed according to the production process to form a high-dimensional time series data matrix. Then LightGBM histogram algorithm is used to extract time-varying delay features from high-dimensional time series matrices. Finally, the Bayesian optimization algorithm is used to perform global hyperparametes tuning for the characteristics of LightGBM with multiple hyperparameters. The experimental results show that compared with support vector regression (SVR), back propagation (BP), gradient boosting decision tree (GBDT), eXtreme gradient boosting (XGBoost), and LightGBM, our method BO-LightGBM has better prediction accuracy, robustness and generalization ability.

    BRNS + SSFSM-DTI: A hybrid method for drug-target interaction prediction based on balanced reliable negative samples and semi-supervised feature selection

    Sharifabad M.M.Gharaghani S.Sheikhpour R.
    11页
    查看更多>>摘要:? 2021 Elsevier B.V.De novo drug discovery is a time-consuming and costly process. Drug repositioning, which means finding new applications for existing drugs, is one of the most effective approaches to reduce time and cost of detecting a new drug. Predicting drug-target interactions (DTIs) can facilitate the drug repositioning and consequently accelerate the process of drug discovery. The diversity of drug descriptors and protein features as well as the lack of access to experimentally-confirmed non-interacting drug-target pairs as negative samples are the major challenges in predicting new DTIs. In this study, we present a modified algorithm for extracting balanced reliable negative samples named BRNS. Moreover, we propose a semi-supervised feature selection method called SSFSM-DTI and compare the performance of our hybrid predictive model with the models constructed by random selection of negative samples and the other well-known feature selection methods on benchmark DTI datasets. The results show that the combination of BRNS and SSFSM-DTI is superior to the random selection of negative samples and the state-of-the-art feature selection methods in most cases. Hence, it can be used as a guideline in the drug repositioning process. The source codes and all datasets used in this study are available at https://github.com/LBDSoft/BRNS.

    A clustering group lasso method for quantification of adulteration in black cumin seed oil using Fourier transform infrared spectroscopy

    Zhu Y.Zou L.Tan T.L.
    11页
    查看更多>>摘要:? 2021 Elsevier B.V.Black cumin seed oil (BCSO) contains a large number of bioactive compounds and thus has many medicinal health benefits and uses. Its high economic profits in the market lead to the frequent occurrence of adulterating this oil with cheaper edible oils such as grape seed oil, walnut oil. It is difficult to detect adulteration as the oil adulterant has similar physical characteristic and even similar chemical composition to the authentic oil. The development of an accurate and rapid analytical method using attenuated total reflection Fourier transform infrared (ATR-FTIR) spectroscopy is of essential importance for determination of authenticity of BCSO and quantification of oil adulterants. In this study, BCSO and grape seed oil (GSO) were mixed in various ratios to mimic the adulteration. A clustering group lasso method was developed by incorporating both the high correlation structure of spectral variables and the underlying group features into the model. Instead of assuming that groups are known a priori as does ordinary group lasso, the clustering group lasso infers groups of spectral features from the data and encourages spectral variables within a group to have a shared association with the response. The model using ATR-FTIR spectroscopy proved to be a powerful tool to quantify BCSO adulteration with high accuracy and can accurately predict the quantity of adulterant at levels as low as 5%. With a substantial reduction in number of spectral features, the clustering group lasso model shows a simple regression coefficient profile with improved interpretability as compared to the ordinary group lasso model and other penalized models. The spectral regions automatically selected for quantification of BCSO adulteration can be helpful for the interpretation of the major chemical constituents of BCSO regarding its anti-cancer and anti-inflammatory effects from a chemometric perspective.