首页期刊导航|Chemometrics and Intelligent Laboratory Systems
期刊信息/Journal information
Chemometrics and Intelligent Laboratory Systems
Elsevier BV
Chemometrics and Intelligent Laboratory Systems

Elsevier BV

0169-7439

Chemometrics and Intelligent Laboratory Systems/Journal Chemometrics and Intelligent Laboratory SystemsSCIISTPEI
正式出版
收录年代

    Naive Bayes classification model for isotopologue detection in LC-HRMS data

    van Herwerden D.Schoenmakers P.J.Samanipour S.O'Brien J.W....
    7页
    查看更多>>摘要:? 2022 The AuthorsIsotopologue identification or removal is a necessary step to reduce the number of features that need to be identified in samples analyzed with non-targeted analysis. Currently available approaches rely on either predicted isotopic patterns or an arbitrary mass tolerance, requiring information on the molecular formula or instrumental error, respectively. Therefore, a Naive Bayes isotopologue classification model was developed that does not depend on any thresholds or molecular formula information. This classification model uses the elemental mass defects of six elemental ratios and successfully identified isotopologues for both theoretical isotopic patterns and wastewater influent samples, outperforming one of the most commonly used approaches (i.e., 1.0033 ?Da mass difference method - CAMERA). For the theoretical isotopologues, the classification model outperformed an “in-house” mass difference method with a true positive rate (TPr) of 99.0% and false positive rate (FPr) of 1.8% compared to a TPr of 16.2% and an FPr of 0.02%, assuming no error. As for the wastewater influent samples, the classification model, with a TPr of 99.8% and false detection rate (FDr) of 0.5%, again performed better than the mass difference method, with a TPr of 96.3% and FDr of 4.8%. Therefore, it can be concluded that the classification model can be used for isotopologue identification, requiring no thresholds or information on the molecular formula.

    NMVI: A data-splitting based imputation technique for distinct types of missing data

    Bhagat H.V.Singh M.
    15页
    查看更多>>摘要:? 2022 Elsevier B.V.In the IoT world, where minute digital devices are acclimated to sense the data, a failure in such devices results in immense information loss and insufficient information regarding datasets results in inappropriate decisions. Missing values within a dataset have an adverse effect on the data analysis. Data Analysts in pre-processing phase perform data imputation before analyzing the dataset. Distinct traditional methods, which are predicated on simple techniques (mean, case deletion, mode or median), show poor performance while estimating the missing values. In this paper, a novel splitting-based Nullify the Missing Values before Imputation (NMVI) is proposed in which the data is first split into complete and incomplete subsets and then an upper-limit is set for every class having missing data that assists the model to estimate missing values closer to the exact values. The proposed NMVI technique surmounts the constraint of exiting imputation techniques that are completely dependent on complete variables within a class to estimate the missing values. The proposed NMVI technique has comparatively less computational time because of which it is beneficial for real-time quandaries. The experimental results depict that the proposed NMVI technique estimates the missing values in an efficient manner with respect to RMSEs, Adjusted coefficient of determination, Accuracy and Correlation coefficient irrespective of the dimensionality as well as missing rate within a dataset.

    Untargeted metabolomics based on nuclear magnetic resonance spectroscopy and multivariate classification techniques for identifying metabolites associated with breast cancer patients

    Esmaeili P.Khalilvand M.Tavakolizadeh H.Parastar H....
    8页
    查看更多>>摘要:? 2022 Elsevier B.V.In this study, multivariate classification techniques combined with proton nuclear magnetic resonance (1HNMR) spectroscopy is proposed to identify breast cancer biomarkers that can precisely distinguish between healthy control and breast cancer (BC) patients. In this regard, first optimizing the metabolite extraction procedure was performed using Box-Behnken design (BBD). Then, data-driven soft independent modeling of class analogy (DD-SIMCA) model and partial least squares-discriminant analysis (PLS-DA) were successfully utilized for separating healthy from BC patient's classes. On this matter, both DD-SIMCA and PLS-DA models could successfully distinguish the healthy class from the BC class with the model's sensitivity and specificity of 100%. Variable importance in projection (VIP) method was implemented to detect significant metabolites. Based on significant variables, 13 significant metabolites (e.g., lactic acid, cysteine, and glucose) were detected as the influential factors for this discrimination. Also, a heat map revealing the trend in metabolites levels was depicted and altered metabolic pathways were detected.

    A tutorial on automatic hyperparameter tuning of deep spectral modelling for regression and classification tasks

    Passos D.Mishra P.
    13页
    查看更多>>摘要:? 2022 The AuthorsDeep spectral modelling for regression and classification is gaining popularity in the chemometrics domain. A major topic in the deep learning (DL) modelling of spectral data is the choice and optimization of the deep neural network architecture suitable for the specific task of spectral modelling. Although there are several recent research articles already available in the chemometric domain showing advanced approaches to deep spectral modelling, currently, there is a lack of hands-on tutorial articles in this space that supply the non-expert user with practical tools to learn and implement advanced DL optimization methodologies aimed at spectral data. Hence, this tutorial article aims at reducing the gap between the non-expert user of DL in the chemometric community and the implementation of DL models for daily usage. This tutorial supplies a quick introduction to the state-of-the-art deep spectral modelling and related DL concepts and presents a set of methodologies aimed at DL hyperparameters’ optimization. To this end, this tutorial shows two practical examples on how to implement and optimize two DL models for spectral regression and classification tasks. The models are implemented in python and Tensorflow and the complete code is supplied in the form of two complementary notebooks.

    Avoiding misleading predictions in fluorescence-based soft sensors using autoencoders

    Ranzan L.Trierweiler L.F.Trierweiler J.O.Hitzmann B....
    9页
    查看更多>>摘要:? 2022 Elsevier B.V.Excitation-emission matrix (EEM) fluorescence data can be explored by using deep convolutional neural networks to enhance the predictive performance of bioprocess variables. This article proposes the use of residual neural networks (ResNet) for the prediction of ethanol, glucose, and biomass concentrations of S. cerevisiae cultivations based on fluorescence data collected in situ. A trust screening for unkown samples, based on autoencoder (AE) reconstruction error, is also proposed. Its characteristic of reconstructing the inputs is the key feature to avoid misleading predictions and forecast if a new sample should be trusted as usual or flagged as abnormal. The 83 layers deep ResNet successfully predicted the desired outputs, with R2 higher than 0.98, in the test subset. The best-fitted autoencoder had a 3-layer architecture, with three neurons in the bottleneck and using rectified linear unit (ReLU) activation for the encoder and linear activation for the decoder. The mean reconstruction Root Mean Square Error (RMSE) for the fermentation's EEMs was 4.61 rel. fluorescence intensity units, representing an error smaller than 1% (of the total amplitude of change). To evaluate the AE capability to work as trust screening, random fluorescence intensity was added to the Ex450/Em530 fluorescence pair (related to flavins) in some samples, creating a defective dataset. The dataset was evaluated with the trained AE and the ResNet model to compare reconstruction errors and bioprocess concentrations. The AE was able to identify the samples with added errors, and, as expected, the defective samples also presented higher predictive errors in general. The higher the AE's reconstruction RMSE, the less the new sample should be trusted to avoid misleading predictions.

    Quality-related fault monitoring for multi-phase batch process based on multiway weighted elastic network

    Yao H.Zhao X.Li W.Hui Y....
    15页
    查看更多>>摘要:? 2022 Elsevier B.V.A quality-related fault monitoring method of multi-phase batch process based on multiway weighted elastic network is proposed in this paper. Firstly, to make the phase division for batch process more accurately, an improved affinity propagation clustering algorithm is developed. Secondly, a multiway weighted elastic network model is developed in each phase. On the one hand, quality-related subspace and quality-unrelated subspace are constructed in each phase to achieve dual monitoring of process fault and quality anomalies. On the other hand, kernel density estimation is used to measure the contribution of each element in each subspace to the fault. According to the difference of the contribution of each element to the fault, different weight is assigned to enhance the fault features and eliminate irrelevant features such as noise. Finally, support vector data description is used to establish monitoring indexes in both quality-related subspace and quality-unrelated subspace. Compared with traditional methods, the superiority and effectiveness of the proposed method have been verified by monitoring the penicillin fermentation process and the hot strip mill process.

    Multi-resolution transmission image registration based on “Terrace Compression Method” and normalized mutual information

    Li G.Ye Y.Yang Y.Ma S....
    10页
    查看更多>>摘要:? 2022 Elsevier B.V.Multispectral transmission images provide the possibility for the early diagnosis of breast cancer, and promote the study of family self-screening of breast tumors. However, the scattering characteristic of biological tissue leads to transmission images with fuzzy boundaries and poor contrast. And there are different offsets in the captured image sequences because of the instability of the human body such as breathing and slight jitter, which will affect the data precision. In view of the above problems, a multi-resolution image registration method combining “Terrace Compression Method” and normalized mutual information (NMI) is proposed, and the effectiveness of the method is verified by taking the transmission image of wavelength at 435 ?nm as an example. First of all, the gray-scale values of the selected boundary region of heterogeneity are sorted. According to the “Terrace Gradient” of gray-scale values, multiple gray-scale intervals are divided to enhance the gradient information of the image. Then, the image of the terraced shape of each gray-scale interval is extracted respectively, and the edge detection result is obtained by the Sobel operator. Finally, the Gaussian pyramid model is used for image down-sampling to achieve multi-resolution image registration by combining with NMI similarity measure. Compared with the other registration methods, the method proposed in this paper can detect the transformation relationship between images more accurately and effectively, and the similarity of the registered image with the reference image is also higher. The combination of the “Terrace Compression Method” and NMI effectively improves the registration accuracy of transmission images, and provides favorable conditions for achieving heterogeneity detection in multispectral transmission images.

    Resolution enhancement of angular plasmonic biochemical sensors via optimizing centroid algorithm

    Wang G.Shi J.Zhang Q.Wang R....
    10页
    查看更多>>摘要:? 2022 Elsevier B.V.In this paper, a simple removable post-processing method was employed to improve the pixel-angle resolutions for different surface plasmon resonance (SPR) biochemical sensors. Compared with conventional centroid algorithm, dip span and height proportion of our enhanced algorithm were optimized simultaneously. To improve sensing stability when resonance angle shifting, the calculation region varies with the resonance angle. Besides, FFT algorithm and adjacent-averaging smoothing algorithm were also introduced into our proposed algorithm. It was found that FFT algorithm could reduce standard deviation of resonance angle for dynamic baseline algorithm. Adjacent-averaging smoothing algorithm with an average number of 10 points could also reduce standard deviation of resonance angle. The refractive index (RI) resolution could reach 1.55 ?× ?10?6 RIU, 5.52 ?× ?10?7 RIU and 8.47 ?× ?10?7 RIU for conventional Au-based SPR sensor, MSF and PAA-based plasmonic waveguide resonance (PWR) sensors, respectively. In biochemical sensing experiments, the limit of detection (LOD) of Zn2+ ion for MSF-based PWR chemical sensor could obtain 14.86 ?nM. The LOD of BSA protein could obtain 5.68 pM for PAA-based PWR biosensor. Further, Ag-based SPR biosensor modified with rabbit IgG was used to detect goat anti-rabbit IgG and a LOD of 9.93 pM was achieved.

    Soft variable selection combining partial least squares and attention mechanism for multivariable calibration

    Xiong Y.Yang W.Xu Z.Du Y....
    8页
    查看更多>>摘要:? 2022 Elsevier B.V.Partial least squares (PLS) are a widely used algorithm for building a linear model between chemical properties and multivariables. Due to the abundant features and relatively few calibration samples, variable selection is usually adopted to eliminate uninformative variables and restrain overfitting. In this study, a new variable selection method, called ‘Attention-PLS’ was proposed, combining PLS with the attention mechanism in a neural network. The attention mechanism tries to find a new combination of the variables, and owing to the property of softmax function, only few variables' weights are dominant in the new combination's weights. Attention-PLS is a soft way of variable selection, as it does not absolutely eliminate the influence of the unimportant variables but enlarge their difference of variables' weights by using softmax function to normalize the weights. Attention-PLS is compared with some common methods like ordinary partial least squares, Least Absolute Shrinkage and Selection Operator (LASSO), Ridge Regression (RR), Monte Carlo based uninformative variable elimination (MC-UVE), and Sparse Partial Least Square (SPLS), which are applied to three near infrared spectral (NIR) datasets. The results show that the proposed method has better prediction performances.

    Fast partial quantile regression

    Lillo R.E.Mendez-Civieta A.Aguilera-Morillo M.C.
    8页
    查看更多>>摘要:? 2022 Elsevier B.V.Partial least squares (PLS) is a dimensionality reduction technique used as an alternative to ordinary least squares (OLS) in situations where the data is colinear or high dimensional. Both PLS and OLS provide mean based estimates, which are extremely sensitive to the presence of outliers or heavy tailed distributions. In contrast, quantile regression is an alternative to OLS that computes robust quantile based estimates. In this work, the multivariate PLS is extended to the quantile regression framework, obtaining a theoretical formulation of the problem and a robust dimensionality reduction technique that we call fast partial quantile regression (fPQR), that provides quantile based estimates. An efficient implementation of fPQR is also derived, and its performance is studied through simulation experiments and the chemometrics well known biscuit dough dataset, a real high dimensional example.