查看更多>>摘要:? 2021In various kinds of industries, researchers conduct experiments in which the experimental factors affect an ordinal outcome with three or more categories. A popular model for ordinal outcome variables is the cumulative logit model which is also known as the proportional odds model. In this article, we explore locally and Bayesian D- and I-optimal experimental designs for the cumulative logit model. We perform an instructive sensitivity study to learn about the dependency of D- and I-optimal designs on the values of the model parameters and on the number of outcome categories, and use a polypropylene experiment as a proof-of-concept example.
查看更多>>摘要:? 2021 Elsevier B.V.There are many difficulties that may be encountered in experiments of chemometrics as well as most computer experiments, that ought to explore an approximate model instead of the true one that is complicated. Space-filling designs including uniform designs are robust for such situation. However, when the underlying regression model is known, the D-optimal design (DOD) is the most effective on parameter estimation, but DOD is not robust against the model change. In this paper, we propose a new type of composite designs guaranteeing both robustness and effectiveness. Subsequently, we compare the prediction performance of the seven candidate composite designs, under various case studies. For the convenience of implementation, instead of chemical experiments, we adopt computer experiments as illustration involving some popular models that have been widely applied in evaluating the performance of optimization algorithms. Among all candidate composite designs, we recommend two of them based on their advantaged performance in all cases we explored. Our recommendation is also suitable for most physical experiments.
查看更多>>摘要:? 2021In this paper, for the simultaneous measurement of Metformin (MET) and Pioglitazone (PIO) in Actoplus MET as antidiabetic tablet, chemometrics and spectrophotometry methods without the need for separation steps was used. The applied chemometrics methods were artificial neural network (ANN), partial least squares regression (PLS), and principal component regression (PCR). The ANN consisting of two, four and six layers with 2, 4, 6, 8, and 10 neurons was trained using a feed forward back-propagation (FFBP) learning. The algorithms used were Levenberg-Marquardt (LM) and gradient descent with momentum and adaptive learning rate back propagation (GDX). The mean square error (MSE) of the LM algorithm was obtained 3.18 ?× ?10?30 and 1.58 ?× ?10?30 for MET and PIO, respectively, which represented that the LM algorithm performed better than the GDX algorithm. In the PLS method, lower root mean square error (RMSE) (MET ?= ?0.0558, PIO ?= ?0.3981) showed better performance compared PCR method (MET ?= ?0.0559, PIO ?= ?0.4048). Finally, the results of the proposed methods and high performance liquid chromatography (HPLC) as a reference approach were compared with one-way ANOVA test at 95% confidence level, which did not show a significant difference between the data.
查看更多>>摘要:? 2021 Elsevier B.V.Lectins are types of glycoprotein that have a wide variety of different species which play an important part in tumor discrimination due to their meaningful binding resemblance to different types of saccharide (carbohydrate) groups of the protein. Cancerlectins are those lectins that are firmly identified with specific kinds of proteins, which begin cancer cell endurance, development, metastasis, and spread of cancer. Differentiation of a protein based on its functionality remains a difficult job in the post-genomic era. The study of protein-specific function differentiation plays important role in therapeutic cancer studies. Lab-based methods were presented for prediction of cancerlectins. However, these approaches are expensive and time-consuming. Numerous computational sequence-based approaches have been developed to separate cancerlectins from non-cancerlectins. In our proposed study, we have designed a fast deep learning model for the discrimination of cancerlectins from non-cancerlectins on sequence-based feature descriptive techniques. The proposed model discovered intrinsic features by Conjoint Trade (CT), Pseudo Amino Acid Composition (PseAAC), and Position Specific Scoring Matrix (PSSM). The feature vector of these descriptors was concatenated and selected the best features by Random Forest-Sequential Feature Selection (RF-SFS). The model training and prediction were performed with Decision Tree (DT), Random Forest Classifier (RFC), Support Vector Machine (SVM), and Deep Neural Network (DNN). The DNN showed the best performance and secured 89.40% accuracy, 80.84% sensitivity, and 94.62% specificity. These experimental results show the sturdiness of the proposed study and surpassed all the current methodology in the literature. We believe that the proposed strategy will be a helpful instrument in the malignant growth therapeutics research, drug plan, and scholarly examination considers.
查看更多>>摘要:? 2021 Elsevier B.V.Visible and near infrared spectroscopy (VIS-NIR) is increasingly being transferred from laboratory to industry for in-line and portable applications in various domains. By intensively using VIS-NIR spectroscopy, some abnormal observations may certainly arise. It is then important to properly handle outliers to elaborate effective prediction models. The objective of this study is to investigate the potential of using a robust method called Roboost-PLSR to improve prediction model performances for a viticulture application. This work focuses on a case study to predict sugar content in grape berries of three different grape varieties of Vitis Vinifera in a maturity monitoring context. Hyperspectral images were acquired of grape berries of Syrah, Fer-Servadou and Mauzac varieties. Reference measurements of sugar levels were made in the laboratory by densimetric baths. Performances of RoBoost-PLSR models were compared to performances of reference models using Partial Least Square Regression (PLSR). Reference prediction criteria using PLSR were obtained for all varieties with these following values: Syrah (Rp2 ?= ?0.971; RMSEp ?= ?5.36 ?g/L), Fer-servadou (Rp2 ?= ?0.788; RMSEp ?= ?11.69 ?g/L) and Mauzac (Rp2 ?= ?0.690; RMSEp ?= ?15.61 ?g/L). Prediction qualities are improved with RoBoost-PLSR: Syrah (Rp2 ?= ?0.990; RMSEp ?= ?3.14 ?g/L), Fer-Servadou (Rp2 ?= ?0.848; RMSEp ?= ?10.20 ?g/L) and Mauzac (Rp2 ?= ?0.927; RMSEp ?= ?7.58 ?g/L). Results confirm that Roboost-PLSR method allows a better consideration of outliers within the calibration set.
查看更多>>摘要:? 2021 Elsevier B.V.Partial least squares (PLS) regression is a linear regression technique and plays an important role in dealing with high-dimensional regressors. Unfortunately, PLS is sensitive to outliers in datasets and consequentially produces a corrupted model. In this paper, we propose a robust method for PLS based on the idea of least trimmed squares (LTS), in which the objective is to minimize the sum of the smallest h squared residuals. However, solving an LTS problem is generally NP-hard. Inspired by the complementary idea of Sim and Hartley, we solve the inverse of the LTS problem instead and formulate it as a concave maximization problem, which is convex and can be solved in polynomial time. Classic PLS as well as two of the most efficient robust PLS methods, Partial Robust M (PRM) regression and RSIMPLS, are compared in this study. Results of both simulation and real data sets show the effectiveness and robustness of our approach.
查看更多>>摘要:? 2021 Elsevier B.V.The least absolute shrinkage and selection operator (LASSO) is an established sparse representation approach for variable selection, and its performance relies on finding a good value for the regularization parameter, typically through cross-validation. However, cross-validation is a computationally intensive step and requires a properly determined search range and step size. In the present study, the ridge-adding homotopy (RAH) algorithm is applied with LASSO to overcome the aforementioned shortcomings. The homotopy algorithm can fit the entire solution of the LASSO problem by tracking the Karush-Kuhn Tucker (KKT) conditions and yields a finite number of potential regularization parameters. Considering the singularities, a M×1 random ridge vector will be added to the KKT conditions, which ensures that only one element is added to or removed from the active set. Finally, we can select the optimal regularization parameter by traversing the potential parameters with modelling and evaluation metrics. The selected variables are the nonzero elements in the sparse regression coefficient vector derived by the optimal regularization parameter. The proposed method has been demonstrated on three near-infrared (NIR) datasets with regard to wavelength selection and calibration. The results suggested that the “RAH-LASSO ?+ ?PLS” outperforms “LASSO ?+ ?PLS” and “full-wavelength PLS” in most cases. Importantly, the RAH method provides a systematic, as opposed to trial-and-error, procedure to determine the regularization parameter in LASSO.
查看更多>>摘要:? 2021 Elsevier B.V.A novel discriminant analysis (DA) method is proposed, based on the robust reweighted shrinkage estimators and a robust Mahalanobis distance with an adjusted quantile as threshold. A simulation study is done to evaluate the performance of the proposed approach in comparison with the classical DA and the other robust alternatives from the literature. The approach is also illustrated using real dataset examples: a geochemical and environmental dataset known as the Kola Project and a second data containing the spectra of different cultivars of a fruit. The results show the appropriateness of the method while being computationally efficient at the same time. Additional simulations are included to show the additional benefits in outlier detection.
查看更多>>摘要:? 2021 Elsevier B.V.The visible near infrared hyperspectral imaging systems (HIS) with a xenon light source, Pika XC2 camera having a spectral range of 400–1100 ?nm, and a SpectrononPro software was used for the hypercube data visualization of the fresh and the damaged rice grains. The linear assembly of stage control was set with a scanning speed of 0.79 ?cm/s, homing speed of 0.77 ?cm/s, and a stepping mode of 0.60 ?cm/s. The captured images in the form of RGB data cubes were modified in MATLAB 2017a to gray image, and then further to a binary image. Dimensional reduction using PCA was at first applied to the range of wavelengths of 396.16 ?nm–1003.71 ?nm to obtain the first and second principal component versus wavelength graphs. The images were then cropped and masked in MATLAB to get first versus second principal component plots for both damaged and healthy rice grains. The first and second principal components have a mean value of 699.9 ?nm, and a mode value of 396.2 ?nm in the case of fresh rice grains, and a mean value of 700.1 ?nm, and a mode value of 396.2 ?nm for the damaged rice grains. The cropping of the images was then at significant wavelengths of 904.07, 914.90, 646.32, and 725.38 ?nm for the fresh rice grains, while 910.57, 916.49, 691.80, and 852.63 ?nm for the damaged rice grains respectively. The standard error reported for the fresh rice on the X-axis (XF) and Y-axis (YF) was 1.34 (XF), and 0.17 (YF), while for the damaged rice was 1.15 (XI), and 0.15 (YI) respectively. Therefore, it can be affirmed that the prediction or distinction of rice on the basis of fresh and damaged ones can be done with ease. Further, this approach can be applied to unknown samples to detect insect infestation in rice.
查看更多>>摘要:? 2022 Elsevier B.V.Therapeutic peptides, as active substances involved in a variety of cell functions in the organism, are essential participants to complete complex physiological activities of the body. Therefore, the prediction of therapeutic peptides is essential for researching on peptide-based therapies. The method of using biological experiments is considered to be time-consuming and labor-intensive. As a fast and accurate method, deep learning can process massive amounts of data on therapeutic peptides. In this research, we raise a deep learning model called Pep-CNN to accurately predict therapeutic peptides. Firstly, we represent the features of the peptide sequence based on the sequence position, the physicochemical property, and the evolutionary-derived feature and use the vectors to represent the sequence. After fusing the features, we use the improved classifier of Convolutional Neural Network (imCNN) to classify and predict eight kinds of peptides. The results show that, compared with other models, Pep-CNN can identify peptides more accurately, which is more conductive to the further research of therapeutic peptides by biomedical scientists. The codes and benchmark datasets are accessible at https://github.com/alivelxj/Pep-CNN.