查看更多>>摘要:Microarray technology allows the simultaneous study of up to thousands of gene expressions, which is essential for researchers to understand cancer, diagnose disease, and drugs development. However, the large number of irrelevant and redundant genes in the data is a challenge to the performance and efficiency of the feature selection method and the classification algorithms. furthermore, the characteristic of the small data sample size increases the risk of over-fitting. Therefore, an efficient, accurate, and robust feature selection pipeline is necessary for gene selection. This paper proposed a new hybrid feature selection method Multi-Fitness RankAggreg Genetic Algorithm(MFRAG). Considering that a single model tends to overfit on small sample datasets, MFRAG integrates nine feature selection methods that can evaluate feature weights to evaluate individuals and calculates individual fitness by ensemble models. MFRAG more clearly imitates the natural principle of "survival of the fittest." It improves the selection and mutation processes in genetic algorithms, enhances the stability and reliability of the selection process through fusion mechanisms and integrated models, and guides the evolutionary process through a set of lists generated by a feature fusion model. The experimental section compares MFRAG with six standard feature selection methods on ten publicly available microarray data and 18 published state-of-the-art methods on three datasets that researchers widely use. The results show that MFRAG outperforms all standard methods in classification accuracy and number of features and is ahead of most advanced methods.
查看更多>>摘要:Swarm-intelligence (SI) algorithms have received great attention in addressing various binary optimization problems such as feature selection. In this article, a new time-varying modified Sigmoid transfer function with two time-varying updating schemes is proposed as the binarization method for particle swarm optimization (PSO), grey wolf optimization algorithm (GWO), whale optimization algorithm (WOA), harris hawk optimization (HHO), and manta-ray foraging optimization (MRFO). The new binary algorithms, BPSO, BGWOA, BWOA, BHHO, and BMRFO algorithms are utilized for solving the descriptors selection problem in supervised Amphetamine-type Stimulants (ATS) drug classification task. The goal of this study is to improve the speed of convergence and classification accuracy. To evaluate the performance of the proposed methods, experiments were carried out on a specific chemical dataset containing molecular descriptors of ATS and non-ATS drugs. The results obtained showed that the proposed methods' performances on the chemical dataset are promising in near to optimal convergence, fast computation, increased classification accuracy, and enormous reduction in descriptor size.
查看更多>>摘要:The computation of lower and upper band boundaries for the feasible solutions of multivariate curve resolution problems is an important and well-understood methodology. These techniques assume rank-regular spectral data matrices, namely the rank of the matrices equals the number of chemical species involved. For rank-deficient problems, which include linear dependencies within the pure component factors, band boundary calculations are much more complex. This paper deals with rank-deficient problems for which the rank-deficient factor is known and describes how to calculate band boundaries for the dual factor. The key tools for these band boundary computations are polytope constructions and linear programming problems to be solved for each spectral channel. Numerical studies are presented for a model problem and for two experimental data sets.
查看更多>>摘要:Anthropogenic activities, such as sewage irrigation and the application of pesticides and fertilizers, are the main causes of Cadmium (Cd) pollution, which reduces soil quality and threatens the environment and human health. Although the traditional cadmium measurement methods are accurate, it involves complicated sample processing steps and complicated laboratory analysis, which is time-consuming and costly, and is often unfriendly to the environment. X-ray fluorescence (XRF) and visible near-infrared (vis-NIR) spectroscopy have been recognized as alternatives to measure soil heavy metal contamination in a cheap, fast, non-destructive, and environmentally conscious manner. In this study, 370 paddy soil samples from the Nanji area of Poyang Lake were taken as the research object, and the feasibility and effectiveness of XRF and vis-NIR spectroscopy for estimating soil chromium content were discussed, respectively, and the combination of the two was used to estimate soil Cd content. Combined with some spectral preprocessing methods, the quantitative analysis model of least squares support vector machine (LS-SVM) and leave-one-out cross validation with three different data fusions (Equal Weight Fusion, Coaddition Fusion and Outer Product Fusion) was established. The results showed the accuracy and stability of Equal Weight Fusion and Outer Product Fusion was better than the single spectrum quantitative analysis model. The model of Outer Product Fusion at LOOCV set had the best performance with the determination coefficient (R2) of 0.91, root mean square error of cross validation (RMSECV) of 0.12 mg/kg and the relative percent deviation (RPD) of 3.27, with R2 = 0.90, RMSEP = 0.13 mg/kg and RPD = 3.11 in Prediction set, which can satisfy the detection requirements. This method was accurate, reliable, and can provide a reference for the research of soil heavy metal distribution investigation methods.
Stehlik, M.Sabolova, R.Seckarova, V.Soza, L. Nunez...
13页
查看更多>>摘要:Arsenic and arsenic compounds contamination is at present a topic of great importance in the field of water quality. Simply applying ANOVAs to test the hypothesis of mean heterogeneity of arsenic contamination in water may lead to oversimplifications, since data can be skewed and the power of many tests will be affected by this skewness. In order to overcome such a problem, we introduce a novel heterogeneity measure of arsenic contamination. This measure is based on the correlation between two univariate statistics decomposed from the Kullback-Leibler divergence of sampled vector as compared to the canonical parameter. For the Gamma distribution, the mere detection of heterogeneity in the means or variances is akin to an omnibus test for mean differences in a standard ANOVA, so our method is useful in this regard. We illustrate this measure's applicability and usefulness in assessment of pollutant compounds. Subsampling and resampling algorithms are developed in order to facilitate a study in a one sample setting. The method is applied to a heteroscedasticity assessment of the arsenic contamination of potable water from the rural area of the region of Arica and Parinacota, Chile.
查看更多>>摘要:Owing to their antimicrobial and insecticide properties, the use of natural compounds like essential oils and their active components has proven to be an effective alternative to synthetic chemicals in different fields ranging from drug delivery to agriculture and from nutrition to food preservation. Their limited application due to the high volatility and scarce water solubility can be expanded by using crystal engineering approaches to tune some properties of the active molecule by combining it with a suitable partner molecule (coformer). However, the selection of coformers and the experimental effort required for discovering cocrystals are the bottleneck of cocrystal engineering. This study explores the use of chemometrics to aid the discovery of cocrystals of active ingredients suitable for various applications. Partial Least Squares-Discriminant Analysis is used to discern cocrystals from binary mixtures based on the molecular features of the coformers. For the first time a dataset comprising also failed cocrystallization experiments and a variety of chemically diverse compounds was utilized. The proposed methodology resulted in a successful prediction rate of 85% for the test set in the model validation phase and of 74% for the external validation set.
查看更多>>摘要:For the first time, a novel and very interesting methodology has been developed based on fabrication of a novel electrochemical biosensor assisted by multi-way calibration methods for simultaneous determination of cholesterol (CL) and cholestanol (CS). A screen printed carbon electrode (SPCE) was chosen as a platform and gold nanoparticles (Au NPs) were electrodeposited onto its surface. Molecularly imprinted polymers (MIPs) from methacrylic acid, ethylene glycolmethacrylate, 2,2-dimetthoxy-2-phenyliacetonephenon, CL and CS have been successfully synthesized by using the photopolymerization method and then, the MIPs were integrated with multiwalled carbon nanotubes (MWCNTs) and casted onto the surface of Au NPs/SPCE to fabricate the biosensor at its final structure. Modifications applied to the SPCE were characterized electrochemical and spectroscopic methods. When the biosensor was in contact with a binary solution of CL and CS, the CL and CS molecules were embedded within the MIP structure which clogged the pathways within the MIP structure. As a reasonable observation, differential pulse voltammetric (DPV) response of the biosensor in the electrochemical probe solution was changed before and after its incubation in a binary solution of CL and CS. Therefore, by immersing the biosensor in an electrochemical probe solution, its second-order DPV responses were recorded which were used to simultaneous determination of the CL and CS with the help of three-way calibration models constructed by unfolded partial least squares/residual bilinearization (U-PLS/RBL) and multi-way partial least squares/residual bilinearization (N-PLS/RBL). The N-PLS/RBL had a better performance for prediction of the concentrations of CL and CS in synthetic samples which motivated us to couple it with the biosensor for simultaneous determination of the CL and CS in real matrices. Fortunately, co-operation of the MIP and three-way calibration helped us to fabricate a very efficient biosensor with a very good practical performance for simultaneous determination of the CL and CS in real matrices.
Prats-Montalban, J. M.Duchesne, C.Ferrer, A.Sanz-Requena, R....
13页
查看更多>>摘要:In current radiology practice, multi-parametric magnetic resonance imaging (mpMRI) has recently become a key tool in diagnostic and therapeutic decisions. Although it is based on the subjective assessment of T2-weighted images, as well as perfusion-weighted and diffusion-weighted sequences, further quantitative parameters can also be derived from them for improving lesion phenotyping. Despite these parameters are usually exploited in a univariate way, ignoring the benefits of a real multivariate approach, still it is the gold standard imaging technique to assess prostate cancer location and probability of malignancy. In this paper, pharmacokinetic (perfusion) and exponential (diffusion) clinical models, as well as latent variable-based multivariate statistical models like multivariate curve resolution-alternating least squares (MCR-ALS), have been calculated and analyzed with sequential multi block-partial least squares discriminant analysis (SMB-PLS-DA) including technique-block differentiation, in order to better assess for cancer aggressiveness based on Gleason scales. The best prediction result was achieved by the ordered combination of diffusion blocks (MCR-ALS and exponential models) and normalized T2 values. The perfusion blocks did not improve the results obtained by diffusion and T2-weighted based parameters alone, so they can be removed from the SMB-PLS-DA model.
查看更多>>摘要:Cancer is the most dangerous disease of humans, causing countless deaths and suffering. The treatment of cancer with cancer peptides is exciting as they have many attractive benefits. In recent years, many researchers have focused on anticancer peptides (ACPs) that are critical for the advancement of novel cancer therapies. The prediction of ACPs by experimental methods is costly and laborious and often generates unsatisfactory predictions. It is highly demanded to identify ACPs by advanced algorithms. In this study, we present a novel deep learningbased method named, ACP-2DCNN for improving the prediction of anticancer peptides. The important features are extracted by Dipeptide Deviation from Expected Mean (DDE) while model training and prediction are performed by Two-dimensional Convolutional Neural Network (2D CNN). The empirical results demonstrate that the proposed method has achieved the best performance and can predict ACPs more accurately comparatively existing methods in the literature.
查看更多>>摘要:When designing experiments, one of the objectives consists in building a model for prediction of future observations. If the underlying phenomenon is complex, one option is to employ machine learning (ML) models for data analysis and to build accurate emulators that function as virtual representations of the physical process, and can be used in lieu of further evaluations of the actual physical system. However, to obtain accurate models, informative data must be provided to train the algorithms, and a typical approach consists in the sequential collection of data via active learning (AL) strategies. Most existing literature on AL focuses on computer experiments. In this paper we introduce an AL algorithm for Physical Experiments based on nonparametric Ranking and Clustering (ALPERC) that can be used for sequential data collection in noisy settings when three or more responses are investigated in the same experiment. We inspect the performance of the ALPERC algorithm through simulations and a case study application on the prediction of relevant properties of thermoelectric materials.