查看更多>>摘要:Early diagnosis of Alzheimer’s disease (AD) is important for early intervention, but current diagnostic tools tend to use unimodal methods, processing either speech or text separately. Although models such as the ComParE Baseline for audio and BERT-based text classifiers have been successful, they do not take advantage of the complementary strengths of both modalities, which restricts their diagnostic power. To overcome this, we suggest SPID-AD (Speech-Based Intelligent Detection of Alzheimer’s Dementia), a multimodal deep-learning approach that combines linguistic and acoustic features for the automated detection of Alzheimer’s. Our approach uses a BERT-based architecture to mine semantic patterns fromtranscripts and an augmented ConvolutionalNeuralNetwork (CNN) to process Mel-spectrogram representations of speech. By combining these features in dense layers, the model retains language-related as well as auditory biomarkers of cognitive impairment. Assessed on the DementiaBank Pitt Corpus, SPID-AD has 95.6% classification accuracy, surpassing state-of-the-art models in precision, recall, and F1-score. The findings demonstrate the strength of multimodal analysis in detecting dementia speech patterns, providing a non-invasive, AI-based diagnostic tool that may assist clinicians in the early detection of Alzheimer’s.
查看更多>>摘要:Rapid and accurate prediction of the sectional velocity field of the channel is of great significance to the design and maintenance of open channels and the improvement of irrigation efficiency. During the water delivery process of Renmin Canal of Dujiangyan irrigation system, the water level of the main canal changes rapidly and in a large range, which is the biggest difficulty in real-time prediction of its velocity field. Therefore, based on machine learning, this paper proposes a new method to construct a real-time velocity field prediction model, which can directly predict the velocity field of the channel according to the water level. According to this method, the computational fluid dynamics (CFD) technology is used to simulate the target open channel, and a machine learning model that can adaptively optimize the characteristics of the velocity field data is designed as the velocity field prediction model, which is experimented in the main canal of Renmin Canal of Dujiangyan irrigation system. The results suggest that the predictions are in line with the general features of flow velocity distribution in open channels and have high precision. Therefore, this method is of high value for engineering application and theoretical research.
查看更多>>摘要:This paper proposes a robust dual-integral structure zeroing neural network (ZNN) design framework, effectively overcoming the limitations of existing single-integral enhanced ZNN models in completely suppressing linear noise. Based on this design framework, a complex-type dual-integral structure ZNN (DISZNN) model with inherent linear noise suppression capability is constructed for computing dynamic complex matrix inversion (DCMI) online. The stability, convergence, and robustness of the proposed DISZNN model are ensured via rigorous theoretical analyses. In three distinct experiments involving DCMI (including cases with only imaginary parts, both real and imaginary parts, and high-dimensional scenarios), the state trajectories of the DISZNN model are well and quickly fitted to the dynamic trajectories of the theoretical solutions with very low residual errors in various linear noise environments. More specifically, the residual errors of the DISZNN model for online computation of DCMI under linear noise environments are consistently below the order of 10−3, representing one-thousandth of the residual errors in existing noise-tolerant ZNN models. Finally, the DISZNN design framework is applied to construct a controlled chaotic system of a permanent magnet synchronous motor (PMSM) with uncertainties and external disturbances based on real-world modeling. Experimental results demonstrate that the three state errors of the controlled PMSM chaotic system converge to zero quickly and stably under various conditions (system parameters, external disturbances, and uncertainties), further highlighting the superiority and generalizability of the DISZNN design framework.
Raul Sena FerreiraJoris GuerinKevin DelmasJeremie Guiochet...
e70032.1-e70032.20页
查看更多>>摘要:Machine Learning (ML) models, such as deep neural networks, are widely applied in autonomous systems to perform complex perception tasks.Newdependability challenges arise whenMLpredictions are used in safety-critical applications, like autonomous cars and surgical robots. Thus, the use of fault tolerance mechanisms, such as safety monitors, is essential to ensure the safe behavior of the system despite the occurrence of faults. This paper presents an extensive literature review on safety monitoring of perception functions using ML in a safety-critical context. In this review, we structure the existing literature to highlight key factors to consider when designing such monitors: threat identification, requirements elicitation, detection of failure, reaction, and evaluation.We also highlight the ongoing challenges associated with safety monitoring and suggest directions for future research.
查看更多>>摘要:Transfer functions have a very important role in metaheuristic optimization-based feature selection algorithms as these functions map the continuous search space into binary space. The U-shaped transfer function (UTF) is one of the transfer functions used to solve the problem of feature selection. However, the UTF requires the selection of parametric values, which can vary for different types of data. To address this issue, an approach to select the parameters of the UTF has been proposed based on a time-varying adaption method, resulting in the modified U-shaped transfer function (MUTF). Furthermore, a methodology has been proposed to enhance feature selection and classification for Parkinson’s disease by utilizing z-score normalization in conjunction with a modified U-shaped transfer function and the binary self-adaptive bald eagle search (MUTF-SABES) optimization algorithm. The z-score normalization has been used to mitigate issues caused by outliers. Also, the performance of the k nearest neighbor classifier is improved by selecting an optimal parameter value using the proposed MUTF-SABES algorithm. The effectiveness of the proposed methodology is validated on seven different Parkinson’s disease datasets and compared with five state-of-the-art optimization algorithms: Salp Swarm algorithm, Harris Hawks optimization, equilibrium optimizer, aquilla optimizer, and Honey Badger algorithm, to evaluate its performance superiority. The results achieved using the proposed approach have been superior or analogous to the erstwhile algorithms for performance comparability. Friedman’s mean rank test is used to check the statistical significance of the propounded approach. The lowest Friedman’s mean rank value obtained using the proposed approach indicates that the proposed approach has the potential to become an alternative to other well-known strategies.
查看更多>>摘要:Pedestrian detection is crucial in agricultural environments to ensure the safe operation of intelligent machinery. In orchards, pedestrians exhibit unpredictable behavior and can pose significant challenges to navigation and operation. This demands reliable detection technologies that ensures safety while addressing the unique challenges of orchard environments, such as dense foliage, uneven terrain, and varying lighting conditions. To address this, we propose ReB-DINO, a robust and accurate orchard pedestrian detection model based on an improved DINO. Initially, we improve the feature extraction module of DINO using structural re-parameterization, enhancing accuracy and speed of the model during training and inference decoupling. In addition, a progressive feature fusion module is employed to fuse the extracted features and improve model accuracy. Finally, the network incorporates a convolutional block attention mechanism and an improved loss function to improve pedestrian detection rates. The experimental results demonstrate a 1.6% improvement in Recall on the NREC dataset compared to the baseline. Moreover, the results show a 4.2% improvement in mAP and the number of parameters decreases by 40.2% compared to the original DINO. In the PiFO dataset, the mAP with a threshold of 0.5 reaches 99.4%, demonstrating high detection accuracy in realistic scenarios. Therefore, our model enhances both detection accuracy and real-time object detection capabilities in apple orchards, maintaining a lightweight attributes, surpassing mainstream object detection models.
查看更多>>摘要:The rapid growth of video data has resulted in an increasing need for surveillance and violence detection systems. Although such events occur less frequently than normal activities, developing automated video surveillance systems for violence detection has become essential to minimize labor and time waste. Detecting violent activity in videos is a challenging task due to the variability and diversity of violent behavior, which can involve a wide range of actions, motions, and interactions between people and objects. Currently, researchers employ deep learning models to detect violent behaviors. In fact, a large number of deep learning approaches are based on extracting spatio-temporal information froma video by exploiting a 3DConvolutionalNeuralNetwork (CNN). Despite their success, these techniques require a lot more parameters than 2D CNNs and have high computational complexity. Therefore, we focus on exploiting a 2D CNN to encode spatio-temporal information. Actually, statistical features of the optical flow changes are used to give this ability to a 2D CNN. These features are designed to make attention to regions of a video clip with much more motion. Accordingly, the optical flow of an input video is calculated. To determine meaningful changes in the optical flow, the optical flow magnitude of a current frame is compared with its predecessor. After that, statistical features of these changes are extracted to summarize a video clip to a 2D template, which feeds a 2D CNN. Experimental results on four benchmark datasets observe that the suggested strategy outperforms baseline ones. In particular, we make a better estimation of the spatio-temporal features in a video by shortening a video clip into a 2D template.
查看更多>>摘要:Human activity recognition (HAR) technology plays a major role in today’s world and is used in detecting human actions and poses in real-time. In the past, researchers employed statistical machine learning methods to build and extract attributes of various movements manually. However, typical techniques are becoming increasingly ineffective in the face of exponentially increasing waveform data that lacks unambiguous principles. With the advancement of deep learning technology, manual feature extraction is no longer required, and performance on challenging human activity recognition problems can be improved. However, various deep learning models have problems such as time consumption, inaccuracy, and the vanishing gradient problem. Therefore, to solve these problems, the proposed study used a deep convolutional attention-based bidirectional recurrent neural network to detect human activities in the provided samples. The input images are first pre-processed using an adaptive bilateral filtering approach to improve their quality and remove image noise. Then, the crucial features are recovered using the convolutional neural network (CNN) based encoder-decoder model. Finally, a deep convolutional attention-based bidirectional recurrent neural network is used to identify human activities. The model recognizes human actions with higher effectiveness and lower latency. The human behaviors are identified using the HMDB51 dataset. The proposed model acquired the highest accuracy of 95.46%, which is 10.51% superior to multi-layer perceptron (MLP), 6.99% superior to CNN, 12.76% superior to long short-term memory (LSTM), 5.59% superior to Bidirectional LSTM (BiLSTM), and 4.82% superior to CNN-LSTM, respectively.
查看更多>>摘要:Land coverage mapping and classification is one of the critical information-based tools for sustainable agricultural development, enabling relevant departments to carry out agricultural resource adjustments, yield predictions, and other tasks in advance. As a vital means of acquiring land cover and usage information, SAR sensors have become an important research direction due to their all-weather and all-day working capabilities. Nevertheless, traditional classification methods in PolSAR image classification often input a combination of various scattering features, i.e., high-dimensional feature combination, into classifiers, leading to mutual interference among different features and consequently degrading classification performance, especially for linear classifiers such as NRS and SVM. To mitigate this interference, this paper proposed an unsupervised feature selection based on spectral clustering (FSSC) that constructs a targeted approach by leveraging the linear expression capabilities of high-dimensional features. In this method, the linear relationships between different features are first analyzed, and the linear similarity between features can be quantitatively expressed using Pearson correlation coefficients, forming a feature similarity matrix. Subsequently, the similarity matrix undergoes unsupervised similarity partitioning through spectral clustering, dividing the features into distinct combinations. Features within clustering subsets can be considered as combinations with high linear similarity. Therefore, KL divergence is applied to select the most representative features within each cluster, and the resulting representative feature combinations from different clustering subsets are combined to form an optimal feature set, achieving the purpose of feature selection. This method maps high-dimensional feature combinations into low-dimensional ones while preserving the essential attributes of the original data, thereby retaining the valuable feature information and enhancing classification performance. Experimental outcomes conclusively show that the proposed method enhances the overall accuracy (OA) of SVM by 4.51% and the OA of NRS by 2.34% in the Flevoland Dataset, underscoring its efficacy in PolSAR image classification, especially for linear classifiers.
查看更多>>摘要:This paper studies neural machine translation (NMT) of code-mixed (CM) text. Specifically, we generate synthetic CM data and how it can be used to improve the translation performance of NMT through the data augmentation strategy. We conduct experiments on three data augmentation approaches viz. CM-Augmentation, CM-Concatenation, and Multi-Encoder approaches, and the latter two approaches are inspired by document-level NMT, where we use synthetic CM data as context to improve the performance of the NMT models. We conduct experiments on three language pairs, viz. Hindi–English, Telugu–English and Czech–English. Experimental results demonstrate that the proposed approaches significantly improve performance over the baseline model trained without data augmentation and over the existing data augmentation strategies. The CM-Concatenation model attains the best performance.