首页期刊导航|International journal of pattern recognition and artificial intelligence
期刊信息/Journal information
International journal of pattern recognition and artificial intelligence
World Scientific Publishing Co. Pte. Ltd.
International journal of pattern recognition and artificial intelligence

World Scientific Publishing Co. Pte. Ltd.

0218-0014

International journal of pattern recognition and artificial intelligence/Journal International journal of pattern recognition and artificial intelligenceSCI
正式出版
收录年代

    Adaptive SURELET-Based Image Denoising in Wavelet Domain with Spatially Varying Noise

    Guang Yi ChenYaser Esmaeili SalehaniSepehr Ghamari
    1.1-1.11页
    查看更多>>摘要:Image denoising is a critical task in numerous real-world applications. This paper presents an innovative method for image denoising in the wavelet domain, extending the SURELET approach to handle spatially varying noise levels. Traditional methods often assume a constant noise level across the entire image, which is unrealistic in practical scenarios. Our proposed method estimates the noise level locally within small neighborhoods in the wavelet domain, adapting well to images with spatially varying noise. This approach effectively reduces both uniform and spatially varying noise, as demonstrated through extensive experiments on six test images with five distinct noise patterns. The results, evaluated using peak signal-to-noise ratio (PSNR), show that our method outperforms existing denoising techniques, particularly in scenarios with spatially varying noise. This study not only advances the state-of-the-art in image denoising but also highlights the importance of adaptive noise estimation in real-world applications.

    RFA: Regularized Feature Alignment Method for Cross-Subject Human Activity Recognition

    Zhe YangHao FuRuohong HuanMengjie Qu...
    1.1-1.30页
    查看更多>>摘要:Cross-subject activity recognition is challenging in the human activity recognition field. Previous studies have often assumed that training and test data follow the same distribution, which is impractical in real-world applications. Thus, models’ performance will significantly decline when applied to data collected from new unseen subjects because of the different physical conditions and human habits. To solve the above challenges, we proposed the regularized feature alignment (RFA) network. The RFA introduces a source domain selection mechanism (SDSM) based on calculating the Wasserstein distance between different subjects. Through SDSM, subjects with high similarity in the source domain can be retained, which implicitly compacts the feature subspace distribution. We implemented linear data augmentation on the retained subjects to mitigate the effects of the decline in the training set. In addition, the regularized dropout method was adopted to explicitly compact the feature subspace distributions. Finally, multi-level feature alignment is performed via maximum mean discrepancy regularization to precisely match the source and target domain. To demonstrate the effectiveness of RFA, comprehensive experiments were conducted on four public datasets under the iterative left-one-subject-out setting. The experimental results demonstrate that RFA outperformed the state-of-the-art methods in datasets with a large divergence between subjects and achieved performance comparable to the state-of-the-art methods in a subject-balanced dataset.

    DRF-YOLO: A Transformer-Enhanced Framework for Underwater Image Enhancement and Object Detection

    Manikandan SundaramS. Vinoth KumarR. Lakshmana KumarP. Punitha...
    1.1-1.29页
    查看更多>>摘要:Underwater aquatic systems play a crucial role in maintaining ecological balance and supporting marine biodiversity. However, due to low visibility, color distortion, and scattering effects caused by light absorption, efficient monitoring and detecting objects in such environments remain challenging. Deep learning-based image processing techniques have revolutionized underwater exploration by providing robust solutions for enhancing image quality, extracting meaningful features, and enabling precise classification. Integrating advanced image enhancement methods with deep learning architectures facilitates accurate detection and monitoring of aquatic species, objects, and anomalies. This study introduces a novel approach that synergistically combines the Multiscale Retinex (MSR) and Dark Channel Prior (DCP) approaches for underwater image enhancement in the form of the Dark Retinex Fusion (DRF) model. The DRF model is further integrated with a YOLO-based Transformer framework, leveraging attention mechanisms to enhance feature extraction and classification. The proposed DRF-YOLO-based Transformer framework effectively reduces haze, enhances contrast, and balances colors for an underwater environment. It incorporates advanced spatial precision features in the YOLO backbone and applies the attention module from the Transformer model that captures the long-range dependencies for better contextual understanding. The model was tested on underwater object datasets, achieving an accuracy of 98% and a loss of 0.2, outperforming traditional methods. Additionally, the framework demonstrated resilience to overfitting and local minima, maintaining consistent performance under varying conditions.

    A Lightweight and Effective YOLO Model for Infrared Small Object Detection

    Shiyi WenLiangfu LiWenchao Ren
    1.1-1.24页
    查看更多>>摘要:Detecting small targets in infrared images presents significant challenges due to low resolution, lack of texture information, and high noise interference. To address these issues, this paper proposes an improved YOLO model aimed at enhancing the accuracy and efficiency of small target detection in infrared images. First, we integrate the Coordinate Attention (CA) mechanism into the backbone network to improve the model’s feature extraction capability in complex scenes. Second, we introduce the Contextual Feature Aggregation (CFA) module into the neck network, effectively merging multi-level contextual information and enhancing the detection capability for small targets. To further optimize the model, we simplify the detection layers for large targets in YOLO, reducing the number of parameters while maintaining high detection accuracy. Finally, we incorporate the Normalized Wasserstein Distance (NWD) loss function, which is insensitive to target size and can accelerate model convergence while improving small target detection performance. We evaluated the model on public datasets such as VisDrone2019 and FLIR, using metrics like mean Average Precision (mAP) to assess performance. Experimental results indicate that the proposed method achieves higher detection accuracy and efficiency while maintaining a low parameter count compared to baseline models.

    EAUR-Net: Enhancing MRI Reconstruction with Edge-Aware Undersampling and Deep Learning

    Libya ThomasJoseph Zacharias
    1.1-1.16页
    查看更多>>摘要:Magnetic resonance imaging (MRI) is an essential imaging technique used for detailed anatomical assessment and clinical decision-making. Conventional MRI acquisition methods, however, are often time-consuming and resource-demanding. To overcome these limitations, compressed sensing (CS) approaches have been developed to accelerate MRI data acquisition by exploiting image sparsity. In this work, we present the Edge-Aware Undersampling and Reconstruction Network (EAUR-Net), an innovative deep learning architecture designed to enhance MRI reconstruction by incorporating dynamic edge-based sampling strategies. EAUR-Net focuses on intelligently sampling data points based on edge information, which is critical for preserving key structural details and improving reconstruction quality while reducing the amount of acquired data. This paper provides a thorough evaluation of EAUR-Net, detailing its architectural components, training procedures, experimental outcomes, and potential future improvements.

    BFC-Cap: Background and Frequency-guided Contextual Image Captioning

    Al Shahriar RubelFrank Y. ShihFadi P. Deek
    1.1-1.25页
    查看更多>>摘要:Effective image captioning relies on both visual understanding and contextual relevance. In this paper, we present two approaches, BFC-Capb–a novel background-based image captioning and its extension BFC-Capf–frequency-guided, to achieve the above goals. First, we develop an Object-Background Attention (OBA) module to capture the interaction and relationship between objects and background features. Then, we incorporate feature fusion with spatial shift operation, enabling alignment with neighbors and avoiding potential redundancy. This framework is extended to transform grid features into frequency domain and filter out low-frequency components to enhance fine details. Our approaches are evaluated using traditional and recent metrics on MS COCO image captioning benchmark. Experimental results show the effectiveness of our proposed approaches, achieving better quantitative scores as compared to the relevant existing methods. Furthermore, our methods show improved qualitative captions with more background and concise contextual information, including more accurate information regarding the objects and their attributes.

    Multimodal Pre-Trained Framework for Aligning Image–Text Relation Semantics

    Lin SunYindu SuZhewei ZhouQingyuan Li...
    1.1-1.27页
    查看更多>>摘要:Image–text relation (ITR) in social media plays a crucial role in mining the semantics of the posts. Vision and language pre-trained models (PTMs) or multimodal PTMs have been used to create multimodal embeddings. The conventional practice of fine-tuning pre-trained models with labeled data for specific image–text relation tasks often falls short due to misalignment between general pre-training objectives and task-specific requirements. In this research, we introduce a cutting-edge pre-trained framework tailored for aligning image–text relation semantics. Our novel framework leverages unlabeled data to enhance learning of image–text relation representations through deep multimodal clustering and multimodal contrastive learning tasks. Our method significantly narrows the disparity between generic Vision-Language Pre-trained Models (VL-PTMs) and image–text relation tasks, showcasing an impressive performance boost of up to 10.4 points in linear probe tests. By achieving state-of-the-art results on image–text relation datasets, our pre-training framework stands out for its effectiveness in capturing and aligning image–text semantics. The visualizations generated by class activation map (CAM) also demonstrate that our models provide more accurate image–text semantic correspondence. The code is available on the website: https://github.com/qingyuannk/ITR.

    Few-Shot Counting with Multi-Scale Vision Transformers and Attention Mechanisms

    Xiaopan ChenZhiwei DongXiaoke ZhuFan Zhang...
    1.1-1.22页
    查看更多>>摘要:Object counting is a fundamental task in computer vision, with critical applications in areas such as crowd monitoring and ecological conservation. Traditional methods typically rely on large-scale annotated datasets, which are costly and time-consuming to obtain. Few-shot object counting has emerged as a promising solution, enabling accurate counting with minimal annotated samples. However, in real-world scenarios, objects often exhibit significant scale variations due to factors such as view distortion, varying shooting distances, and inherent size differences. Existing few-shot methods usually struggle to address this challenge effectively. To address these issues, we propose a Scale-Aware Vision Transformer (SAViT) framework. Specifically, we design a multi-scale dilated convolution module in SAViT, which can adaptively adjust convolution kernel sampling rates to handle objects of varying sizes. Additionally, we incorporate a global channel attention mechanism to strengthen the model’s ability to capture robust feature representations, thereby improving detection accuracy. For practical usability, we integrate the Segment Anything Model (SAM) to create an exemplar box selection module, simplifying the process by allowing users to generate precise exemplar boxes with a single line drawn on the target object. Extensive experiments on the FSC-147 dataset demonstrate the effectiveness of our approach, achieving a Mean Absolute Error (MAE) of 8.92 and a Root Mean Squared Error (RMSE) of 31.26. Compared to the state-of-the-art method, CACViT, our model reduces MAE by 0.21 (2.30% improvement) and RMSE by 17.7 (36.15% improvement). Our approach not only provides an effective solution for few-shot object counting but also provides a new practical paradigm for extending few-shot learning to complex vision tasks requiring multi-scale reasoning. The code of our paper is available at https://github.com/BlouseDong/SAViT.

    Neural Network-Aided Multiple-Symbol Noncoherent Detection Scheme of LDPC Coded MPSK Receiver for Unmanned Aerial Vehicle Communications

    Di WuGege WeiGaolei SongYongen Li...
    1.1-1.32页
    查看更多>>摘要:Multiple-symbol noncoherent detection (MSND) with the aid of Neural Networks (NNs) for low-density parity-check (LDPC) coded multiple phase shift keying (MPSK) signals is studied for Unmanned aerial vehicle (UAV) communications. In the traditional MSND scheme, the number of the candidate sequences grows exponentially with respect to the length of the symbol observation period. Implementing the optimal bit log-likelihood ratio (LLR) for decoding is challenging, even when the observed symbol period is two. In this paper, we first proposed an improved scheme to reduce the number of the candidate sequences by phase combination, the phase is uniformly quantized into L discrete values. We find that the performance requirements can be well met when the phase quantization order is only 4. Then we utilize Back Propagation neural networks (BPN) to compute the bit LLR. To enhance the training efficiency of our NNs and achieve better performance, we also uniformly quantize the carrier phase offset (CPO) into discrete states. The decoding convergence is accelerated significantly compared to the improved traditional scheme. The complexity is reduced to a certain extent within the acceptable range of performance loss.

    Enhancing Stability of Multi-Underwater Robot Swarms Based on the Fusion of Social Force and Vicsek Models

    Qiang ZhaoBing LiGang Wang
    1.1-1.18页
    查看更多>>摘要:As the demand for marine environmental monitoring and deep-sea operations continues to grow, AUV swarm systems have become a research hotspot due to their high flexibility. This paper proposes a navigation control method for multi-AUV swarms that integrates the Social Force Model (SFM) with the Vicsek model to improve collaboration efficiency and motion stability in dynamic underwater environments. The proposed control strategy incorporates both physical forces and alignment mechanisms to achieve dynamic behavioral coordination among the robots in the swarm. The SFM is used to characterize interactions between individuals, ensuring collision avoidance, while the Vicsek model provides a neighborhood-based velocity alignment mechanism to enhance swarm coherence. Additionally, the control framework introduces a leader-follower structure, effectively integrating local perception with global navigation information. Simulation results indicate that the robot swarm can maintain effective obstacle avoidance and structural stability even in complex environments with obstacles. In the absence of the Vicsek model, navigation tasks may fail or significantly increase navigation time. Even with low alignment strength and higher speeds, navigation time can be reduced by 35%, while higher alignment strength combined with lower speeds can shorten navigation time by up to 50%. Tests with large-scale swarms demonstrate that the proposed method exhibits good scalability and effectively prevents excessive aggregation within the group. Future research will focus on integrating intelligent optimization methods, such as deep reinforcement learning, to enhance the generalization ability of the control strategy in unstructured and dynamic environments.