查看更多>>摘要:Kernel possibilistic fuzzy C-means with local information (KWPFLICM) has important research significance of image segmentation, but it is very sensitive to high noise or outliers. To enhance the segmentation per-formance of the algorithm, this paper proposes a kernelized total Bregman divergence-driven possibilistic fuzzy clustering with local information (TKWPFLICM). Firstly, a polynomial kernel function is introduced to kernelize total Bregman divergence (TBD), and local neighborhood information of the pixel is used to modify it, which overcomes the shortcomings of Bregman divergence (BD) with rotation variability; Secondly, the modified kernelized TBD and possibilistic typicality are combined to further enhance the anti-noise ability of the algorithm; Finally, the modified kernelized TBD is introduced into the objective function of KWPFLICM algorithm, then a novel robust fuzzy clustering algorithm is derived by optimiza-tion theory. Experimental results show that compared with existing fuzzy clustering-related algorithms, the average SA improvement on TKWPFLICM algorithm is in the range of 0.791% to 33.237%. Therefore, TKWPFLICM algorithm has better anti-noise robustness and segmentation accuracy. (c) 2022 Elsevier Ltd. All rights reserved.
查看更多>>摘要:Corner detection algorithms based on multi-scale analysis attract more attention due to their promising performance. However, they only consider amplitude information, neglect phase information and partially utilize multi-scale decomposition coefficients to detect corners. This limits their detection accuracy, repeatability and localization ability. This paper describes a new multi-scale analysis based corner detector. To overcome the problems of bilateral margin responses, edge extension and lack of phase information in traditional shearlets, a novel complex shearlet transform is proposed to better localize distributed discontinuities and especially to extract phase information from geometrical features. Moreover, a new rotary phase congruence tensor is proposed to utilize all amplitude and phase information for corner detection. Its tolerances to noise and ability for corner localization are improved further by screening and normalizing the amplitude information. Experimental results demonstrate that the localization ability and detection accuracy of the proposed method are superior to current detectors, and its repeatability is generally higher than current detectors and recent machine learning based interest point detectors.(c) 2022 The Authors. Published by Elsevier Ltd.This is an open access article under the CC BY-NC-ND license ( http://creativecommons.org/licenses/by-nc-nd/4.0/ )
查看更多>>摘要:We present two novel shape signature-based reflection symmetry detection methods with their theoretical underpinning and empirical evaluation. LIP-signature and R-signature share similar beneficial properties allowing to detect reflection symmetry directions in a high-performing manner. For the shape signature of a given shape, its merit profile is constructed to detect candidates of symmetry direction. A verification process is utilized to eliminate the false candidates by addressing Radon projections. The proposed methods can effectively deal with compound shapes which are challenging for traditional contour-based methods. To quantify the symmetric efficiency, a new symmetry measure is proposed over the range [0, 1]. Furthermore, we introduce two symmetry shape datasets with a new evaluation protocol and a lost measure for evaluating symmetry detectors. Experimental results using standard and new datasets suggest that the proposed methods prominently perform compared to state of the art. (c) 2022 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license ( http://creativecommons.org/licenses/by-nc-nd/4.0/ )
查看更多>>摘要:Weakly supervised object localization locates objects based on the localization map generated from the classification network. However, most existing methods utilize the information of the target class to locate objects based on the feature map of a single image, which ignores both the relationships of interclass and intra-class. In this work, we propose a Gradient-based Refined Class Activation Map (GRCAM) approach to achieve more accurate localization. Two kinds of gradients are applied to reveal the relationships of inter-class and intra-class during the testing stage. First, we exploit the gradients of the classification loss function concerning the feature map to enhance class-specific information. The gradients of classification loss reveal the connection among the predicted probabilities of all classes. Second, we design a regression function that refers to the loss between the pseudo-bounding box coordinates containing category consistency and the predicted coordinates generated from the localization map. The predicted coordinates are revised by the gradients of the regression function. The gradients of the regression function reveal the consistency within a class. Despite the apparent simplicity, we demonstrate the advantages of GRCAM on ILSVRC and CUB-200-2011 in extensive experiments. Especially, on ILSVRC dataset, the proposed GRCAM achieves a new state-of-the-art Top-1 localization error of 42.94%.(c) 2022 Elsevier Ltd. All rights reserved.
查看更多>>摘要:Due to the strong nonlinear representation capabilities of deep neural networks and the low storage and high efficiency characteristics of hash learning, deep cross-modal hashing has been propelled to the forefront of academics. How to preferably bridge semantic relevance to further bridge the semantic modality gap is the vital bottleneck to improve model performance. Confronting samples with rich semantics, how to comprehensively explore the hidden correlations and establish more precise modality relationships is the primary issue to be solved. In this work, we propose a novel deep hashing method called Multi-Label Semantic Supervised Graph Attention Hashing (MS(2)GAH), which is an end-to-end framework that integrates graph attention networks (GATs). It constructs graph features through the adjacency of nodes and assigns different weights to adjacent edges to enhance the robustness of the model. Simultaneously, multi-label annotations are utilized to bridge the semantic relevance between modalities in a more finegrained manner. To make preferable use of rich semantic information, an end-to-end label encoder is designed to mine high-level semantics from multi-label annotations to guide the feature extraction process of specific-modality networks, thereby further narrowing the modality gap. Finally, extensive experiments have been conducted on four datasets, and the results show that MS2GAH is superior to other baselines and one step forward. (C)& nbsp;2022 Elsevier Ltd. All rights reserved.
查看更多>>摘要:Accurate segmentation of brain magnetic resonance images is a key step in quantitative analysis of brain images. Finite mixture model is one of the most widely used methods in brain magnetic resonance image segmentation. However, due to the presence of intensity inhomogeneity artifact and noise, the image his-togram distribution of brain MR images may follow a heavy tailed distribution or asymmetric distribution, which makes traditional finite mixture model, such as Gaussian mixture model, hard to achieve accurate segmentation results. To alleviate these problems, a novel spatially constrained finite skew student's-t mixture model is proposed in this paper. Firstly, we propose anisotropic two-level spatial information, which combines the prior and posterior probabilities, to reduce the impact of noise. The proposed spa-tial information can preserve rich details, such as edges and corners. Secondly, we couple the anisotropic spatial information into the skew student's-t distribution to fit the intensity distribution of observation data with heavy tail distribution or asymmetric distribution. Thirdly, we use a linear combination of a set of orthogonal basis functions to model the intensity inhomogeneities. Finally, the objective function integrates both tissue segmentation and the bias field estimation. In the implementation, we used an improved expectation maximization (EM) algorithm to estimate the model parameters. The experimen-tal results of our model on synthetic data and brain magnetic resonance images are better than other state-of-the-art segmentation methods. (c) 2022 Elsevier Ltd. All rights reserved.
查看更多>>摘要:Segmentation is essential for medical image analysis to identify and localize diseases, monitor morphological changes, and extract discriminative features for further diagnosis. Skin cancer is one of the most common types of cancer globally, and its early diagnosis is pivotal for the complete elimination of malignant tumors from the body. This research develops an Artificial Intelligence (AI) framework for supervised skin lesion segmentation employing the deep learning approach. The proposed framework, called MFSNet (Multi-Focus Segmentation Network), uses differently scaled feature maps for computing the final segmentation mask using raw input RGB images of skin lesions. In doing so, initially, the images are preprocessed to remove unwanted artifacts and noises. The MFSNet employs the Res2Net backbone, a recently proposed convolutional neural network (CNN), for obtaining deep features used in a Parallel Partial Decoder (PPD) module to get a global map of the segmentation mask. In different stages of the network, convolution features and multi-scale maps are used in two boundary attention (BA) modules and two reverse attention (RA) modules to generate the final segmentation output. MFSNet, when evaluated on three publicly available datasets: PH 2 , ISIC 2017, and HAM10 0 0 0, outperforms state-of-the-art methods, justifying the reliability of the framework. The relevant codes for the proposed approach are accessible at https://github.com/Rohit-Kundu/MFSNet .
查看更多>>摘要:Analyzing the layout of a document to identify headers, sections, tables, figures etc. is critical to understanding its content. Deep learning based approaches for detecting the layout structure of document images have been promising. However, these methods require a large number of annotated examples during training, which are both expensive and time consuming to obtain. We describe here a synthetic document generator that automatically produces realistic documents with labels for spatial positions, extents and categories of the layout elements. The proposed generative process treats every physical component of a document as a random variable and models their intrinsic dependencies using a Bayesian Network graph. Our hierarchical formulation using stochastic templates allow parameter sharing between documents for retaining broad themes and yet the distributional characteristics produces visually unique samples, thereby capturing complex and diverse layouts. We empirically illustrate that a deep layout detection model trained purely on the synthetic documents can match the performance of a model that uses real documents. (c) 2022 Elsevier Ltd. All rights reserved.
查看更多>>摘要:Occlusion is a severe problem for pedestrian detection in crowded scenes. Due to the diversity of pedestrian postures and occlusion forms, leading to false detection and missed detection. In this paper, we propose a high quality proposal feature generation pedestrian detection algorithm to improve detection performance. Firstly, Dual-Region Feature Generation (DRFG) is proposed to generate high quality proposal features. Specifically, visible regions with less occlusion are introduced and low-precision proposals are generated for both the full-body and visible regions respectively. Then, proposals are respectively selected from the two kinds of proposals mentioned above to match in pairs, so as to guarantee a strong correspondence in information between the two proposals. Afterwards, the successfully matched proposal features are fused by Selective Kernel Feature Fusion (SKFF) to generate high quality proposal features. Secondly, Paired Multiple Instance Prediction(PMIP) is performed on the fused features to generate multiple prediction branches, and each prediction branch generates full-body and visible prediction box. Finally, Paired Non-Maximum Suppression(PNMS) is applied to the prediction boxes to reduce the false positives. Experiments have been conducted on CrowdHuman [1] and CityPersons [2] datasets. Comparing with baseline, our methods have achieved 5.9% AP and 1.5% MR -2 improvement on the above two datasets, sufficiently verifying the effectiveness of our methods in crowded pedestrian detection. (c) 2022 Elsevier Ltd. All rights reserved.
Beeche, CameronSingh, Jatin P.Leader, Joseph K.Gezer, Naciye S....
12页
查看更多>>摘要:Objective: To develop and validate a novel convolutional neural network (CNN) termed Super U-Net for medical image segmentation. Methods: Super U-Net integrates a dynamic receptive field module and a fusion upsampling module into the classical U-Net architecture. The model was developed and tested to segment retinal vessels, gastrointestinal (GI) polyps, skin lesions on several image types (i.e., fundus images, endoscopic images, dermoscopic images). We also trained and tested the traditional U-Net architecture, seven U-Net variants, and two non-U-Net segmentation architectures. K-fold cross-validation was used to evaluate performance. The performance metrics included Dice similarity coefficient (DSC), accuracy, positive predictive value (PPV), and sensitivity. Results: Super U-Net achieved average DSCs of 0.808 +/- 0.0210, 0.752 +/- 0.019, 0.804 +/- 0.239, and 0.877 +/- 0.135 for segmenting retinal vessels, pediatric retinal vessels, GI polyps, and skin lesions, respectively. The Super U-net consistently outperformed U-Net, seven U-Net variants, and two non-U-Net segmentation architectures (p < 0.05). Conclusion: Dynamic receptive fields and fusion upsampling can significantly improve image segmentation performance. (C) 2022 Elsevier Ltd. All rights reserved.