查看更多>>摘要:This paper reports a new method (simplified as AE-ELM-SynMin) to create the Synthetic Minority class samples for imbalanced classification based on AutoEncoder Extreme Learning Machine (AE-ELM). AEELM-SynMin first trains an AE-ELM which is a special ELM with the same input and output, i.e., the original minority class samples. Second, the crossover, mutation and filtration operations are conducted on the hidden-layer output of AE-ELM and then the synthetic hidden-layer output is obtained. Third, the synthetic minority class samples are created by decoding the synthetic hidden-layer output with output-layer weights of AE-ELM. AE-ELM-SynMin guarantees that the synthetic minority class has the higher information amount than original minority class and meanwhile keeps the consistent probability distribution with the original minority class. The experimental results demonstrate the better imbalanced classification performances of AE-ELM-SynMin in comparison with the regular synthetic minority over-sampling technique (Regular-SMOTE) and its variants, e.g., Borderline-SMOTE, Random-SMOTE, and SMOTE-IPF. (c) 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:Most existing approaches for single image super-resolution (SISR) resort to quality low-high resolution (LR-HR) pairs and available degradation kernels to train networks for a specific task in hand in a fully supervised manner. Labeled data used for training are, however, usually limited in terms of the quantity and the diversity degradation kernels. The learned SR networks with one degradation kernel (e.g., bicubic) do not generalize well and their performance sharply deteriorates on other kernels (e.g., blurred or noise). In this paper, we address the critical challenge for SISR: limited labeled LR images and degradation kernels. We propose a novel Semi-supervised Student-Teacher Super-Resolution approach called (STSR)-T-2 that super-resolves both labelled and unlabeled LR images via adversarial learning. To better exploit the information from labeled LR images, we propose a student-teacher framework (S-T) via knowledge transfer from supervised learning (T) to unsupervised learning (S). Specifically, the S-T knowledge transfer is based on a shared SR network, partial weight sharing of dual discriminators, and a pair matching network which also plays as a `latent discriminator'. Lastly, to learn better features from the limited labeled LR images, we propose a new SR network via non-local and attention mechanisms. Experiments demonstrate that our approach substantially improves unsupervised methods and performs favorably over fully supervised methods. (C) 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:State-of-the-art object detection approaches are often composed of two stages, namely, proposing a number of regions on an image and classifying each of them into one class. Both stages share a network backbone which builds visual features in a bottom-up manner. In this paper, we advocate the importance of equipping two-stage detectors with top-down signals, in order to which provides high-level contextual cues to complement low-level features. In practice, this is implemented by adding a side path in the detection head to predict all object classes in the image, which is co-supervised by image-level semantics and requires little extra overheads. Our approach is easily applied to two popular object detection algorithms, and achieves consistent performance gain in the MS-COCO dataset. (c) 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:The classification of hyperspectral image is a challenging task due to the high dimensional space, with large number of spectral bands, and low number of labeled training samples. To overcome these challenges, we propose a novel methodology for hyperspectral image classification based on multi-view deep neural networks which fuses both spectral and spatial features by using only a small number of labeled samples. Firstly, we process the initial hyperspectral image in order to extract a set of spectral and spatial features. Each spectral vector is the spectral signature of each pixel of the image. The spatial features are extracted using a simple deep autoencoder, which seeks to reduce the high dimensionality of data taking into account the neighborhood region for each pixel. Secondly, we propose a multi-view deep autoencoder model which allows fusing the spectral and spatial features extracted from the hyperspectral image into a joint latent representation space. Finally, a semi-supervised graph convolutional network is trained based on thee fused latent representation space to perform the hyperspectral image classification. The main advantage of the proposed approach is to allow the automatic extraction of relevant information while preserving the spatial and spectral features of data, and improve the classification of hyperspectral images even when the number of labeled samples is low. Experiments are conducted on three real hyperspectral images respectively Indian Pines, Salinas, and Pavia University datasets. Results show that the proposed approach is competitive in classification performances compared to state-of-the-art. (c) 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:In this paper, a user modeling task is examined by processing mobile device gallery of photos and videos. We propose a novel engine for preferences prediction based on scene recognition, object detection and facial analysis. At first, all faces in a gallery are clustered, and all private photos and videos with faces from large clusters are processed on the embedded system in offline mode. Other photos may be sent to the remote server to be analyzed by very deep sophisticated neural networks. The visual features of each photo are obtained from scene recognition and object detection models. These features are aggregated into a single descriptor in the neural attention unit. The proposed pipeline is implemented in mobile Android application. Experimental results for the Photo Event Collection, Web Image Dataset for Event Recognition and Amazon Fashion data demonstrate the possibility to efficiently process images without significant accuracy degradation. (C) 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:We propose a novel solution that classifies very similar images (fine-grained classification) of variants of retail products displayed on the racks of supermarkets. The proposed scheme simultaneously captures object-level and part-level cues of the product images. The object-level cues of the product images are captured with our novel reconstruction-classification network (RC-Net). For annotation-free modeling of part-level cues, the discriminatory parts of the product images are identified around the keypoints. The ordered sequences of these discriminatory parts, encoded using convolutional LSTM, describe the products uniquely. Finally, the part-level and object-level models jointly determine the products explicitly explaining coarse to finer descriptions of the products. This bi-level architecture is embedded in R-CNN for recognizing variants of retail products on the rack. We perform extensive experiments on one In-house and three benchmark datasets. The proposed scheme outperforms competing methods in almost all the evaluations. (c) 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:Dimensionality reduction (DR) is a key preprocessing stage in high-dimensional data classification. Tra-ditional linear DR algorithms, e.g., Linear Discriminant Analysis, transform the original data into a low-dimensional subspace with a linear transformation matrix. However, these methods cannot handle com-plex nonlinearly separable data. Although some nonlinear DR methods, e.g., Locally Linear Embedding, are proposed to solve this problem, most of them are unsupervised, which only focus on the data structure hidden in the original high-dimensional space, rather than maximizing the inter-class separability of the transformed data, thus reducing the classification accuracy. To tackle this challenge, a novel supervised nonlinear DR algorithm, distance metric restricted mixture factor analysis (DMR-MFA), is proposed for high-dimensional data classification. In DMR-MFA, the original data is divided into several clusters, and the generation of original data in each cluster is described via a factor analysis model. Meanwhile, the distance metric constraint (DMC) is used for maximizing the separability of transformed low-dimensional data from different classes. Moreover, the optimal model parameters are learned via the joint optimiza-tion of log-likelihood function and DMC loss function, which makes the DMR-MFA possible to obtain the more separable low-dimensional embeddings while accurately describing the original data. Experimental results on synthetic data, benchmark datasets and high-resolution range profile data demonstrate that our method can handle nonlinearly separable data and improves the classification accuracy of data with high dimensionality. (c) 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:In the Internet of Things enabled intelligent transportation systems, a huge amount of vehicle video data has been generated and real-time and accurate video analysis are very important and challenging work, especially in situations with complex street scenes. Therefore, we propose edge computing based video pre-processing to eliminate the redundant frames, so that we migrate the partial or all the video process -ing task to the edge, thereby diminishing the computing, storage and network bandwidth requirements of the cloud center, and enhancing the effectiveness of video analyzes. To eliminate the redundancy of the traffic video, the magnitude of motion detection based on spatio-temporal interest points (STIP) and the multi-modal linear features combination are presented which splits a video into super frame segments of interests. After that, we select the key frames from these interesting segments of the long videos with the design and detection of the prominent region. Finally, the extensive numerical experimental verification results show our methods are superior to the previous algorithms for different stages of the redundancy elimination, video segmentation, key frame selection and vehicle detection. (c) 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:In this paper, we propose head pose estimation using deep neural networks and 3D point cloud. Unlike existing methods that either take 2D RGB image or 2D depth image as input, we adopt 3D point cloud data generated from depth to estimate 3D head poses. To further improve robustness and accuracy of head pose estimation, we classify 3D angles of head poses into 36 classes with 5 degrees interval and predict the probability of each angle in a class based on multi-layer perceptron (MLP). While traditional iter-ative methods for head model construction require high computation and memory costs, the proposed method is lightweight and computationally efficient by utilizing a sampled 3D point cloud as input com-bined with a graph convolutional neural network (GCNN). Experimental results on Biwi Kinect Head Pose dataset show that the proposed method achieves outstanding performance in head pose estimation and outperforms state-of-the-art ones in terms of accuracy. (c) 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:In this paper, we propose convex cone-based frameworks for image-set classification. Image-set classification aims to classify a set of images, usually obtained from video frames or multi-view cameras, into a target object. To accurately and stably classify a set, it is essential to accurately represent structural information of the set. There are various image features, such as histogram-based features and convolutional neural network features. We should note that most of them have non-negativity and thus can be effectively represented by a convex cone. This leads us to introduce the convex cone representation to image-set classification. To establish a convex cone-based framework, we mathematically define multiple angles between two convex cones, and then use the angles to define the geometric similarity between them. Moreover, to enhance the framework, we introduce two discriminant spaces. We first propose a discriminant space that maximizes gaps between cones and minimizes the within-class variance. We then extend it to a weighted discriminant space by introducing weights on the gaps to deal with complicated data distribution. In addition, to reduce the computational cost of the proposed methods, we develop a novel strategy for fast implementation. The effectiveness of the proposed methods is demonstrated experimentally by using five databases. (c) 2021 Elsevier Ltd. All rights reserved.