查看更多>>摘要:How to effectively fuse inter-and intra-frame spatio-temporal information plays a key role in video super-resolution (VSR). Most existing works rely heavily on the accuracy of motion estimation and compen-sation for spatio-temporal feature alignment. However, they cannot perform well when suffering from large-scale and complex motions. To this end, this paper introduces an efficient and effective Interlaced Sparse Sinkhorn Matching (ISSM) network for VSR, which aligns supporting frames with the reference one in the feature space by learning optimal matching between image regions across frames. Specifi-cally, the ISSM divides the input dense affinity matrix into two sparse block matrixes: one can match long-distance regions while the other can match short-distance regions, and then we leverage an effi-cient Sinkhorn method on each block to learn optimal matching. Moreover, we insert a residual atrous spatial pyramid pooling module before the ISSM, which can flexibly generate multi-scale features frame by frame to capture the multi-scale context information in images. The aligned features of each adjacent frame are then fed to a bidirectional temporal fusion module to capture the rich temporal information. Finally, the fused features are sent into a frame-wise dynamic reconstruction network to produce an HR frame. Extensive evaluations on three benchmark datasets demonstrate the superiority of our method over the state-of-the-art methods in terms of PSNR and SSIM. (c) 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:Craniofacial reconstruction is applied to identify human remains in the absence of determination data (e.g., fingerprinting, dental records, radiological materials, or DNA), by predicting the likeness of the unidentified remains based on the internal relationship between the skull and face. Conventional 3D methods are usually based on statistical models with poor capacity, which limit the description of such complex relationship. Moreover, the required high-quality data are difficult to collect. In this study, we present a novel craniofacial reconstruction paradigm that synthesize craniofacial images from 2D computed tomography scan of skull data. The key idea is to recast craniofacial reconstruction as an image translation task, with the goal of generating corresponding craniofacial images from 2D skull images. To this end, we design an automatic skull-to-face transformation system based on deep generative adversarial nets. The system was trained on 4551 paired skull-face images obtained from 1780 CT head scans of the Han Chinese population. To the best of our knowledge, this is the only database of this magnitude in the literature. Finally, to accurately evaluate the performance of the model, a face recognition task employing five existing deep learning algorithms, -FaceNet, -SphereFace, -CosFace, -ArcFace, and -MagFace, was tested on 102 reconstruction cases in a face pool composed of 1744 CT-scan face images. The experimental results demonstrate that the proposed method can be used as an effective forensic tool.(c) 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:As a sub-branch of behavioral biometrics, online signature verification systems deal with unique signing characteristics, which could be better differentiated by extraction of habitual singing styles instead of geometric features in case of perfect forgery. Even if the signatures are geometrically identical, speed and frequency components of the signing process might significantly vary. Therefore, a novel framework is introduced as a new signature verification protocol for touchscreen devices using barcodes containing the dominant frequency component of the speed signals. A special interface is designed as signature tracker to extract the displacement data sampled from the signing process. The speed signals are interpolated from the displacement data and the frequency components of the signals are computed by scalograms analysis governed by continuous wavelet transformations (CWT). The signature barcodes are generated as 4-scale scalograms and classified by support vector machines (SVM). Among several compatible wavelets, Gaussian derivative wavelet is selected for generating scalograms and the results of the process are calculated as 2.25% FAR, 2.75% FRR and 2.81%EER for our dataset. The framework is also tested with SVC2004 data that we achieved 0% FAR, 9.33% FRR and 8%EER, also with SUSIG-Visual, SUSIG-Blind, MOBISIG databases and we reached between 1.22%-3.62% average EERs, which are competitive among the relevant results. Given the promising outcomes, the signature barcoding is very reliable method which could be executed by a simple touchscreen interface collecting the barcodes for storing and benchmarking when needed.
查看更多>>摘要:The person representation problem is a critical bottleneck in the player identification task. However, the current approaches for player identification utilizing the entire image features only are not sufficient to preserve identities due to the reliance on visible visual representations. In this paper, we propose a novel player representation method using a graph-powered pose representation to resolve this bottleneck problem. Our framework consists of three modules: (i.) a novel pose-guided representation module that is able to capture the pose changes dynamically and their associated effects; (ii.) a pose-guided graph embedding module using both the image deep features and the pose structure information for a better player representation inference; (iii.) an identification module as a player classifier. Experiment results on the real-world sport game scenarios demonstrate that our method achieves state-of-the-art identification performance, together with a better player representation. @ 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:There is an urgent need for automated methods to assist accurate and effective assessment of COVID19. Radiology and nucleic acid test (NAT) are complementary COVID-19 diagnosis methods. In this paper, we present an end-to-end multitask learning (MTL) framework (COVID-MTL) that is capable of automated and simultaneous detection (against both radiology and NAT) and severity assessment of COVID-19. COVID-MTL learns different COVID-19 tasks in parallel through our novel random-weighted loss function, which assigns learning weights under Dirichlet distribution to prevent task dominance; our new 3D realtime augmentation algorithm (Shift3D) introduces space variances for 3D CNN components by shifting low-level feature representations of volumetric inputs in three dimensions; thereby, the MTL framework is able to accelerate convergence and improve joint learning performance compared to single-task models. By only using chest CT scans, COVID-MTL was trained on 930 CT scans and tested on separate 399 cases. COVID-MTL achieved AUCs of 0.939 and 0.846, and accuracies of 90.23% and 79.20% for detection of COVID-19 against radiology and NAT, respectively, which outperformed the state-of-the-art models. Meanwhile, COVID-MTL yielded AUC of 0.800 +/- 0.020 and 0.813 +/- 0.021 (with transfer learning) for classifying control/suspected, mild/regular, and severe/critically-ill cases. To decipher the recognition mechanism, we also identified high-throughput lung features that were significantly related ( P < 0.001) to the positivity and severity of COVID-19. (c) 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:This paper introduces a new DBSCAN-based method for boundary detection and plane segmentation for 3D point clouds. The proposed method is based on candidate samples selection in 3D space and plane validity detection via revising the classical DBSCAN clustering algorithm to obtain a valid fitting plane. Technically, a coplanar threshold is designed as an additional clustering condition to group 3D points whose distances to the fitting plane satisfy the constraint of the threshold as one cluster. The threshold value is automatically adjusted to fit the local distribution of samples in the input dataset, which is free of parameter tuning. Planar objects can be detected by the proposed method since a cluster contains only data points belonging to one plane, and the boundaries among different planes can be correctly detected. Experimental evaluations are performed on both synthetic and real point cloud datasets. Results show that the proposed approach is effective for planar segmentation and high-quality segmentation of intersection boundaries.(c) 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:Domain adaptation is proposed to generalize learning machines and address performance degradation of models that are trained from one specific source domain but applied to novel target domains. Existing domain adaptation methods focus on transferring holistic features whose discriminability is generally tailored to be source-specific and inferiorly generic to be transferable. As a result, standard domain adaptation on holistic features usually damages feature structures, especially local feature statistics, and deteriorates the learned discriminability. To alleviate this issue, we propose to transfer primitive local feature patterns, whose discriminability are shown to be inherently more sharable, and perform hierarchical feature adaptation. Concretely, we first learn a cluster of domain-shared local feature patterns and partition the feature space into cells. Local features are adaptively aggregated inside each cell to obtain cell features, which are further integrated into holistic features. To achieve fine-grained adaptations, we simultaneously perform alignment on local features, cell features and holistic features, within which process the local and cell features are aligned independently inside each cell to maintain the learned local structures and prevent negative transfer . Experimenting on typical one-to-one unsupervised domain adaptation for both image classification and action recognition tasks, partial domain adaptation, and domain-agnostic adaptation, we show that the proposed method achieves more reliable feature transfer by consistently outperforming state-of-the-art models and the learned domain-invariant features generalize well to novel domains. (c) 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:Existing works on Automated Machine Learning (AutoML) are mainly based on predefined search space. This paper seeks synergetic automation of two ingredients, i.e., search space and search strategies. Specifically, we formulate the automation of search space and search strategies as a combinatorial optimization problem. Our empirical study on many architecture benchmarks shows that identifying the suitable search space exerts more effect than choosing a sophisticated search strategy. Motivated by this, we attempt to leverage a machine learning method to solve the discrete optimization problem, and thus develop a Layered Architecture Search Tree (LArST) approach to synergize these two components. In addition, we use a probe model-based method to extract dataset-wise features, i.e., meta-features, which is able to facilitate the estimation of proper search space and search strategy for a given task. Experimental results show the efficacy of our approach under different search mechanisms and various datasets and hardware platforms. (c) 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:Deep neural networks based purely on attention have been successful across several domains, relying on minimal architectural priors from the designer. In Human Action Recognition (HAR), attention mecha-nisms have been primarily adopted on top of standard convolutional or recurrent layers, improving the overall generalization capability. In this work, we introduce Action Transformer (AcT), a simple, fully, self-attentional architecture that consistently outperforms more elaborated networks that mix convolutional, recurrent, and attentive layers. In order to limit computational and energy requests, building on previous human action recognition research, the proposed approach exploits 2D pose representations over small temporal windows, providing a low latency solution for accurate and effective real-time performance. Moreover, we open-source MPOSE2021, a new large-scale dataset, as an attempt to build a formal train-ing and evaluation benchmark for real-time, short-time HAR. The proposed methodology was extensively tested on MPOSE2021 and compared to several state-of-the-art architectures, proving the effectiveness of the AcT model and laying the foundations for future work on HAR. (c) 2021 Elsevier Ltd. All rights reserved.
查看更多>>摘要:This paper aims to provide a compact but accessible introduction to Conformal Predictors (CP), a Machine Learning method with the distinguishing property of producing predictions that exhibit a chosen error rate. This property, referred to as validity, is backed by not only asymptotic, but also finite-sample probabilistic guarantees. CPs differ from the conventional approach to prediction in that they introduce hedging in the form of set-valued predictions. The CP validity guarantees do not require assumptions such as priors, but are of broad applicability as they rely solely on exchangeability. The CP framework is universal in the sense that it operates on top of virtually any Machine Learning method. In addition to the formal definition, this introduction discusses CP variants that can be computed efficiently (Inductive or "split" CP) or that are suitable for imbalanced data sets (class-conditional CP). Finally, a short survey of the field provides references for relevant research and highlights the variety of domains in which CPs have found valuable application. (c) 2021 Elsevier Ltd. All rights reserved.