查看更多>>摘要:Background Synthesizing dance motions to match musical inputs is a significant challenge in animation research.Compared to functional human motions,such as locomotion,dance motions are creative and artistic,often influenced by music,and can be independent body language expressions.Dance choreography requires motion content to follow a general dance genre,whereas dance performances under musical influence are infused with diverse impromptu motion styles.Considering the high expressiveness and variations in space and time,providing accessible and effective user control for tuning dance motion styles remains an open problem.Methods In this study,we present a hierarchical framework that decouples the dance synthesis task into independent modules.We use a high-level choreography module built as a Transformer-based sequence model to predict the long-term structure of a dance genre and a low-level realization module that implements dance stylization and synchronization to match the musical input or user preferences.This novel framework allows the individual modules to be trained separately.Because of the decoupling,dance composition can fully utilize existing high-quality dance datasets that do not have musical accompaniments,and the dance implementation can conveniently incorporate user controls and edit motions through a decoder network.Each module is replaceable at runtime,which adds flexibility to the synthesis of dance sequences.Results Synthesized results demonstrate that our framework generates high-quality diverse dance motions that are well adapted to varying musical conditions and user controls.
查看更多>>摘要:We introduce CURDIS,a template for algorithms to discretize arcs of regular curves by incrementally producing a list of support pixels covering the arc.In this template,algorithms proceed by finding the tangent quadrant at each point of the arc and determining which side the curve exits the pixel according to a tailored criterion.These two elements can be adapted for any type of curve,leading to algorithms dedicated to the shape of specific curves.While the calculation of the tangent quadrant for various curves,such as lines,conics,or cubics,is simple,it is more complex to analyze how pixels are traversed by the curve.In the case of conic arcs,we found a criterion for determining the pixel exit side.This leads us to present a new algorithm,called CURDIS-C,specific to the discretization of conics,for which we provide all the details.Surprisingly,the criterion for conics requires between one and three sign tests and four additions per pixel,making the algorithm efficient for resource-constrained systems and feasible for fixed-point or integer arithmetic implementations.Our algorithm also perfectly handles the pathological cases in which the conic intersects a pixel twice or changes quadrants multiple times within this pixel,achieving this generality at the cost of potentially computing up to two square roots per arc.We illustrate the use of CURDIS for the discretization of different curves,such as ellipses,hyperbolas,and parabolas,even when they degenerate into lines or corners.
Robert KOSKRichard SOUTHERNLihua YOUShaojun BIAN...
383-395页
查看更多>>摘要:Background Deep 3D morphable models(deep 3DMMs)play an essential role in computer vision.They are used in facial synthesis,compression,reconstruction and animation,avatar creation,virtual try-on,facial recognition systems and medical imaging.These applications require high spatial and perceptual quality of synthesised meshes.Despite their significance,these models have not been compared with different mesh representations and evaluated jointly with point-wise distance and perceptual metrics.Methods We compare the influence of different mesh representation features to various deep 3DMMs on spatial and perceptual fidelity of the reconstructed meshes.This paper proves the hypothesis that building deep 3DMMs from meshes represented with global representations leads to lower spatial reconstruction error measured with L1 and L2 norm metrics and underperforms on perceptual metrics.In contrast,using differential mesh representations which describe differential surface properties yields lower perceptual FMPD and DAME and higher spatial fidelity error.The influence of mesh feature normalisation and standardisation is also compared and analysed from perceptual and spatial fidelity perspectives.Results The results presented in this paper provide guidance in selecting mesh representations to build deep 3DMMs accordingly to spatial and perceptual quality objectives and propose combinations of mesh representations and deep 3DMMs which improve either perceptual or spatial fidelity of existing methods.
查看更多>>摘要:Background Co-salient object detection(Co-SOD)aims to identify and segment commonly salient objects in a set of related images.However,most current Co-SOD methods encounter issues with the inclusion of irrelevant information in the co-representation.These issues hamper their ability to locate co-salient objects and significantly restrict the accuracy of detection.Methods To address this issue,this study introduces a novel Co-SOD method with iterative purification and predictive optimization(IPPO)comprising a common salient purification module(CSPM),predictive optimizing module(POM),and diminishing mixed enhancement block(DMEB).Results These components are designed to explore noise-free joint representations,assist the model in enhancing the quality of the final prediction results,and significantly improve the performance of the Co-SOD algorithm.Furthermore,through a comprehensive evaluation of IPPO and state-of-the-art algorithms focusing on the roles of CSPM,POM,and DMEB,our experiments confirmed that these components are pivotal in enhancing the performance of the model,substantiating the significant advancements of our method over existing benchmarks.Experiments on several challenging benchmark co-saliency datasets demonstrate that the proposed IPPO achieves state-of-the-art performance.
查看更多>>摘要:Background Document images such as statistical reports and scientific journals are widely used in information technology.Accurate detection of table areas in document images is an essential prerequisite for tasks such as information extraction.However,because of the diversity in the shapes and sizes of tables,existing table detection methods adapted from general object detection algorithms,have not yet achieved satisfactory results.Incorrect detection results might lead to the loss of critical information.Methods Therefore,we propose a novel end-to-end trainable deep network combined with a self-supervised pretraining transformer for feature extraction to minimize incorrect detections.To better deal with table areas of different shapes and sizes,we added a dual-branch context content attention module(DCCAM)to high-dimensional features to extract context content information,thereby enhancing the network's ability to learn shape features.For feature fusion at different scales,we replaced the original 3×3 convolution with a multilayer residual module,which contains enhanced gradient flow information to improve the feature representation and extraction capability.Results We evaluated our method on public document datasets and compared it with previous methods,which achieved state-of-the-art results in terms of evaluation metrics such as recall and F1-score.https://github.com/YongZ-Lee/TD-DCCAM.