查看更多>>摘要:In recent years,the demand for real-time data processing has been increasing,and various stream processing systems have emerged.When the amount of data input to the stream processing system fluctuates,the computing resources required by the stream processing job will also change.The resources used by stream processing jobs need to be adjusted according to load changes,avoiding the waste of computing resources.At present,existing works adjust stream processing jobs based on the assumption that there is a linear relationship between the operator parallelism and operator resource consumption(e.g.,throughput),which makes a significant deviation when the operator parallelism increases.This paper proposes a nonlinear model to represent operator performance.We divide the operator performance into three stages,the Non-competition stage,the Non-full competition stage,and the Full competition stage.Using our proposed performance model,given the parallelism of the operator,we can accurately predict the CPU utilization and operator throughput.Evaluated with actual experiments,the prediction error of our model is below 5%.We also propose a quick accurate auto-scaling(QAAS)method that uses the operator performance model to implement the auto-scaling of the operator parallelism of the Flink job.Compared to previous work,QAAS is able to maintain stable job performance under load changes,minimizing the number of job adjustments and reducing data backlogs by 50%.
查看更多>>摘要:Data hierarchy,as a hidden property of data structure,exists in a wide range of machine learning applications.A common practice to classify such hierarchical data is first to encode the data in the Euclidean space,and then train a Euclidean classifier.However,such a paradigm leads to a performance drop due to distortion of data embedding in the Euclidean space.To relieve this issue,hyperbolic geometry is investigated as an alternative space to encode the hierarchical data for its higher ability to capture the hierarchical structures.Those methods cannot explore the full potential of the hyperbolic geometry,in the sense that such methods define the hyperbolic operations in the tangent plane,causing the distortion of data embeddings.In this paper,we develop two novel kernel formulations in the hyperbolic space,with one being positive definite(PD)and another one being indefinite,to solve the classification tasks in hyperbolic space.The PD one is defined via mapping the hyperbolic data to the Drury-Arveson(DA)space,which is a special reproducing kernel Hilbert space(RKHS).To further increase the discrimination of the classifier,an indefinite kernel is further defined in the Kreǐn spaces.Specifically,we design a 2-layer nested indefinite kernel which first maps hyperbolic data into the DA spaces,followed by a mapping from the DA spaces to the Kreǐn spaces.Extensive experiments on real-world datasets demonstrate the superiority of the proposed kernels.
查看更多>>摘要:Heterogeneous information network(HIN)has recently been widely adopted to describe complex graph structure in recommendation systems,proving its effectiveness in modeling complex graph data.Although existing HIN-based recommendation studies have achieved great success by performing message propagation between connected nodes on the defined metapaths,they have the following major limitations.Existing works mainly convert heterogeneous graphs into homogeneous graphs via defining metapaths,which are not expressive enough to capture more complicated dependency relationships involved on the metapath.Besides,the heterogeneous information is more likely to be provided by item attributes while social relations between users are not adequately considered.To tackle these limitations,we propose a novel social recommendation model MPISR,which models MetaPath Interaction for Social Recommendation on heterogeneous information network.Specifically,our model first learns the initial node representation through a pretraining module,and then identifies potential social friends and item relations based on their similarity to construct a unified HIN.We then develop the two-way encoder module with similarity encoder and instance encoder to capture the similarity collaborative signals and relational dependency on different metapaths.Extensive experiments on five real datasets demonstrate the effectiveness of our method.
查看更多>>摘要:Commonsense question answering(CQA)requires understanding and reasoning over QA context and related commonsense knowledge,such as a structured Knowledge Graph(KG).Existing studies combine language models and graph neural networks to model inference.However,traditional knowledge graph are mostly concept-based,ignoring direct path evidence necessary for accurate reasoning.In this paper,we propose MRGNN(Meta-path Reasoning Graph Neural Network),a novel model that comprehensively captures sequential semantic information from concepts and paths.In MRGNN,meta-paths are introduced as direct inference evidence and an original graph neural network is adopted to aggregate features from both concepts and paths simultaneously.We conduct sufficient experiments on the CommonsenceQA and OpenBookQA datasets,showing the effectiveness of MRGNN.Also,we conduct further ablation experiments and explain the reasoning behavior through the case study.
查看更多>>摘要:Generating photo-realistic images from a text description is a challenging problem in computer vision.Previous works have shown promising performance to generate synthetic images conditional on text by Generative Adversarial Networks(GANs).In this paper,we focus on the category-consistent and relativistic diverse constraints to optimize the diversity of synthetic images.Based on those constraints,a category-consistent and relativistic diverse conditional GAN(CRD-CGAN)is proposed to synthesize K photo-realistic images simultaneously.We use the attention loss and diversity loss to improve the sensitivity of the GAN to word attention and noises.Then,we employ the relativistic conditional loss to estimate the probability of relatively real or fake for synthetic images,which can improve the performance of basic conditional loss.Finally,we introduce a category-consistent loss to alleviate the over-category issues between K synthetic images.We evaluate our approach using the Caltech-UCSD Birds-200-2011,Oxford 102 flower and MS COCO 2014 datasets,and the extensive experiments demonstrate superiority of the proposed method in comparison with state-of-the-art methods in terms of photorealistic and diversity of the generated synthetic images.
查看更多>>摘要:Traditional image-sentence cross-modal retrieval methods usually aim to learn consistent representations of heterogeneous modalities,thereby to search similar instances in one modality according to the query from another modality in result.The basic assumption behind these methods is that parallel multi-modal data(i.e.,different modalities of the same example are aligned)can be obtained in prior.In other words,the image-sentence cross-modal retrieval task is a supervised task with the alignments as ground-truths.However,in many real-world applications,it is difficult to realign a large amount of parallel data for new scenarios due to the substantial labor costs,leading the non-parallel multi-modal data and existing methods cannot be used directly.On the other hand,there actually exists auxiliary parallel multi-modal data with similar semantics,which can assist the non-parallel data to learn the consistent representations.Therefore,in this paper,we aim at"Alignment Efficient Image-Sentence Retrieval"(AEIR),which recurs to the auxiliary parallel image-sentence data as the source domain data,and takes the non-parallel data as the target domain data.Unlike single-modal transfer learning,AEIR learns consistent image-sentence cross-modal representations of target domain by transferring the alignments of existing parallel data.Specifically,AEIR learns the image-sentence consistent representations in source domain with parallel data,while transferring the alignment knowledge across domains by jointly optimizing a novel designed cross-domain cross-modal metric learning based constraint with intra-modal domain adversarial loss.Consequently,we can effectively learn the consistent representations for target domain considering both the structure and semantic transfer.Furthermore,extensive experiments on different transfer scenarios validate that AEIR can achieve better retrieval results comparing with the baselines.
查看更多>>摘要:Federated learning is a promising learning paradigm that allows collaborative training of models across multiple data owners without sharing their raw datasets.To enhance privacy in federated learning,multi-party computation can be leveraged for secure communication and computation during model training.This survey provides a comprehensive review on how to integrate mainstream multi-party computation techniques into diverse federated learning setups for guaranteed privacy,as well as the corresponding optimization techniques to improve model accuracy and training efficiency.We also pinpoint future directions to deploy federated learning to a wider range of applications.
查看更多>>摘要:Reachability query plays a vital role in many graph analysis tasks.Previous researches proposed many methods to efficiently answer reachability queries between vertex pairs.Since many real graphs are labeled graph,it highly demands Label-Constrained Reachability(LCR)query in which constraint includes a set of labels besides vertex pairs.Recent researches proposed several methods for answering some LCR queries which require appearance of some labels specified in constraints in the path.Besides that constraint may be a label set,query constraint may be ordered labels,namely OLCR(Ordered-Label-Constrained Reachability)queries which retrieve paths matching a sequence of labels.Currently,no solutions are available for OLCR.Here,we propose DHL,a novel bloom filter based indexing technique for answering OLCR queries.DHL can be used to check reachability between vertex pairs.If the answers are not no,then constrained DFS is performed.So,we employ DHL followed by performing constrained DFS to answer OLCR queries.We show that DHL has a bounded false positive rate,and it's powerful in saving indexing time and space.Extensive experiments on 10 real-life graphs and 12 synthetic graphs demonstrate that DHL achieves about 4.8-22.5 times smaller index space and 4.6-114 times less index construction time than two state-of-art techniques for LCR queries,while achieving comparable query response time.The results also show that our algorithm can answer OLCR queries effectively.
查看更多>>摘要:As a carrier of knowledge,papers have been a popular choice since ancient times for documenting everything from major historical events to breakthroughs in science and technology.With the booming development of science and technology,the number of papers has been growing exponentially.Just like the fact that Internet of Things(IoT)allows the world to be connected in a flatter way,how will the network formed by massive academic papers look like?Most existing visualization methods can only handle up to hundreds of thousands of node size,which is much smaller than that of academic networks which are usually composed of millions or even more nodes.In this paper,we are thus motivated to break this scale limit and design a new visualization method particularly for super-large-scale academic networks(VSAN).Nodes can represent papers or authors while the edges means the relation(e.g.,citation,coauthorship)between them.In order to comprehensively improve the visualization effect,three levels of optimization are taken into account in the whole design of VSAN in a progressive manner,i.e.,bearing scale,loading speed,and effect of layout details.Our main contributions are two folded:1)We design an equivalent segmentation layout method that goes beyond the limit encountered by state-of-the-arts,thus ensuring the possibility of visually revealing the correlations of larger-scale academic entities.2)We further propose a hierarchical slice loading approach that enables users to observe the visualized graphs of the academic network at both macroscopic and microscopic levels,with the ability to quickly zoom between different levels.In addition,we propose a"jumping between nebula graphs"method that connects the static pages of many academic graphs and helps users to form a more systematic and comprehensive understanding of various academic networks.Applying our methods to three academic paper citation datasets in the AceMap database confirms the visualization scalability of VSAN in the sense that it can visualize academic networks with more than 4 million nodes.The super-large-scale visualization not only allows a galaxy-like scholarly picture unfolding that were never discovered previously,but also returns some interesting observations that may drive extra attention from scientists.