查看更多>>摘要:The flourish of deep learning frameworks and hardware platforms has been demanding an efficient compiler that can shield the diversity in both software and hardware in order to provide application portability.Among the existing deep learning compilers,TVM is well known for its efficiency in code generation and optimization across diverse hardware devices.In the meanwhile,the Sunway many-core processor renders itself as a competitive candidate for its attractive computational power in both scientific computing and deep learning workloads.This paper combines the trends in these two directions.Specifically,we propose swTVM that extends the original TVM to support ahead-of-time compilation for architecture requiring cross-compilation such as Sunway.In addition,we leverage the architecture features during the compilation such as core group for massive parallelism,DMA for high bandwidth memory transfer and local device memory for data locality,in order to generate efficient codes for deep learning workloads on Sunway.The experiment results show that the codes generated by swTVM achieve 1.79x improvement of inference latency on average compared to the state-of-the-art deep learning framework on Sunway,across eight representative benchmarks.This work is the first attempt from the compiler perspective to bridge the gap of deep learning and Sunway processor particularly with productivity and efficiency in mind.We believe this work will encourage more people to embrace the power of deep learning and Sunway many-core processor.
查看更多>>摘要:Container-based virtualization is becoming increasingly popular in cloud computing due to its efficiency and flexibility.Resource isolation is a fundamental property of containers.Existing works have indicated weak resource isolation could cause significant performance degradation for containerized applications and enhanced resource isolation.However,current studies have almost not discussed the isolation problems of page cache which is a key resource for containers.Containers leverage memory cgroup to control page cache usage.Unfortunately,existing policy introduces two major problems in a container-based environment.First,containers can utilize more memory than limited by their cgroup,effectively breaking memory isolation.Second,the OS kernel has to evict page cache to make space for newly-arrived memory requests,slowing down containerized applications.This paper performs an empirical study of these problems and demonstrates the performance impacts on containerized applications.Then we propose pCache(precise control of page cache)to address the problems by dividing page cache into private and shared and controlling both kinds of page cache separately and precisely.To do so,pCache leverages two new technologies:fair account(f-account)and evict on demand(EoD).F-account splits the shared page cache charging based on per-container share to prevent containers from using memory for free,enhancing memory isolation.And EoD reduces unnecessary page cache evictions to avoid the performance impacts.The evaluation results demonstrate that our system can effectively enhance memory isolation for containers and achieve substantial performance improvement over the original page cache management policy.
查看更多>>摘要:Hybrid memory systems composed of dynamic random access memory(DRAM)and Non-volatile memory(NVM)often exploit page migration technologies to fully take the advantages of different memory media.Most previous proposals usually migrate data at a granularity of 4 KB pages,and thus waste memory bandwidth and DRAM resource.In this paper,we propose Mocha,a non-hierarchical architecture that organizes DRAM and NVM in a flat address space physically,but manages them in a cache/memory hierarchy.Since the commercial NVM device-Intel Optane DC Persistent Memory Modules(DCPMM)actually access the physical media at a granularity of 256 bytes(an Optane block),we manage the DRAM cache at the 256-byte size to adapt to this feature of Optane.This design not only enables fine-grained data migration and management for the DRAM cache,but also avoids write amplification for Intel Optane DCPMM.We also create an Indirect Address Cache(IAC)in Hybrid Memory Controller(HMC)and propose a reverse address mapping table in the DRAM to speed up address translation and cache replacement.Moreover,we exploit a utility-based caching mechanism to filter cold blocks in the NVM,and further improve the efficiency of the DRAM cache.We implement Mocha in an architectural simulator.Experimental results show that Mocha can improve application performance by 8.2%on average(up to 24.6%),reduce 6.9%energy consumption and 25.9%data migration traffic on average,compared with a typical hybrid memory architecture-HSCC.
查看更多>>摘要:In the process of software development,the ability to localize faults is crucial for improving the efficiency of debugging.Generally speaking,detecting and repairing errant behavior at an early stage of the development cycle considerably reduces costs and development time.Researchers have tried to utilize various methods to locate the faulty codes.However,failing test cases usually account for a small portion of the test suite,which inevitably leads to the class-imbalance phenomenon and hampers the effectiveness of fault localization.Accordingly,in this work,we propose a new fault localization approach named ContextAug.After obtaining dynamic execution through test cases,ContextAug traces these executions to build an information model;subsequently,it constructs a failure context with propagation dependencies to intersect with new model-domain failing test samples synthesized by the minimum variability of the minority feature space.In contrast to traditional test generation directly from the input domain,ContextAug seeks a new perspective to synthesize failing test samples from the model domain,which is much easier to augment test suites.Through conducting empirical research on real large-sized programs with 13 state-of-the-art fault localization approaches,ContextAug could significantly improve fault localization effectiveness with up to 54.53%.Thus,ContextAug is verified as able to improve fault localization effectiveness.
查看更多>>摘要:Spreadsheets are very common for information processing to support decision making by both professional developers and non-technical end users.Moreover,business intelligence and artificial intelligence are increasingly popular in the industry nowadays,where spreadsheets have been used as,or integrated into,intelligent or expert systems in various application domains.However,it has been repeatedly reported that faults often exist in operational spreadsheets,which could severely compromise the quality of conclusions and decisions based on the spreadsheets.With a view to systematically examining this problem via survey of existing work,we have conducted a comprehensive literature review on the quality issues and related techniques of spreadsheets over a 35.5-year period(from January 1987 to June 2022)for target journals and a 10.5-year period(from January 2012 to June 2022)for target conferences.Among other findings,two major ones are:(a)Spreadsheet quality is best addressed throughout the whole spreadsheet life cycle,rather than just focusing on a few specific stages of the life cycle.(b)Relatively more studies focus on spreadsheet testing and debugging(related to fault detection and removal)when compared with spreadsheet specification,modeling,and design(related to development).As prevention is better than cure,more research should be performed on the early stages of the spreadsheet life cycle.Enlightened by our comprehensive review,we have identified the major research gaps as well as highlighted key research directions for future work in the area.
查看更多>>摘要:Instance co-segmentation aims to segment the co-occurrent instances among two images.This task heavily relies on instance-related cues provided by co-peaks,which are generally estimated by exhaustively exploiting all paired candidates in point-to-point patterns.However,such patterns could yield a high number of false-positive co-peaks,resulting in over-segmentation whenever there are mutual occlusions.To tackle with this issue,this paper proposes an instance co-segmentation method via tensor-based salient co-peak search(TSCPS-ICS).The proposed method explores high-order correlations via triple-to-triple matching among feature maps to find reliable co-peaks with the help of co-saliency detection.The proposed method is shown to capture more accurate intra-peaks and inter-peaks among feature maps,reducing the false-positive rate of co-peak search.Upon having accurate co-peaks,one can efficiently infer responses of the targeted instance.Experiments on four benchmark datasets validate the superior performance of the proposed method.
查看更多>>摘要:Moving target detection is one of the most basic tasks in computer vision.In conventional wisdom,the problem is solved by iterative optimization under either Matrix Decomposition(MD)or Matrix Factorization(MF)framework.MD utilizes foreground information to facilitate background recovery.MF uses noise-based weights to fine-tune the background.So both noise and foreground information contribute to the recovery of the background.To jointly exploit their advantages,inspired by two framework complementary characteristics,we propose to simultaneously exploit the advantages of these two optimizing approaches in a unified framework called Joint Matrix Decomposition and Factorization(JMDF).To improve background extraction,a fuzzy factorization is designed.The fuzzy membership of the background/foreground association is calculated during the factorization process to distinguish their contributions of both to background estimation.To describe the spatio-temporal continuity of foreground more accurately,we propose to incorporate the first order temporal difference into the group sparsity constraint adaptively.The temporal constraint is adjusted adaptively.Both foreground and the background are jointly estimated through an effective alternate optimization process,and the noise can be modeled with the specific probability distribution.The experimental results of vast real videos illustrate the effectiveness of our method.Compared with the current state-of-the-art technology,our method can usually form the clearer background and extract the more accurate foreground.Anti-noise experiments show the noise robustness of our method.
查看更多>>摘要:Federated learning(FL)has emerged to break data-silo and protect clients'privacy in the field of artificial intelligence.However,deep leakage from gradient(DLG)attack can fully reconstruct clients'data from the submitted gradient,which threatens the fundamental privacy of FL.Although cryptology and differential privacy prevent privacy leakage from gradient,they bring negative effect on communication overhead or model performance.Moreover,the original distribution of local gradient has been changed in these schemes,which makes it difficult to defend against adversarial attack.In this paper,we propose a novel federated learning framework with model decomposition,aggregation and assembling(FedDAA),along with a training algorithm,to train federated model,where local gradient is decomposed into multiple blocks and sent to different proxy servers to complete aggregation.To bring better privacy protection performance to FedDAA,an indicator is designed based on image structural similarity to measure privacy leakage under DLG attack and an optimization method is given to protect privacy with the least proxy servers.In addition,we give defense schemes against adversarial attack in FedDAA and design an algorithm to verify the correctness of aggregated results.Experimental results demonstrate that FedDAA can reduce the structural similarity between the reconstructed image and the original image to 0.014 and remain model convergence accuracy as 0.952,thus having the best privacy protection performance and model training effect.More importantly,defense schemes against adversarial attack are compatible with privacy protection in FedDAA and the defense effects are not weaker than those in the traditional FL.Moreover,verification algorithm of aggregation results brings about negligible overhead to FedDAA.
查看更多>>摘要:The Internet of Things(IoT)can realize the interconnection of people,machines,and things anytime,anywhere.Most of the existing research mainly focuses on the practical applications of IoT,and there is a lack of research on modeling and reasoning about IoT systems from the perspective of formal methods.Thus,the Calculus of the Internet of Things(CaIT)has been proposed to specify and analyze IoT systems before the actual implementation,which can effectively improve development efficiency,and enhance system quality and reliability.To verify the correctness of IoT systems described by CaIT,this paper presents a proof system for CaIT,in which specifications and verifications are based on the extended Hoare Logic with time.Furthermore,we explore the cooperation between isolated proofs to validate the postconditions of the communication actions occurring in these proofs,with a particular focus on broadcast communication.We also demonstrate the soundness of our proof system.A simple"smart home"is given to illustrate the availability of our proof system.
查看更多>>摘要:In this paper,we introduce a sub-Nyquist sampling-based receiver architecture and method for wideband spectrum sensing.Instead of recovering the original wideband analog signal,the proposed method aims to directly reconstruct the power spectrum of the wideband analog signal from sub-Nyquist samples.Note that power spectrum alone is sufficient for wideband spectrum sensing.Since only the covariance matrix of the wideband signal is needed,the proposed method,unlike compressed sensing-based methods,does not need to impose any sparsity requirement on the frequency domain.The proposed method is based on a multi-coset sampling architecture.By exploiting the inherent sampling structure,a fast compressed power spectrum estimation method whose primary computational task consists of fast Fourier transform(FFT)is proposed.Simulation results are presented to show the effectiveness of the proposed method.