首页期刊导航|计算机科学技术学报(英文版)
期刊信息/Journal information
计算机科学技术学报(英文版)
计算机科学技术学报(英文版)

李国杰

双月刊

1000-9000

jcst@ict.ac.cn

010-62610746

100080

北京中关村科学院南路6号 《计算机科学技术学报(英)》编辑部

计算机科学技术学报(英文版)/Journal Journal of Computer Science and TechnologyCSCDCSTPCD北大核心EISCI
查看更多>>Journal of Computer Science and Technology(JCST)是中国计算机科学技术领域国际性学术期刊。 JCST于1986 年创刊, 双月刊, 国内外公开发行, 由Springer Science + Business Media代理国际出版发行。 JCST是中国计算机学会会刊, 由中国科学院计算技术研究所承办。JCST由数十位国际计算机界的著名专家和学者联袂编审,把握世界计算机科学技术最新发展趋势。JCST荟萃了国内外计算机科学技术领域中有指导性和开拓性的学术论著,定期组织热点专辑或专题栏目,部分文章邀请了世界著名计算机科学专家撰写。
正式出版
收录年代

    Technical Perspective:Research on General-Purpose Brain-Inspired Computing Systems

    Oliver Rhodes
    1-3页

    Research on General-Purpose Brain-Inspired Computing Systems

    渠鹏纪兴龙陈嘉杰庞猛...
    4-21页
    查看更多>>摘要:Brain-inspired computing is a new technology that draws on the principles of brain science and is oriented to the efficient development of artificial general intelligence(AGI),and a brain-inspired computing system is a hierarchical system composed of neuromorphic chips,basic software and hardware,and algorithms/applications that embody this tech-nology.While the system is developing rapidly,it faces various challenges and opportunities brought by interdisciplinary research,including the issue of software and hardware fragmentation.This paper analyzes the status quo of brain-inspired computing systems.Enlightened by some design principle and methodology of general-purpose computers,it is proposed to construct"general-purpose"brain-inspired computing systems.A general-purpose brain-inspired computing system refers to a brain-inspired computing hierarchy constructed based on the design philosophy of decoupling software and hardware,which can flexibly support various brain-inspired computing applications and neuromorphic chips with different architec-tures.Further,this paper introduces our recent work in these aspects,including the ANN(artificial neural network)/SNN(spiking neural network)development tools,the hardware agnostic compilation infrastructure,and the chip micro-archi-tecture with high flexibility of programming and high performance;these studies show that the"general-purpose"system can remarkably improve the efficiency of application development and enhance the productivity of basic software,thereby being conductive to accelerating the advancement of various brain-inspired algorithms and applications.We believe that this is the key to the collaborative research and development,and the evolution of applications,basic software and chips in this field,and conducive to building a favorable software/hardware ecosystem of brain-inspired computing.

    VPI:Vehicle Programming Interface for Vehicle Computing

    吴宝福仲任王昱心万健...
    22-44页
    查看更多>>摘要:The emergence of software-defined vehicles(SDVs),combined with autonomous driving technologies,has en-abled a new era of vehicle computing(VC),where vehicles serve as a mobile computing platform.However,the interdisci-plinary complexities of automotive systems and diverse technological requirements make developing applications for au-tonomous vehicles challenging.To simplify the development of applications running on SDVs,we propose a comprehen-sive suite of vehicle programming interfaces(VPIs).In this study,we rigorously explore the nuanced requirements for ap-plication development within the realm of VC,centering our analysis on the architectural intricacies of the Open Vehicu-lar Data Analytics Platform(OpenVDAP).We then detail our creation of a comprehensive suite of standardized VPIs,spanning five critical categories:Hardware,Data,Computation,Service,and Management,to address these evolving pro-gramming requirements.To validate the design of VPIs,we conduct experiments using the indoor autonomous vehicle,Ze-bra,and develop the OpenVDAP prototype system.By comparing it with the industry-influential AUTOSAR interface,our VPIs demonstrate significant enhancements in programming efficiency,marking an important advancement in the field of SDV application development.We also show a case study and evaluate its performance.Our work highlights that VPIs significantly enhance the efficiency of developing applications on VC.They meet both current and future technologi-cal demands and propel the software-defined automotive industry toward a more interconnected and intelligent future.

    10-Million Atoms Simulation of First-Principle Package LS3DF

    严昱瑾李海波赵曈汪林望...
    45-62页
    查看更多>>摘要:The growing demand for semiconductor devices simulation poses a big challenge for large-scale electronic structure calculations.Among various methods,the linearly scaling three-dimensional fragment(LS3DF)method exhibits excellent scalability in large-scale simulations.Based on algorithmic and system-level optimizations,we propose a highly scalable and highly efficient implementation of LS3DF on a domestic heterogeneous supercomputer equipped with acceler-ators.In terms of algorithmic optimizations,the original all-band conjugate gradient algorithm is refined to achieve faster convergence,and mixed precision computing is adopted to increase overall efficiency.In terms of system-level optimiza-tions,the original two-layer parallel structure is replaced by a coarse-grained parallel method.Optimization strategies such as multi-stream,kernel fusion,and redundant computation removal are proposed to increase further utilization of the com-putational power provided by the heterogeneous machines.As a result,our optimized LS3DF can scale to a 10-million sili-con atoms system,attaining a peak performance of 34.8 PFLOPS(21.2%of the peak).All the improvements can be adapt-ed to the next-generation supercomputers for larger simulations.

    Approximate Similarity-Aware Compression for Non-Volatile Main Memory

    陈章玉华宇左鹏飞孙园园...
    63-81页
    查看更多>>摘要:Image bitmaps,i.e.,data containing pixels and visual perception,have been widely used in emerging applica-tions for pixel operations while consuming lots of memory space and energy.Compared with legacy DRAM(dynamic ran-dom access memory),non-volatile memories(NVMs)are suitable for bitmap storage due to the salient features of high density and intrinsic durability.However,writing NVMs suffers from higher energy consumption and latency compared with read accesses.Existing precise or approximate compression schemes in NVM controllers show limited performance for bitmaps due to the irregular data patterns and variance in bitmaps.We observe the pixel-level similarity when writing bitmaps due to the analogous contents in adjacent pixels.By exploiting the pixel-level similarity,we propose SimCom,an approximate similarity-aware compression scheme in the NVM module controller,to efficiently compress data for each write access on-the-fly.The idea behind SimCom is to compress continuous similar words into the pairs of base words with runs.The storage costs for small runs are further mitigated by reusing the least significant bits of base words.SimCom adaptively selects an appropriate compression mode for various bitmap formats,thus achieving an efficient trade-off be-tween quality and memory performance.We implement SimCom on GEM5/zsim with NVMain and evaluate the perfor-mance with real-world image/video workloads.Our results demonstrate the efficacy and efficiency of our SimCom with an efficient quality-performance trade-off.

    DIR:Dynamic Request Interleaving for Improving the Read Performance of Aged Solid-State Drives

    聂世强张驰伍卫国
    82-98页
    查看更多>>摘要:Triple-level cell(TLC)NAND flash is increasingly adopted to build solid-state drives(SSDs)for modern computer systems.While TLC NAND flash effectively improves storage density,it faces severe reliability issues;in partic-ular,the pages exhibit different raw bit error rates(RBERs).Integrating strong low-density parity-check(LDPC)code helps to improve reliability but suffers from prolonged and proportional read latency due to multiple read retries for worse pages.The straightforward idea is that dispersing page-size data across several pages in different types can achieve a low-er average RBER and reduce the read latency.However,directly implementing this simple idea into flash translation lay-er(FTL)induces the read amplification issue as one logic page residing in more than one physical page brings several read operations.In this paper,we propose the Dynamic Request Interleaving(DIR)technology for improving the performance of TLC NAND flash-based SSDs,in particular,the aged ones with large RBERs.DIR exploits the observation that the la-tency of an I/O request is determined,without considering the queuing time,by the access of the slowest device page,i.e.,the page that has the highest RBER.By grouping consecutive logical pages that have high locality and interleaving their encoded data in different types of device pages that have different RBERs,DIR effectively reduces the number of read re-tries for LDPC with limited read amplification.To meet the requirement of allocating hybrid page types for interleaved data,we also design a page-interleaving friendly page allocation scheme,which splits all the planes into multi-plane re-gions for storing the interleaved data and single-plane regions for storing the normal data.The pages in the multi-plane re-gion can be read/written in parallel by the proposed multi-plane command and avoid the read amplification issue.Based on the DIR scheme and the proposed page allocation scheme,we build DIR-enable FTL,which integrates the proposed schemes into the FTL with some modifications.Our experimental results show that adopting DIR in aged SSDs exploits nearly 33%locality from I/O requests and,on average,reduces 43%read latency over conventional aged SSDs.

    CFP:A Coherence-Free Processor Design

    杨光诺
    99-102页
    查看更多>>摘要:This paper presents the design of a Coherence-Free Processor(CFP)that enables a scalable multiprocessor by eliminating cache coherence operations in both hardware and software.The CFP uses a coherence-free cache(CFC)that can improve the cost-effectiveness and performance-effectiveness of the existing multiprocessors for commonly used workloads.The CFC is feasible because not all program data that reside in a multiprocessor cache need to be accessed by other processors,and private caches at level 1(L1)and level 2(L2)facilitate this method of sharing.Reentrant programs are specifically designed to protect their data from modification by other tasks.Program data that are modified but not shared with other tasks do not require a coherence protocol.Adding processors reduces the multitasking queue,reducing elapsed time.Simultaneous execution replaces concurrent execution.

    An Online Algorithm Based on Replication for Using Spot Instances in IaaS Clouds

    许志伟潘丽刘士军
    103-115页
    查看更多>>摘要:Infrastructure-as-a-Service(IaaS)cloud platforms offer resources with diverse buying options.Users can run an instance on the on-demand market which is stable but expensive or on the spot market with a significant discount.However,users have to carefully weigh the low cost of spot instances against their poor availability.Spot instances will be revoked when the revocation event occurs.Thus,an important problem that an IaaS user faces now is how to use spot in-stances in a cost-effective and low-risk way.Based on the replication-based fault tolerance mechanism,we propose an on-line termination algorithm that optimizes the cost of using spot instances while ensuring operational stability.We prove that in most cases,the cost of our proposed online algorithm will not exceed twice the minimum cost of the optimal of-fline algorithm that knows the exact future a priori.Through a large number of experiments,we verify that our algorithm in most cases has a competitive ratio of no more than 2,and in other cases it can also reach the guaranteed competitive ratio.

    Online Nonstop Task Management for Storm-Based Distributed Stream Processing Engines

    张洲金培权谢希科王晓亮...
    116-138页
    查看更多>>摘要:Most distributed stream processing engines(DSPEs)do not support online task management and cannot adapt to time-varying data flows.Recently,some studies have proposed online task deployment algorithms to solve this problem.However,these approaches do not guarantee the Quality of Service(QoS)when the task deployment changes at runtime,because the task migrations caused by the change of task deployments will impose an exorbitant cost.We study one of the most popular DSPEs,Apache Storm,and find out that when a task needs to be migrated,Storm has to stop the resource(implemented as a process of Worker in Storm)where the task is deployed.This will lead to the stop and restart of all tasks in the resource,resulting in the poor performance of task migrations.Aiming to solve this problem,in this pa-per,we propose N-Storm(Nonstop Storm),which is a task-resource decoupling DSPE.N-Storm allows tasks allocated to resources to be changed at runtime,which is implemented by a thread-level scheme for task migrations.Particularly,we add a local shared key/value store on each node to make resources aware of the changes in the allocation plan.Thus,each resource can manage its tasks at runtime.Based on N-Storm,we further propose Online Task Deployment(OTD).Differ-ing from traditional task deployment algorithms that deploy all tasks at once without considering the cost of task migra-tions caused by a task re-deployment,OTD can gradually adjust the current task deployment to an optimized one based on the communication cost and the runtime states of resources.We demonstrate that OTD can adapt to different kinds of applications including computation-and communication-intensive applications.The experimental results on a real DSPE cluster show that N-Storm can avoid the system stop and save up to 87%of the performance degradation time,compared with Apache Storm and other state-of-the-art approaches.In addition,OTD can increase the average CPU usage by 51%for computation-intensive applications and reduce network communication costs by 88%for communication-intensive ap-plications.

    Federated Dynamic Client Selection for Fairness Guarantee in Heterogeneous Edge Computing

    毛莺池沈莉娟吴俊平萍...
    139-158页
    查看更多>>摘要:Federated learning has emerged as a distributed learning paradigm by training at each client and aggregat-ing at a parameter server.System heterogeneity hinders stragglers from responding to the server in time with huge com-munication costs.Although client grouping in federated learning can solve the straggler problem,the stochastic selection strategy in client grouping neglects the impact of data distribution within each group.Besides,current client grouping ap-proaches make clients suffer unfair participation,leading to biased performances for different clients.In order to guaran-tee the fairness of client participation and mitigate biased local performances,we propose a federated dynamic client selec-tion method based on data representativity(FedSDR).FedSDR clusters clients into groups correlated with their own lo-cal computational efficiency.To estimate the significance of client datasets,we design a novel data representativity evalua-tion scheme based on local data distribution.Furthermore,the two most representative clients in each group are selected to optimize the global model.Finally,the DYNAMIC-SELECT algorithm updates local computational efficiency and data representativity states to regroup clients after periodic average aggregation.Evaluations on real datasets show that FedS-DR improves client participation by 27.4%,37.9%,and 23.3%compared with FedAvg,TiFL,and FedSS,respectively,tak-ing fairness into account in federated learning.In addition,FedSDR surpasses FedAvg,FedGS,and FedMS by 21.32%,20.4%,and 6.90%,respectively,in local test accuracy variance,balancing the performance bias of the global model across clients.