首页期刊导航|IEEE transactions on computers
期刊信息/Journal information
IEEE transactions on computers
Institute of Electrical and Electronics Engineers
IEEE transactions on computers

Institute of Electrical and Electronics Engineers

月刊

0018-9340

IEEE transactions on computers/Journal IEEE transactions on computersSCIEI
正式出版
收录年代

    Hardware Trojan Detection Methods for Gate-Level Netlists Based on Graph Neural Networks

    Peijun MaJie LiHongjin LiuJiangyi Shi...
    1470-1481页
    查看更多>>摘要:Currently, untrusted third-party entities are increasingly involved in various stages of IC design and manufacturing, posing a significant threat to the reliability and security of SoCs due to the presence of hardware Trojans (HTs). In this paper, gate-level HT detection methods based on graph neural networks (GNNs) are established to overcome the defects of existing machine learning, which makes it difficult to characterize circuit connection relationships. We introduce harmonic centrality in the feature engineering of gate-level HT detection, which reflects the positional information of nodes and their adjacent nodes in the graph, thereby enhancing the quality of feature engineering. We use the golden section weight optimization algorithm to configure penalty weights to alleviate the problem of extreme data imbalance. In the SAED database, GraphSAGE-LSTM model obtained a TPR of 88.06% and an average F1 score of 90.95%. In the combined HT netlist of LEDA datasets, GraphSAGE-POOL model obtains a TPR of 88.50% and the best F1 score of 92.17%. In sequential HT netlist, GraphSAGE-LSTM model performs optimally, with a TPR of 98.25% and an average F1 score of 98.59%. Compared to existing detection models, the F1 score is enhanced by 8.86% and 2.48% on combined and sequential HT datasets, respectively.

    A Complexity-Effective Local Delta Prefetcher

    Agustín Navarro-TorresBiswabandan PandaJesús Alastruey-BenedéPablo Ibáñez...
    1482-1494页
    查看更多>>摘要:Data prefetching is crucial for performance in modern processors by effectively masking long-latency memory accesses. Over the past decades, numerous data prefetching mechanisms have been proposed, which have continuously reduced the access latency to the memory hierarchy. Several state-of-the-art prefetchers, namely Instruction Pointer Classifier Prefetcher (IPCP) and Berti, target the first-level data cache, and thus, they are able to completely hide the miss latency for timely prefetched cache lines. Berti exploits timely local deltas to achieve high accuracy and performance. This paper extends Berti with a larger evaluation and with extra optimizations on top of the previous conference paper. The result is a complexity-effective version of Berti that outperforms it for a large amount of workloads and simplifies its control logic. The key for those advancements is a simple mechanism for learning timely deltas without the need to track the fetch latency of each cache miss. Our experiments conducted with a wide range of workloads (CVP traces by Qualcomm, SPEC CPU2017, and GAP) show performance improvements by 4.0% over a mainstream stride prefetcher, and by a non-negligible 1.4% over the previously published version of Berti requiring similar storage.

    DC-ORAM: An ORAM Scheme Based on Dynamic Compression of Data Blocks and Position Map

    Chuang LiChangyao TanGang LiuYanhua Wen...
    1495-1509页
    查看更多>>摘要:Oblivious RAM (ORAM) is an efficient cryptographic primitive that prevents leakage of memory access patterns. It has been referenced by modern secure processors and plays an important role in memory security protection. Although the most advanced ORAM has made great progress in performance optimization, the access overhead (i.e., data blocks) and on-chip (i.e., PosMap) storage overhead is still too high, which will lead to problems such as low system performance. To overcome the above challenges, in this paper, we propose a DC-ORAM system, which reduces the data access overhead and on-chip PosMap storage overhead by using dynamic compression technology. Specifically, we use byte stream redundancy compression technology to compress data blocks on the ORAM tree. And in PosMap, a high-bit multiplexing strategy is used to achieve data compression for binary high-bit repeated data of leaf labels (or path labels). By introducing the above compression technology, in this work, compared with conventional Path ORAM, the compression rate of the ORAM tree is $52.9\%$, and the compression rate of PosMap is $40.0\%$. In terms of performance, compared to conventional Path ORAM, our proposed DC-ORAM system reduces the average latency by $33.6\%$. In addition, we apply the compression technology proposed in this work to the Ring ORAM system. By comparison, it is found that with the same compression ratio as Path ORAM, our design can still reduce latency by an average of $21.5\%$.

    Dynamic Caching Dependency-Aware Task Offloading in Mobile Edge Computing

    Liang ZhaoZijia ZhaoAmmar HawbaniZhi Liu...
    1510-1523页
    查看更多>>摘要:Mobile Edge Computing (MEC) is a distributed computing paradigm that provides computing capabilities at the periphery of mobile cellular networks. This architecture empowers Mobile Users (MUs) to offload computation-intensive applications to large-scale computing nodes near the edge side, reducing application latency for MUs. The resource allocation and task offloading in MEC has been widely studied. However, the burgeoning complexity inherent to modern applications, often represented as Directed Acyclic Graphs (DAGs) comprising a multitude of subtasks with interdependencies, poses huge challenges for application offloading and resource allocation. Meanwhile, previous work has neglected the impact of edge caching on the offloading execution of dependent tasks. Therefore, this paper introduces a novel dynamic caching dependency-aware task offloading (CachOf) scheme. First, to effectively enhance the rationality of cache and computing resource allocation, we develop a subtask priority computation scheme based on DAG dependencies. This scheme includes the execution sequence priority of subtasks on a single MU and the offloading sequence priority of subtasks from multiple MUs. Second, a dynamic caching scheme, designed to cater to dependent tasks, is proposed. This caching approach can not only assist offloading decisions, but also contribute to load balancing by harmonizing caching resources among edge servers. Finally, based on the task prioritization results and caching results, this paper presents a Deep Reinforcement Learning (DRL)-based offloading scheme to judiciously allocate resources and improve the execution efficiency of applications. Extensive simulation experiments demonstrate that CachOf outperforms other baseline schemes, achieving improved execution efficiency for applications.

    The Graph Structure of Baker's Maps Implemented on a Computer

    Chengqing LiKai Tan
    1524-1537页
    查看更多>>摘要:The complex dynamics of the baker's map and its variants in infinite-precision mathematical domains and quantum settings have been extensively studied over the past five decades. However, their behavior in finite-precision digital computing remains largely unknown. This paper addresses this gap by investigating the graph structure of the generalized two-dimensional baker's map and its higher-dimensional extension, referred to as HDBM, as implemented on the discrete setting in a digital computer. We provide a rigorous analysis of how the map parameters shape the in-degree bounds and distribution within the functional graph, revealing fractal-like structures intensify as parameters approach each other and arithmetic precision increases. Furthermore, we demonstrate that recursive tree structures can characterize the functional graph structure of HDBM in a fixed-point arithmetic domain. Similar to the 2-D case, the degree of any non-leaf node in the functional graph, when implemented in the floating-point arithmetic domain, is determined solely by its last component. We also reveal the relationship between the functional graphs of HDBM across the two arithmetic domains. These findings lay the groundwork for dynamic analysis, effective control, and broader application of the baker's map and its variants in diverse domains.

    Pruning-Based Adaptive Federated Learning at the Edge

    Dongxiao YuYuan YuanYifei ZouXiao Zhang...
    1538-1548页
    查看更多>>摘要:Federated Learning (FL) is a new learning framework in which $s$ clients collaboratively train a model under the guidance of a central server. Meanwhile, with the advent of the era of large models, the parameters of models are facing explosive growth. Therefore, it is important to design federated learning algorithms for edge environment. However, the edge environment is severely limited in computing, storage, and network bandwidth resources. Concurrently, adaptive gradient methods show better performance than constant learning rate in non-distributed settings. In this paper, we propose a pruning-based distributed Adam (PD-Adam) algorithm, which combines model pruning and adaptive learning steps to achieve asymptotically optimal convergence rate of $O(1/\sqrt[4]{K})$. At the same time, the algorithm can achieve convergence consistent with the centralized model. Finally, extensive experiments have confirmed the convergence of our algorithm, demonstrating its reliability and effectiveness across various scenarios. Specially, our proposed algorithm is $2$% and $18$% more accurate than the current state-of-the-art FedAvg algorithm on the ResNet and CIFAR datasets.

    Small Hazard-Free Transducers

    Johannes BundChristoph LenzenMoti Medina
    1549-1564页
    查看更多>>摘要:In digital circuits, hazardous input signals are a result of spurious operation of bistable elements. For example, the problem occurs in circuits with asynchronous inputs or clock domain crossings. Marino (TC’81) showed that hazards in bistable elements are inevitable. Hazard-free circuits compute the “most stable” output possible on hazardous inputs, under the constraint that it returns the same output as the circuit on stable inputs. Ikenmeyer et al. (JACM’19) proved an unconditional exponential separation between the hazard-free complexity and (standard) circuit complexity of explicit functions. Despite that, asymptotically optimal hazard-free sorting circuit are possible (Bund et al., TC’19). This raises the question: Which classes of functions permit efficient hazard-free circuits? We prove that circuit implementations of transducers with small state space are such a class. A transducer is a finite state machine that transcribes, symbol by symbol, an input string of length n into an output string of length n. We present a construction that transforms any function arising from a transducer into an efficient circuit that computes the hazard-free extension of the function. For transducers with constant state space, the circuit has asymptotically optimal size, with small constants if the state space is small.

    Computing Tasks Saving Schemes Through Early Exit in Edge Intelligence-Assisted Systems

    Xin NiuXianwei LvWang ChenChen Yu...
    1565-1576页
    查看更多>>摘要:Edge intelligence (EI) is a promising paradigm where end devices collaborate with edge servers to provide artificial intelligence services to users. In most realistic scenarios, end devices often move unconsciously, resulting in frequent computing migrations. Moreover, a surge in computing tasks offloaded to edge servers significantly prolongs queuing latency. These two issues obstruct the timely completion of computing tasks in EI-assisted systems. In this paper, we formulate an optimization problem aiming to maximize computing task completion under latency constraints. To address this issue, we first categorize computing tasks into new computing tasks (NCTs) and partially completed computing tasks (PCTs). Subsequently, based on model partitioning, we design a new computing task saving scheme (NSS) to optimize early exit points for NCTs and computing tasks in the queuing queue. Furthermore, we propose a partially completed computing task saving scheme (PSS) to set early exit points for PCTs during computing migrations. Numerous experiments show that computing saving schemes can achieve at least 90% computing task completion rate and up to 61.81% latency reduction compared to other methods.

    Slack Time Management for Imprecise Mixed-Criticality Systems With Reliability Constraints

    Yi-Wen ZhangHui Zheng
    1577-1588页
    查看更多>>摘要:A Mixed-Criticality System (MCS) integrates multiple applications with different criticality levels on the same hardware platform. For power and energy-constrained systems such as Unmanned Aerial Vehicles, it is important to minimize energy consumption of the computing system while meeting reliability constraints. In this paper, we first determine the number of tolerated faults according to the given reliability target. Second, we propose a schedulability test for MCS with semi-clairvoyance and checkpointing. Third, we propose the Energy-Aware Scheduling with Reliability Constraint (EASRC) scheduling algorithm for MCS with semi-clairvoyance and checkpointing. It consists of an offline phase and an online phase. In the offline phase, we determine the offline processor speed by reclaiming static slack time. In the online phase, we adjust the processor speed by reclaiming dynamic slack time to further save energy. Finally, we show the performance of our proposed algorithm through experimental evaluations. The results show that the proposed algorithm can save an average of 9.67% of energy consumption compared with existing methods.

    Energy-Delay-Aware Joint Microservice Deployment and Request Routing With DVFS in Edge: A Reinforcement Learning Approach

    Liangyuan WangXudong LiuHaonan DingYi Hu...
    1589-1604页
    查看更多>>摘要:The emerging microservice architecture offers opportunities for accommodating delay-sensitive applications in edge. However, such applications are computation-intensive and energy-consuming, imposing great difficulties to edge servers with limited computing resources, energy supply, and cooling capabilities. To reduce delay and energy consumption in edge, efficient microservice orchestration is necessary, but significantly challenging. Due to frequent communications among multiple microservices, service deployment and request routing are tightly-coupled, which motivates a complex joint optimization problem. When considering multi-instance modeling and fine-grained orchestration for massive microservices, the difficulty is extremely enlarged. Nevertheless, previous work failed to address the above difficulties. Also, they neglected to balance delay and energy, especially lacking dynamic energy-saving abilities. Therefore, this paper minimizes energy and delay by jointly optimizing microservice deployment and request routing via multi-instance modeling, fine-grained orchestration, and dynamic adaptation. Our queuing network model enables accurate end-to-end time analysis covering queuing, computing, and communicating delays. We then propose a delay-aware reinforcement learning algorithm, which derives the static service deployment and routing decisions. Moreover, we design an energy-aware dynamic frequency scaling algorithm, which saves energy with fluctuating request patterns. Experiment results demonstrate that our approaches significantly outperform baseline algorithms in both delay and energy consumption.