查看更多>>摘要:As the end of Moore’s law approaches, chiplet integration technology (or chiplet technology) has emerged to revolutionize future semiconductor chip design. Chiplet technology provides unique advantages over 3-D-stacking technology, including a more cost-efficient and thermal-friendly integration of heterogeneous technologies. Although chiplet technologies have already begun to be used by the latest commercial chips, they have not been explored for commodity dynamic random access memory (DRAM) design yet. Harnessing its advantages for DRAM for the first time, this article evaluates the feasibility of chiplet-based DRAM architecture, considering various physical and electrical constraints imposed by a standard chiplet interface [i.e., universal chiplet interconnect express (UCIe)]. We further explore the DIMM architectures that simplify module packaging and assembly, leading to reductions in total die size and overall costs. The comprehensive cross-level analysis (i.e., device, circuit, chip, and system levels) shows that chiplet-based DRAM reduces t_RCD + t_CAS, latency-critical DRAM timing parameters, by $1.32\times $ – $1.39\times $ , at the same energy consumption. In addition, a $1.39\times $ – $2.28\times $ improvement in t_RRD is obtained. The reduced DRAM timing parameters improve the overall system performance by up to 8.8%–24.7% (geomean 3.4%–8.4%) in real-life benchmarks. The chiplet-based heterogeneous integration achieves a $1.27\times $ higher chip-level yield compared with the monolithic chip, along with up to 10% reduction in overall cost compared with traditional DIMMs at emerging process technologies.
查看更多>>摘要:Due to the immaturity of the manufacturing process, numerous faults often occur in through-silicon vias (TSVs). Prebond TSV testing is crucial in enhancing the performance and yield of chiplet-based integrated chips. However, most existing test methods suffer from the test resolution and hard-to-detect weak faults. A novel prebond TSV test method based on the pulse is proposed to improve the test circuit. By introducing pMOS as a driver in pulse detection, TSV leakage faults can be directly tested, thus improving the resolution of leakage faults’ detection. In addition, the range of test pulsewidth to digital code conversion is effectively improved by the ring oscillator (RO) for coarse detection and pulse shrinking for fine detection, avoiding the problem of large overheads that would be brought about by solely increasing the pulse shrinking chain. The results validated by HSPICE simulation show that it can detect open faults, resistive open faults with $R_{\text {open}} \gt $ $0.9~{\mathrm {K}} {\mathrm {\Omega }}$ , leakage faults with $R_{\text {leak}} \lt $ $30~{\mathrm {G}} {\mathrm {\Omega }}$ , and compound faults consisting of resistive open faults and leakage faults.
查看更多>>摘要:Machine learning-based alternate test of analog/mixed-signal integrated circuits (ICs) has been widely studied in the last decade, which has the benefits of simplifying test equipment and decreasing test costs. However, due to low reliability and accuracy, it is hard to adopt the alternate test technique in the industry. In this article, a model splitting approach (MDSP approach) is proposed to improve the reliability and accuracy of the alternate test. The machine learning-based estimation model is “split” into two models with “complementary” performance (a “positive” model and a “negative” model). The “positive” model generates estimations that are no smaller than label values, while the “negative” model outputs estimations that are no larger than label values. Estimations with excessive differences between two models are identified as suspected estimations with large errors and filtered out. The rest results of “complementary” models are averaged to generate the final estimations. By comparing estimations of two models, the estimations with large error are filtered out effectively, and the estimation accuracy is improved significantly by fusing the results of two estimators. The MDSP approach is investigated with data from the commercial analog-to-digital converter and operational amplifier (OP). Results demonstrated that the proposed approach can improve test reliability and accuracy significantly.
查看更多>>摘要:This article presents the conception, design, and realization of a fully differential two-stage CMOS amplifier, that is, unconditionally stable for any value of the capacitive load. This is simply achieved by sending a scaled replica of the output stage current to the amplifier virtual ground in order to create a left half-plane (LHP) zero in the loop gain that either cancels or tracks the output pole in all process, voltage, and temperature (PVT) conditions. Consequently, from a stability point of view, the amplifier behavior resembles that of a single-pole OTA. Starting from an existing two-stage gain-programmable amplifier, designed in a 0.18- $\mu $ m bipolar-CMOS-DMOS (BCD) process that was able to drive only 10 pF without encountering into stability issues, a simple circuit has been added to extend the stability to any capacitive load value. An interesting and unusual method, based on the frequency behavior of the unloaded closed-loop amplifier output impedance, has been introduced to further verify the unconditional stability of this solution. Measurements show a high degree of stability in any load conditions. In the used 0.18- $\mu $ m BCD technology, silicon area and current consumption of the extra circuit are only 0.0004 mm2 and $2~\mu $ A, respectively, with a 5-V power supply.
查看更多>>摘要:We propose a low-power voltage reference that enables independent adjustment of temperature sensitivity and output level. This design enhances the temperature sensitivity without impacting the output level distribution, in contrast to previous methods. The proposed circuit achieves this by integrating a separate control system that utilizes diode-connected pMOS transistors and an analog multiplexer for output level adjustment, along with biasing current control to improve the temperature sensitivity. In a 180-nm CMOS process, the prototype circuit generates a stable reference voltage averaging 192 mV, maintaining an accuracy of ±8.8 mV ( $\pm 3\sigma $ ) from 0 °C to 75 °C across ten samples. In addition, it consumes only 35.8 pW at 0.6 V and 25 °C.
查看更多>>摘要:Delay lines often face challenges due to input-output nonlinearity and excessive voltage-to-time gain, leading to inaccurate voltage indications and a limited input voltage range. This article presents a complementary voltage-to-time converter (VTC) with an optimized voltage scaling circuit to address these issues. The complementary VTC utilizes both input-voltage-sourced and input-voltage-referenced delay lines. Although each delay line has inherent nonlinearities, the opposite signs of their respective voltage-to-time gains effectively reduce the overall nonlinearity. To further enhance performance, an optimized voltage scaling circuit is incorporated, refining nonlinearity and expanding the input voltage range. Experimental results using UMC 0.18- $\mu $ m technology demonstrate that the proposed circuit achieves excellent linearity, an extended nearly rail-to-rail input voltage range, and robustness against process variations. The VTC achieves a voltage-to-time gain of 13.27 ps/mV, signal to noise and distortion ratio (SNDR) of 32.4 dB, and maintains stable dynamic performance across the working frequency band.
查看更多>>摘要:Recognizing the explosive increase in the use of artificial intelligence (AI)-based applications, several industrial companies developed custom application-specific integrated circuits (ASICs) (e.g., Google TPU, IBM RaPiD, and Intel NNP-I/NNP-T) and constructed a hyperscale cloud infrastructure with them. These ASICs perform operations of the inference or training process of AI models which are requested by users. Since the AI models have different data formats and types of operations, the ASICs need to support diverse data formats and various operation shapes. However, the previous ASIC solutions do not or less fulfill these requirements. To overcome these limitations, we first present an area-efficient multiplier, named all-in-one multiplier, which supports multiple bit-widths for both integer (INT) and floating-point (FP) data types. Then, we build a multiply-and-accumulation (MAC) array equipped with these multipliers with multiformat support. In addition, the MAC array can be partitioned into multiple blocks that can be flexibly fused to support various deep neural network (DNN) operation types. We evaluate the practical effectiveness of the proposed MAC array by making an accelerator out of it, named All-rounder. According to our evaluation, the proposed all-in-one multiplier occupies $1.49\times $ smaller area compared to the baselines with dedicated multipliers for each data format. Then, we compare the performance and energy efficiency of the proposed All-rounder with three different accelerators showing consistent speedup and higher efficiency across various AI benchmarks from vision to large language model (LLM)-based language tasks.