MEASURING THE WEIGHTED POWER OF CMOS LATCHING CIRCUITS Dušan Raič, Faculty of Electrical Engineering, Ljubljana, Slovenia Key words: microelectronics, IC, integrated circuits, integrated circuits design, weiglited powers, power measuring, CMOS digital integrated circuits, latching circuits, CMOS latching circuits, low-power design, circuit optimization, performance evaluation, PDP, Power-Delay Product, EDP, Energy-Delay Product, flip-flop circuits, Hspice simulators, energy metrics Abstract: We propose a method for power consumption evaluation of CMOS latching circuits, based on the weighting of individual energy-related parameters. By assigning appropriate weighting factors for clock and data inputs the circuit evaluation can be carried out in the context of overall system performance. The clock weighting factor is defined as the power ratio of a complete clocking system and the power needed to drive the clock inputs in the circuit. This factor is found to be =1,8 for a representative CMOS technology with optimally designed clock driver. We show how power parameters of a circuit can be measured and weighted in Hspice environment to evaluate the circuit power, PDP or EDP products. Finally we present comparative results for some well-known CMOS latching circuit types. Merjenje porabe moči v pomnilnih strukturah CMOS Ključne besede: mikroelektronika, IC vezja integrirana, IC snovanje vezij integriranih, moči utežene, merjenje moči, CMOS vezja integrirana digitalna, vezja držalna, CMOS vezja držalna, snovanje za moči male, optimiranje vezij, vrednotenje zmogljivosti, PDP produkt odloga moči, EDP produkt odloga energije, vezja prevesna, Hspice simulatorji, metrika energije Povzetek: Predlagamo metodo za ocenjevanje porabe moči v pomnilnih strukturah CMOS na osnovi utežnostnih faktorjev za posamezne kategorije moči. Uporaba utežnostnih faktorjev omogoča ocenjevanje vezja v luči lastnosti celega sistema. Utežnostni faktor za vhod urinega signala definiramo kot razmerje med porabo moči celotnega podsistema za krmiljenje ure in med močjo, ki jo potrebujemo za krmiljenje vseh urinih vhodov. Za ta faktor ugotovimo, da se giblje okoli vrednosti =1.8 za tipično tehnologijo CMOS ob uporabi optimalno načrtovanega ojačevalnika signala ure. Prikazano tehniko za mejenje utežene povprečne porabe moči, faktorjev PDP in EDP ilustriramo s stavki iz simulatorja Hspice. Na koncu prilagamo izmerjene vrednosti za nekatere znane izvedbe pomnilnih celic CMOS. 1. Introduction Memory cells determine to a large degree the speed and power characteristics of digital systems. They represent relatively large cells that are repeated many times, consume considerable power because of the clock activity and underlay the key architectural decisions of the system (clock distribution scheme, static/dynamic operation, pipelining, etc). In order to improve the speed and power consumption, a number of CMOS latching schemes have been analyzed and new design concepts have been proposed in recent years. However, it is frequently very difficult to compare the design efficiency of these solutions because we are faced with many parameters that are hard to match. The ultimate common cost function is therefore the energy that must be spent to complete the desired function within the limits of the design specification. Similar problems as with the circuit comparison are addressed also when we try to formulate the cost function for the circuit optimization. In fact the technique described in this work can be applied in both cases. When authors compare new concepts with known circuit techniques they rely either on hand calculated data /5/,/10/ or on a number of simulation runs with different data statistics /12/,/13/. The details of power measuring technique are frequently not well documented so that it is hard to reproduce the results. In the present work we try to provide a realistic measure of circuit performance based on power consumption that is weighted against the overall system performance. Different circuit techniques can be therefore compared with a single quantitative measurement. The first section presents a short discussion of performance measurement based on power-delay and energy-delay products. We proceed with individual power consumption components of CMOS circuits and show how they can be monitored in the Hspice environment. Futlher attention is devoted to the power that is needed to drive the clock and data inputs of the latching cells. We calculate the clock weighting factor and show how the data activity can be used as the data weighting factor to reflect signal statistics on the cell performance evaluation. In the last section we show the implementation details and Hspice code fragments to calculate the weighted PDP, EDP or average circuit power. Experimental results are presented for some representative latching circuits. 2. The energy metrics The most common quality measure for logic gates is the energy consumed by the circuit per switching event, usually called the Power-Delay Product (PDP). PDP = E3„ = fT P(t)dt (1) where T is the duration of the switching event. The switching event is normally defined as the low to high and high to low signal transition /1/,/6/. In an ideal CMOS gate without the second order effects this is equal to PDP = CLV|d The problem with PDP is that it ignores the actual speed of the circuit. For a circuit designer the name "power delay" can be misleading as the calculated value has no direct relation with the signal delay in the circuit; it is related to the duration of the switching event that sets the limits for the power integration interval. Faster circuits have larger PDP values because of higher currents involved in the switching process. PDP can be used as an adequate circuit quality measure only under the assumption that the duration of the switching event fits the design specifications and needs no modification. A typical example are ring oscillators that are usually used to measure the PDP and speed limits of a logic family. To combine the circuit speed (or the "specification" speed), characterized by the signal delay Tspec with the power consumption it is better to use the Energy-Delay Product (EDP) as proposed in /4/: EDP = signal delay switching event switching event energy .PDP • T spec (3) The EDP definition implies that the quality of a circuit with PDPa and Tsa is equal to the quality of another circuit with PDPb = PDPa/k and Tsb = k Tsa . In general the speed/power tradings are not linear /11/ and allow various circuit techniques with different topologies to compete for the best solution of a design case. As the power supply voltage is constant the switching energy can be calculated from the total charge flow on the power supply; PDP = P(t)dt = V,Jldd(t)dt = Vdd-Q3w •'0 (4) The procedure to measure Qsw in Hspice simulation is presented in section 3. Tspec is measured with standard .measure statements for delays relevant to the circuit application. The power consumption of a CMOS cell can be considered as the sum of 3 components: » Internal power (node parasitics and DC currents) ® Power to drive the inputs ® Power to drive the loads Of course it is possible to measure the sum of all power components from the power supply current Idd, but this would eliminate the possibility to weight individual components. As we deal with the latching circuits it is of particular interest to weight separately the power consumed by the clock and data inputs. This can be realised by monitoring some internal circuit currents. In Hspice the easiest way to do this is to insert dummy voltage sources of 0 DC volts in the circuit branches, as presented on fig 1. All loads associated with core CMOS logic cells are combinations of some capacitive parasitics to Vdd (Cp) and to Vss (Cn). Figure 2 shows currents associated with the switching event on such loads. As we see from table 1, the power to drive the load can be calculated either from the power supply current Idd or from the measurement of the driver current Id. The third possibility is to measure the input current to the load Ii. Because of opposite polarization the average value of this current is 0. However in Hspice it is also possible to measure the absolute current value which would in our case provide the double of the load power. As the capacitive load power is reflected on the cell supply current, the measurement of the Ipp on fig. 1 provides the sum of internal cell power and the load ©i Idd Fig. 1: Dummy voltage sources (Vdr, Vpp, Vin, Vid) to determine individual cell power components Table 1: Power consumption on the CMOS load Power monitor 0 1 1 ^ 0 E — hd ■ Vdd Vd'd ■ Cn V'L-Cr, Vd'd ■ (Cn + C,) Id ■ Vdd .0 Vd'd ■ {Cn + C,) h ■ Vdd 0 \Ii\ ■ Vdd Vld-{Cn + C,) Vd'd ■ (Cn + C,) 1 ->0 0-> driver Vd load Id IP Vi / ---- I ' In Cp Cn Ö Vdd Vdd Ii = -Ip -In Id = 0 Idd = -Ip Fig. 2: Current flows on the capacitive CMOS load Ii = Ip + In Id = Ip -f In Idd = -In power. The measurement of lid is therefore not needed. To calculate the last component, the cell input power, we have two possibilities. One isto build an ideal driving stage without parasitics that consumes no internal power and calculate the power from the supply current Idr of that driver. The second possibility is to measure the absolute value of the input current lin and divide the result by 2. This is more convenient as we don't need extra device models and can use real circuit components to simulate the switching event. 3. Weighting the input power components If we look at the circuit performance from the system level we must recognize that the price for various power components is not the same. To reflect the global system features such as the clock distribution scheme and signal statistics, the circuit evaluation function must be formulated with weighted input power components. To drive clock inputs in synchronous systems we normally need special drivers and distribution networks. The system power penalty is therefore much higher than the sum of locally measured powers on clock inputs. To make an assessment of the real power requirement, consider the internal capacitance of an optimally designed N-stage buffer /6/ with tapering factor u and input capacitance Ci. The equivalent gate capacitance is u-1 (5) We get a similar expression for internal diffusion capacitance if we assume that it is equal to 5 times the buffer input gate capacitance: Cdift - S ■ Cgate (6) The optimal buffer drives the load Cl = u^ ■ Ci and dissipates the power ^driver " ('^gate + ^diff + ^LjVj^ ■ f - (7) 11™ -I ^ ' u-1 CrVd'd-f The measured input power in the cells would be that of the driver load Cl: Pcells=CL-V|d-f = U^-Ci.V2,.f (8) The ratio between the total power consumed by the driver and the measured power on clock inputs is defined as the clock weighting factor Wc: w^ = p ' driver Pcelis (uN-1)-(1 + 5) (9) = 1 + --- u^-(u-l) The number of stages N depends on the size of the clock network, while the optimum tapering factor u is known to be close to 3.6 for technologies with 5= 1. For N values greater than 2 we can then assume u^-l =uN so that the clock weighting factor can be approximated to w. l-hS u~1 (10) With 5=1 and u=3.6the calculated value is 1.77. It must be pointed out that this is still an optimistic value since we have neglected the wiring capacitance of the clock distribution network and have assumed an optimal driver. Another important issue regarding the input power measurements is the signal activity alpha, defined as the number of complete signal transitions per clock cycle /13/. The clock itself has a = 1 while other input signals may have various values depending on the nature of the system. If the signal comes from another latching cell the maximum value for alpha is 0.5, as data can change once per clock cycle at maximum. A random binary signal has a = 0.25. The natural tendency of binary coded signals is that high-order bits have lower activity than low-order bits. The signal activity depends also on the circuit structure; adder outputs for example may have activities much higher than 1 due to the carry propagation transients. If such signals are connected to the data inputs of flip-flops the toggle power can significantly influence the selection of the optimal latching circuit. When we simulate the switching event of a latching circuit we apply one logic-high and one logic-low state to the data input for each clock cycle. The signal activity of the data input in the switching event is therefore equal toO.5 (fig. 5).To compensate for the difference between the real application and the calculated input power in the switching event we define the data input weighting factor as Wh = ® system 0.5 (11) It is obvious that some statistical properties of system signals must be known to make better cell comparisons. If this is notthecase, wd = 1 can be used as worst case input power weight. Methods to determine alpha are described elsewhere /14/, /15/. 4. The power measurement technique According to (4) the measurement of average power consumption of CMOS cells with constant power supply is reduced to the calculation of the equivalent charge flow across the power supply. Figure 3 shows the necessary setup to automate this procedure in the Hspice simulator /6/. We insert a new node qt, connected to the measuring capacitor Cq and the current-controlled current source Fp. The later is controlled by the cell supply current (measured on Vp), multiplied by the gain factor. Cq and the gain must be set to scale the voltage Vqt on node qt in the range of reasonable values for the given circuit type, otherwise we will experience difficulties with the numeric precision of the simulator. A good choice is to use gain = 1 and Cq = IfF, which scale Vqt to 1V for 1fC of charge flow. The voltage on node qt is then given by c. ■gain.|p(t)dt = ^p(t) (12) The total charge flow across Vp during the switching event with the duration Tfin is given by Qsw = ''""lp(t)dt = Qp(T,,) = ^V„(T,,) (13) gain As we see, Qsw can be measured as the voltage on node qt at the end of the switching event, multiplied by the scaling factor Cq/gain. One should not forget to set the initial condition on Cq to 0 volts. The whole procedure requires four Hspice statements: Cq qtO If Fp 0 qt Vp 'gain' .ic V(qt) 0 .measure tran Qsw max V(qt) from=0 to='Tfin' Vp cell 0" Cq .ic v(qt) 0 Cq ql 0 If Fp 0 qt Vp .mea.sure tran Qsw max v(qt) trom^O to-"t fin' Fig. 3: Measuring tine equivaient charge flow of a celi Similar technique can be used to measure the charge flow into the cell inputs. In this case we use the absolute value of the controlling current and divide the result by 2. This is done inside the 'F' statement by the 'abs = 1' modifier and the gain multiplied by 0.5. Referring to fig. 4, the charge flow for input CI would be modeled by C, Fc 0 qt Vc 'gain*0.5*wc' abs = 1 We can get the sum of all charge flows needed for the cell power calculation if we connect the relevant current-controlled current sources to the same node qt. When weighting is required, the gain in the F statement is multiplied by the corresponding weighting factor. In this case the voltage Vqt(T) represents the total weighted switching charge of the circuit. Once Qsw is known, one can calculate the average power, PDP or EDP from (3) and (4). The appropriate 'measure' statements would be .measure PWR param = 'Vdd*Qsw*Sq/Tfin' .measure PDP param = 'Vdd*Qsw*Sq' .measure EDP param = 'Vdd*Qsw*Sq*Tspec' The power measuring setup for a CMOS latching cell is presented on fig. 4. Typical waveforms for signals, power supply current and the simulated switching charge are presented on figure 5. Table 2 presents illustrative results for four different M/S static flip-flop structures: the switched inverter or C^MOS type /6/ (fig. 6 a), the pass-transistor type /17/ (fig, 6 b), a variant of the RAM cell type /18/ (fig. 6 c) and the SSTC type /10/ (fig. 6 d). All circuits were simulated with minimum size transistors. To make meaningful evaluations some optimization should be done in the given system environment. This is especially true for the flip-flop types c and d. With gain = 1 the scaling factor Sq is equal to Cq.Tfin is the duration of the simulated switching event. If EDP is needed, Tspec is measured within the same simulation as some combination of the delay times of interest. - - C' ) VlJ 'vy o •ic v(c|t) 0 Cq qi 0 11 F[) Ü (jt Vp i Fd 0 q\ V(1 'wd'O.?' ;it.Ks=l Fc 0 (jl Vc 'wc*0.5' iibs=l ,mca<.iirc tran Qsw max v(qt) from=0 to^'Tfli Table 2: Simulation results for some minimum-size static flip-flops technology w/l Vdd T Ci Wc Wd Conditions: 0.6u, typical mean 0.8u/0.6u, all transistors 3V 25 deg. C BfF 1.77 measured value type A type B 1 type C { type 0 ; 1 ;l avg. power 12,46E-6W 10.62E-6 W 15.48E-6 W 42.10E-6 W PDP 1.12E-12J 0.95E-12J 1 1,39E-12J 3,79E-12J Ii EDP 1.40E-21 Js 0.93E-21 Js 1.85E-21 Js 3.50E-21 Js Fig. 4: Measuring the weighted charge flow of a CMOS latch To characterize flip-flops and latches the switching experiment must assure both high low and low high transitions of the outputs which means that input signals must exhibit certain waveforms. We recommend the following rules for the switching experiment: ® The circuit must be loaded with realistic loads. " The initial states of all nodes must be equal to the final states so that an even number of state transitions is involved. ® Inputs must be driven by realistic drivers. In spite of the fact that input currents are measured and contribute to the total power consumption the rise and fall times of ideal voltage sources do not change with loading. Measurement of circuit speed based on heavy loading of ideal input signals would give unrealistic EDP values because of the nonlinear speed-power trading in real drivers. 5. Conclusion The proposed evaluation technique for CMOS latching circuits is based on weighting the energy-related parameters. Special attention has been paid to the power required to drive the clock and data inputs. We show how this power can be weighted to reflect the overall system performance on the circuit under investigation. Fragments of Hspice code illustrate the calculation of average power, PDP and EDP products. All these parameters are measured by the simulation of a simple switching event along with the delay times. The technique can be used to simplify the optimization process and to improve the comparison of different latching schemes. Wave Symbol! i DO:AO:v(cp) o > 3 2.5 2 1.5 1 500m 0 ________Wave DO:AO:v(d) Symbgl^i > 3 2.5 2 1.5 1 500m 0 Wave DO:AO:v(q) Symbol o > 3 2.5 2 1.5 1 500m 0 10n 20n 30n 40n 50n 60n 70n 80n Time (lin) (TIME) 90n 0 10n 20n 30n 40n 50n 60n Time (lin) (TIME) 70n 80n 90n immmmmsrnrnm^mm^simm^m^'^ Panel 3 0 lOn 20n 30n 40n bUn 60n 70n 80n Time (lin) (TIME) 90n VVave DO:AO:i(vp) Symbol: I00u o 50u 0 lOn 20n 30n 40n 50n 60n 70n 80n Time (lin) (TIME) 90n Wave„_____ DO:AO:v(qt) ^mboii 300 250 200 8) 150 03 =5 100 > 0 lOn 20n 30n 40n 50n 60n 70n BOn Time (lin) (TIME) 90n Fig. 5: Simulation of the weighted charge flow in a CMOS latch dk O d O cikn O- d O clkn O- clk o elk, O O- a.) Switchecl-inverter type tu > b.) Pass-transistor type - > c.) A variant of RAM-cell type i HE HS d.) SSTC type Fig. 6: Examples of CMOS static MS flip-flops O q -o q -o q vWd —a < —c 1 fv^ --—[ 6. References /1/ M. Shoji, CMOS Digital Circuit Teciinoiogy, Prentice Hall, Englewood Cliffs, NJ, 1988, ISBN 0-13-138850-9 025. /2/ J. Yuan and C.Svensson, "High-speed CMOS circuit technique," IEEE Journal of Solid State Circuits, vol. 24,1989, pp. 62-70. /3/ N. Weste and K. Eshraghian, Principles of CMOS VLSI design: a systems perspective, Addison-Wesley, 1993, ISBN 0-201-53376-6 /4/ M.Horowitz, T.lndermaur, and R.Gonzales, "Low-Power Digital Design," in IEEE Symp. Low Power Electr., Oct. 1994, pp. 8-11. /5/ P.Day and J.V.Woods, "Investigation into Micropipeline Latch Design Styles," IEEE Transactions on VLSI Systems, vol. 3, 1995, pp. 264-272. /6/ N. M. Rabaey, Digital Integrated Circuits: a design perspective. Prentice-Hall, Englewood Cliffs, NJ, 1996, ISBN 0-201-53376-6 /7/ C.Tretz and C.Zukowski, "CMOS Transistor Sizing for Minimization of Energy-Delay Product," Proceedings, Sixth Great Lakes Symposium on VLSI, 1996, pp. 168-173. /8/ R.Gonzales and M.Horowitz, "Energy Dissipation In General Purpose Microprocessors," IEEE Journal of Solid State Circuits, vol. 31, 1996, pp. 1277-1283. /9/ M.Atghahi, "A Robust Single Phase Clocking for Low Power, High-Speed VLSI Applications," IEEE Journal of Solid State Circuits, vol. 31, 1996, pp. 247-253. /10/ J. Yuan and C.Svensson, "New single-clock CMOS latches and flipflops with improved speed and power savings," IEEE Journal of Solid State Circuits, vol. 32, 1997, pp. 62-69. /11/ R.Gonzales, B.Gordon, and M.Horowitz, "Supply and Threshold Voltage Scaling for Low Power CMOS," IEEE Journal of Solid State Circuits, vol. 32, 1997, pp. 1210-1216. /12/ R. Zimmermann and W.Fichtner, "Low-power logic styles: CMOS versus pass-transistor logic," IEEE Journal of Solid State Circuits, vol. 32, 1997, pp. 1079-1090. /13/ C.Svensson and J.Yuan, "Latches and Flip-flops for Low Power Systems," in Low-Power CMOS Design, A.Chandrakasan and R.Brodersen, editors, IEEE Press, 1998, ISBN 0-7803-3429-9. /14/ /15/ /16/ /17/ /18/ C.Tretz and C.Zukowski, "Conservative Modeling of the Con tribution of Spurious Transitions to Power Dissipation in Digital CMOS VLSI Circuits," Proceedings, IEEE 39.th Midwest Sym posium on Circuits and Systems, 1997, pp. 317-320. D.Singh, J.Rabaey, M.Pedram, F.Catthoor, S.Rajgopal N.Sehgal, T.Mozdzen, "Power Conscious CAD Tools and Methodologies: A Perspective," Proceedings of the IEEE, vol 83, 1995, pp. 570-592. Meta-Software, HSPICE User's Manual, Meta-Software, Inc. Campbell, CA, 1992 G.Gerosa, et. al., "A 2.2 W, 80 MHz Superscalar RISC Micro processor," IEEE Journal of Solid State Circuits, vol. 29, no 12, December 1994, pp. 1440-1452. L.Wai, et. al., "A 1-V Programmable DSP for Wireless Communications," IEEE Journal of Solid State Circuits, vol. 32, no. 11, November 1997, pp. 1766-1774. Dr. Dušan Raič, dipl. ing. University Ljubijana, Faculty of Electrical Engineering 1000 Ljubljana, Tržaška 25 Slovenia email: dusan.raic@fe.uni-lj.si Prispelo (Arrived): 06.10.1998 Sprejeto (Accepted): 12.10.1998