# Half V<sub>DD</sub> Clock-Swing Flip-Flop with Reduced Contention for up to 60% Power Saving in Clock Distribution

David Levacq<sup>1</sup>, Muhammad Yazid<sup>2</sup>, Hiroshi Kawaguchi<sup>3</sup>, Makoto Takamiya<sup>1</sup>, Takayasu Sakurai<sup>1</sup>

<sup>1</sup> levacq@iis.u-tokyo.ac.jp Center for Collaborative Research University of Tokyo Tokyo, Japan <sup>2</sup>Fujitsu Microsolutions Kanagawa, Japan <sup>3</sup>Kobe University Kobe, Japan

*Abstract*— A new low clock swing flip-flop (F/F) is proposed. The existing low clock-swing F/F's consume high power, introduce speed penalty due to contention currents or require large silicon area due to separate well for substrate biasing. By reducing contention currents, our proposal efficiently mitigates those issues. Measurements and simulations are carried out based on a 90 nm CMOS process, demonstrating reductions of active power by 71%, area by 36% and delay by 35% compared to previous proposals. It is shown that the combination of a lowclock swing distribution tree with the new F/F can save up to 60% of the total clock system power.

## I. INTRODUCTION

Reduction of power consumption of VLSI circuits is a growing concern. A direct solution is to reduce the system supply voltage  $V_{DD}$ . However, this can only be done at the expense of speed degradation, which can be unacceptable in high-performance systems. Another solution is to reduce the clock voltage swing without reducing  $V_{DD}$ . This has been shown to be an efficient approach to reduce power dissipation because clock distribution is a major contributor to the power dissipation in VLSI circuits (20 to 45% of the total chip power)[1]. If  $V_{DD}$  for logic circuits is kept high, this technique has little impact on speed, at the condition to have flip-flops (F/F) that can operate efficiently under reduced  $V_{CK}$ .

A first simple idea is to insert low-to-high level converters in front of conventional F/Fs to regenerate a full clock signal [2][3] (Low-to-High converter D-F/F (LHDFF), Fig. 1a). Theoretically, neglecting clock skew issues, this technique doesn't have any impact on the system performance since the logic critical path is not affected. However, it doesn't translate in large power savings since voltage swings are reduced on the clock-tree distribution lines only while the high number of low-to-high level converters consumes considerable power. Therefore, a more efficient approach would be to implement F/Fs that can directly receive a reduced swing clock. In [4], two separate half-swing clock signals are distributed across the chip: the first swinging from zero to half  $V_{DD}$  to control NMOSFETs, the second swinging from half  $V_{DD}$  to  $V_{DD}$  to control PMOSFETs. While this technique has little impact on speed, the requirement to distribute two clock signals presents some difficulties regarding routing and skew adjustment.

The previously proposed Reduced Clock Swing F/F [1] (RCSFF, Fig. 1b) requires only one clock signal swinging between 0 and a low voltage V<sub>CK</sub><V<sub>DD</sub>. However, it is clear from Fig. 1b that when clock signal is high, the clocked PMOSFETs cannot be efficiently turned off, resulting in a direct current path from V<sub>DD</sub> to ground and high power dissipation. This difficulty can be partially circumvented by connecting the n-well of clocked PMOSFETs to a high voltage bias to increase their threshold voltage (V<sub>th</sub>) and thereby reduce the leakage. But this requires to layout those transistors in a separate well and to generate and distribute a voltage bias above the standard  $V_{DD}$ , which complicates the design and increases the circuit's area. Moreover, the voltage bias that can be applied to the separate well is limited by reliability constraints. Last but not least, the prechargedischarge cycles inside the F/F result in unnecessary power dissipation when input signal is kept constant over several clock cycles. The high power consumption of RCSFF therefore strongly reduce the benefits of low clock swing to reduce the chip power dissipation.

The NAND-type Keeper F/F [5](NDKFF, Fig. 1c) doesn't require separate well and eliminates unnecessary signal transitions inside the F/F for constant input. However, two internal nodes are subject to contention. When QQ node is low while D is high, there is a fighting (contention) between the ON NMOSFETs pulldown network and the ON PMOSFET (in bold in Fig. 1c) to discharge node X at the clock rising edge. Similarly, there is a fight against the positive feedback of the latch formed by inverters I6,7 to change the state of node QQ. Our simulations show that the required sizing to guarantee functionality across all process corners despite those contentions results in suboptimal speed performance.





Figure 1. Low Clock-Swing F/F circuits (a: LHDFF [2],[3]; b: RCSFF [1]; c: NDKFF [5]; d: proposed CRFF). The numbers indicate the transistors widths in µm (unless otherwise explicitly stated, NMOSFETs widths=0.54µm, PMOSFETs widths=0.82µm, lengths=0.1µm).

In this paper we propose a new low clock-swing F/F that can operate under half- $V_{DD}$  clock swing. The F/F architecture results in reduced contention currents, which guarantees high speed and low power operation.

The rest of the paper is organized as follows. In Section II we describe our new proposal. Simulations results are presented in Section III where the properties of different low clock-swing F/Fs are compared. Experimental results are discussed in Section IV.

## II. REDUCED CONTENTION FLIP-FLOP

The new proposal, the low clock-swing Contention-Reduced F/F (CRFF), is shown in Fig. 1d. Its operation is illustrated by the timing diagram in Fig. 2. After the clock signal CK goes high, there is a short time window where pass transistors N1,2,3,4 are all ON (transparency window). Depending on the input data *D*, either node QQ or node  $\overline{QQ}$  is discharged. When N1,2 are turned off at the end of the transparency window, the CMOS latch made of inverters I6,7 is activated in order to store the data. Contention currents are reduced in two ways. Firstly, the pull-up circuit (P1,2) is controlled by the input *D* through P3,4 to reduce the contention with NMOS pass transistors. Secondly, clockdriven P5 and P6 disconnect the I6,7 CMOS latch from  $V_{DD}$  during the transparency window.

## III. SIMULATION RESULTS

A conventional D-F/F preceded by a low-to-high level converter (LHDFF), RCSFF, NDKFF and CRFF have been designed on a 90 nm bulk CMOS technology (Fig. 1). The F/Fs have been sized in order to provide highest speed while minimizing their area and guaranteeing proper operation at all process corners. Since high-V\_th ( $\pm 0.3V$ ) and low-V\_th ( $\pm$ 0.2V ) transistors were available in this particular process, those have been adequately combined for the best delay/power dissipation tradeoff. However, the F/Fs can be designed in single-V<sub>th</sub> technologies as well. NDKFF and CRFF are both pulsed F/Fs. The clock pulsewidth must be long enough to guarantee that the NMOS drivers have enough time to develop a sufficient signal on the internal nodes to flip the state of the F/F. On the other hand, a long pulsewidth results in a longer transparency window of the latch and a longer hold time. Optimal pulse was obtained with three cascaded high-V<sub>th</sub> inverters. The different F/Fs have been simulated according to the method described in [6]. 20fF capacitive loads at the data input and at the inverting and non-inverting outputs simulate the signals degradation due to other stages.



Figure 2. Timing diagram of CRFF

The static power dissipation and the active power dissipation of the different F/Fs have been extracted for different input data activity ratio  $\alpha$ , considering a 100MHz clock frequency. The results are plotted in Fig. 3. They were scaled to the power dissipation of a conventional D-F/F (i.e. LHDFF of Fig. 1a without Low-to-High converter) operating under full clock swing. For  $\alpha = 0.5$ , a pseudo-random data input sequence has been applied. The extracted power includes the power dissipated by the buffers on switching the F/Fs data and clock input capacitances, and the internal power dissipation of the F/F, excluding the power dissipated on switching the output load capacitance [6].

A first observation is that the insertion of a level converter in front of a conventional D-F/F results in a considerable power increase, strongly mitigating the expected power reduction due to low clock swing scheme. In all conditions, the new CRFF features the lowest power dissipation. By contrast. RCSFF demonstrates a high power dissipation due to the precharge cycles. Its very high static power dissipation when clock is high is due to the leakage of the clocked PMOSFETs that are not sufficiently turned off. For low activity ratios, the power dissipation of CRFF is significantly reduced compared to the conventional high clock swing D-F/F. For minimum  $\alpha = 0$ , power dissipation is less than 50% of the conventional D-F/F and 71% lower than RCSFF. This benefit adds to the power reduction in the clock tree resulting from low voltage swing on the clock distribution lines (divided by 4 in the case of half- $V_{DD}$  clock swing).

The timing characteristic that best characterizes the speed performance of a F/F is the minimum D-to-Q delay ( $t_{D-Q,min}$ , [6]). Indeed,  $t_{D-Q,min}$  represents the minimum portion of time that the F/F takes out of the clock cycle in a pipeline chain. The extracted  $t_{D-Q,min}$  of the different F/Fs are plotted in Fig. 4. As discussed before, the insertion of a LH converter doesn't degrade the optimal delay of the conventional D-F/F. RCSFF features high speed performance as well, at the price of high power dissipation. The delay of CRFF is kept to 1.6 times (i.e. 378 ps) that of the conventional high swing F/F, which is reasonable. It is 35% lower than the NDKFF delay.

### IV. EXPERIMENTAL RESULTS

To verify the functionality of the CRFF and validate the simulation results, test structures for RCSFF and CRFF have been implemented on a 1V 90nm CMOS technology. The

areas of RCSFF and CRFF are 39.23  $\mu$ m<sup>2</sup> and 25.25  $\mu$ m<sup>2</sup> respectively. The larger area of RCSFF is a consequence of the separate well for the clock PMOSFETs (Fig. 1b). The new proposal therefore results in 36% area savings. A photograph of the test chip is shown in Fig. 5.

The clock-to-Q delay ( $t_{CQ}$ ) has been extracted thanks to ring oscillator structures as illustrated in Fig. 6. During one period, every F/F undergoes both high-to-low and low-to-high transitions. The oscillation period is therefore determined by the average  $t_{CQ}$  of the F/F and by the delay introduced by the XOR gates. This delay is measured and subtracted thanks to another ring oscillator made of XOR gates. Fig. 7 shows the extracted average  $t_{CQ}$  of the F/Fs as a function of  $V_{CK}$ . Simulation results are plotted as well, demonstrating good fitting with measured data. This validates the simulation results from previous section.



Figure 3. Simulated power dissipation of low clock swing F/Fs scaled to the power dissipation of a conventional D-F/F operating under full clock swing ( $P_{DFF}$ ), for different activity ratios  $\alpha$ ;  $V_{DD}$ =1V,  $V_{CK}$ =0.5V,  $f_{CK}$ =100MHz.





Figure 5. Microphotograph of test chip with power measurement and ring oscillators (RO) structures.



Figure 6. Ring oscillator structure for t<sub>CQ</sub> measurement.







Figure 8. Measured power dissipation of F/F for different input data D conditions ( $V_{DD}$ =1V,  $V_{CK}$ =0.6V, Vwell in RCSFF = 2V).



Figure 9. Different contributions to the clock distribution system power for conventional case (i.e. full clock swing distribution with conventional F/Fs) and for Half-V<sub>DD</sub> clock swing distribution with the proposed CRFF.

The power dissipation has been extracted by measuring 60 F/F in parallel. Fig. 8 shows the measured power supplied by  $V_{DD}$  for one F/F as a function of the clock frequency ( $V_{CK}$ =0.6V) for maximum activity input data *D* (different input on every clock cycle), and constant input (*D*=0). Despite a high voltage bias ( $V_{well}$ =2V) for the separate well in RCSFF, its power dissipation is strongly larger than CRFF. Moreover, as expected, RCSFF dissipates unnecessary power when the input is constant. On the contrary, power dissipation of CRFF is efficiently reduced when input data is kept constant.

## V. CONCLUSION

A new F/F with low clock swing voltage has been demonstrated and its performance compared with previously proposed solutions by experimental measurements and simulations. Former proposals either consume relatively high power (LHDFF, RCSFF), or are slower due to contention currents that require suboptimal sizing to be robust against process variations (NDKFF). The new proposal demonstrates 36% area reduction, and up to 71% power savings compared to previous solutions.

Typically, clock tree and F/Fs account for 60% and 30% of the total clock distribution system of a VLSI chip respectively [2]. According to the results extracted in this paper, CRFF with half- $V_{DD}$  clock swing can reduce clock system power by 50-60% depending on the data activity ratio (Fig. 9). Considering a typical contribution of 40% for the clock distribution circuits to the total chip power, this translates in a reduction of total chip power by 20-25% The price to pay for performance is very low, the insertion delay penalty of the CRFF being increased by less than 60% compared with a conventional high clock swing F/F while the logic critical path remains unchanged.

## ACKNOWLEDGMENTS

This work is partially supported by MEXT and Hitachi, Ltd. The test chip has been fabricated through the chip fabrication program of VLSI Design and Education Center, the University of Tokyo, with the collaboration of STARC.

#### REFERENCES

- H. Kawaguchi and T. Sakurai, "A reduced clock-swing flip-flop (RCSFF) for 63% power reduction," IEEE J. Solid-State Circuits, 33(5), pp. 807-811, May 1998.
- [2] M. Igarashi et al., "A low-power design method using multiple supply voltages," Proc. of the ACM International Symposium on Low Power Electronics and Design, pp. 36-41, 1997.
- [3] J. Pangjun and S. S. Sapatnekar, "Low-power clock distribution using multiple voltages and reduced swings," IEEE Trans. VLSI Syst., vol. 10, no. 3, June 2002.
- [4] H. Kojima, S. Tanaka and K. Sasaki, "Half-swing clocking scheme for 75% power saving in clocking circuitry," IEICE Trans. Electron., vol. E78-C, 6, June 1995.
- [5] M. Tokomasu, H. Fujii, M. Ohta, T. Fuse and A. Kameyama, "A new reduced clock-swing flip-flop: NAND-type keeper flip-flop (NDKFF)," Proc. IEEE Custom Integrated Circuits Conference, pp. 129-132, 2002.
- [6] V. Stojanovic and V.G. Oklobdzija, "Comparative analysis of masterslave latches and flip-flops for high-performance and low-power systems," IEEE J. Solid-State Circuits, 34(4), pp. 536-548, April 1999.