# Intermittent Resonant Clocking Enabling Power Reduction at Any Clock Frequency for Near/Sub-Threshold Logic Circuits

Hiroshi Fuketa, Member, IEEE, Masahiro Nomura, Member, IEEE, Makoto Takamiya, Member, IEEE, and Takayasu Sakurai, Fellow, IEEE

Abstract—In order to eliminate the limitation of a narrow frequency range of conventional resonant clocking, intermittent resonant clocking (IRC) is proposed for near/sub-threshold logic circuits. In this paper, IRC is applied to 0.37 V 32-bit adder array with latches and adder array with flip-flops fabricated in a 40 nm CMOS process. Measurement results show that IRC reduces the clock power by 36% at 980 kHz and the clock leakage power by 81% compared with conventional non-resonant clocking when IRC is applied to the adder array with latches. The same power reduction is achieved when IRC is applied to the adder array with flip-flops. IRC can reduce the clock power at any clock frequency, which enables flexible selection of the clock frequency.

*Index Terms*—LC resonance, near-threshold circuit, resonant clocking, subthreshold circuit, ultra-low power.

# I. INTRODUCTION

C URRENTLY, near/sub-threshold circuits have been drawing a lot of attentions for ultra-low power applications, because an operation at ultra-low supply voltage  $(V_{DD})$  near or below threshold voltage  $(V_{TH})$  achieves an energy minimum operation [1]–[3]. The clock power occupies a large portion of the total chip power [4], and hence reducing the clock power is effective for attaining energy-efficient near/sub-threshold logic circuits. Resonant clocking [5]–[11] is one of possible solutions to reduce the clock power. In this paper, a new resonant clocking for near/sub-threshold circuits is proposed.

Fig. 1(a) shows conventional non-resonant clocking with a buffered H-tree. For non-resonant clocking with the buffered tree, however, skew variation significantly increases as  $V_{DD}$ 

Digital Object Identifier 10.1109/JSSC.2013.2294172

is reduced due to manufacturing variability [12], and hence non-resonant clocking with an unbuffered H-tree illustrated in Fig. 1(b) is a better clocking technique for near/sub-threshold circuits.

In contrast to these conventional non-resonant clocking techniques, Fig. 1(c) shows conventional resonant clocking with an unbuffered H-tree [7], [8]. This conventional resonant clocking can reduce the clock power, whereas it has the following three problems.

- 1) The clock frequency  $(f_{CLK})$  range is narrow, typically  $\pm 20\%$  [6]–[8]. At  $f_{CLK}$  lower than the resonant frequency  $(f_{RES})$ , power higher than that of non-resonant clocking and a functional error due to ringing are observed [5], which prohibit dynamic frequency scaling (DFS) and a low-speed test.
- f<sub>CLK</sub> values in previous papers [5]–[10] range from 100 MHz to 5 GHz and a less-than-1-MHz operation is difficult to achieve, because a large inductance for the resonance is required, which prevents the application of resonant clocking to near/sub-threshold logic circuits.
- 3) A sinusoidal clock waveform increases the clock skew due to the low clock slew rate. In addition, such sinusoidal clock increases the short circuit power [11].

To solve these problems, intermittent resonant clocking (IRC) is proposed in this paper. The implementation of IRC is shown in Fig. 1(d). In conventional resonant clocking, the clock period ( $T_{\rm CLK}$ ) is equal to the resonant period ( $T_{\rm RES}$ ). In contrast, in the proposed IRC,  $T_{\rm CLK}$  is larger than  $T_{\rm RES}$ . Therefore, IRC can solve the abovementioned three problems as follows.

- 1) The selection of  $f_{\rm CLK}$  is flexible, and power reduction is achieved at any clock frequency. Thus, DFS and low-speed testing are possible, and IRC can be applied to near/subthreshold logic circuits.
- 2) The required inductance for the resonance is reduced to  $(T_{\rm RES}/T_{\rm CLK})^2$  of conventional resonant clocking. For example, if  $T_{\rm RES}$  is 1/20 times as small as  $T_{\rm CLK}$ , the inductance can be reduced by a factor of 1/400.
- 3) A small clock skew owing to a high clock slew rate is obtained. The slew rate is increased to  $T_{\rm CLK}/T_{\rm RES}$  of conventional resonant clocking. Thus, the short circuit power can be also reduced.

In IRC, the clock is a pulse signal as shown in Fig. 1(d), and hence latches are used. Although the clock is not a signal with a 50% duty cycle, IRC can be applied to circuits with flip-flops

0018-9200 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.

See http://www.ieee.org/publications\_standards/publications/rights/index.html for more information.

Manuscript received July 20, 2013; revised October 20, 2013; accepted November 24, 2013. Date of publication January 02, 2014; date of current version January 24, 2014. This paper was approved by Associate Editor Stefan Rusu. This work was carried out as a part of the Extremely Low Power (ELP) project supported by the Ministry of Economy, Trade and Industry (METI) and the New Energy and Industrial Technology Development Organization (NEDO).

H. Fuketa is with the Institute of Industrial Science, University of Tokyo, Tokyo 153-8505, Japan.

M. Nomura was with the Semiconductor Technology Academic Research Center (STARC), Kanagawa 222-0033, Japan. He is now with Renesas Electronics Corporation, Kanagawa 211-8668, Japan.

M. Takamiya is with the VLSI Design and Education Center, University of Tokyo, Tokyo 153-8505, Japan.

T. Sakurai is with the Institute of Industrial Science, University of Tokyo, Tokyo 153-8505, Japan.



Fig. 1. Block diagram of (a) conventional non-resonant clocking with buffered tree, (b) conventional non-resonant clocking with unbuffered tree [12], (c) conventional resonant clocking [7], [8], and (d) proposed intermittent resonant clocking (IRC).

(FFs). The similar concept of the proposed IRC has been proposed in [10]. However, resonant clocking proposed in [10] does not focus on application to near/sub-threshold circuits, and its effectiveness is not validated in silicon.

In this paper, the proposed IRC is applied to 32-bit adder array fabricated in a 40 nm CMOS process. The preliminary work of this paper has been presented in [13]. The contributions of this paper beyond the preliminary work are 1) clock power reduction by IRC is analytically evaluated and 2) clock power reduction of the adder array with FFs by IRC is measured in addition to that with latches. Measurement results show that IRC can reduce the clock power at any clock frequency. IRC reduces the clock power by 36% at 980 kHz and the clock leakage power by 81% at  $V_{\rm DD} = 0.37$  V compared with conventional non-resonant clocking when IRC is applied to the adder array with latches. It is shown that the same power reduction can be achieved when IRC is applied to the adder array with FFs.

This paper is organized as follows. Section II explains the details of the proposed IRC. Measurement results are described in Section III. Finally, Section IV concludes this paper.



Fig. 2. Schematic of IRC buffer shown in Fig. 1(d).

## II. INTERMITTENT RESONANT CLOCKING (IRC)

## A. Overview

The implementation of IRC is shown in Fig. 1(d). As shown in this figure, The IRC buffer drives the clock node (CKB) and its circuit schematic is shown in Fig. 2. Pulse and  $C_{CLK}$  denote the input signal of the IRC buffer and the capacitance of CKB,



Fig. 3. Timing chart of (a) IRC at step input and (b) actual IRC operation at pulse input.

respectively. Compared with conventional resonant clocking, the inductor is moved from the output of the buffer (Fig. 1(c)) to the bottom of the buffer (Fig. 1(d)). A voltage doubler is added to overdrive the gate of transistor M1. The details of the voltage doubler will be explained later in Section II-C.

Fig. 3 shows timing waveforms for IRC. To illustrate the operating principles of IRC, Fig. 3(a) shows timing waveforms when a step input is applied. Fig. 3(b) shows actual IRC operation with a pulse input. When Pulse is high, IRC is in a resonant mode. In contrast, when Pulse is low, IRC is in a non-resonant mode. In IRC, both the resonant mode and the non-resonant mode are mixed within a clock period ( $T_{\rm CLK}$ ) and  $T_{\rm CLK}$  is larger than  $T_{\rm RES}$ , thereby enabling the flexible selection of  $f_{\rm CLK}$ .

The pulse width ( $T_{PW}$ ) of Pulse shown in Fig. 3(b) is an important parameter for IRC to minimize the clock power. In Fig. 3(a), after the step input, CKB shows ringing at around 0 V. To minimize the clock power, the highest voltage in the ringing shown as "Point A" in Fig. 3(a) indicates the best timing for  $T_{PW}$ , because the power required to pull up CKB to  $V_{DD}$  is minimized at Point A. The timing of Point A corresponds to  $T_{RES}$ , as shown in Fig. 3(b). In contrast, the lowest voltage in the ringing at  $T_1$  indicates the worst timing at which the clock power is maximized. Similarly,  $T_2$  indicates the second worst timing.

# B. Analytical Evaluation of Power Reduction by IRC

Fig. 4 shows an equivalent circuit of the resonant mode of IRC. In this equivalent circuit, transistor M1 of the IRC buffer



Fig. 4. The equivalent circuit of IRC buffer in resonant mode.



Fig. 5. Waveform of  $V_{CKB}$  in Fig. 4 when transistor M1 is turned on at t = 0.

(Fig. 2) is represented by an ideal switch and an equivalent resistance  $(R_{\rm ON}).$   $V_{\rm CKB}(t)$  denotes the voltage of the clock node (CKB in Fig. 1(d)). Fig. 5 shows  $V_{\rm CKB}(t)$  when the ideal switch in Fig. 4 is turned on at t=0. In this case, the circuit is equivalent to a series RLC circuit, and hence  $V_{\rm CKB}(t)$  can be expressed as

$$V_{\rm CKB}(t) = V_0 e^{-((R_{\rm ON} + R_{\rm CLK})/2L)t} \cos 2\pi f_0 t \qquad (1)$$

where  $V_0$  is the initial voltage of CKB at t = 0.

 $V_0$  is ideally equal to  $V_{\rm DD}.$  For near/sub-threshold circuits, the clock frequency and the resonant frequency are much slower, which requires large inductance. Thus, it is inevitable that the inductor is implemented off-chip, which consequently causes large parasitic capacitances of such off-chip inductor itself and a pad required to implement it. Fig. 6 explains the influence of the parasitic capacitance on  $V_0$ . In the non-resonant mode, the capacitance of the clock node  $(C_{\rm CLK})$  is charged up to  $V_{\rm DD}$ . When the mode is switched to the resonant-mode at t=0, the charge of  $C_{\rm CLK}$  is shared with the parasitic capacitance  $(C_{\rm PAR})$ . Thus, the initial voltage of  $V_{\rm CKB}$   $(V_0)$  in the resonant mode decreases and is given by

$$V_0 = \frac{C_{\rm CLK}}{C_{\rm PAR} + C_{\rm CLK}} V_{\rm DD}.$$
 (2)

In IRC, the clock waveform is expressed as (1) when the following condition is satisfied.

$$R_{\rm ON} + R_{\rm CLK} < 2\sqrt{\frac{L}{C_{\rm CLK} + C_{\rm PAR}}}.$$
 (3)



Fig. 6. Voltage reduction due to parasitic capacitance (C<sub>PAR</sub>) of off-chip inductor. Charge of C<sub>CLK</sub> is shared with C<sub>PAR</sub>.

In this case, the resonant frequency  $(f_{RES})$  is written as

$$f_{\rm RES} = \frac{1}{2\pi\sqrt{L\left(C_{\rm CLK} + C_{\rm PAR}\right)}} \tag{4}$$

 $f_0$  in (1) is given by the following equation and can be approximated as  $f_{RES}$  when the resistance is sufficiently small in (3).

$$f_0 = \frac{1}{2\pi} \sqrt{\frac{1}{L \left(C_{\text{CLK}} + C_{\text{PAR}}\right)} - \frac{1}{4} \left(\frac{R_{\text{ON}} + R_{\text{CLK}}}{L}\right)^2} \approx f_{\text{RES}}.$$
(5)

When the ideal switch in Fig. 4 is turned off at  $t = T_{PW}$ , CKB is pulled-up to  $V_{DD}$  by the IRC buffer, which consumes the clock power, as shown in Fig. 5. Therefore, the dynamic power dissipation of the clock node in IRC ( $P_{IRC}$ ) is a function of  $T_{PW}$  and is expressed as

$$P_{\rm IRC}(T_{\rm PW}) = C_{\rm CLK} V_{\rm DD} \left( V_{\rm DD} - V_{\rm CKB} \left( T_{\rm PW} \right) \right) \cdot f_{\rm CLK}$$
(6)

where  $f_{CLK}$  is the clock frequency. This equation indicates that the clock power of IRC is proportional to  $f_{CLK}$ , and is minimized when  $V_{CKB}$  ( $T_{PW}$ ) is maximum.

In contrast, the dynamic clock power of conventional nonresonant clocking shown in Fig. 1(a) and (b) is given by

$$P_{CONV} = C_{\rm CLK} V_{\rm DD}^2 \cdot f_{\rm CLK}.$$
 (7)

 $V_{CKB}(t)$  in (1) is the ringing signal shown in Fig. 3(a) and becomes highest at  $T_{PW} = T_{RES}$ . Therefore, the clock power of IRC (P<sub>IRC</sub> in (6)) is minimized at  $T_{PW} = T_{RES}$ . The maximum power reduction by the proposed IRC can be expressed as

$$\frac{P_{\text{IRC}}\left(T_{\text{PW}} = T_{\text{RES}}\right)}{P_{CONV}} = 1 - \frac{V_{\text{CKB}}\left(T_{\text{RES}}\right)}{V_{\text{DD}}}$$
$$= 1 - \frac{C_{\text{CLK}}}{C_{\text{PAR}} + C_{\text{CLK}}}$$
$$\cdot e^{-\pi (R_{\text{ON}} + R_{\text{CLK}})\sqrt{(C_{\text{CLK}} + C_{\text{PAR}})/L}}.$$
(8)

This equation indicates that smaller resistance and capacitance and larger inductance are preferable to obtain larger power reduction by IRC. Especially, the resistance is the most important parameters, and hence a technique to reduce  $R_{\rm ON}$  will be discussed in the next section.

The abovementioned discussion only focuses on the dynamic power consumed to charge the capacitance of the clock network. In actual case, the power dissipations to drive the clock buffer and the IRC buffer should be taken into account. In this case, the maximum power reduction by the proposed IRC is rewritten as

$$\frac{P_{\rm D,IRC} + P_{\rm IRC} \left( T_{\rm PW} = T_{\rm RES} \right)}{P_{\rm D,CONV} + P_{CONV}} \tag{9}$$

where  $P_{D,CONV}$  and  $P_{D,IRC}$  denote the dynamic power consumed to drive the clock buffers in conventional non-resonant clocking and the IRC buffer, respectively.

The clock power is reduced by the proposed IRC as  $R_{ON}$  decreases as shown in (8). Since  $R_{ON}$  is reduced by using more advanced process technologies, it is better for IRC to use advanced process technologies. If the proposed IRC is applied to larger circuits, the resistance and capacitance of the clock network increases ( $R_{CLK}$  and  $C_{CLK}$ ), and hence the condition for LC resonance shown in (3) becomes severer. In addition, when IRC is applied to circuits which operate at higher supply voltage and frequency, the resonant frequency must also increase, and consequently the condition for LC resonance in (3) also becomes severer, because L is reduced to increase the resonant frequency when  $C_{CLK}$  is identical as indicated in (4). Therefore, applying the proposed IRC to larger and/or faster (at higher supply voltage) circuits might cause design challenges. One of the solutions is that the inductors are distributed as described in [5], [6], [10].

## C. Gate Boosting for IRC

As indicated in (3) and (8), the reduction in the ON resistance of M1 in Fig. 2 ( $R_{ON}$ ) is required to minimize the clock power. In near/sub-threshold logic circuits, however, the ON resistance is significantly large. To reduce such large ON resistance, the gate width of M1 must be much larger, which extremely increases the power dissipation. Therefore, gate boosting of M1 is effective to decrease the ON resistance with small area and power overhead. In this work, gate boosting is realized by the voltage doubler and its circuit schematic is shown in Fig. 7 [14].

Table I summarizes a simulated comparison of the gate widths and power of M1 with and without gate boosting at the same ON resistance when the supply voltage is 0.37 V. At the same ON resistance, the proposed gate boosting with the voltage doubler reduces the gate width of M1 and the power dissipation to drive M1 by 97% and 91%, respectively. This means that gate boosting can significantly reduce  $P_{D,IRC}$  compared with  $P_{D,CONV}$  in (9), which leads to further reduction in the clock power. Therefore, gate boosting is very effective for near/sub-threshold logic circuits.



Fig. 7. Circuit schematic of voltage doubler in Fig. 2 [14].

TABLE I SIMULATED COMPARISON OF GATE WIDTHS AND POWER OF M1 (FIG. 2) WITH AND WITHOUT THE GATE BOOSTING AT THE SAME ON- RESISTANCE AT  $V_{DD} = 0.37$  V.

|                                                           | Gate width of M1<br>(@ same ON resistance) | Power dissipation to<br>drive M1       |  |  |  |  |
|-----------------------------------------------------------|--------------------------------------------|----------------------------------------|--|--|--|--|
| (Conventional)<br>→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→→    | <sup>1</sup> 97%                           | <sup>1</sup> 91%                       |  |  |  |  |
| (Proposed)<br>↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓             | 0.03                                       | 0.09                                   |  |  |  |  |
| \<br>Voltage doubler                                      |                                            |                                        |  |  |  |  |
| IRC buffer<br>Pulse<br>CKB<br>CKB<br>Off-chip<br>inductor | Adder array                                | x32<br>Latch / FF<br>33<br>// FF<br>33 |  |  |  |  |
|                                                           | Adder array (32 cores)                     |                                        |  |  |  |  |

Fig. 8. Block diagram of test chip.

### **III. MEASUREMENT RESULTS**

#### A. Measurement Circuit

In this paper, the proposed IRC is applied to an adder array. Fig. 8 shows a block diagram of a test chip. 32 arrays of 32-bit full adders are implemented with input/output latches. In the test chip, conventional non-resonant clocking with an unbuffered clock tree shown in Fig. 1(b) is also implemented for comparison, because clocking with an unbuffered tree is more energyefficient than that with a buffered tree in near/sub-threshold circuits [12].

In this paper, static CMOS latches, instead of FFs, are used, since CKB is not a signal with a 50% duty cycle but a pulse signal, as shown in Fig. 3(b). Although CKB is not a signal with a 50% duty cycle, IRC can be applied to circuits with FFs, and hence IRC with FFs is also implemented.

The test chip is fabricated in a 40 nm CMOS process. A die micrograph of the test chip is illustrated in Fig. 9. The area overhead due to the voltage doubler is 0.6%.

Fig. 10 shows the measured energy of the adder array at the maximum frequency as a function of the supply voltage ( $V_{DD}$ ).



Fig. 9. Chip micrograph fabricated in 40-nm CMOS process.



Fig. 10. Measured energy of adder array at the maximum frequency as a function of the supply voltage ( $V_{\rm DD}$ ). Minimum energy operation is achieved at  $V_{\rm DD} = 0.37 \ V$ .

The minimum energy operation is achieved at  $V_{DD} = 0.37 \text{ V}$ . Thus,  $V_{DD}$  of 0.37 V is used throughout this paper.

## B. IRC With Latches

In this section, measurement results of IRC with latches are shown. Fig. 11(b) illustrates the measured waveform of CKB in the proposed IRC with latches at  $V_{DD} = 0.37$  V and  $f_{CLK} = 980$  kHz. This measured waveform validates the qualitative waveform in Fig. 3(b). The measured clock waveform of conventional non-resonant clocking with an unbuffered clock tree is also shown in Fig. 11(a) for comparison. For a fair power comparison, the unbuffered clock tree is the same and the gate widths of the clock drivers are designed to have the same slew rate. Compared with conventional non-resonant clocking, the total gate width of the proposed IRC is reduced by 84%.

The measured slew rates in Fig. 11(a) and (b) are similar. In IRC, the measured resonant frequency ( $f_{RES}$ ) is 19 MHz with an off-chip inductor of 7  $\mu$ H, which corresponds to  $C_{CLK} + C_{PAR}$  of 10 pF. In this implementation,  $T_{CLK}/T_{RES} = 19$ . Therefore, the required inductance of IRC for the resonance is reduced to  $1/19^2 = 1/361$ -fold that of conventional resonant clocking and the slew rate is increased to 19-fold that of conventional resonant clocking.

To validate the discussion on the optimum  $T_{PW}$  in Fig. 3, Fig. 12 shows the measured  $T_{PW}$  dependence of the clock



Fig. 11. Two clocking circuits in test chip for comparison. (a) Conventional non-resonant clocking [12]. (b) Proposed IRC with latches.



Fig. 12. Measured dependence of clock power on pulse width of Pulse signal  $(T_{PW})$ . C lock power is minimum at  $T_{PW} = T_{RES}$ .

power of IRC. The clock power of the conventional non-resonant clocking (Fig. 1(b)) is also shown for comparison. As explained in Section II-B, the minimum clock power is achieved at  $T_{PW} = T_{RES}$  with the power reduction of -36%, whereas the maximum and second maximum clock powers are observed at  $T_{PW} = T_1$  and  $T_{PW} = T_2$  with the power increases of +16% and +5%, respectively.

To demonstrate power reduction at any clock frequency in the proposed IRC, Fig. 13 shows the measured  $f_{\rm CLK}$  dependence of the clock power of conventional non-resonant clocking and the proposed IRC with latches. The maximum clock frequencies of



Fig. 13. Measured  $f_{\rm CLK}$  dependence of clock power of conventional non-resonant clocking and proposed IRC with latches. At  $f_{\rm CLK} = 980$  kHz, clock power is reduced by 36%. Clock leakage power is reduced by 81%.

both non-resonant clocking and IRC are identical. In IRC, the clock power is proportional to  $f_{\rm CLK}$ , which is different from that observed in conventional resonant clocking [5]–[10]. The reason why the clock power of IRC is proportional to  $f_{\rm CLK}$  is explained in (6). Compared with that of conventional non-resonant clocking, the clock power of IRC is reduced at all  $f_{\rm CLK}$  values, which is the most important advantage of the proposed IRC. At  $f_{\rm CLK} = 980$  kHz, the clock powers of conventional non-resonant clocking and the proposed IRC are 1.1  $\mu$ W and 0.7  $\mu$ W, respectively, and the power dissipation of the adder array without the clock power is 2.7  $\mu$ W. This corresponds to 36% reduction in the clock power and 10% reduction in the total

|                                           | JSSC09<br>[5]            | ISSCC12<br>[6]            | CICC07<br>[7]            | ESSCIRC09<br>[8]                | JSSC04<br>[9]        | LASCAS13<br>[10] | This work                                                     |                |
|-------------------------------------------|--------------------------|---------------------------|--------------------------|---------------------------------|----------------------|------------------|---------------------------------------------------------------|----------------|
| Process                                   | 90nm SOI                 | 32nm SOI                  | 130nm<br>bulk            | 130nm<br>bulk                   | 130nm<br>SOI         | 45nm<br>Bulk     | 40nm bulk                                                     |                |
| Supply<br>voltage                         | 1.2V                     | 1.2V                      | 1.03 ~<br>1.21V          | 1.2V                            | 0.6 ~ 0.9V           | 0.5 ~ 1V         | 0.37V                                                         |                |
| Clock<br>frequency<br>(f <sub>CLK</sub> ) | 1.6 ~ 5GHz               | < 4.25GHz*                | 800MHz ~<br>1.2GHz       | 100 ~<br>200MHz                 | 147MHz               | 500MHz ~<br>2GHz | DC ~<br>980kHz                                                | DC ~<br>947kHz |
| Resonant<br>frequency                     | 3.2GHz                   | 3.3GHz                    | 1.03GHz                  | 125MHz                          | 147MHz               | -                | 19MHz                                                         | 6.3MHz         |
| Freedom of<br>f <sub>CLK</sub>            | Poor<br>( <i>P ≰ f</i> ) | Fair<br>( <i>P ≰ f</i> )* | Poor<br>( <i>P ≰ f</i> ) | Poor<br>( <i>P</i> ≪ <i>f</i> ) | Poor<br>(f is fixed) | Good             | $\begin{array}{c} \textbf{Good} \\ (P \propto f) \end{array}$ |                |
| Clock slew<br>rate                        | Fair                     | Fair                      | Low                      | Low                             | Low                  | High             | High                                                          |                |
| Local<br>Clock<br>buffer                  | Yes                      | Yes                       | No                       | No                              | No                   | Yes              | No                                                            |                |
| Inductor                                  | On-chip                  | On-chip                   | On-chip                  | Off-chip                        | On-chip              | On-chip          | Off-chip                                                      |                |
| Latch/FF                                  | FF                       | FF                        | Latch                    | FF                              | FF                   | FF               | Latch                                                         | FF             |
| Power reduction                           | 25%                      | 24%                       | 76%                      | 85%                             | 35%                  | 25%**            | 36 ~ 81%                                                      | 27 ~ 84%       |

 TABLE II

 Comparison With Previous Works

(\*) Inductors can be disconnected when clock frequency is low (<2.9GHz). In this case, conventional clocking is used, and hence the clock power is proportional to the clock frequency.

(\*\*) Not validated in silicon

power including the adder array. This clock power reduction is achieved not only by resonant clocking, which reduces the power consumed to charge the capacitance of the clock network described in Section II-B, but also by gate boosting, which reduces the power consumed to drive the IRC buffer described in Section II-C. From the measurement results, the power reductions by resonant clocking and gate boosting are estimated to 20% and 16%, respectively, in the total 36% clock power reduction. When the clock is stopped, the clock leakage power is reduced by 81% owing to the 84% gate width reduction explained in Fig. 11. The leakage power reduction is achieved by gate boosting, since resonant clocking reduces the dynamic power dissipation only.

# C. IRC With FFs

In this section, measurement results of IRC with FFs are shown. IRC with FFs is also implemented in this paper, since IRC can be applied to circuits with FFs even though CKB is not a signal with a 50% duty cycle. Fig. 14 shows the measured waveform of CKB in IRC with FFs at  $V_{\rm DD} = 0.37$  V and  $f_{\rm CLK} = 947$  kHz. In this paper, an off-chip inductor of 40  $\mu$ H is used. This inductance is determined such that the adder array with FFs is functional at  $V_{\rm DD} = 0.37$  V, and it is larger than that used for the adder array with latches described in Section III-B. In this case, resonant frequency  $f_{\rm RES}$  is 6.3 MHz and the duty cycle is 80% when  $f_{\rm CLK}$  is 947 kHz.

Fig. 15 shows the measured  $f_{CLK}$  dependence of the clock power of conventional non-resonant clocking and IRC with FFs. The clock power of non-resonant clocking in Fig. 15 is different from that in Fig. 13, since the capacitances of the clock node ( $C_{CLK}$ ) of the adder array with latches are different from



Fig. 14. Measured clock waveform of IRC with FFs. Duty cycle is 80%.

that of the adder array with FFs. The clock power of IRC is reduced at all  $f_{\rm CLK}$  values compared with that of conventional non-resonant clocking. The clock power is reduced by 27% at  $f_{\rm CLK} = 947$  kHz and the clock leakage power is reduced by 84%. In IRC, the resonant frequency of the adder array with FFs is lower than that of the adder array with latches described in Section III-B, while the size of the clock buffer of the adder array with FFs in conventional non-resonant clocking is same as that of the adder array with latches. Therefore, in the adder array with FFs, the slew rate of IRC is lower than that of conventional non-resonant clocking, which worsens the maximum  $f_{\rm CLK}$  of IRC at  $V_{\rm DD} = 0.37$  V by -6% compared with that of conventional non-resonant clocking.

## D. Comparison With Previous Works

Table II compares this work with the previous works [5]–[10]. The  $f_{\rm CLK}$  range of the previous works is narrow because the



Fig. 15. Measured  $f_{\rm CLK}$  dependence of clock power of conventional non-resonant clocking and proposed IRC with FFs. At  $f_{\rm CLK}=947$  kHz, clock power is reduced by 27%. Clock leakage power is reduced by 84%. On the other hand, maximum clock frequency of IRC at  $V_{\rm DD}=0.37~\rm V$  decreases by -6%.

clock power is not always reduced as  $f_{\rm CLK}$  decreases in conventional resonant clocking, whereas the clock power can be reduced at any clock frequency in the proposed IRC. Therefore, the selection of  $f_{\rm CLK}$  is flexible, which is the most important advantage of the proposed IRC.

# **IV.** CONCLUSIONS

In this paper, intermittent resonant clocking (IRC) was proposed to reduce the clock power for near/sub-threshold logic circuits. Conventional resonant clocking has the following three problems; 1) clock frequency ( $f_{\rm CLK}$ ) range is narrow, 2) huge inductor is required for slow operation, such as less than 1 MHz, and 3) clock skew is large. The proposed IRC can solve these problems, since the resonant frequency is chosen independently of  $f_{CLK}$  and is much higher than  $f_{CLK}$ . In this paper, IRC was applied to 32-bit adder array with latches and adder array with FFs fabricated in a 40 nm CMOS process. Measurement results showed that IRC can reduce the clock power at any clock frequency, and the clock power is reduced by 36% at 980 kHz and the clock leakage power is reduced by 81% compared with conventional non-resonant clocking when IRC is applied to the adder array with latches. It was shown that the same power reduction can be achieved when IRC is applied to the adder array with FFs.

## REFERENCES

- A. Wang, B. Calhoun, and A. Chandrakasan, Sub-Threshold Design for Ultra Low-Power Systems. New York, NY, USA: Springer, 2006.
- [2] H. Kaul, M. Anders, S. Mathew, S. Hsu, A. Agarwal, R. Krishnamurthy, and S. Borkar, "A 320 mV 56 μW 411 GOPS/Watt ultra-low voltage motion estimation accelerator in 65 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 44, no. 1, pp. 107–114, Jan. 2009.
- [3] K. Hirairi, Y. Okuma, H. Fuketa, T. Yasufuku, M. Takamiya, M. Nomura, H. Shinohara, and T. Sakurai, "13% power reduction in 16 b integer unit in 40 nm CMOS by adaptive power supply voltage control with Parity-Based Error Prediction and Detection (PEPD) and fully integrated digital LDO," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2012, pp. 486–487.

- [4] H. Kawaguchi and T. Sakurai, "A reduced clock-swing flip-flop (RCSFF) for 63% power reduction," *IEEE J. Solid-State Circuits*, vol. 33, no. 5, pp. 807–811, May 1998.
- [5] S. Chan, P. Restle, T. Bucelot, J. Liberty, S. Weitzel, J. Keaty, B. Flachs, R. Volant, P. Kapusta, and J. Zimmerman, "A resonant global clock distribution for the cell broadband engine processor," *IEEE J. Solid-State Circuits*, vol. 44, no. 1, pp. 64–72, Jan. 2009.
- [6] V. Sathe, S. Arekapudi, C. Ouyang, M. Papaefthymiou, A. Ishii, and S. Naffziger, "Resonant clock design for a power-efficient high-volume x86–64 microprocessor," in *IEEE Int. Solid-State Circuits Conf.* (ISSCC) Dig. Tech. Papers, Feb. 2012, pp. 68–69.
- [7] V. Sathe, J. Kao, and M. Papaefthymiou, "A 0.8–1.2 GHz single-phase resonant-clocked FIR filter with level-sensitive latches," in *Proc. IEEE Custom Integrated Circuits Conf. (CICC)*, Sep. 2007, pp. 583–586.
- [8] A. Ishii, J. Kao, V. Sathe, and M. Papaefthymiou, "A resonant-clock 200 MHz ARM926EJ-S™ microcontroller," in *Proc. IEEE Eur. Solid-State Circuits Conf. (ESSCIRC)*, Sep. 2009, pp. 356–359.
- [9] A. Drake, K. Nowka, T. Nguyen, J. Burns, and R. Brown, "Resonant clocking using distributed parasitic capacitance," *IEEE J. Solid-State Circuits*, vol. 39, no. 9, pp. 1520–1528, Sep. 2009.
- [10] I. Bezzam, S. Krishnan, T. Raja, and C. Mathiazhagan, "Low power low voltage wide frequency resonant clock and data circuits for power reductions," in *Proc. Latin Amer. Symp. Circuits Syst. (LASCAS)*, Mar. 2013, pp. 1–4.
- [11] X. Hu and M. Guthaus, "Distributed LC resonant clock grid synthesis," *IEEE Trans. Circuits Syst. I*, vol. 59, no. 11, pp. 2749–2760, Nov. 2012.
- [12] M. Seok, D. Blaauw, and D. Sylvester, "Clock network design for ultra-low power applications," in *Proc. IEEE Int Symp. Low Power Electron. Design (ISLPED)*, Aug. 2010, pp. 271–276.
- [13] H. Fuketa, M. Nomura, M. Takamiya, and T. Sakurai, "Intermittent resonant clocking enabling power reduction at any clock frequency for 0.37 V 980 kHz near-threshold logic circuits," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2013, pp. 436–437.
- [14] N. Verma and A. Chandrakasan, "A 65 nm 8 T sub-Vt SRAM employing sense-amplifier redundancy," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2007, pp. 328–329.



**Hiroshi Fuketa** (S'07–M'10) received the B.E. degree from Kyoto University, Kyoto, Japan, in 2002 and the M.E. and Ph.D. degrees in information systems engineering from Osaka University, Osaka, Japan, in 2008 and 2010, respectively.

He is currently a Research Associate with Institute of Industrial Science, the University of Tokyo. His research interests include ultra-low-power circuit design and large area electronics with organic transistors. He is a member of IEICE and IPSJ.



**Masahiro Nomura** (M'96) received the B.E. and M.E. degrees in electrical engineering and the Ph.D. degree in information sciences from Tohoku University, Sendai, Japan, in 1989, 1991, and 2007, respectively.

He joined NEC Corporation, Sagamihara, Japan, in 1991. He was initially engaged in the research and development of high-speed video signal processors, high-speed microprocessors, and low-power multiprocessors. He transferred to NEC Electronics Corporation and Renesas Electronics Corporation in

2008 and 2010, respectively. From 2009 to 2013, he was concurrently serving as a manager of logic design at the Semiconductor Technology Academic Research Center (STARC). He is currently a section manager at the Incubation Center, Renesas Electronics Corporation, Kawasaki, Japan. His current research interests include low-power, high-speed circuits, reconfigurable circuits, and image recognition processor technologies.

Dr. Nomura is a member of the Institute of Electronics, Information and Communication Engineers (IEICE) of Japan and the IEEE Solid-State Circuits Society. He served as a technical program committee member of the A-SSCC from 2005 through 2009, and the Symposium on VLSI Circuits in 2009 and 2010.



Makoto Takamiya (S'98–M'00) received the B.S., M.S., and Ph.D. degrees in electronic engineering from the University of Tokyo, Japan, in 1995, 1997, and 2000, respectively.

In 2000, he joined NEC Corporation, Japan, where he was engaged in the circuit design of high-speed digital LSIs. In 2005, he joined the University of Tokyo, Japan, where he is an Associate Professor with the VLSI Design and Education Center. Since 2013, he has been a visiting scholar with the University of California, Berkeley, CA, USA. His research

interests include the circuit design of the low-power RF circuits, the ultra low-voltage logic circuits, the low-voltage power management circuits, and the large area and flexible electronics with organic transistors.

Dr. Takamiya is a member of the technical program committee for the IEEE Symposium on VLSI Circuits. He received the 2009 and 2010 IEEE Paul Rappaport Awards and the Best Paper Award in 2013 IEEE Wireless Power Transfer Conference.



**Takayasu Sakurai** (S'77–M'78–SM'01–F'03) received the Ph.D. degree in electrical engineering from the University of Tokyo, Japan, in 1981.

In 1981 he joined Toshiba Corporation, where he designed CMOS DRAM, SRAM, RISC processors, DSPs, and SoC solutions. He has worked extensively on interconnect delay and capacitance modeling known as Sakurai model and alpha power-law MOS model. From 1988 through 1990, he was a visiting researcher at the University of California, Berkeley, CA, USA, where he conducted research in the field

of VLSI CAD. Since 1996, he has been a Professor at the University of Tokyo, working on low-power high-speed VLSI, memory design, interconnects, ubiquitous electronics, organic ICs and large-area electronics. He has published more than 400 technical publications, including 100 invited presentations, and several books, and has filed more than 200 patents.

Dr. Sakurai will be an executive committee chair for the VLSI Symposia and a steering committee chair for IEEE A-SSCC from 2010. He served as a conference chair for the Symposium on VLSI Circuits and ICICDT, a vice chair for ASPDAC, a TPC chair for the A-SSCC, and VLSI Symposium, an executive committee member for ISLPED, and a program committee member for ISSCC, CICC, A-SSCC, DAC, ESSCIRC, ICCAD, ISLPED, and other international conferences. He was a recipient of the 2010 IEEE Donald O. Pederson Award in Solid-State Circuits, 2009 and 2010 IEEE Paul Rappaport award, 2010 IEICE Electronics Society award, 2009 achievement award of IEICE, 2005 IEEE ICICDT award, 2004 IEEE Takuo Sugano award, 2005 P&I patent of the year award, and four product awards. He has given the keynote speech at more than 50 conferences including ISSCC, ESSCIRC and ISLPED. He was an elected AdCom member for the IEEE Solid-State Circuits Society and an IEEE CAS and SSCS distinguished lecturer. He is an IEICE Fellow and IEEE Fellow.