BRIEF PAPER Special Section on Analog Circuits and Related SoC Integration Technologies

# Power Supply Voltage Dependence of Within-Die Delay Variation of Regular Manual Layout and Irregular Place-and-Route Layout

Tadashi YASUFUKU<sup>†a)</sup>, Member, Yasumi NAKAMURA<sup>†</sup>, Zhe PIAO<sup>†</sup>, Nonmembers, Makoto TAKAMIYA<sup>†</sup>, Member, and Takayasu SAKURAI<sup>†</sup>, Fellow

key words: within-die delay variation, design methodology, low voltage

## 1. Introduction

Reduction of power supply voltage ( $V_{DD}$ ) is an effective method for achieving ultra low power logic circuits, and maximum energy efficiency is achieved at low  $V_{DD}$  (e.g., 340 mV [1]). Thus, many works have been carried out on the low  $V_{DD}$  operation of logic circuits [1]–[4]. Transistor variation is, however, more pronounced at low  $V_{DD}$  [2]. This means that the design of logic circuit becomes more difficult at low  $V_{DD}$  because the transistor variation makes it difficult for logic gate paths to meet timing constraints. Therefore, in this paper, the within-die delay variation dependence on  $V_{DD}$  in several types of design under tests (DUT's) is discussed.

Many works on within-die delay variation have been carried such as [5]. The impact of methodology of physical layout on within-die delay variation, however, is not clear. Layout of logic circuits is usually designed by place and route (P&R) tools. In this case, the interconnect capacitance is larger than the manual layout. The effect of the auto P&R layout on delay variation is not clear at low voltages. Thus, DUT delay dependence on methodology of physical layout is mainly discussed in this paper.

Section 2 presents an all digital delay measurement circuit, and explains its operation. Section 3 shows the experimental results. The delay variation dependence on several types of DUT's are discussed. The manual layout and the auto P&R are also compared. Section 4 concludes this paper.

DOI: 10.1587/transele.E94.C.1072

#### 2. All Digital Delay Measurement Circuit

Figure 1(a) shows a single unit of the all digital delay measurement circuit. DUT delay is measured by sweeping clock frequency. The left part of the schematic diagram is  $V_{DD}$  region (= 0.4 V to 1.2 V), and the right side is  $V_{DDH}$  region (= 1.2 V). DUT's are placed in the  $V_{DD}$  region. Delay of one DUT is measured in this unit. Error memory and scan chain for reading results are placed in the  $V_{DDH}$  region.

Figure 1(b) shows a timing chart of delay measurement circuit. All F/F's are reset before starting measurement. When the DUT delay is shorter than the clock period (T), an error does not occur. On the other hand, when the DUT delay is longer than T, an error is detected and the Error Out signal changes from low to high. Error Out signal is converted to  $V_{DDH}$  level by the level shifter and is stored in an error memory. Output of this error memory can be read using the scan chain. In this way, DUT delay is measured digitally. Figure 2 shows the chip micrograph. The die size



**Fig.1** Schematic and timing chart of proposed tester-friendly all digital delay measurement circuit. (a) Schematic diagram. (b) Timing chart.

Manuscript received January 21, 2011.

 $<sup>^\</sup>dagger \text{The}$  authors are with The University of Tokyo, Tokyo, 153-8505 Japan.

a) E-mail: tdsh@iis.u-tokyo.ac.jp



Fig. 2 Chip micrograph of delay measurement circuit fabricated in 65 nm standard CMOS process.

 Table 1
 Key features of manual layout and auto place and route layout.

|                                                                        | Manual layout | Auto P&R layout |
|------------------------------------------------------------------------|---------------|-----------------|
| Random delay variation due to transistor variations                    | Included      | Included        |
| Systematic delay variation<br>due to interconnect length<br>difference | Not included  | Included        |
| Interconnection delay                                                  | small         | large           |

is  $1500 \,\mu\text{m} \times 490 \,\mu\text{m}$  fabricated in 65 nm standard CMOS process. The test chip has two types of DUT's and each DUT type consists of 128 delay measurement units shown in Fig. 1(a). This enables the measurement of within-die delay variation of 128 DUT's.

Various types of DUT's are implemented to survey dependence of DUT delay on gate type, gate width, and layout methodology. Regarding the gate type, ×1 size inverter and 1 size NAND are implemented. Regarding the gate width, ×0.5 size inverter and ×1 size inverter are implemented. Regarding the layout methodology, the layout of ×0.5 size inverter is designed using auto P&R tool, and compared with the manual layout. Each DUT includes 100 stages. This means that the measured within-die variation is  $1/\sqrt{100}$  compared to the delay variation at each stage.

In the case of the manual layout, 128 paths of 100-stage inverters are in a line and the layout of each path is completely regular. In contrast, in the auto P&R layout  $100 \times 128$ inverters are entangled and the layout of each path is irregular. Table 1 compares key features of the manual layout and the auto P&R layout. In case of manual layout, interconnect delay is small and there is no interconnect length difference. On the other hand, auto P&R has larger interconnect delay and interconnect length of each path is different.

#### 3. Experimental Results

Figure 3(a) shows the measured histogram of within-die DUT delay at  $V_{DD}$ =1.2 V. The auto P&R ×0.5 inverter has the largest variation of DUT delay because of large interconnect variation. Figure 3(b) shows the measured histogram of DUT delay at  $V_{DD}$ =0.4 V. By comparing Figs. 3(a) and (b), DUT delay and its delay variation at 0.4 V are larger than those at 1.2 V. At 0.4 V, the difference of delay variation between the manual layout and the auto P&R layout is reduced compared with that at 1.2 V.

Figure 4 shows the dependence of average delay of 128 DUT's on  $V_{DD}$ . This shows average DUT delay ( $\mu$ ) increases with lowering  $V_{DD}$ . The delay of the manual layout



Fig. 3 Measured histogram of within-die DUT delay. (a)  $V_{DD}$ =1.2 V. (b)  $V_{DD}$ =0.4 V.



 $\label{eq:Fig.4} Fig. 4 \qquad \text{Dependence of average DUT delay on } V_{DD}.$ 

inverter is always smaller than that of the auto P&R layout. The ×1 inverter is the fastest due to its high current drivability. Figure 5 shows the measured dependence of standard deviation ( $\sigma$ ) of 128 DUT's on V<sub>DD</sub>. Although the difference of  $\sigma$  between auto P&R layout and other gates is large at high V<sub>DD</sub>, the difference reduces as V<sub>DD</sub> is reduced. In order to fairly compare delay variation, relative delay variation ( $\sigma/\mu$ ) is introduced. Figure 6 shows the measured dependence of relative delay variation ( $\sigma/\mu$ ) on V<sub>DD</sub>. This shows



Fig. 5 Measured dependence of standard deviation (s) of DUT delay on  $V_{\rm DD}.$ 



Fig. 6 Measured dependence of relative delay variation on VDD.

that the difference of relative delay variation between manual layout and auto P&R decreases from 1.56% to 0.07% when reducing V<sub>DD</sub> from 1.2 V to 0.4 V. This effect is explained by the following equations. Regarding manual layout, average delay ( $\mu_M$ ), standard deviation ( $\sigma_M$ ) and relative delay variation ( $\sigma_M/\mu_M$ ) is expressed as Eqs. (1)–(3),

$$\mu_M = R_T \cdot C_T \tag{1}$$

$$\sigma_M = R_T \cdot \Delta C_T + \Delta R_T \cdot C_T \tag{2}$$

$$\frac{\sigma_M}{\mu_M} = \frac{\Delta C_T}{C_T} + \frac{\Delta R_T}{R_T}$$
(3)

where  $R_T$  is resistance of transistor,  $\Delta R_T$  is standard deviation of  $R_T$ ,  $C_T$  is transistor capacitance, and  $\Delta C_T$  is standard deviation of  $C_T$ . The resistance and capacitance of the interconnect are ignored in Eq. (1). Regarding auto P&R layout, Average ( $\mu_A$ ), Standard deviation ( $\sigma_A$ ) and relative delay variation ( $\sigma_A/\mu_A$ ) of manual layout is expressed as (4)–(6),

$$\mu_A = R_T \cdot (C_T + C_I) \tag{4}$$

$$\sigma_A = R_T \cdot (\Delta C_T + \Delta C_I) + \Delta R_T \cdot (C_T + C_I) \tag{5}$$

$$\frac{\sigma_A}{\mu_A} = \frac{\Delta C_T + \Delta C_I}{C_T + C_I} + \frac{\Delta R_T}{R_T}$$
(6)

where  $C_I$  is capacitance of interconnection and  $\Delta C_I$  is stan-



Fig. 7 Measured spatial within-die DUT delay distributions normalized by the average DUT delay at 1.2 V and 0.4 V. (a) Manual layout. (b) Auto place and route layout.

dard deviation of  $C_I$ . The resistance of the interconnect is ignored in Eq. (4). In (3) and (6),  $\Delta R_T/R_T$  term dominants  $\sigma_M/\mu_M$  and  $\sigma_A/\mu_A$  at low V<sub>DD</sub>, the transistor variation is larger than the interconnect length variation at low V<sub>DD</sub>. In this way, the difference of relative delay variation becomes small in low V<sub>DD</sub> region.

In Fig. 6, NAND has the smallest  $\sigma/\mu$ , because the total gate size is the largest in four types of DUT's. More specifically, NAND consists of 4 transistors and other gates have 2 transistors. It also proved that  $\sigma/\mu$  of ×1 inverter is smaller than ×0.5 manual layout inverter, because transistor variation is proportional to  $1/\sqrt{LW}$  [6].

The spatial distribution of delay variation for the manual layout and the auto P&R layout is investigated. Figure 7 shows the measured spatial distribution of within-die DUT delay normalized by the average delay. This figure also shows the measured histogram of normalized DUT delay. Figure 7(a) shows the distribution of the manual layout inverter and Fig. 7(b) shows the distribution of the auto P&R layout inverter. In Fig. 7(a), DUT delay has only random variations caused by the random transistor variations. The magnitude of the DUT delay variation at 0.4 V is larger than that at 1.2 V. Figure 7(b) shows that center location has larger delay than the both ends. This is caused by interconnect length difference in the auto P&R layout. In order to check the randomness of the spatial delay distribution at 0.4 V in Fig. 7(a), the correlation coefficient of the measured delay of 128 DUT's of two dies at 0.4 V is calculated. The correlation coefficient is 0.023. Therefore, the measured



**Fig. 8** Measured and simulated spacial within-die DUT delay distributions of auto P&R layout at 1.2 V.

variation of manual layout inverter is mainly caused by the random transistor variations.

Figure 8 shows the measured and simulated spatial within-die DUT delay distributions at 1.2 V. DUT is the auto P&R layout inverter. The simulated results are obtained by Monte Carlo SPICE simulation after LPE. In this simulation the threshold voltage of each transistor is changed according to standard deviation calculated from  $A_{VT}$  of this technology. The spatial delay distribution is similar between the measured and simulated results. This affirms that the longer delay seen at the center location is caused by the interconnect length difference.

### 4. Conclusion

 $V_{DD}$  dependence of the within-die delay variation on layout methodology (manual layout and auto P&R layout) is investigated and measured for the first time. The relative delay (= $\sigma/\mu$ ) variation difference between the manual layout and the P&R layout at 1.2 V is large (1.56%), because the delay variations due to interconnect length difference is larger than the random delay variations due to the random transistor variations. In contrast, the difference at 0.4 V is very small (0.07%), because the random delay variations due to random transistor variation increases with the reduced  $V_{DD}$  and dominates the total delay variation.

# Acknowledgments

This work was carried out as a part of the next-generation circuit architecture technical development program supported by METI and STARC.

#### References

- [1] A. Agarwal, S.K. Mathew, S.K. Hsu, M.A. Anders, H. Kaul, F. Sheikh, R. Ramanarayanan, S. Srinivasan, R. Krishnamurthy, and S. Borkar, "A 320 mV-to-1.2 V on-die fine-grained reconfigurable fabric for DSP/media accelerators in 32 nm CMOS," IEEE International Solid-State Circuits Conference, pp.328–329, Feb. 2010.
- [2] H. Kaul, M. Anders, S. Mathew, S. Hsu, A. Agarwal, R. Krishnamurthy, and S. Borkar, "A 320 mV 56 μW 411GOPS/Watt ultra-low voltage motion estimation accelerator in 65 nm CMOS," IEEE International Solid-State Circuits Conference, pp.316–317, Feb. 2008.
- [3] Y. Pu, J.P. de Gyvez, H. Corporaal, and H. Yajun, "An ultra-lowenergy/frame multi-standard JPEG co-processor in 65 nm CMOS with sub/near-threshold power supply," IEEE International Solid-State Circuits Conference, pp.146–147, Feb. 2009.
- [4] H. Kaul, M.A. Anders, S.K. Mathew, S.K. Hsu, A. Agarwal, R. Krishnamurthy, and S. Borkar, "A 300 mV 494GOPS/W reconfigurable dual-supply 4-Way SIMD vector processing accelerator in 45 nm CMOS," IEEE International Solid-State Circuits Conference, pp.260–261, Feb. 2009.
- [5] B.P. Das, B. Amrutur, H.S. Jamadagni, N.V. Arvind, and V. Visvanathan, "Within-die gate delay variability measurement using reconfigurable ring oscillator," IEEE Trans. Semicond. Manuf., vol.22, no.2, pp.256–267, May 2009.
- [6] M.P.M. Pelgrom, A.C.J. Duinmaijer, and A.P.G. Welbers, "Matching properties of MOS transistors," IEEE J. Solid-State Circuits, vol.24, no.5, pp.1433–1440, 1989.