## 11.2 An On-chip 100GHz-Sampling Rate 8-channel Sampling Oscilloscope with Embedded Sampling Clock Generator

Makoto Takamiya, Masayuki Mizuno, 1Kazuyuki Nakamura

NEC Corp., Kanagawa, Japan

<sup>1</sup>now with Kyushu Institute of Technology, Fukuoka, Japan

Signal-integrity degradation caused by such factors as supply (di/dt) noise, substrate noise, and crosstalk between interconnects is a critical concern because it can restrict performance advances in LSIs. To maintain signal integrity, on-chip signal waveforms must first be accurately measured to get an in-depth perspective of the complicated physical phenomenon of LSIs. The bandwidth of on-chip signals, however, is too wide for an external oscilloscope alone to measure them reliably, making on-chip embedded measurement circuits, such as sampling oscilloscopes, indispensable. For conventional sampling-type measurement circuits, an off-chip pulse generator for a sampling clock is necessary [1,2]. In this case, it is difficult to maintain a high sampling rate, because the sampling rate is degraded by variations in the skew between sampling clock signal and signals to be measured. Furthermore, the input range of conventional sampling-type measurement circuits is limited by ground voltage levels and supply voltage levels [1]. Thus, such circuits are not suitable for signal-integrity checks, because their voltage levels overshoot supply-voltage levels and undershoot ground-voltage levels. Conventional comparator-based sampling-type measurement circuits are also unsuitable, because they require complicated measurement procedures [2]. An easy-touse 100GHz-sampling-rate sampling oscilloscope macro addresses these problems. It features 1) embedded small-area phase-interpolated sampling clock generators to achieve 100GHz sampling rates, 2) charge-sharing sampling heads that are able to capture waveform overshoots and undershoots, covering a wide range of input-voltage levels.

Figure 11.2.1 shows a block diagram of the sampling oscilloscope macro, which contains eight sampling heads (SH), a sampling clock generator (SCG), and an output buffer. The improved SH compared with Reference [2] has S2, S3, and C2 that provide a wide -0.3V to Vdd+0.3V input range. C1 samples measured voltage  $(V_{mea})$ , C2 reduces the range of  $V_{mea}$  to within the input range  $(V_{\mbox{\tiny amp}})$  of the amplifier in its SH, and C3 holds the measured voltage level. When S1 and S3 are closed; S2 and S4 are opened, the voltage in C1 becomes equal to V<sub>mea</sub>, while that in C2 becomes equal to the bias voltage ( $V_{\text{bias}} \sim Vdd/2$ ), which is the center value of the input range of the amplifier in the SH. When S1 and S3 are opened; S2 and S4 are closed, C1, C2, and C3 share their charges among one another. By adjusting C1, C2, and C3,  $V_{\mbox{\tiny amp}}$ can be made to lie within the input range of the amplifier in the SH. In this way, the SH is able to handle signals from  $-|V_{tn}|$  to  $Vdd+|V_{tp}|$ . However, a calibration between  $V_{mea}$  and the output voltage is necessary, because V<sub>mea</sub> may be distorted by switchinginduced charge injection and by non-linearity of the amplifier.

Figure 11.2.2 shows frequency characteristics of the SH. The bandwidth, determined both by the sampling-rate  $(1/\Delta T)$  and by switches and capacitors in the SH, is 6.4GHz. Additional S2, S3, and C2 in Figure 11.2.1 achieve 2x bandwidth. Figure 11.2.2 also shows simulated operation of the sampling oscilloscope macro. As may be seen, 1) the sampling-clock cycle time must be  $\Delta T$  larger than that of the input-signal cycle time (T) so that the output time scale is scaled up T/ $\Delta T$  times of the input signal, and 2) the output waveform that can be obtained at one time is only a portion of the entire input-signal waveform. Here, the sampling

rate (1/ $\Delta$ T) must not be made dependent on input clock frequency. Instead of a PLL design for the SCG, a delay-line design as shown in Figure 11.2.3 is used, since this allows generation of a constant  $\Delta$ T irrespective of the input clock frequency, which cannot be done with a PLL design. Further, a PLL design offers less portability and requires a larger area for a loop filter.

The measurements are carried out according to the following procedure: 1) the offset-delay generator (zero to 7ns in 1ns steps) sets an offset delay of 0ns; the fine-delay shifter clips waveforms from zero to 1.27ns (i.e., to 127 times the 10ps delay) by 10ps-step from the SH input waveforms as shown in Figures 11.2.2 and 11.2.3. The waveforms from 1ns to 1.27ns is a tab for sticking. 2) The offset-delay generator sets an offset delay of 1ns. The fine-delay shifter clips waveforms from 1ns to 2.27ns from the SH input waveforms. The two sequential measurements overlap slightly (0.27ns). 3) Eight waveforms (A) through (H) in Figure 11.2.3 can be successfully connected with tabs for sticking, thus eight iterations expand the range of measuring waveforms to 8.27ns (i.e., to 1.27ns of the fine-delay shifter plus 7ns of the offset-delay generator). Here, 10ps delay ( $\Delta T$ ) is generated by an interpolation of two signals with a 160ps delay difference by two cascaded phase interpolators [3,4]. This 10ps delay shift corresponds to a 100GHz sampling rate. While average  $\Delta T$  may deviate slightly from 10ps due to wafer-to-wafer device variations or to environmental changes, this is not a serious concern because the average  $\Delta T$  is derived experimentally by performing a calibration.

Figure 11.2.4 shows a micrograph of a chip fabricated using a 1.2V 0.13 $\mu$ m CMOS process with 6-layer Cu metallization. The test chip contains the sampling oscilloscope macro and a noise source. This noise source contains 4k flip-flops and 84k inverters and its activation rate can be varied. Each SH probes one each of the clock lines, the supply lines, and the ground lines. To prevent supply noise and substrate noise from affecting the oscilloscope macro itself, supply and ground lines are separated from the noise source and 910pF on-chip decoupling capacitors are added. SH area is 1,550 $\mu$ m<sup>2</sup>, while SCG area is 23,600 $\mu$ m<sup>2</sup>, which is only one tenth to half of conventional PLLs. The sampling macro consumes 32mW at 500MHz.

Figure 11.2.5 shows a measured calibration function between SH input voltage and oscilloscope output voltage. A wide input range is demonstrated. Figure 11.2.6 shows the measured supply voltage noise and ground voltage noise. Both overshoot and undershoot are observed at the clock edges. This measurement shows the result of adjusting the delay of the offset-delay generator to overlap multiple time-shifted outputs. Figure 11.2.7 shows measured clock signals with and without the decoupling capacitors, showing that the capacitors successfully reduce the noise.

References:

<sup>[1]</sup> M. Nagata, et al., "Reduced Substrate Noise Digital Design for Improving Embedded Analog Performance," ISSCC Digest of Technical Papers, pp.224-225, Feb. 2000.

<sup>[2]</sup> R. Ho, et al., "Application of On-Chip Samplers for Test and Measurement of Integrated Circuits," Dig. of Symp. on VLSI Circuits, pp.138-139, June 1998.

<sup>[3]</sup> K. Yamaguchi, et al., "2.5GHz 4-phase Clock Generator with Scalable and No Feedback Loop Architecture," ISSCC Digest of Technical Papers, pp.398-399, Feb. 2001.

<sup>[4]</sup> B. W. Garlepp, et al., "A Portable Digital DLL for High-Speed CMOS Interface Circuits," IEEE J. of Solid-State Circuits, vol. 34, pp. 632-644, May 1999.







Figure 11.2.1: Sampling oscilloscope macro.



Figure 11.2.2: Frequency characteristics of SH and macro operation.



Figure 11.2.3: Sampling clock generator (SCG).



Figure 11.2.4: Chip micrograph.



Figure 11.2.5: Measured calibration function.



Figure 11.2.6: Measured on-chip supply and ground voltages.



Figure 11.2.7: Measured clock signals with and without decoupling capacitors.