# A Charge-Domain Auto- and Cross-Correlation Based Data Synchronization Scheme With Powerand Area-Efficient PLL for Impulse Radio UWB Receiver

Lechang Liu, Takayasu Sakurai, Fellow, IEEE, and Makoto Takamiya, Member, IEEE

*Abstract*—A 1.2 V 100 Mb/s 2.93 mW discrete-time charge-domain impulse radio ultra-wideband (IR-UWB) receiver is developed in 65 nm CMOS. In the charge-domain, the template of the correlator is represented by capacitance and the correlation is implemented by charge addition instead of voltage multiplication and voltage integration. Power consumption of the charge-domain receiver is minimized by a novel auto- and cross-correlation based synchronization scheme. To reduce the power consumption and the chip area of the PLL clock generator for the receiver, a dual charge-pump PLL is proposed to scale up the capacitance of the loop filter without extra charge-pump current. The developed UWB receiver with the area- and power-efficient PLL achieves the energy consumption of 29.3 pJ/bit with the 62.5-ps timing step for data synchronization.

*Index Terms*—Ultra-wideband, UWB, charge-domain, auto-correlation, cross-correlation, phase-locked loop, PLL, delay-locked loop, DLL, capacitance multiplier.

# I. INTRODUCTION

**I** MPULSE RADIO ultra-wideband (IR-UWB) refers to a radio technology for transmitting information by means of extremely short duration pulses without radio frequency modulation. Ideal targets for IR-UWB system are low power, low cost, high data rates, and extremely low interference, which makes it an attractive option for ad hoc and sensor network where groups of wireless terminals are located in a limited area and communicate in an infrastructure-free fashion without any central coordinating unit or base-station.

Fig. 1 shows the simulated UWB environments with narrowband interference and band-limited white noise in frequency domain and time domain respectively. UWB technology is different from conventional narrowband wireless transmission technology—instead of broadcasting on separate frequencies; UWB spreads signals across a very wide range of frequencies. The typical sinusoidal radio wave is replaced by trains of

The authors are with the Institute of Industrial Science, University of Tokyo, Tokyo 153-8505, Japan (e-mail: llch@iis.u-tokyo.ac.jp).

Digital Object Identifier 10.1109/JSSC.2011.2128210



Fig. 1. Simulated UWB environments with narrowband interference and white noise. (a) Frequency domain. (b) Time domain.

pulses at hundreds of millions of pulses per second. The wide bandwidth and very low power makes UWB transmissions appear as background noise.

Receivers for IR-UWB can be broadly categorized as threshold or leading edge detectors (LED) [1], [2], correlation detectors [3]–[6], and RAKE receivers. RAKE receiver is a bank of correlation detectors. In a significant multipath and low SNR environment, RAKE receiver becomes popular because it can combine different signal components that propagated through the channel by different paths. However, in a short distance and high SNR environment, a simple threshold detector or a single correlation detector can also be used.

Manuscript received October 20, 2010; revised January 12, 2011; accepted February 21, 2011. Date of publication April 25, 2011; date of current version May 25, 2011. This paper was approved by Associate Editor Hooman Darabi. This work was supported in part by CREST/JST. The VLSI chips were fabricated in the chip fabrication program of VLSI Design and Education Center (VDEC), University of Tokyo, in collaboration with e-Shuttle, Inc. and Fujitsu Ltd.



Fig. 2. Conventional IR-UWB receivers. (a) Threshold detector. (b) Correlation detector. (c) BER vs.  $\rm E_b/N_0.$ 

Fig. 2(a) is the basic block diagram of a LED receiver [1], [2]. The LED receiver sets a threshold at the receiver, and any incoming pulse that crosses the threshold is detected and demodulated. The threshold receiver consists of a circuit fast enough to detect the presence of the IR-UWB pulse, and some type of pulse stretcher circuit that outputs a pulse which is long enough for slower logic or analog circuits to process. The LED receiver is advantageous in that it is simple to implement and can be used for the case where only one pulse per data bit is transmitted. The main disadvantage of the threshold receiver is that any interfering signals will either be sufficiently strong to cross the threshold and trigger numerous false detections or will cause the receiver to increase its threshold and reduce its range. To mitigate the problem of false detections, the receiver must continuously monitor the input noise signal and adaptively set a threshold such that only a small percentage of false detections will occur.

The correlation-based receiver is also known as a matched filter receiver and has been used in narrowband communication systems for several decades. A block diagram of a correlation receiver is shown in Fig. 2(b) [5], [6]. In the correlation-based receiver the received signal is first preprocessed by a correlator.



Fig. 3. Simulated BER dependency on timing accuracy.

The correlator consists of a template generator, a mixer and an integrator. The incoming pulse is multiplied by the template waveform and integrated by the integrator. The output from the correlator is a function of how well the incoming waveform matches the template waveform in time and shape. The correlation receiver is a matched filter system, and as such, it can provide the optimum detection SNR if the template waveform exactly matches the time and shape of the incoming waveform.

Fig. 2(c) shows the simulated bit error rate (BER) comparison between the threshold detector and the correlation detector. When the magnitude of the interference or the noise is much larger than the received UWB pulse, threshold-detection based receiver can't operate at all. In contrast, correlation based UWB receiver can still attain superior BER due to the correlation between the received signal and the template of the correlator.

However, correlation-based receivers attain superior noise performance and robust narrowband interference suppression at the expense of increased power consumption and increased circuit complexity over their threshold detection-based counterparts. To achieve the correlation operation, the template of the correlator must be synchronized with the received signal. Conventionally, data synchronization is implemented with tapped delay-lines [5] or digital baseband [7]. To simplify the synchronization circuits and reduce the power consumption of the receiver, a 1.28 mW 100 Mb/s UWB receiver with discrete-time charge-domain correlator and embedded sliding scheme for data synchronization is developed in [3]. In the discrete-time receiver, there will always be a timing mismatch between the incoming signal and the template of the correlator even if the synchronization is achieved. Fig. 3 shows the simulated BER dependency on the timing mismatch. When the timing mismatch is reduced from 500 ps to 62.5 ps,  $E_{\rm b}/N_0$  can be improved by 7.8 dB for the white noise and 5.8 dB for the narrowband interference.

To achieve 62.5 ps timing mismatch, a 16-GHz PLL clock generator and high resolution comparators are required in the conventional topology [3] and thus the receiver power consumption will be increased by eight times and synchronization time will be increased from 19 periods to 159 periods. This work aims to minimize the power consumption with 2-GHz 8-phase PLL clock generator by a novel auto- and cross-correlation based synchronization scheme [4]. To further reduce the



Fig. 4. (a) Proposed IR-UWB receiver with charge-domain auto- and crosscorrelators for data synchronization. (b) Dependence of auto-correlator output on timing accuracy ( $\Delta T$ ) without antenna effect. (c) Dependence of cross-correlator output on timing accuracy ( $\Delta T$ ) without antenna effect.

power consumption of the capacitance multiplier used in the conventional area-efficient PLL, a dual charge-pump PLL is proposed to scale up the capacitance of the loop filter without extra charge-pump current.

The overview of the proposed synchronization scheme is described in Section II. Section III presents the details of the circuit implementation. Experimental results are presented in Section IV, and Section V concludes the paper.

## **II. SYNCHRONIZATION SCHEME**

The architecture of the proposed IR-UWB receiver with charge-domain auto- and cross-correlators for data synchronization is shown in Fig. 4(a). It consists of one PLL for multi-phase clock generation and two coupled delay-locked loops (DLL) for coarse and fine timing alignment. Timing discrimination of coarse and fine timing alignment loops are



Fig. 5. Dependence of correlator output on timing accuracy  $(\Delta T)$  with antenna effect.

implemented by auto-correlation and cross-correlation respectively. Fig. 4(b) and (c) shows the correlator output dependency on timing mismatch  $\Delta T$  between the incoming signal and the templates of the correlators for Gaussian first-order derivative pulse without antenna effect. Synchronization is achieved in two steps. When the timing mismatch  $\Delta T$  is larger than 0.5 ns, the sliding scheme based coarse timing alignment loop is activated. The coarse timing alignment loop can keep reduce the timing mismatch  $\Delta T$  towards 0.5 ns. When the timing mismatch is equal to 0.5 ns, the phase switching based fine timing alignment loop will be activated and reduce the mismatch  $\Delta T$ to 62.5 ps.

In the upper coarse timing alignment loop, the incoming signal is correlated with a stored replica of the signal shape. The unidirectional sliding scheme can keep sliding the phase of the sampling clock at the step of 0.5 ns until the auto-correlation result  $V_{cor1}$  is higher than the threshold of the comparators. In the lower fine timing alignment loop, the incoming signal is correlated against a stored replica of the derivative of the signal pulse shape. The cross-correlation result is odd-symmetry about the vertical axis and therefore this loop can keep switching the multi-phase output of the PLL bidirectionally until accurate match between the incoming pulse and the template is achieved. The correlation result from the upper coarse timing alignment arm contains the data-sign information. As shown in Fig. 4(c), when the timing match is less than 0.5 ns, the cross-correlation result is monotonically increasing for data "1" and monotonically decreasing for data "0". To remove the random data-sign information, the two arm outputs are multiplied and the multiplication result provides a signal containing the timing error information. This is then treated in the same manner as the error voltage in an ordinary DLL. When the local timing exactly matches with that of the incoming signal, the cross-correlation output  $V_{\rm cor2}$  is equal to  $V_{\rm DD}/2$ . When there



Fig. 6. (a) Conventional continuous-time voltage correlator. (b) Discrete-time voltage domain correlator. (c) Proposed discrete-time charge domain correlator.

is a timing error,  $V_{cor2}$  deviates from  $V_{DD}/2$  and the deviation determines the direction of the phase switching.

The UWB antenna is dispersive and often approximated as a differentiator both at the transmitter and receiver side. Gaussian first order derivative pulse is assumed as the received signal in Fig. 4(b) and (c) but after the signal pass through the antenna, it will become second order derivative pulse and the local templates for auto- and cross-correlation should be changed to second order derivative and third order derivative respectively. Fig. 5 shows the auto- and cross-correlator output dependency on timing mismatch  $\Delta T$  for Gaussian second order derivative pulse. Compared to the first order derivative correlator output, monotonical interval of the second order derivative cross-correlator output is reduced so the comparator threshold  $V_{TH0}$  for the auto-correlator output decision should be increased correspondingly. For further higher order derivative Gaussian pulse, such as three order or four order, monotonical interval of the cross-correlator output will be further reduced and less than 0.5 ns. The proposed synchronization scheme can adapt to higher order application by increasing the PLL output frequency and the sampling rate of the correlators.

## III. CIRCUIT IMPLEMENTATION

#### A. Auto-Correlator and Cross-Correlator

In the conventional IR-UWB receiver, correlators are designed in the continuous-time voltage domain [8]. As shown in Fig. 6(a), both the incoming signal  $V_s(t)$  and the template  $V_t(t)$ of the correlator are represented by voltage and the correlation operation is implemented by voltage multiplication and voltage integration. Fig. 6(b) is a corresponding discrete-time version of the voltage domain correlator. The incoming signal  $V_s[i]$ and the template  $V_t[i]$  of the correlator are still represented by voltage and therefore the voltage multiplier and the voltage adder still consume a lot of power. Fig. 6(c) shows the proposed discrete-time charge domain correlator. In the charge domain, the incoming signal and the template are represented by voltage  $V_s[i]$  and capacitance  $C_t[i]$  respectively. The correlation of



Fig. 7. Gaussian first order derivative pulse and discrete templates for autoand cross-correlation.

the incoming signal  $V_s[i]$  and the template  $C_t[i]$  is performed by charge combination instead of voltage multiplication and voltage integration. In this work charge combination is implemented by capacitance addition and subtraction which only consumes dynamic power, thereby achieving lower power consumption than conventional voltage-domain correlator which consumes both static and dynamic power [8]. Another advantage of the proposed charge domain correlator is the reconfigurability of the discrete template. The derivative operation in the cross-correlator can be implemented by simply varying the values of the template capacitances and reconfiguring the addition and the subtraction circuits.

The incoming binary phase shift keying (BPSK) signal and the discrete templates for auto- and cross-correlation are shown in Fig. 7. The received 100 Mb/s, DC-960 MHz band Gaussian first-order derivative pulse is sampled at 2 GSa/s and correlated with the discrete templates of the two correlators. Since the template of the auto-correlator is odd symmetric and the pulse width



Fig. 8. Discrete-time charge-domain auto-correlator. (a) Sampling mode. (b) Summing and averaging mode.

is 4 ns, only six sampling points are non-zero and six capacitance values  $C_{11}, C_{12}, C_{13}, -C_{13}, -C_{12}$  and  $-C_{11}$  can be used to represent the discrete template for the auto-correlation. The template of the cross-correlator is even symmetric and seven capacitance values  $C_{21}, C_{22}, -C_{23}, -C_{24}, -C_{23}, C_{22}$  and  $C_{21}$ are used to represent the discrete template for the cross-correlation.

Figs. 8 and 9 show the circuit schematics for the auto- and cross-correlators in sampling mode and summing mode respectively. Addition-based charge-domain technique has been introduced to the design of discrete-time mixer and filter [9]–[11]. In this work, both addition and subtraction based charge-domain technique is proposed to the design of discrete-time correlator. As shown in Figs. 8(a) and 9(a), sampling operation is implemented by turning on the switches controlled by clocks  $\phi_1, \phi_2 \dots \phi_7$  sequentially and sampling results are stored in the capacitors as  $V_{11}, V_{12} \dots V_{17}$  and  $V_{21}, V_{22} \dots V_{27}$  respectively. In Figs. 8(b) and 9(b), the stored voltages are weighted summed and averaged by turning on all the  $\phi_r$  controlled switches. Subtraction operation is implemented by exchanging the two plate connections of the capacitors. After summing and averaging, the correlator output is reset to  $V_{\rm DD}/2$  by clock  $\phi_{\rm r}$ . Thereby the proposed architecture can achieve the following correlation operation:

$$V_{\rm cor1} = \frac{V_{\rm DD}}{2} + \left( \sum_{i=1}^{3} C_{1i} V_{1i} - \sum_{i=5}^{7} C_{1(8-i)} V_{1i} \right) / \sum_{i=1}^{3} 2C_{1i}$$
(1)

$$V_{\rm cor2} = \frac{V_{\rm DD}}{2} + \left(\sum_{i=1}^{2} C_{2i}V_{2i} - \sum_{i=3}^{4} C_{2i}V_{2i} - C_{23}V_{25} + \sum_{i=6}^{7} C_{2(8-i)}V_{2i}\right) / \left(C_{24} + \sum_{i=1}^{3} 2C_{2i}\right)$$
(2)

where  $V_{cor1}$  is the auto-correlator output and  $V_{cor2}$  is the cross-correlator output.

If the antenna effect is considered, the incoming Gaussian first order derivative pulse will become second order derivative



Fig. 9. Discrete-time charge-domain cross-correlator. (a) Sampling mode. (b) Summing and averaging mode.

pulse after the antenna. As shown in Fig. 10, the proposed charge-domain correlator can be adapted for second order derivative pulse by simply reconfiguring the addition and the subtraction building blocks.

## B. Synchronization Control Units

The operation principle of the sliding scheme for coarse timing alignment is shown in Fig. 11(a). Before synchronization is achieved, the incoming signal lags the template of the correlator by  $\Delta t$ . Sliding scheme can reduce the timing error  $\Delta t$ towards zero at the step of 0.5 ns. The coarse timing alignment can be finished within 19 slides because the period of input signal is 10 ns and differs with the period of the sampling clock by 0.5 ns. Conventionally the control unit for sliding scheme is implemented by tapped delay-lines [5], [6]. However, the delay-lines not only consume additional power but also have uncertainty due to the variation of process, voltage, and temperature across large dies. In the proposed discrete-time receiver, the sliding control unit can be embedded to the charge-domain correlator seamlessly by utilizing the 2 GHz clock for the sampling correlator.

As shown in Fig. 11(b), the control unit is implemented by preprocessing the clock signal CLK. The gated inverter which drives the divide-by-five frequency divider is transparent to "0" but not transparent to "1". When the timing mismatch  $\Delta T$  between the incoming signal and the template of the correlator is larger than 0.5 ns, the auto-correlation result  $V_{cor1}$  is lower than the threshold of the following comparators. In this case, neither UP nor DN is generated and the clock removal signal RM is generated. This clock removal pulse will block the propagation of the input clock by one period. With the one-period delay, the phase of the sampling clock slides by 0.5 ns and the period of the sampling clock is increased from 10 ns to 10.5 ns. When the timing mismatch  $\Delta T$  between the incoming signal and the template of the correlator is less than 0.5 ns, the auto-correlation result  $V_{cor1}$  is large enough to be detected by the comparators. In this case, either UP or DN is generated and RM is not generated, which keeps the period of the sampling clock 10 ns. The



Fig. 10. (a) Gaussian second order derivative pulse and discrete templates for auto- and cross-correlation. (b) Auto-correlator. (c) Cross-correlator.

20-phase sampling clock is generated by combining the output edges of the divide-by-5 divider and the divide-by-4 divider. The threshold voltages chosen for correlation detection are critical for receiver BER performance. The threshold voltage for auto-correlation should be no less than  $V_{\rm TH0}$  of Fig. 4(b) in order to pull the cross-correlator to the monotonically increasing or decreasing interval. Ad hoc and sensor network application can be viewed as different in nature from the mobile communication where the communication nodes are always moving and the receiver must continuously monitor the input signal SNR and adaptively set the threshold for data decision. Sensor-based networks primarily rely on the ubiquitous placement of tiny fixed nodes to report on the physical world and thus the received signal strength and SNR for each node is almost fixed and a fixed threshold voltage can be used for each node.

When the sliding scheme finishes the coarse timing alignment, the auto-correlator output can be used for data decision. The data randomness is removed by multiplying the decision outputs (OUT,  $(\overline{OUT})$  with the outputs of the phase-switching comparators (SP, SN) of Fig. 12. The right-shifting control signal SR and the left-shifting control signal SL are further divided to SRE, SLE for even clock cycle and SRO, SLO for odd clock cycle to control the bi-directional shift register.





Fig. 11. Sliding scheme for coarse timing alignment. (a) Operation principle. (b) Circuit Implementation.



Fig. 12. Phase-switching control unit for fine timing alignment.

With each shifting control signal, the phase of the sampling clock varies by 62.5 ps. Fig. 13 shows the timing chart of the whole synchronization process. In the worst case the data synchronization can be finished within nineteen phase slides and four phase switches. After the synchronization is achieved, the sampling clock period returns to 10 ns.

Fig. 14 shows the circuit schematic of the proposed circulating bidirectional shift-register. In the conventional DLL [12], the delay line has fixed start and stop and the loop is initialized at the middle of the delay line. When the loop is shifted to one



Fig. 13. Timing chart of sliding scheme for coarse timing alignment and phase switching for fine timing alignment.



Fig. 14. Proposed circulating bidirectional shift-register. (a) Operation principle. (b) Circuit Implementation.

end, it cannot be shifted further and will be locked to the end if the same direction shifting-control signal is sent again. In the proposed phase-switching based timing alignment loop, the sequence of the eight phases of the PLL output are relative and there is no absolute definition for start phase and stop phase. To avoid deadlock when the loop comes to the end, a circulating bidirectional shift-register is proposed in Fig. 14(b). The circulating shift-register is initialized by setting the NAND gates inputs to "0" and the NOR gates inputs to "1".

# C. Capacitance Multiplier

Multi-phase PLL is a critical component for the sampling clock generation in the proposed UWB receiver. The loop filter



Fig. 15. Dual charge-pump phase-locked loops. (a) Conventional topology with extra charge-pump current. (b) Proposed topology without extra charge-pump current.

of the PLL is a barrier in fully integrating the receiver because of its large integrating capacitor. To reduce the chip area of the PLL, a capacitance multiplier is proposed to scale up the on-chip capacitor in [13]. The capacitance multiplier can be simplified to the dual charge-pump architecture of Fig. 15(a) noting that the capacitance  $C_p$  often remains below  $C_z$  by roughly a factor of 15 and thus the current flowing through  $C_p$  can be neglected [14]. The transfer function from the PFD to the filter can be approximated as

$$\frac{v_c}{\Delta\phi}(s) = \frac{I_p R_z}{2\pi} + \frac{1}{2\pi} \cdot \left[ I_p - \left(1 - \frac{1}{k}\right) I_p \right]$$
$$\times \frac{1}{s \cdot C_z/k} = \frac{I_p}{2\pi} \left( R_z + \frac{1}{s \cdot C_z} \right). \quad (3)$$

This result is the same as the transfer function of a normal PLL with loop filter capacitance  $C_z$ . However, the capacitance multiplier reduces the chip area at the cost of an extra  $(1 - 1/k)I_p$  charge-pump current consumption, where k is the capacitance scaling factor and  $I_p$  is the normal charge-pump current. This extra charge-pump current can be eliminated by reversing the order of the resistor and the capacitor in the filter and biasing the filter to  $V_{\rm DD}/2$ . As shown in Fig. 15(b), the total current consumption of the two charge-pumps is reduced to the normal value  $I_p$  while the capacitance is multiplied by k times. If the current flowing through  $C_p$  is neglected, the transfer function from the PFD to the filter can be expressed as

$$\frac{v_c}{\Delta\phi}(s) = \left[\frac{I_p}{k} + \left(1 - \frac{1}{k}\right)I_p\right]\frac{R_z}{2\pi} + \frac{1}{2\pi}\cdot\frac{I_p}{k}\frac{1}{s\cdot C_z/k} = \frac{I_p}{2\pi}\left(R_z + \frac{1}{s\cdot C_z}\right).$$
 (4)



Fig. 16. (a) Simulated waveforms with multipath, antenna and LNA effects. (b) BER dependency on LNA noise figure with and without multipath effects.

The above result is also the same as the transfer function of a normal PLL with loop filter capacitance  $C_z$ . In this work a 28 pF capacitance is scaled up to 224 pF (k = 8) and the total power consumption of receiver is reduced by 28% with the novel dual charge-pump PLL.

# D. Front-End Low-Noise Amplifier

Sensitivity, linearity and noise figure of the receiver is determined by front-end low-noise amplifier (LNA). For short distance ad hoc and sensor network application in this work, the multipath signal component is well below the direct signal. Fig. 16(a) shows the simulated waveforms with multipath, antenna and LNA effects. BER dependency on LNA noise figure at -65 dBm input with and without multipath effects is shown in Fig. 16(b). When the noise figure is below 15 dB, the receiver BER is lower than  $10^{-4}$ .

To reduce the power consumption of LNA, current reuse technique can be used to halve the current of the front-end stage by stacking NMOS and PMOS transistors as amplifying devices [15], [16]. As shown in Fig. 17(a), this current can be further reduced to 1/3 by overlapping the NMOS and PMOS pairs. Fig. 17(b) shows the complete circuit schematic for the proposed LNA with output buffer for measurement. Shunt feedback



Fig. 17. Proposed low-power LNA. (a) Operation principle. (b) Circuit implementation.

topology is employed to prevent amplitude distortion and provide 50- $\Omega$  input impedance over a wide bandwidth in the first stage. The simulated IIP3 and noise figure are -10 dBm and 12 dB from 200 MHz to 960 MHz. Shunt feedback topology can also eliminate the reference voltage generator for the proposed LNA. The width of the overlapped transistors in the first self-biased stage is half of the top and the bottom transistors so the DC voltages at the overlapped points O<sub>1</sub> and O<sub>2</sub> are  $V_{\rm DD}/3$  and  $2V_{\rm DD}/3$  which can be used as bias voltage for the second class AB stage.

## **IV. MEASUREMENT RESULTS**

The proposed UWB receiver was designed and fabricated in 1.2 V 65 nm CMOS process. The chip micrograph and layout are shown in Fig. 18. The core area and the capacitor area for PLL are 9000  $\mu$ m<sup>2</sup> and 0.5 mm<sup>2</sup> respectively.

Loop stability of the dual charge-pump PLL can be verified by measuring the control voltage  $V_{\rm ctrl}$  of the voltage-controlled oscillator (VCO). As shown in Fig. 19, the measured settling time of the PLL with 28 pF 8x capacitance multiplier is the same as the normal PLL with 224 pF capacitance. Fig. 20(a) shows the waveforms of the PLL eight-phase outputs. To achieve 62.5 ps fine timing alignment, the rms jitter of the PLL output should be well below 62.5 ps. Fig. 20(b) shows the measured jitter histogram where the rms jitter is 1.33 ps and peak-to-peak jitter is 15.76 ps.

To verify the coarse-tuning and the fine-tuning time steps, the coarse timing alignment loop with auto-correlation and the fine



Fig. 18. Chip micrograph and layout.



Fig. 19. Measured VCO control voltage for PLL frequency jump.



Fig. 20. Measurement results. (a) Waveforms of the PLL 8-phase outputs. (b) Jitter histogram.

timing alignment loop with cross-correlation are measured separately. Fig. 21(a) shows the measured waveforms of the coarse timing alignment loop where  $V_{in}$  is the incoming UWB pulse,



Fig. 21. Measurement results. (a) Sliding scheme for coarse timing alignment. (b) Phase switching for fine timing alignment. (c) Measured waveforms in synchronization.

REF is the reference frequency of the multi-phase PLL,  $\phi_5$  is the fifth sampling clock and OUT is the output data. As indicated in Fig. 13, when the sliding scheme for coarse timing alignment is activated, the period of the sampling clock is increased to 10.5 ns. Fig. 21(b) shows the measured waveforms of the fine timing alignment loop, where SP and SN correspond to the shifting control signal of the bi-directional shift register in Fig. 12. With each left or right shifting control signal, the sampling clock period is increased or decreased by 62.5 ps from 10 ns. The measured waveforms when the synchronization is

|                   |                      | ISSCC'06<br>[7] | ISSCC'06<br>[22] | ISSCC'09<br>[23] | This<br>Work |
|-------------------|----------------------|-----------------|------------------|------------------|--------------|
| Technology (CMOS) |                      | 0.18µm          | 0.18µm           | 0.13µm           | 65nm         |
| Supply Voltage    |                      | 1.8V            | 1.8V             | 1.2V             | 1.2V         |
| Data Rate         |                      | 400 Mb/s        | 20 Mb/s          | 40 Mb/s          | 100 Mb/s     |
| Frequency Band    |                      | 3-10GHz         | 3-10GHz          | 0-960MHz         | 0-960MHz     |
| Timing<br>Step    | Coarse               | 1ns             | 1ns              | NA               | 500ps        |
|                   | Fine                 | 100ps           | 60ps             | 800ps            | 62.5ps       |
| Power             | Front-End            | 81mW            | 8.8mW            | 1.8mW            | 1.22mW       |
|                   | Signal<br>Processing |                 | 18.8mW           | 1.5mW            | 1.71mW       |
| Energy per bit    |                      | 203pJ/bit       | 940pJ/bit        | 82.5pJ/bit       | 29.3pJ/bit   |

TABLE I Performance Comparisons



Fig. 22. Comparison with state-of-the-art correlation-based UWB receivers.

achieved are shown in Fig. 21(c). Since both the sliding scheme for coarse-tuning and the phase-switching for fine-tuning are inactivated, the period of the sampling clock returns to 10 ns.

The measured power consumption and voltage gain of the front-end LNA are 1.22 mW and 19 dB at sub-1 GHz. The trade-off between power consumption and synchronization accuracy is shown in Table I. Fig. 22 shows the power and energy comparison with the state-of-the-art correlation-based IR-UWB receivers. The proposed UWB receiver with the area- and power-efficient PLL achieves the lowest energy consumption of 29.3 pJ/bit with the 62.5-ps timing step for data synchronization.

## V. CONCLUSION

A 100 Mb/s, 2.93 mW DC-960 MHz band IR-UWB receiver is developed in 1.2 V 65 nm CMOS. The developed UWB receiver with the discrete-time charge-domain auto- and crosscorrelators for data synchronization achieves 0.5 ns and 62.5 ps timing step for coarse tuning and fine tuning respectively. The proposed dual charge-pump PLL can reduce the area of the loop filter without extra charge-pump current and thereby the total power consumption of receiver is reduced by 28%. The proposed UWB receiver achieves the lowest energy consumption of 29.3 pJ/bit in the state-of-the-art correlation-based IR-UWB receivers.

## REFERENCES

- [1] L. Liu, Y. Miyamoto, Z. Zhou, K. Sakaida, J. Ryu, K. Ishida, M. Takamiya, and T. Sakurai, "A 100 Mbps, 0.41 mW, DC-960 MHz band impulse UWB transceiver in 90 nm CMOS," in *IEEE Symp. VLSI Circuits Dig.*, Honolulu, HI, Jun. 2008, pp. 118–119.
- [2] A. Tamtrakarn, H. Ishikuro, K. Ishida, M. Takamiya, and T. Sakurai, "A 1-V 299 mW flashing UWB transceiver based on double thresholding scheme," in *IEEE Symp. VLSI Circuits Dig.*, Honolulu, HI, Jun. 2006, pp. 202–203.
- [3] L. Liu, T. Sakurai, and M. Takamiya, "A 1.28 mW 100 Mb/s impulse UWB receiver with charge-domain correlator and embedded sliding scheme for data synchronization," in *IEEE Symp. VLSI Circuits Dig.*, Kyoto, Japan, Jun. 2009, pp. 146–147.
- [4] L. Liu, T. Sakurai, and M. Takamiya, "A charge-domain auto- and cross-correlation based IR-UWB receiver with power- and area-efficient PLL for 62.5 ps step data synchronization in 65 nm CMOS," in *IEEE Symp. VLSI Circuits Dig.*, Honolulu, HI, Jun. 2010, pp. 27–28.
- [5] T. Terada, S. Yoshizumi, M. Muqsith, Y. Sanada, and T. Kuroda, "A CMOS ultra-wideband impulse radio transceiver for 1-Mb/s data communications and 2.5-cm range finding," *IEEE J. Solid-State Circuits*, vol. 41, no. 4, pp. 891–898, Apr. 2006.
- [6] T. Terada, S. Yoshizumi, Y. Sanada, and T. Kuroda, "A CMOS impulse radio ultra-wideband transceiver for 1 Mb/s data communications and 2.5 cm range findings," in *IEEE Symp. VLSI Circuits Dig.*, Kyoto, Japan, Jun. 2005, pp. 30–33.
- [7] Y. Zheng, Y. Tong, C. W. Ang, Y.-P. Xu, W. G. Yeoh, F. Lin, and R. Singh, "A CMOS carrier-less UWB transceiver for WPAN applications," in 2006 IEEE ISSCC Dig. Tech. Papers, San Francisco, CA, Feb. 2006, pp. 378–387.
- [8] J. Ryckaert, M. Verhelst, M. Badaroglu, S. D'Amico, V. De Heyn, C. Desset, P. Nuzzo, B. Van Poucke, P. Wambacq, A. Baschirotto, W. Dehaene, and G. Van der Plas, "A CMOS ultra-wideband receiver for low data-rate communication," *IEEE J. Solid-State Circuits*, vol. 42, no. 11, pp. 2515–2527, Nov. 2007.
- [9] K. Muhammad, D. Leipold, B. Staszewski, Y. C. Ho, C. M. Hung, K. Maggio, C. Fernando, T. Jung, J. Wallberg, J. S. Koh, S. John, I. Deng, O. Moreira, R. Staszewski, R. Katz, and O. Friedman, "A discrete-time Bluetooth receiver in a 0.13 μm digital CMOS process," in 2004 IEEE ISSCC Dig. Tech. Papers, San Francisco, CA, Feb. 2004, pp. 268–269.
- [10] R. B. Staszewski, K. Muhammad, D. Leipold, H. Chih-Ming, H. Yo-Chuol, J. L. Wallberg, C. Fernando, K. Maggio, R. Staszewski, T. Jung, K. Jinseok, S. John, D. Irene Yuanying, V. Sarda, O. Moreira-Tamayo, V. Mayega, R. Katz, O. Friedman, O. E. Eliezer, E. de-Obaldia, and P. T. Balsara, "All-digital TX frequency synthesizer and discrete-time receiver for Bluetooth radio in 130-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 39, no. 12, pp. 2278–2291, Dec. 2004.
- [11] A. Yoshizawa and S. Lida, "A gain-boosted discrete-time charge-domain FIR LPF with double-complementary MOS parametric amplifiers," in 2008 IEEE ISSCC Dig. Tech. Papers, San Francisco, CA, Feb. 2008, pp. 68–69.

- [12] Y. Okajima, M. Taguchi, M. Yanagawa, K. Nishimura, and O. Hamada, "Digital delay locked loop and design technique for high-speed synchronous interface," *IEICE Trans. Electron.*, vol. E79C, no. 6, pp. 798–807, 1996.
- [13] K. Shu, E. Sanchez-Sinencio, J. Silva-Martinez, and S. H. K. Embabi, "A 2.4-GHz monolithic fractional-N frequency synthesizer with robust phase-switching prescaler and loop capacitance multiplier," *IEEE J. Solid-State Circuits*, vol. 38, no. 6, pp. 866–874, Jun. 2003.
- [14] L. Liu and B. Li, "Fast locking scheme for PLL frequency synthesiser," *Electron. Lett.*, vol. 40, no. 15, pp. 918–920, 2004.
- [15] F. Gatta, E. Sacchi, F. Svelto, P. Vilmercati, and R. Castello, "A 2-dB noise figure 900-MHz differential CMOS LNA," *IEEE J. Solid-State Circuits*, vol. 36, no. 10, pp. 1444–1452, Oct. 2001.
- [16] A. N. Karanicolas, "A 2.7-V 900-MHz CMOS LNA and mixer," *IEEE J. Solid-State Circuits*, vol. 31, no. 12, pp. 1939–1944, Dec. 1996.
- [17] J. R. Bergervoet, K. S. Harish, S. Lee, D. Leenaerts, R. van de Beek, G. van der Weide, and R. Roovers, "A WiMedia-compliant UWB transceiver in 65 nm CMOS," in 2007 IEEE ISSCC Dig. Tech. Papers, San Francisco, CA, Feb. 2007, pp. 112–113.
- [18] M. Crepaldi, L. Chen, K. Dronson, J. Fernandes, and P. Kinget, "An ultra-low-power interference-robust IR-UWB transceiver chipset using self-synchronizing OOK modulation," in 2010 IEEE ISSCC Dig. Tech. Papers, San Francisco, CA, 2010, pp. 226–227.
- [19] S. Joo, W.-H. Chen, T.-Y. Choi, M.-K. Oh, J.-H. Park, J.-Y. Kim, and B. Jung, "A fully integrated 802.15.4a IR-UWB transceiver in 0.13 mm CMOS with digital RRC synthesis," in 2010 IEEE ISSCC Dig. Tech. Papers, San Francisco, CA, 2010, pp. 228–229.
- [20] F. S. Lee and A. P. Chandrakasan, "A 2.5 nJ/bit 0.65 V pulsed UWB receiver in 90 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 42, no. 12, pp. 2851–2859, Dec. 2007.
- [21] I. D. O'Donnell and R. W. Brodersen, "A 2.3 mW baseband impulse-UWB transceiver front-end in CMOS," in *IEEE Symp. VLSI Circuits Dig.*, Honolulu, HI, Jun. 2006, pp. 200–201.
- [22] J. Ryckaert, M. Badaroglu, V. De Heyn, G. Van der Plas, P. Nuzzo, A. Baschirotto, S. D'Amico, C. Desset, H. Suys, M. Libois, B. Van Poucke, P. Wambacq, and B. Gyselinckx, "A 16 mA UWB 3-to-5 GHz 20 Mpulses/s quadrature analog correlation receiver in 0.18 μm CMOS," in 2006 IEEE ISSCC Dig. Tech. Papers, San Francisco, CA, Feb. 2006, pp. 368–377.
- [23] M. Verhelst, N. Van Helleputte, G. Gielen, and W. Dehaene, "A reconfigurable, 0.13 mm CMOS 110 pJ/pulse, fully integrated IR-UWB receiver for communication and sub-cm ranging," in 2009 IEEE ISSCC Dig. Tech. Papers, San Francisco, CA, Feb. 2009, pp. 250–251.
- [24] Y. Zheng, A. A. M., K.-W. Wong, Y. J. The, S. A. P. H., D. D. Tran, W. G. Yeoh, and D.-L. Kwong, "A 0.18 mm CMOS 802.15.4a UWB transceiver for communication and localization," in 2008 IEEE ISSCC Dig. Tech. Papers, San Francisco, CA, Feb. 2008, pp. 118–600.
- [25] Y. J. Zheng, S. X. Diao, C. W. Ang, Y. Gao, F. C. Choong, Z. Chen, X. Liu, Y. S. Wang, X. J. Yuan, and C. H. Heng, "A 0.92/5.3 nJ/b UWB impulse radio SoC for communication and localization," in 2010 IEEE ISSCC Dig. Tech. Papers, San Francisco, CA, 2010, pp. 230–231.



Lechang Liu received the Ph.D. degree in electronic engineering from Shanghai Jiao Tong University, China, in 2006.

From 2007 to 2009, he held a postdoctoral research position in VLSI Design and Education Center at University of Tokyo, Japan. Since 2010, he has been a researcher associate with the Institute of Industrial Science at University of Tokyo, Japan. He has published over 20 conference and journal papers on wireless communications. His research on proximity communication and ultra-wideband

communication has been selected as technical highlights at the 2008 IEEE International Solid-State Circuits Conference (ISSCC) and 2009 IEEE Symposium on VLSI Circuits, respectively. His current research interests include CMOS analog/RF IC design and signal processing for wireless communications.

**Takayasu Sakurai** (S'77–M'78–SM'01–F'03) received the Ph.D. degree in electrical engineering from the University of Tokyo, Japan, in 1981.

In 1981 he joined Toshiba Corporation, where he designed CMOS DRAM, SRAM, RISC processors, DSPs, and SoC solutions. He has worked extensively on interconnect delay and capacitance modeling known as Sakurai model and alpha power-law MOS model. From 1988 through 1990, he was a visiting researcher at the University of California, Berkeley, where he conducted research in the field of VLSI

CAD. Since 1996, he has been a Professor at the University of Tokyo, working on low-power high-speed VLSI, memory design, interconnects, ubiquitous electronics, organic IC's and large-area electronics. He has published more than 400 technical publications including 100 invited presentations and several books and filed more than 200 patents.

Prof. Sakurai served as a conference chair for the Symp. on VLSI Circuits, and ICICDT, a vice chair for ASPDAC, a TPC chair for the first A-SSCC, and VLSI symp. and a program committee member for ISSCC, CICC, A-SSCC, DAC, ESSCIRC, ICCAD, ISLPED, and other international conferences. He is an executive committee chair for VLSI Symposia and a steering committee chair for A-SSCC from 2010. He is a recipient the of 2010 IEEE Donald O. Pederson Award in Solid-State Circuits, 2009 Achievement Award of IEICE, 2005 IEEE ICICDT award, 2004 IEEE Takuo Sugano award and 2005 P&I patent of the year award and four product awards. He gave keynote speech at more than 50 conferences including ISSCC, ESSCIRC, and ISLPED. He was an elected AdCom member for the IEEE Solid-State Circuits Society and an IEEE CAS and SSCS distinguished lecturer. He is a STARC Fellow, IEICE Fellow, and IEEE Fellow.



**Makoto Takamiya** (S'98–M'00) received the B.S., M.S., and Ph.D. degrees in electronic engineering from the University of Tokyo, Japan, in 1995, 1997, and 2000, respectively.

In 2000, he joined NEC Corporation, Japan, where he was engaged in the circuit design of high speed digital LSIs. In 2005, he joined the University of Tokyo, Japan, where he is an Associate Professor of VLSI Design and Education Center. His research interests include circuit design of low-power RF circuits, ultra-low-voltage digital circuits, and

large-area electronics with organic transistors.

Prof. Takamiya is a member of the technical program committee for the IEEE Symposium on VLSI Circuits and the IEEE Custom Integrated Circuits Conference (CICC).