ISSN 0352-9045



Journal of Microelectronics, Electronic Components and Materials **Vol. 51, No. 2(2021), June 2021** 

Revija za mikroelektroniko, elektronske sestavne dele in materiale **Ietnik 51, številka 2(2021), Junij 2021** 



# Informacije MIDEM 2-2021 Journal of Microelectronics, Electronic Components and Materials

#### VOLUME 51, NO. 2(178), LJUBLJANA, JUNE 2021 | LETNIK 51, NO. 2(178), LJUBLJANA, JUNIJ 2021

Published quarterly (March, June, September, December) by Society for Microelectronics, Electronic Components and Materials - MIDEM. Copyright © 2020. All rights reserved. | Revija izhaja trimesečno (marec, junij, september, december). Izdaja Strokovno društvo za mikroelektroniko, elektronske sestavne dele in materiale – Društvo MIDEM. Copyright © 2020. Vse pravice pridržane.

#### Editor in Chief | Glavni in odgovorni urednik

Marko Topič, University of Ljubljana (UL), Faculty of Electrical Engineering, Slovenia

#### Editor of Electronic Edition | Urednik elektronske izdaje

Kristijan Brecl, UL, Faculty of Electrical Engineering, Slovenia

#### Associate Editors | Odgovorni področni uredniki

Vanja Ambrožič, UL, Faculty of Electrical Engineering, Slovenia Arpad Bürmen, UL, Faculty of Electrical Engineering, Slovenia Danjela Kuščer Hrovatín, Jožef Stefan Institute, Slovenia Matija Pirc, UL, Faculty of Electrical Engineering, Slovenia Franc Smole, UL, Faculty of Electrical Engineering, Slovenia Matjaž Vidmar, UL, Faculty of Electrical Engineering, Slovenia

#### Editorial Board | Uredniški odbor

Mohamed Akil, ESIEE PARIS, France Giuseppe Buia, University of Padova, Italy Gian-Franco Dalla Betta, University of Trento, Italy Martyn Fice, University College London, United Kingdom Ciprian Iliescu, Institute of Bioengineering and Nanotechnology, A\*STAR, Singapore Marc Lethiecq, University of Tours, France Teresa Orlowska-Kowalska, Wroclaw University of Technology, Poland Luca Palmieri, University of Padova, Italy Goran Stojanović, University of Novi Sad, Serbia

#### International Advisory Board | Časopisni svet

Janez Tronteli, UL, Faculty of Electrical Engineering, Slovenia - Chairman Cor Claeys, IMEC, Leuven, Belgium Denis Đonlagić, University of Maribor, Faculty of Elec. Eng. and Computer Science, Slovenia Zvonko Fazarinc, CIS, Stanford University, Stanford, USA Leszek J. Golonka, Technical University Wroclaw, Wroclaw, Poland Jean-Marie Haussonne, EIC-LUŚAC, Octeville, France Barbara Malič, Jožef Stefan Institute, Slovenia Miran Mozetič, Jožef Stefan Institute, Slovenia Stane Pejovnik, UL, Faculty of Chemistry and Chemical Technology, Slovenia Giorgio Pignatel, University of Perugia, Italy Giovanni Soncini, University of Trento, Trento, Italy Iztok Šorli, MIKROIKS d.o.o., Ljubljana, Slovenia Hong Wang, Xi'an Jiaotong University, China

#### Headquarters | Naslov uredništva

Uredništvo Informacije MIDEM MIDEM pri MIKROIKS Stegne 11, 1521 Ljubljana, Slovenia T. +386 (0)1 513 37 68 F. + 386 (0)1 513 37 71 E. info@midem-drustvo.si www.midem-drustvo.si

Annual subscription rate is 160 EUR, separate issue is 40 EUR. MIDEM members and Society sponsors receive current issues for free. Scientific Council for Technical Sciences of Slovenian Research Agency has recognized Informacije MIDEM as scientific Journal for microelectronics, electronic components and materials. Publishing of the Journal is cofinanced by Slovenian Research Agency and by Society sponsors. Scientific and professional papers published in the journal are indexed and abstracted in COBISS and INSPEC databases. The Journal is indexed by ISI® for Sci Search®, Research Alert® and Material Science Citation Index™. | Letna naročnina je 160 EUR, cena posamezne številke pa 40 EUR. Člani in sponzorji MIDEM prejemajo posamezne številke brezplačno. Znanstveni svet za tehnične vede je

podal pozitivno mnenje o reviji kot znanstveno-strokovni reviji za mikroelektroniko, elektronske sestavne dele in materiale. Izdajo revije sofinancirajo ARRS in sponzorji društva. Znanstveno-strokovne prispevke objavljene v Informacijah MIDEM zajemamo v podatkovne baze COBISS in INSPEC. Prispevke iz revije zajema ISI® v naslednje svoje produkte: Sci Search®, Research Alert® in Materials Science Citation Index™

Design | Oblikovanje: Snežana Madić Lešnik; Printed by | tisk: Biro M, Ljubljana; Circulation | Naklada: 1000 issues | izvodov; Slovenia Taxe Percue | Poštnina plačana pri pošti 1102 Ljubljana

Informacije (MIDEM

Journal of Microelectronics, Electronic Components and Materials vol. 51, No. 2(2021)

# Content | Vsebina

| Original scientific papers                                                                                                                                                                                                                                                        |     | Izvirni znanstveni članki                                                                                                                                                                                                                             |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| M. Konal, F. Kacar:<br>Extended Bandwidth Method on Symmetrical<br>Operational Transconductance<br>Amplifier and Filter Application                                                                                                                                               | 95  | M. Konal, F. Kacar:<br>Metoda razširjene pasovne širine na simetričnem<br>operacijskem ojačevalniku in filtru                                                                                                                                         |
| P. Bhattacharjee, B. K. Bhattacharyya, A. Majumder:<br>Vector Controlled Delay Cell with Nearly Identical<br>Rise/Fall Time for Processor Clock Application                                                                                                                       | 101 | P. Bhattacharjee, B. K. Bhattacharyya, A. Majumder:<br>Vektorsko nadzorovana zakasnilna celica s<br>skoraj enakim časom vzpona/ padca za uporabo<br>procesorske ure                                                                                   |
| K. Orman, Y. Babacan:<br>The Implementation of Logic Gates Using<br>Only Memristor Based Neuristor                                                                                                                                                                                | 113 | K. Orman, Y. Babacan:<br>Uporaba logičnih vrat le z uporabo nevristorja na<br>osnovi memristorja                                                                                                                                                      |
| G. Nanjareddy, V. Mysuru Boregowda,<br>C. Prasanna Raj:<br>Low Power Area Optimum Configurable 160 to<br>2560 Subcarrier Orthogonal Frequency Division<br>Multiplexing Modulator-Demodulator Architecture<br>based on Systolic Array and Distributive<br>Arithmetic Look-Up Table | 119 | G. Nanjareddy, V. Mysuru Boregowda,<br>C. Prasanna Raj:<br>Nastavljiv modulator-demodulator s frekvenčnim<br>multipleksiranjem s 160 do 2560 ortogonalnimi<br>podnosilci, zasnovan na arhitekturi sistoličnih polj s<br>porazdeljeno vpogledno tabelo |
| M. Vidmar:<br>Extending Leeson's Equation                                                                                                                                                                                                                                         | 135 | M. Vidmar:<br>Razširitev Leesonove Enačbe                                                                                                                                                                                                             |
| Announcement and Call for Papers:<br>56 <sup>th</sup> International Conference on Microelectronics,<br>Devices and Materials<br>with the Workshop on Personal Sensor for Remote<br>Health Care Monitoring                                                                         | 147 | Napoved in vabilo k udeležbi:<br>56. Mednarodna konferenca o mikroelektroniki,<br>z delavnico o osebnih senzorjih za oddaljeno<br>spremljanje zdravstvenega stanja                                                                                    |
| Front page:<br>Simulated OEO phase noise. (M. Vidmar)                                                                                                                                                                                                                             |     | Naslovnica:<br>Simuliran OEO fazni šum. (M. Vidmar)                                                                                                                                                                                                   |

https://doi.org/10.33180/InfMIDEM2021.201



Journal of Microelectronics, Electronic Components and Materials Vol. 51, No. 2(2021), 95 – 100

# Extended Bandwidth Method on Symmetrical Operational Transconductance Amplifier and Filter Application

Mustafa Konal<sup>1</sup>, Firat Kacar<sup>2</sup>

<sup>1</sup>Tekirdag Namik Kemal University, Electronics and Telecommunication Engineering Department, Tekirdag, Turkey <sup>2</sup>Istanbul University-Cerrahpasa, Electrical and Electronics Engineering Department, Istanbul, Turkey

**Abstract:** In this paper, a method for extending the bandwidth of a symmetrical operational transconductance amplifier (OTA) circuit is proposed. Resistive compensation technique is applied to all current mirrors of the symmetrical OTA circuit. A passive resistor is connected between the gate and the drain of each primary transistor of the current mirrors in the symmetrical OTA structure. The performance of the proposed OTA with extended transconductance bandwidth is analyzed by implementing filter structures. The advantage of using the resistive compensation technique is demonstrated. The proposed symmetrical OTA and the filters are simulated with LTSPICE by using TSMC 0.18 µm CMOS process parameters.

Keywords: Symmetrical OTA, Extended bandwidth, Filter, Resistive compensation technique

# Metoda razširjene pasovne širine na simetričnem operacijskem ojačevalniku in filtru

**Izvleček:** V članku je predlagana metoda za razširitev pasovne širine vezja simetričnega ojačevalnika (OTA). Tehnika uporovne kompenzacije je uporabljena na vseh zrcalih simetričnega vezja OTA. Pasivni upor je vezan med vrati in ponorom vsakega primarnega tranzistorja trenutnih zrcal v simetrični OTA strukturi. Učinkovitost predlaganega OTA z razširjeno pasovno širino transkonduktance je analizirana z uporabo filtrirnih struktur. Dokazana je prednost uporabe uporovne kompenzacijske tehnike. Predlagani simetrični OTA in filtri so simulirani z LTSPICE v TSMC 0,18 µm CMOS tehnologiji.

Ključne besede: Simetrični OTA, razširjena pasovna širina, filter, tehnika uporovne kompenzacije

\* Corresponding Author's e-mail: mkonal@nku.edu.tr

### 1 Introduction

Operational transconductance amplifiers (OTA) are significant active elements for continuous-time signal processing applications. In the literature, many OTA based circuit blocks such as filters [1-6], oscillators [7-9], mem-elements [10-11] and inductance simulators [12-14] etc. have been reported. For several applications operating at high frequencies, it is necessary to use integrated circuits with wide bandwidth. Therefore, a resistive compensation technique can be used to widen bandwidth of the circuits [15-17]. A wide bandwidth second-generation current conveyor based four-quadrant mixed mode analogue multiplier is presented in [15]. A conventional low voltage cascode current mirror is analyzed in [16].

In this paper, a symmetrical operational transconductance amplifier with extended transconductance bandwidth is proposed. Resistive compensation technique is applied to the current mirrors of the OTA in order to increase the transconductance bandwidth. The transconductance of the OTA can be adjusted electronically by changing the biasing current and resistive compensation technique can be applied to the circuit with different resistor values. Temperature performance of the OTA is analyzed for different temperatures. In addition, in order to demonstrate the performance of the proposed OTA, it is used in a second order low-pass filter structure. Both symmetrical OTA and filter circuits are analyzed with LTSPICE using 0.18 µm TSMC CMOS process parameters.

# 2 Extended bandwidth symmetrical OTA

Resistive compensation technique is applied to the symmetrical OTA in order to extend its bandwidth. A passive resistor is connected between the gate and drain of each main transistor of the current mirrors in the presented symmetrical OTA. The current mirror circuits without and with resistive compensation are given in Fig. 1a and Fig. 1b.



**Figure 1:** Simple current mirror (a) without resistive compensation (b) with resistive compensation.

Small signal models of the simple current mirrors without and with resistive compensation are given in Fig. 2a and Fig. 2b, respectively.



**Figure 2:** Small-signal model of the current mirrors (a) without resistive compensation (b) with resistive compensation.

By considering transistors  $M_1$  and  $M_2$  are identical, the relationship between the input and the output cur-

rents is given in [15] as the following equations for the simple current mirror circuit for which the equivalent circuit given in Fig. 2a ( $C_{gs1} = C_{gs2} = C_{gs'} r_{o1} = r_{o2} = r_{o'} g_{m1} = g_{m2} = g_m$ ). The bandwidth of the circuit depends on the gate-source capacitance  $C_{gs}$  and transconductance  $g_m$  as given in Eq (2) [15].

$$\frac{I_{out}}{I_{in}} = \frac{1}{s \frac{2}{g_m} C_{gs} + 1}$$
(1)

$$\omega_0 = \frac{g_m}{2C_{gs}}, \quad f_0 = \frac{g_m}{4\pi C_{gs}}$$
 (2)

After the resistive compensation technique is applied to a simple current mirror (equivalent circuit given in Fig. 2b), the relationship between the input and the output currents is given by Eq (3) [15]. By choosing the value of resistor as  $R = 1/g_{m'}$  the expression for frequency as given in Eq (4) is obtained. It can be seen from the equation that the bandwidth of the current mirror with compensation technique is increased by a factor of two compared to the simple current mirror [15].

$$\frac{I_{out}}{I_{in}} = \frac{sRC_{gs} + 1}{s^2 \frac{R}{g_m} C_{gs}^2 + s \frac{(2r_o + R)}{r_o g_m} C_{gs} + 1}$$
(3)

$$\omega_0 = \frac{g_m}{C_{gs}}, \quad f_0 = \frac{g_m}{2\pi C_{gs}} \tag{4}$$

The symbol and the CMOS realization of a symmetrical OTA with extended transconductance bandwidth are shown in Fig. 3a and Fig. 3b, respectively. The terminal relations of the symmetrical OTA is given by Eq (5) as follows;

$$I_o = g_m (V_i^+ - V_i^-)$$
<sup>(5)</sup>

The supply voltages and biasing current are chosen as  $V_{DD} = -V_{SS} = 1.5$ V and  $I_B = 400 \ \mu$ A, respectively. The transconductance of the OTA is calculated as  $1.02 \ m$ A/V and the passive resistors are taken as  $R_{P1} = R_{P2} = R_N = 1/$  $g_m = 980\Omega$ . The aspect ratios of the transistors are given in Table 1. The frequency dependence of the transconductances of a symmetrical OTA without and with resistive compensation is given in Fig. 4a and the zoomedin version is shown in Fig. 4b. It can be seen that the bandwidth of the  $g_m$  is extended by approximately .



**Figure 3:** a) Circuit symbol b) CMOS realization of a symmetrical OTA with extended transconductance bandwidth.



**Figure 4:** (a) Transconductances of the symmetrical OTA without and with resistive compensation (b) zoomed-in version.

The transconductance of the OTA can be adjusted by changing the biasing current  $I_g$ . For values of biasing current 50  $\mu$ A, 100  $\mu$ A, 200  $\mu$ A and 400  $\mu$ A the transconductance of the symmetrical OTA is 796.9  $\mu$ A/V, 925.4  $\mu$ A/V, 986.5  $\mu$ A/V and 1.02 *m*A/V, respectively. The dependence of transconductance on the value of compensation resistor is depicted in Fig. 5. In order to make the circuit suitable for analog integrated circuit

implementations passive resistors can be replaced by MOS resistors.

Table 1: Transistors aspect ratios.

| Transistors                                                       | W(μm)                                                                                      | L(µm) |  |  |  |  |  |  |  |
|-------------------------------------------------------------------|--------------------------------------------------------------------------------------------|-------|--|--|--|--|--|--|--|
| M <sub>1</sub> , M <sub>2</sub>                                   | 0.72                                                                                       | 0.18  |  |  |  |  |  |  |  |
| M <sub>3</sub> , M <sub>4</sub> , M <sub>5</sub> , M <sub>6</sub> | 1.8                                                                                        | 0.18  |  |  |  |  |  |  |  |
| M <sub>7</sub> , M <sub>8</sub>                                   | 5.4                                                                                        | 0.18  |  |  |  |  |  |  |  |
| V <sub>DD</sub> =-V <sub>SS</sub> =1.5 V,                         | $V_{DD}$ =- $V_{SS}$ =1.5 V, $I_B$ =400 $\mu$ A, $R_{P1}$ = $R_{P2}$ = $R_N$ =980 $\Omega$ |       |  |  |  |  |  |  |  |



**Figure 5:** Transconductance gains of the symmetrical OTA without and with resistive compensation for different biasing currents.

The temperature performance of the symmetrical OTA with resistive compensation is simulated for the resistor values of 1.25 k $\Omega$  and 980  $\Omega$  for various temperatures from 0°C to 100°C as shown in Fig. 6. It can be seen from the figure that the transconductance of the OTA decreases with increasing temperature.



**Figure 6:** Temperature performance of the symmetrical OTA with resistive compensation.

# 3 Filter application of the symmetrical **OT**A

A filter structure is proposed to demonstrate the advantage of using the resistive compensation technique. The proposed filter based on symmetrical OTA is given in Fig. 7.



Figure 7: Filter structure based on symmetrical OTA. The proposed circuit can simultaneously realize lowpass, high-pass, and band-pass filter functions. Depending on the voltage status of  $V_{in1}$ ,  $V_{in2}$ , and  $V_{in3}$ , one of the following three filter functions is realized:

$$\begin{split} V_{in1} = & V_{in} \text{ and } V_{in2} = & V_{in3} = 0, \text{ second order low-pass filter.} \\ V_{in2} = & V_{in} \text{ and } V_{in1} = & V_{in3} = 0, \text{ second order band-pass filter.} \\ V_{in3} = & V_{in} \text{ and } V_{in1} = & V_{in2} = 0, \text{ second order high-pass filter.} \end{split}$$
i) ii)

iii)

Fig. 8 shows the gain-frequency responses of all filter structures designed for cutoff frequency of 3.25 MHz. The passive capacitor values are chosen as  $C_1 = C_2 =$ 50 pF. Symmetrical OTAs with resistive compensation given in Fig 3b are used as active elements.



Figure 8: Gain-frequency responses of the symmetrical OTA based filter structures.

The gain-frequency responses of the low-pass filter based on symmetrical OTA with resistive compensation technique for the resistor values of 980  $\Omega$  and 1.25  $k\Omega$  are shown in Fig. 9. The bandwidth of the filter is extended from 3.2 MHz to 3.9 MHz as shown in Fig. 9b and improved about 18%.



Figure 9: (a) Gain-frequency responses of the low-pass filters based on symmetrical OTA with resistive compensation (b) zoomed-in version.

### 4 Conclusion

In this study, a symmetrical OTA structure with extended transconductance bandwidth is proposed. Resistive compensation technique is applied to the current mirrors of the OTA structure. Due to the use of resistive compensation, bandwidth of the symmetrical OTA is improved. Using resistors and matching them to the g<sub>m</sub> value adjusted using the biasing currents of the OTA, the bandwidth can be increased. Furthermore, the transconductance value of the OTA with resistive

compensation is simulated for the varied temperatures and the improved performance is demonstrated. Passive resistors can be replaced by MOS resistors in order to make the circuit suitable for analog integration. Additionally, a low-pass filter circuit is realized with the proposed OTA and the gain-frequency response of the filter is analyzed. Bandwidth of the filter is extended by 18%. The simulations of the symmetrical OTA and filter structures are performed with LTSPICE using 0.18 µm TSMC CMOS technology.

# 5 Conflict of interest

We have no conflict of interest to declare.

# 6 References

- Psychalinos, C., Kasimis, C., & Khateb, F. (2018). Multiple-input single-output universal biquad filter using single output operational transconductance amplifiers. AEU-International Journal of Electronics and Communications, 93, 360-367. <u>https://doi.org/10.1016/j.aeue.2018.06.037</u>
- Bano, S., Narejo, G. B., & Shah, S. U. A. (2019). Low Voltage Low Power Single Ended Operational Transconductance Amplifier for Low Frequency Applications. Wireless Personal Communications, 106(4), 1875-1884. <u>https://doi.org/10.1007/s11277-018-5726-1</u>
- 3. Ali, H. K., & Abdaljabar, J. S. (2017). Analysis and Simulation of Active Filters Using Operational Transconductance Amplifier (OTA). European Scientific Journal, 13(15), 170-184. https://doi.org/10.19044/esj.2017.v13n15p170
- 4. Mathad, R. S. (2014). Low frequency filter design using operational transconductance amplifier. IOSR Journal of Engineering (IOSRJEN), 4(4), 21-28.

```
https://doi.org/10.9790/3021-04462128
```

- Rezaei, F., & Azhari, S. J. (2011). Ultra low voltage, high performance operational transconductance amplifier and its application in a tunable Gm-C filter. Microelectronics Journal, 42(6), 827-836. <u>https://doi.org/10.1016/j.mejo.2011.04.012</u>
- Abuelma'Atti, M. T., & Quddus, A. (1996). Programmable voltage-mode multifunction filter using two current conveyors and one operational transconductance amplifier. Active and passive electronic components, 19(3), 133-138. https://doi.org/10.1155/1996/29750

- Prommee, P., & Dejhan, K. (2002). An integrable electronic-controlled quadrature sinusoidal oscillator using CMOS operational transconductance amplifier. International Journal of Electronics, 89(5), 365-379. https://doi.org/10.1080/713810385
- Abuelma'Atti, M. T., & Khan, M. H. (1996). Grounded capacitor oscillators using a single operational transconductance amplifier. Active and passive electronic components, 19, 91-98. <u>https://doi.org/10.1155/1996/17943</u>
- Senani, R., & Kumar, B. A. (1989). Linearly tunable Wien bridge oscillator realised with operational transconductance amplifiers. Electronics Letters, 25(1), 19-21. <u>https://doi.org/10.1049/el:19890014</u>
- 10. Babacan, Y. (2018). An Operational Transconductance Amplifier-based Memcapacitor and Meminductor. Electrica, 18(1), 36-38. https://doi.org/10.5152/iujeee.2018.1806
- 11. Taskiran, Z. G. C., Ayten, U. E., & Sedef, H. (2019). Dual-output operational transconductance amplifier-based electronically controllable memristance simulator circuit. Circuits, Systems, and Signal Processing, 38(1), 26-40. https://doi.org/10.1007/s00034-018-0856-y
- Koomgaew, C., Petchmaneelumka, W., & Riewruja, V. (2009, August). OTA-based floating inductance simulator. In 2009 ICCAS-SICE (pp. 857-860). IEEE.
- Jaikla, W., & Siripruchyanan, M. (2006, October). Floating positive and negative inductance simulators based on OTAs. In 2006 International Symposium on Communications and Information Technologies (pp. 344-347). IEEE.
- 14. Singh, V. (2003). Floating operational transconductance amplifier based grounded impedance. IEE Proceedings-Circuits, Devices and Systems, 150(1), 27-30.

```
https://doi.org/10.1049/ip-cds:20030367
```

 Ettaghzouti, T., Hassen, N., Garradhi, K., & Besbes, K. (2018). Wide bandwidth CMOS four-quadrant mixed mode analogue multiplier using a second generation current conveyor circuit. Turkish Journal of Electrical Engineering & Computer Sciences, 26(2), 882-894.

```
https://doi.org/10.3906/elk-1708-179
```

 Gupta, M., Singh, U., & Srivastava, R. (2014). Bandwidth extension of high compliance current mirror by using compensation methods. Active and passive electronic components, 2014. https://doi.org/10.1155/2014/274795 17. Voo, T., & Toumazou, C. (1995). High-speed current mirror resistive compensation technique. Electronics Letters, 31(4), 248-250. https://doi.org/10.1049/el:19950207



Copyright © 2021 by the Authors. This is an open access article distributed under the Creative Com-

mons Attribution (CC BY) License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Arrived: 28. 11. 2020 Accepted: 07. 04. 2021 https://doi.org/10.33180/InfMIDEM2021.202



Journal of Microelectronics, Electronic Components and Materials Vol. 51, No. 2(2021), 101 – 112

# Vector Controlled Delay Cell with Nearly Identical Rise/Fall Time for Processor Clock Application

Pritam Bhattacharjee<sup>1</sup>, Bidyut K. Bhattacharyya<sup>2</sup>, Alak Majumder<sup>3</sup>

<sup>1</sup>Department of Computer Science & Engineering, Amrita Vishwa Vidyapeetham, Amritapuri, Kerala, India.

<sup>2</sup>Packaging Research Center at Georgia Institute of Technology, Atlanta, USA. <sup>3</sup>National Institute of Technology (NIT), Department of Electronics & Communication Engineering, Integrated Circuit And System (i-CAS) Laboratory, Arunachal Pradesh, India.

**Abstract:** In the design of modern processor chips, proper clock distribution is a very important aspect which impacts the chip performance. It is the active cell of delay circuits and cells with variable delay that have the major involvement in clock distribution, thereby deciding the time slacks of all functionalities inside the chip. Because they help in proper input to output signal transmission with the adjustment of variable timing delays and monitor the output signal to have equal rise/fall time, which most of the existing delay elements fail to deliver. Therefore in this article, we have proposed an input vector based design of variable delay with balanced rise time and fall time for the output signal. We have also estimated the delay and output voltage in terms of a mathematical model. This new configuration is executed across the commercial platform of Cadence Virtuoso<sup>®</sup> using 90nm technology node while steered by a 1GHz input signal and power supply of 1.1 V. The execution outcome confirms the desired features of our proposed design under typical conditions and even in process corner variations.

Keywords: vector-controlled circuit design; variable delay cell, Rise/Fall time; Processor Clock; CMOS process technology

# Vektorsko nadzorovana zakasnilna celica s skoraj enakim časom vzpona/ padca za uporabo procesorske ure

**Izvleček:** Pri zasnovi sodobnih procesorskih čipov je ustrezna razporeditev ure zelo pomemben vidik, ki vpliva na delovanje čipa. Aktivna celica zakasnilnih vezij in celic s spremenljivo zakasnitvijo ima glavno vlogo pri porazdelitvi ure in tako odloča o časovnih zakasnitvah vseh funkcij znotraj čipa. Pomaga pri pravilnem prenosu vhodnega signala s prilagoditvijo spremenljivih časovnih zamikov in nadzoruje izhodni signal, da ima enak čas vzpona / padca, kar večina obstoječih elementov zakasnitve ne dosega. V članku predlagamo zasnovo spremenljive zakasnitve na osnovi vhodnega vektorja z uravnoteženim časom vzpona in padca izhodnega signala. Zakasnitev in izhodno napetost smo ocenili z matematičnim modelom. Nova konfiguracija se izvaja na komercialni platformi Cadence Virtuoso<sup>®</sup> z uporabo 90nm tehnologije s 1 GHz krmilnim signalom in napajanjem 1.1 V. Rezultat izvedbe potrjuje želene značilnosti našega predlaganega načrta v tipičnih in netipičnih pogojih.

Ključne besede: zasnova vektorsko krmiljenega vezja; celica s spremenljivo zakasnitvijo; čas vzpona / padca; procesorska ura; tehnologija CMOS

\* Corresponding Author's e-mail: pritambhattacharjee@am.amrita.edu

# 1 Introduction

Since the past few decades, we are able to witness a lot of advancement in the consumer electronics like computers, computer accessories, mobile phones as well as their inner components for example, central-processing-unit (CPU) or even the graphics-processing-unit (GPU). Semiconductor giants like Intel, AMD and QUAL-COMM have successfully brought up the discrete-level integration of CPU and GPU on a single platform [1, 2]. However, as result of this integration, clock signaling and its efficient routing have become very important in order to maintain proper functioning and performance of each CPU and GPU. The efficacy of clock signaling and transmission to CPU/GPU is ascertained by the components involved in clock distribution as seen from Fig. 1(a). Basically, the clock signal traverses through multi-buffer stages (in the form of tree-like structure, viz. clock tree) before reaching the dedicated CPU, Graphics or PCIe sockets. All these units operate at different frequencies, but they are supposed to function in parallel. Therefore, the timing parameters involved in the signal transmission are always a matter of concern so as to extract the best performance out of the endproduct [3].



**Figure 1:** (a) Typical style of clock distribution inside a processor chip (b) clock tree design.

In fact, it is these buffers which play a crucial role in forming the clock tree for the clock distribution network (CDN) as shown in Fig. 1(b), wherein, CDN is purposed to output a synchronizing signal to coordinate the functioning of each circuit block inside the processor chip. The buffers of clock tree set up the delay for signal transmission along the branches of tree so that the timing of signals can be balanced at each and every node (or leaf as directed in Fig. 1(b)) connected to the units like CPU, graphics or PCIe socket. However, the delay incorporated through these buffer cells is of constant value and most often it is required to use different sized buffers (i.e., the sizes of Buffer\_1  $\neq$  Buffer\_2  $\neq$  Buffer 3 and so on) such that the clock arrival time across all sequential elements inside CPU or GPU chip remains synchronized. But, nowadays the CDN designers are more interested and dependent on the use of variable delay cells (with proper control to adjust the delay variability) so that the clock trees inside CDN are more versatile in terms of their functioning. In fact, the use of variable delay cells is quite popular in other CDN components like locked loops (both DLL and PLL), oscillators, frequency multipliers and dividers and many other System-on-Chips (SoCs). As a matter of fact, the use of these delay elements in the form of cluster (i.e., delay line) is also popular for the construction of SoC time measurement circuits (TMC) that are installed to measure internal timing parameters of the chip [4-6]. Hereby, the design creditability of delay circuits offering fine-tuned values of delay indirectly supports the working performance of TMCs. Nevertheless, the circuit design of such delay elements for modern SoCs is difficult to tackle because of their own trade-offs in design specifications and the concern is also relatively high while considering their involvement in the computational aspects of embedded systems [7]. Therefore, many researchers and circuit designers have invested themselves in the development of different delay circuits.

#### 1.1 Background of delay cell design

Although the research on delay circuit design has been present for guite a long time and there are several literatures, but we have focused on basic design structures like transmission gate-controlled delay cell element (Trans-DE) [8,9], concatenated inverter-controlling delay cell element (viz., CI-DE) [8, 10] and current starved controlling delay element (viz., CS-DE) [8, 10]. Up to now, any circuital modifications done on the delay circuit design revolve around this delay cell primitives and all of them produce delay based on the change of physical dimensions of devices used in the architecture. But nowadays, substantial research is invested into the design of delay cell architectures with fixed dimensions that are capable of generating variable delay values at the output. Such design was pioneered with the advent of Vernier Delay Line (VDL) [12, 13], as presented in Fig. 2. It has many buffers that are connected along the customised rows and columns. The delays introduced by a buffer is equal to one of two values §, and  $\S_2$  ( $\S_1 \neq \S_2$ ). The delay value obtained at a circuit node is given by  $\left\{ \left\| \int_{a,b} = (a \times \int_{1}) + (b \times \int_{2}) \right\| \cdot t \right\}$  depending on input cycle time viz., 't'. The magnitude difference of §, and §, (i.e.,  $|\$_1-\$_2|$ ) represents the adjustability of the delay in this design. As the buffers are typically designed using complementary metal-oxide-semiconductor (CMOS) technology, the input gate of every MOS along the customised rows and columns serves as the knob to tune

the delay value, which is not convenient and the architecture is unnecessarily crowded.



Figure 2: Design style of Vernier delay line [12].

In [14] the concept of Voltage Controlled based Delay Element i.e., VC-DE was presented. This has also been the foundation for designing digitally-controlled or even the digital-based programmable delay elements (DC-DE/DP-DE). From design prospective, DC-DE is not much different from DP-DE and they are treated as a sub-class of vector-controlled delay elements. The changes of delay value in DC-DE or DP-DE are based on the various combinations of input vectors [15-17]. In case of VC-DE, typically different bias/control voltages are employed to obtain the variable delay values. However, the design layover of both DC-DE, VC-DE along with DP-DE centres on the concept of controlling terminal voltages/currents across MOS devices of the fundamental designs viz., Trans-DE, CI-DE, also sometimes the CS-DE. The value of channel resistance  $(R_{ON})$ when the device is ON and the logical gate capacitance  $(C_{G'})$  as stated in equation (1) and (2) directly impact the propagation delay ( $\tau = R_{ON} \times C_{G'}$ ) of the delay circuit [18, 19].

$$R_{\rm oN} = \frac{1}{k(V_{\rm GS} - V_{\rm th})} \tag{1}$$

$$C_{G'} = \frac{\Delta Q_G}{V_{dd}}$$
(2)

Parameter 'k' comprises of device related terms, V<sub>GS</sub> is the gate-to-source voltage, V<sub>th</sub> is threshold voltage of MOS devices, V<sub>dd</sub> is the power supply voltage, and  $\Delta Q_{g}$  is the gate charge which depends on V<sub>GS</sub> [20].

The matter of associating DC/DP with the delay circuits is to make the delay cell design strong and stable. It is the proper capacity of these DC/DP techniques to tune the delay values which determine how they can generate variable delay at the output. So, it is important to understand how well these techniques suit with the fundamental delay elements.

#### 1.2 Consequences in the design of delay cell structures

During the literature survey, we concluded that the DC/ DP-DE implementation is more compatible with delay elements viz., CI-DE and CS-DE, instead of being incorporated with Trans-DE. The reason for this can be seen in Fig. 3(a) where the n-channel MOS (nMOS) i.e.,  $M_2$ and the p-channel MOS (pMOS) i.e.,  $M_1$  of the transmission gate ( $T_G$ ) are ON for most of the time to maintain proper signaling integrity from the input ( $V_{in}$ ) to the output (herein, the node 'P') and results in a significant amount of power dissipation across  $V_{dd}$ . That questions the appropriateness of the Trans-DE cell design.



**Figure 3:** (a) CMOS based Schmitt trigger attached to TG (b) Design style of CI-DE.

The CI-DE, being one of the primitive architectures of delay elements comprises of 2 CMOS inverters back-toback depicted in Fig. 3(b). Its physical time delay is given by equation (3) where  $C_A$  is the capacitance across node 'A' and  $V_{out}$  is the amount of voltage change at the output.

$$\tau = \frac{C_A V_{out}(t)}{I}$$
(3)

| Circuital Schemes | VDL<br>[12, 13]                                                                                                                       | Trans-DE<br>[8, 10]                                                                                                                                               | CI-DE<br>[8, 10]                                                                                                                             | CS-DE<br>[8, 9]                                                                                                                               |
|-------------------|---------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|
| Tuning approach   |                                                                                                                                       | Voltage-controlled                                                                                                                                                | Vector-controlled                                                                                                                            | Vector-controlled                                                                                                                             |
| Pros              | <ul> <li>Constructed with<br/>series of stable buffer<br/>cells.</li> <li>Delivers different<br/>delay at all output taps.</li> </ul> | <ul> <li>No issue in the output signal strength.</li> <li>Small circuit.</li> <li>Voltage-level of 'S' and " helping to generate variable delay.</li> </ul>       | <ul> <li>Simple CMOS based design.</li> <li>Symmetric architecture.</li> <li>DC/DP technique to generate variable delay.</li> </ul>          | - Good adjustment of<br>I <sub>ch</sub> and I <sub>dis</sub> (I <sub>ch</sub> ≈I <sub>dis</sub> ).<br>- Implementation of<br>DC/DP technique. |
| Cons              | <ul> <li>Design is crowded<br/>with lot of redundant<br/>elements.</li> <li>The delay value can-<br/>not be tuned.</li> </ul>         | <ul> <li>Adjusting the transistor sizes is difficult due to impact of device body effect.</li> <li>Dependency on the proper generation of 'S' and 'S'.</li> </ul> | Difficult adjusting<br>of transistor sizes to<br>maintaining sym-<br>metricity, mostly the<br>problem is caused by<br>$I_{ch} \neq I_{dis.}$ | Presence of current<br>mirror and bias cir-<br>cuits.                                                                                         |

| Table 1: Design anal | ysis of kinds of circuital attem | ots in variable delay. |
|----------------------|----------------------------------|------------------------|
|                      |                                  |                        |

In this case, 'I' denotes the charging current and the discharging current (viz.,  $I_{ch}$  and  $I_{dis'}$ , respectively) based on input steady-state condition. When these delay elements (i.e., specifically CI-DE) are being used in on-chip sections like CDN, it is really important that the output rise/fall time (viz.,  $t_{rise}/t_{fall}$  or also indicated as rise/fall delay) of the delay element is almost equal. The near-symmetric rise/fall time is required or else there are many negative consequences that appear inside the chip signaling such as the inequality in the clock pulse-width which results in variation of the ON-OFF time as shown in Fig. 4.



**Figure 4:** Output signal depicted across the operation of CI-DE.

If  $I_{ch} = I_{dis'}$  t<sub>rise</sub> is equal to t<sub>fall</sub> which results in equal ON-OFF time for a clock signal. In a CI-DE it is not possible to guarantee  $I_{ch} = I_{dis}$  since it is a CMOS inverter based design. This kind of design has a pull-up section made of pMOS transistors that charge-up the output load and a pull-down section that discharges it through nMOS transistors. The device dimension of nMOS and pMOS must differ for CI-DEs to match the charge-carrier mobility because nMOS transistors have a higher mobility than pMOS transistors. To compensate this difference, pMOS transistors must have greater channel width.

Due to this CI-DE a not a symmetric architecture which can deliver nearly balanced output timing components (viz. rise time and fall time). Even if the input signal has  $t_{rise} = t_{fall'}$  the CI-DE output fails to replicate this and the effect is further increased by a long buffer chain. Since our concern is the delay elements of CDN, it can be inferred that the output of CI-DE (if used inside CDN) will have a tendency to incorrectly drive the on-chip sequential circuits, especially the ones that are leveltrigger sensitive.

So, it is quite important that  $I_{ch}$  and  $I_{dis}$  are matched. For that, some extra transistors are added to CI-DE (viz., P3 and N3 as shown in Fig. 5(a)), the design which is commonly referred as CS-DE. The use of P3 and N3 is to provide a source of current flowing from  $V_{dd}$  such that the values of  $I_{ch}$  and  $I_{dis}$  can be matched. However in the design of CS-DE, there are also P4 and N4 that have current limiting features and obstruct the supply voltage-level to the inverter (constituted by P2 and N2). This even has the possibility to induce power supply noise into the CS-DE output impacting the output signal integrity. Often, the design structure of CS-DE is improvised as shown in Fig. 5(b) so that this problem can be avoided. Initially, most of its nodes in CS-DE (viz.,



**Figure 5:** Structure of (a) conventional CS-DE (b) redesigned CS-DE [17].

M and N) are stuck-at logic '0' which allows P1 to be ON and therefore the output node 'out' is high. This enables P3 and N3 to be OFF at an early stage. Though the logic state of 'N4' is dependent on the input 'in', it does not impact the real-time signal transmission of 'in' to 'out'. This stability in the transmission is due to a CMOS inverter in addition to an nMOS 'N1' at the output. Interestingly, this version of CS-DE provides matching rise/ fall delay by tweaking the charging as well as discharging capacitances (viz., C<sub>o</sub>1 and C<sub>o</sub>2 respectively) across the output. Despite this the problem still prevails i.e.,  $I_{ch} \neq I_{dis}$  (since the paths of C<sub>o</sub>1 and C<sub>o</sub>2 are different) and as a whole that affects the magnitude of  $t_{rise}$  and  $t_{fall}$ .

Above all, prevalent DC/DP techniques [15-17] which are utilized for obtaining the different values of delay possibly enhance the difference in expected equality that  $I_{ch}$  also  $I_{dis}$  should have. In fact, the problem is there in almost all the kinds of delay circuits as reviewed and displayed in Table 1. Very few circuit designers has looked into this aspect and tried to balance rise delay of the output with its fall delay. Hence, it is our motivation to design a new delay element delivering almost equal values of I<sub>ch</sub> and I<sub>dis</sub> such that it is able to generate near symmetric output t<sub>rise</sub> and t<sub>fall</sub>. Besides, we have also concentrated on using a DC/DP based technique which will help to generate variable delay using the proposed delay element. This technique can be thought of as simplistic all-digital approach to produce variable delay at the output having near symmetric t<sub>rise</sub>/ t<sub>fall</sub>.

#### 1.3 Organization of this article

This article is structured as follows: In section 2, we provide justification for our proposed circuit design. In section 3, we introduce the new design of delay element and demonstrate a simple mathematical model. We also introduce our alternative approach to DC/DP technique in the same section. The performance analysis of the whole circuit setup is described in section 4. In the last section 5, we conclude our work by stating once again the relevancy of our proposed delay circuit design in modern processor systems.

# 2 Major Highlights

An efficient design of a delay circuit is only possible if the outputs exhibit almost equal  $t_{rise}/t_{fall}$ . So in this article, we have focused on a delay cell structure such that it is efficient in projecting varied input-to-output physical time delay based on the tweaking of proposed alternative of DC/DP technique and also the output signal is able to feature  $t_{rise} \approx t_{fall}$ .

The contents of this article are as follows:

- Need of variable delay cells in modern processors.
- Development of new delay cell complying with nearly balanced output timing components (viz. rise time and fall time).
- Constructing an alternate of DC-DE or DP-DE methodology to control variant values of delay across the proposed delay cell.
- Detailed performance analyses of schematic and layout based proposed vector-controlled variable delay cell using 90nm process design kit (PDK) [21].

### 3 The New Design of CI-DE

It has been mentioned earlier that the current designs of CI-DE is not capable of delivering equal t<sub>rise</sub> and t<sub>fall</sub> at its output. The major issue is non-symmetric design of pull-up-network (PUN) and pull-down-network (PDN) in CMOS based inverters. Despite this, circuit designers have been relying on CI-DE design structure and in most cases improvised by adding intermediate shunt capacitors. By doing so, the symmetricity within PUN and PDN is adjusted [22]. But, fabrication of these shunt capacitors in any deep sub-micron technology is difficult. However, an effective solution would be to embed MOS based resistors and capacitors in the CI-DE design instead of using shunt capacitors. In fact, this approach was first published in [20]. It is shown in Fig. 6(a) where the resistance (R<sub>1</sub>) and the capacitance (C) are placed adjacent to the inverter output as well as another resistance (R<sub>2</sub>) is placed underneath the pulldown section. Nevertheless, these R<sub>1</sub>, R<sub>2</sub> and C were not MOS-based cells and using them was not efficient in terms of layout design.



**Figure 6:** (a) Inverter design from [23] (b) improvised version based on the circuit from figure 6(a) which is the basis for the new CI-DE.

The inverter design in Fig. 6(a) delivered a good amount of propagation delay, provided  $R_2=0\Omega$  (or there was issues in determining output logic level '0') and  $R_1>>R_2$ . Though the value of  $R_1$  could be managed, adjusting the value  $R_2$  to  $0\Omega$  was technically quite difficult using MOS devices as intrinsic parameters always affect the device ON resistance to some extent. To solve this problem, we modified the circuit in Fig. 6(a) by discarding  $R_2$  and implementing  $R_1$  and C as MOS based resistance and capacitance respectively. Since nMOS is faster logic compared to pMOS [24, 25], we have preferred the nMOS based representation of resistance and capacitance.

#### 3.1 Mathematical model of delay estimation

Based on the circuit of Fig. 6(a), a different kind of CI-DE is obtained as shown in Fig. 6(b). It can be seen from the fundamentals depicted in equation 1(a) that there can be variation in the value of  $R_{ON}$  depending on the

change in V<sub>GS</sub> and V<sub>th</sub>. In this design, R<sub>ON</sub> of T3 and T7 can be varied based on the value of their common V<sub>GS</sub> (denoted by 'X' in Fig. 6 (b)). Now considering the first modified inverter in Fig. 6(b), let us assess the magnitude of output voltage at node 'C' and the amount of propagation delay incurred. While the node 'C' switches from high to low, the nMOS 'T2' is in saturation. Therefore, the current flowing across 'T2' is given by:

$$I_{2} = \frac{1}{2} k'_{n} \left( \frac{W}{L} \right)_{n} (V_{in} - V_{Tn})^{2}$$
(4)

In equation (4),  $k_n = \mu_n C_{ox}$  where  $\mu_n$  is the coefficient of electron carrier mobility and  $C_{ox}$  is the oxide-capacitance per unit area, W/L is the aspect ratio of 'T2',  $V_{Tn}$  is threshold voltage of nMOS. The value of  $I_2$  may be put in equation (3) and we have:

$$-C_{T4} \frac{dVC(t)}{dt} = \frac{1}{2} k'_n \left(\frac{W}{L}\right)_n \left(V_{in}(t) - V_{Tn}\right)^2$$

where  $C_{T4}$  is the capacitance of the MOS capacitor 'T4',  $V_c$  is potential at node 'C'.

$$\frac{dV_{C}(t)}{dt} = \frac{k_{n}}{2C_{T4}} \left(\frac{W}{L}\right)_{n} \left(V_{Tn}^{2} \times \left\{\frac{2 \times V_{in}(t)}{VTn} - 1\right\}\right)$$
(5)

For equation (5), we have not considered to include the squared terms while solving  $(V_{in}(t)-V_{Tn})^2$ . Such kind of condition can be taken in account when  $V_{in} << V_{Tn}$  and 'T2' switches to cut-off. Now, we know the obvious case is:

$$\frac{2 \times V_{in}(t)}{V_{Tn}} >> 1$$

So, equation (5) can be rewritten as:

$$\frac{dV_{C}(t)}{dt} = \frac{\dot{k_{n}}V_{Tn}}{C_{T4}} \left(\frac{W}{L}\right)_{n} \times V_{in}(t)$$
(6)

Assuming that 'T4' is initially charged with voltage 'V<sub>0</sub>', it will gradually discharge through the MOS resistance 'T3' (which has variable ON resistance 'R<sub>var</sub>' based on the gate voltage 'X') and fixed-finite resistance offered by 'T2' (denoted as  $R_{sat}$ ). Therefore, equation (6) can be rewritten as:

$$\frac{d}{dt}(V_0 \times e^{\frac{-t}{(Rvar + Rsat) \times C T^4}}) = \frac{\dot{k_n} V_{Tn}}{C_{T4}} \left(\frac{W}{L}\right)_n \times V_{in}(t)$$
(7)

Consider the Laplace transformation on both sides of equation (7) and analyse for zero initial condition. It is as follows:

$$V_{0}\left[\frac{s}{s+\frac{1}{(Rvar+Rsat)\times CT4}}\right] = \frac{k_{n}'V_{Tn}}{C_{T4}}\left(\frac{W}{L}\right)_{n} \times V_{in}(s)$$

Using s=j $\omega$  and obtaining the modulus of V<sub>0'</sub> the relation can be rewritten as:

$$\left|V_{0}\right| = \sqrt{\left(1 + \frac{1}{\omega^{2} \times (R_{var} + R_{sat})^{2} \times C_{T4}^{2}}\right)} \times \frac{\dot{k_{n}}V_{Tn}}{C_{T4}} \left(\frac{W}{L}\right)_{n} \times V_{in}(\omega) \quad (8)$$

Equation (8) models the voltage at output node 'C'. The crucial observation is that the output voltage is a function of variable resistance incurred by nMOS 'T3' and the input signal frequency. However while reconsidering equation (6) for particular point in time; it can be interpreted as:

$$V_{\rm C} = \frac{{\rm k}_{\rm n}^{\rm v} V_{\rm Tn}}{C_{\rm T4}} \left(\frac{W}{L}\right)_{\rm n} \times V_{\rm in} \times \int_{0}^{\tau} {\rm dt}$$
<sup>(9)</sup>

where  $\tau$  is propagation delay coefficient. Finally, equation (9) is simplified as shown in equation (10):

$$\tau = \frac{V_{C}}{\frac{\dot{k_{n}}VTn}{CT4} \left(\frac{W}{L}\right)_{n} \times Vin}$$
(10)

We consider the design in Fig. 6(b) to be symmetric i.e., the structural components across the input node 'in' to node 'C' and that of node 'C' to node 'E' are identical. All the device dimensions of the design and their intrinsic parameters are set in accordance to the details given in 90nm PDK. In fact, the device dimensions are adjusted to assure that  $I_1=I_2$  and  $I_4=I_5$ . Since hypothetically, our improvised CI-DE is a symmetric design, the amount of current flow across node 'C'  $\rightarrow$  ( $I_3$ ) and across node 'E'  $\rightarrow$ ( $I_6$ ) can be correlated in magnitude. The signal passing through node 'C' is inversed when it reaches the node 'E' and its  $t_{rise}=t_{fall}$ . A CMOS buffer (with  $t_{rise}=t_{fall}$ ) is attached at the end to enhance the range of delay. Therefore the proposed CI-DE has the capability of delivering an output signal with balanced rise and fall time.

#### 3.2 Proposed System Architecture

The construction of the proposed of CI-DE is incomplete without setting up an alternative of DC/DP techniques that can generate values for the gate voltage 'X'. We propose a new circuit for setting the delay generated by our CI-DE as shown in Fig. 7. The resources used for constructing it are taken from 90nm PDK libraries. The proposed circuit comprises three circuit blocks:

- Potential Generator (PG).
- 8:1 Multiplexer (MUX).
- Proposed CI-DE module.

It is the PG unit which generates different voltages based on the supply voltage ' $V_{dd}$ '. These voltages are transferred to node 'X' through an 8:1 MUX controlled by select lines (S<sub>1</sub>, S<sub>2</sub> and S<sub>3</sub>).



Figure 7: Proposed architecture of the new CI-DE.

A significant part of the circuit in Fig. 7 is the PG unit. In the proposed circuit 8 voltage levels are generated: 780mV, 820mV, 840mV, 860mV, 900mV, 920mV, 970mV and 1V. The selection of these voltage levels is decided according to the parameters stated in Table 2 in a way that the proposed delay cell can generate meaningful range of delay values.

Table 2: Simulation setup used in this work.

| <b>a</b>                    | Temp. | V <sub>dd</sub><br>(volt) | Ir             | nput sign      | al              |
|-----------------------------|-------|---------------------------|----------------|----------------|-----------------|
| Process Tech. (nm)          | (°C)  | (volt)                    | Rise time (ps) | Fall time (ps) | Frequency (GHz) |
| Typical<br>90nm<br>PDK [21] | 27    | 1.1                       | 100            | 100            | 1               |

The PG unit generates the voltage levels based on the Potential-Divider principle. The resistors are made of polysilicon ('resnpoly'). The sheet resistance of these resistors is intrinsically high and quite often used in MOS-based circuit designs [26]. The 'resnpoly' on the V<sub>dd</sub> side of the PG unit is fixed to 22 $\Omega$  and the value of the resistor near the ground line of PG is varied as mentioned in Table 3. The physical designs of PG, MUX and CI-DE are based on the definitions given in the 90nm PDK [21]. The layout of the proposed circuit is given in Fig. 8 and the estimated area is 1139.645µm<sup>2</sup> (where, area of the individual portions are as follow: PG=394.856µm<sup>2</sup>, 8:1

MUX=702.159 $\mu$ m<sup>2</sup>, and Delayed Clock section or proposed CI-DE module=42.63 $\mu$ m<sup>2</sup>).

Table 3: Resistance values used in the PG Unit.



**Figure 8:** Layout of the proposed delay cell architecture using 90nm PDK.

For pre & post-layout circuit simulation, commercial electronic design automation (EDA) tools like Cadence Virtuoso<sup>®</sup> and Mentor Graphics Calibre<sup>®</sup> were used. The results of the transient analysis are shown in Fig. 9.

The circuit exhibits greater delay in post-layout simulations. This can be considered as an added advantage based on the process technology used (i.e., 90nm PDK). However, the main concern is whether the circuit can generate equal  $t_{rise}/t_{fall}$  at its output.



**Figure 9:** Output signal obtained from pre- and postlayout simulation.

For this reason, we have plotted the rise time and fall time of the proposed delay cell output with respect to the change in input vector combinations and displayed it in Fig. 10(a) and 10(b). The difference between rise time and fall time in pre-layout simulations (denoted by ' $\Delta_1$ ') is much smaller when compared to the difference obtained in post-layout simulations (denoted by ' $\Delta_2$ '). The value of  $\Delta_1$  is approximately 0 when "100" is set as the input vector whereas approximation of  $\Delta_2$  is 0 for input vector "111". This is mainly because the extracted parasitic values (obtained from Calibre® PEX Runtime [27]) of the pre-layout version of the design are different from the post-layout version. However, that is not a matter of concern since there are always sophisticated layout techniques [28-31] which offer ways to avoid such design-level mismatch.

# 4 Estimation of circuit performance of the proposed delay cell

In this section, the performance analysis of our proposed circuit is presented based on parameters like rise



**Figure 10:** Rise delay & fall delay obtained from (a) prelayout and (b) post-layout simulations.

and fall delay, with their difference in value (denoted as ' $\Delta$ '), the average delay (t<sub>avg</sub>) and power-delay-product (PDP). The input vector for simulation is considered as "111".

# 4.1 Circuit performance based on process variation and corner analysis

It is important to test the delay cell performance for various temperature (T) and  $V_{dd}$  values. These results are plotted in Fig. 11(a) and 11(b).

The difference between rise delay and fall delay is negligible and the average delay is low at low temperatures. The average delay increases as the temperature is increased. The characteristic of balanced output rise/fall time is upheld across variations of V<sub>dd</sub> within ±9.08%. The proposed delay cell can deliver balanced rise and fall delay at the output as well as appropriate average delay while operating at room temperature (300°K) and 1.1V V<sub>dd</sub>.



Figure 11:  $t_{_{rise'}} \; t_{_{fall'}}$  and  $t_{_{avg.}}$  as function of (a) T(°C) and (b)  $V_{_{dd}}\!.$ 

**Table 4:** Proposed delay cell performance for 3 distinct process corners.

|                    | Performance Parameters |                    |           |                       |             |  |  |  |  |  |  |  |
|--------------------|------------------------|--------------------|-----------|-----------------------|-------------|--|--|--|--|--|--|--|
| Process<br>Corners | Rise<br>Delay<br>(ps)  | Fall Delay<br>(ps) | Δ<br>(ps) | Avg.<br>Delay<br>(ps) | PDP<br>(fJ) |  |  |  |  |  |  |  |
| FF                 | 157.808                | 154.815            | 2.993     | 156.31                | 3.965       |  |  |  |  |  |  |  |
| TT                 | 201.201                | 192.843            | 8.358     | 197.02                | 4.036       |  |  |  |  |  |  |  |
| SS                 | 284.833                | 262.114            | 22.719    | 273.47                | 4.815       |  |  |  |  |  |  |  |

The post-layout performance of the presented delay circuit is simulated for 3 different process corners (viz., Fast-Fast $\rightarrow$  'FF', Typical-Typical $\rightarrow$  'TT' and Slow-Slow $\rightarrow$  'SS'). The results are displayed in Table 4. The observation from here is noted as follows: (a) the  $\Delta$  in FF is 64.18% lesser than TT; whereas in SS,  $\Delta$  is found 63.21% more; (b) the average delay value measured in TT is 26.04% higher than the value in FF which is even higher in SS corner; (c) as per as the power dissipation of our circuit is concerned, the reading of PDP in TT corner is seen to be optimal.

#### 4.2 Analysis of the proposed delay cell through Monte-Carlo simulation

In this section, the results are reported on carrying out the Monte-Carlo simulation of the proposed delay cell under nominal operating parameters of TT process corner. All the results are obtained from Cadence ADEXL<sup>®</sup>.



**Figure 12:** Plots depicting the results of Monte-Carlo simulation for the parameters i.e., (a) Rise Delay & (b) Fall Delay.

The histogram plot of the output rise/fall delay that we see in Fig. 12(a) and 12(b) are based on the data collected while all design parameters are varied randomly for 500 different instances. Considering 3 $\sigma$  process, the rise delay is found to range between 172ps to 232ps with a variability of 4.96% only; whereas the fall delay records a variability of 8.01% against the statistical variations. But, the mean of both the metrics are almost similar to what we have noted for TT corner simulation, which proves the reliability of the design. The  $\Delta$  delay is found to be as small as  $\pm$ 6.2% only, thereby justifying the worth of proposed delay cell configuration.

# 5 Conclusion

The design of the proposed delay cell is accomplished by reconstructing the primitive CI-DE architecture, adding components viz., resistances as well as capacitances at the appropriate places so that equal rise time and fall time can be obtained at the output. The delay at the circuit's output can be adjusted by setting the gate-voltage of the MOS based resistors. Our proposed delay circuit is tested in 90nm PDK with an input signal of frequency 1GHz and V<sub>dd</sub>=1.1V. It is noted that the difference in rise/fall time is only 4.24% of the average delay incurred by the proposed circuit and this value range from 260ps to 360ps. These values can be further increased by incorporating long buffer chains. We conclude that the proposed vector-controlled variable delay cell is fit for its purpose.

# 6 Acknowledgments

The author(s) would like to acknowledge the insights of Dr. A.J. Mondal during this work.

# 7 Conflict of Interest

The authors declare no conflict of interest in preparing this article.

### 8 References

- Dong, T., Dobrev, V., Kolev, T., Rieben, R., Tomov, S., & Dongarra, J. (2014, May). A step towards energy efficient computing: Redesigning a hydrodynamic application on CPU-GPU. In 2014 IEEE 28th International Parallel and Distributed Processing Symposium (pp. 972-981). IEEE. <u>https://doi.org/10.1109/IPDPS.2014.103</u>
- Chen, W., & Dömer, R. (2015). Out-of-order Parallel Discrete Event Simulation for Electronic Systemlevel Design. Springer International Publishing. <u>https://doi.org/10.1007/978-3-319-08753-5</u>
- Anju, C., & Pande, K. S. (2012). Low Power GALS Interface Implementation with Stretchable Clocking Scheme. *International Journal of Computer Science Issues (IJCSI)*, 9(4), 209. https://www.ijcsi.org/ papers/IJCSI-9-4-3-209-213.pdf
- Dhirubhai, L. M., & Pande, K. S. (2019, July). Critical Path Delay Improvement in Logic Circuit Operated at Subthreshold Region. In 2019 International Conference on Communication and Electronics Systems (ICCES) (pp. 633-637). IEEE. https://doi.org/10.1109/ICCES45898.2019.9002233
- Abas, M. A., Russell, G., & Kinniment, D. J. (2007). Embedded high-resolution delay measurement system using time amplification. IET Computers & Digital Techniques, 1(2), 77-86. <u>https://doi.org/10.1049/iet-cdt:20060099</u>
- Abas, M. A., Russell, G., & Kinniment, D. J. (2007). Built-in time measurement circuits-a comparative design study. IET Computers & Digital Techniques, 1(2), 87-97. https://doi.org/10.1049/iet-cdt:20060111

- Banerjee, A., & Das, D. K. (2016). A New Squarer design with reduced area and delay. IET Computers & Digital Techniques, 10(5), 205-214. <u>https://doi.org/10.1049/iet-cdt.2015.0170</u>
- Mahapatra, N. R., Tareen, A., & Garimella, S. V. (2002). Comparison and analysis of delay elements. In Circuits and Systems, 2002. MWS-CAS-2002. The 2002 45th Midwest Symposium on (Vol. 2, pp. II-II). IEEE. https://doi.org/10.1109/MWSCAS.2002.1186901
- 9. Zhang, X., & Sridhar, R. (1994, September). CMOS wave pipelining using transmission-gate logic. In Proceedings Seventh Annual IEEE International ASIC Conference and Exhibit (pp. 92-95). IEEE. https://doi.org/10.1109/ASIC.1994.404602
- Mahapatra, N. R., Garimella, S. V., & Tareen, A. L. W. I. N. (2000, April). An empirical and analytical comparison of delay elements and a new delay element design. In Proceedings IEEE Computer Society Workshop on VLSI 2000. System Design for a System-on-Chip Era (pp. 81-86). IEEE. https://doi.org/10.1109/IWV.2000.844534
- 11. Jovanović, G. S., & Stojčev, M. K. (2006). Current starved delay element with symmetric load. International journal of electronics, 93(03), 167-175. https://doi.org/10.1080/00207210600560078
- 12. Moyer, G. C., Clements, M., & Liu, W. (1996). Precise delay generation using the Vernier technique. Electronics letters, 32(18), 1658-1659. https://doi.org/10.1049/el:19961149
- Li, G. H., & Chou, H. P. (2007, November). A high resolution time-to-digital converter using twolevel vernier delay line technique. In 2007 IEEE Nuclear Science Symposium Conference Record (Vol. 1, pp. 276-280). IEEE. https://doi.org/10.1109/NSSMIC.2007.4436330
- Johnson, M. G., & Hudson, E. L. (1988). A variable delay line PLL for CPU-coprocessor synchronization. IEEE Journal of Solid-State Circuits, 23(5), 1218-1223.

https://doi.org/10.1109/NSSMIC.2007.4436330

- 15. Maymandi-Nejad, M., & Sachdev, M. (2003). A digitally programmable delay element: design and analysis. IEEE transactions on very large scale integration (VLSI) systems, 11(5), 871-878. https://doi.org/10.1109/TVLSI.2003.810787
- Maymandi-Nejad, M., & Sachdev, M. (2005). A monotonic digitally controlled delay element. IEEE Journal of Solid-State Circuits, 40(11), 2212-2219. <u>https://doi.org/10.1109/JSSC.2005.857370</u>
- Kobenge, S. B., & Yang, H. (2009). A power efficient digitally programmable delay element for low power VLSI applications. In Quality Electronic Design, 2009. ASQED 2009. 1st Asia Symposium on (pp. 83-87). IEEE. https://doi.org/10.1109/ASQED.2009.5206292

- Sadhu, A., Bhattacharjee, P., & Koley, S. (2014). Performance Estimation of VLSI Design. Journal of VLSI Design Tools & Technology, 4(2), 59-66. <u>https://doi.org/10.37591/jovdtt.v4i2.3167</u>
- 19. Rajeswari, P., Shekar, G., Devi, S., & Purushothaman, A. (2018). Geometric Programming-Based Power Optimization and Design Automation for a Digitally Controlled Pulse Width Modulator. *Circuits, Systems, and Signal Processing, 37*(9), 4049-4064.

#### https://doi.org/10.1007/s00034-017-0734-z

20. Nose, K., Chae, S. I., & Sakurai, T. (2000). Voltage dependent gate capacitance and its impact in estimating power and delay of CMOS digital circuits with low supply voltage (poster session). In Proceedings of the 2000 international symposium on Low power electronics and design (pp. 228-230). ACM.

#### https://doi.org/10.1145/344166.344601

- 21. 90nm CMOS based Process Design Kit https:// www.themosisservice.com/products/fab-processes
- Andreani, P., Bigongiari, F., Roncella, R., Saletti, R., & Terreni, P. (1999). A digitally controlled shunt capacitor CMOS delay line. Analog Integrated Circuits and Signal Processing, 18(1), 89-96. <u>https://doi.org/10.1023/A:1008359721539</u>
- 23. Mondal J, A., A. Majumder, B. K. Bhattacharyya & P. Chakraborty. (2017). A Process Aware Delay Circuit with Reduce Impact of Input Switching at GHz Frequencies. IEEE VLSI Circuits and Systems Letters 3(2), 6-12. https://ieeecs-media.computer. org/media/technical-ac-tivties/tcvlsi/newsletters/2017/VLSI\_Circuits\_and\_Systems\_Vol-3\_Issue-2\_June2017.pdf
- 24. Kang, S. M., & Leblebici, Y. (2003). CMOS digital integrated circuits. Tata McGraw-Hill Education. https://www.amazon.in/dp/0071243429/ ref=cm\_sw\_em\_r\_mt\_dp\_wiuSFb5M7YZ5M
- 25. Xiang, Q. (2003). U.S. Patent No. 6,600,170. Washington, DC: U.S. Patent & Trademark Office. https://patentimages.storage.googleapis.com/71/ e0/22/fea13947c00b4a/US6600170.pdf
- Roy, A., Ender, F., Azadmehr, M., Ta, B. Q., & Aasmundtveit, K. E. (2017, July). Design considerations of CMOS micro-heaters to directly synthesize carbon nanotubes for gas sensing applications. In 2017 IEEE 17th International Conference on Nanotechnology (IEEE-NANO) (pp. 828-833). IEEE. https://doi.org/10.1109/TNANO.2019.2961415
- 27. Quantus RC Extraction https://www.cadence. com/content/cadence-www/global/en\_US/ home/tools/digital-design-and-signoff/siliconsignoff/quantus-extraction-solution.html

- 28. Saint, C., & Saint, J. (2002). IC mask design: Essential layout techniques. New York: McGraw-Hill. https://dl.acm.org/doi/abs/10.5555/1593630
- Martin-Gonthier, P., Havard, E., & Magnan, P. (2010). Custom transistor layout design techniques for random telegraph signal noise reduction in CMOS image sensors. Electronics Letters, 46(19), 1323-1324. <u>https://doi.org/10.1049/el.2010.1767</u>
- Megalingam, R. K., & Lal, L. S. (2014, April). Piezoresistive MEMS pressure sensors using Si, Ge, and SiC diaphragms: A VLSI layout optimization. In 2014 International Conference on Communication and Signal Processing (pp. 597-601). IEEE. https://doi.org/10.1109/ICCSP.2014.6949911
- 31. Geiger, R. L., Allen, P. E., & Strader, N. R. (1990). VLSI design techniques for analog and digital circuits (Vol. 90). New York: McGraw-Hill. https://cds.cern. ch/record/1544515



Copyright © 2021 by the Authors. This is an open access article distributed under the Creative Com-

mons Attribution (CC BY) License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Arrived: 17. 01. 2021 Accepted: 15 .04. 2021 https://doi.org/10.33180/InfMIDEM2021.203



Journal of Microelectronics, Electronic Components and Materials Vol. 51, No. 2(2021), 113 – 117

# The Implementation of Logic Gates Using Only Memristor Based Neuristor

Kamil Orman<sup>1</sup>, Yunus Babacan<sup>2</sup>

<sup>1</sup>Erzincan Binali YildirimUniversity, Dept. of Computer Engineering, Erzincan, Turkey <sup>2</sup>Erzincan Binali YildirimUniversity, Dept. of Electrical and Electronics Engineering, Erzincan, Turkey

**Abstract:** One can learn about memristor-based neuron circuits in literature if one wishes to implement more effective circuits, as they are linear, have a high density, and consume little energy. This paper presents two logic gates based on memristor-based neurons. The neuron circuit can float, and therefore can be used as a circuit element. Electronic neurons, or neuristors, generate spikes when DC current is applied to them; likewise, the proposed logic gates generate spikes when appropriate inputs are applied to them. We simulated the proposed gates with SPICE using TSMC 0.18 µm CMOS process models.

Keywords: AND gate; OR gate; Neuron; Memristor; Neuristor

# Uporaba logičnih vrat le z uporabo nevristorja na osnovi memristorja

**Izvleček:** Nevronks vezja na osnovi memristorja so linearna, velike gostote in porabijo malo energije. Članek predstavlja doje vrat na odnovi nevronov, ki temljijo na memristorjih. Nevronsko vezje lahko plava in se zato lahko uporablja kot element vezja. Elektronski nevroni ali nevristorji ustvarijo pri enosmernem toku konice; prav tako predlagani logični vhodi, pri ustreznem vhodnem signal, generirajo konice. Predlagana vrata smo simulirali s SPICE okolju z uporabo TSMC 0,18 µm CMOS tehnologije.

Ključne besede: IN vrata; ALI vrata; Nevron; Memristor; Nevistor

\* Corresponding Author's e-mail: korman@erzincan.edu.tr

#### **1** Introduction

In, 1952, Hodgkin and Huxley proposed an electrical circuit model of an axon membrane [1] with passive circuit elements. The Hodgkin-Huxley (HH) model explains how the membrane potential gets conducted from one cell to another cell. The HH circuit is composed of three channels: a sodium channel, a potassium channel, and a leakage channel. The sodium and potassium channels are modelled with a capacitor and parallel nonlinear resistors. To understand how the brain works, researchers present realistic neuron models and circuits [2]-[5].

Leon Chua defined a new circuit element, dubbed a memristor, and demonstrated the connection between charge and flux [6],[7]. But memristors did not attract

anyone's interest until the HP research group had managed to implement them as solid state devices [8]. The new element has nonlinear characteristics, memory, and an ultra-dense structure. Many types of of memristor emulator circuit ([9]; [10]; [11], [12] [13]; [14]) have been proposed given that it was not commercially available. More and more researchers' now are interested in modelling neuron and neural networks using memristors. Pickett and co-workers implemented a mottmemristor and memristor-based neuron circuit in Nature [15]. Shin et al. presented a memristor-based neuron circuit. Here, nonlinear opening and closing of sodium and potassium ion channels are modelled with a memristor [16]. Ren et al. proposed a model for connected neurons. [17]. Zhang and Liao created the memristor-based circuit of the FitzHugh-Nagumo model and investigated the dynamic behaviour of neuronal circuit networks using memristors as a synapses [18]. Feali et al. modified the Pickett's circuit using both memristor and memcapacitor SPICE models [19].

This study presents two logic gates using a memristorbased floating neuristor circuit. The memristor in question is based on OTA (Operational Transconductance Amplifier), and has a fully floating structure. The neuristor also has a floating structure and generates voltage spikes when DC input current is applied to it. The neuristor-based logic gates behave as an AND and OR gate depending on the input signal amplitude. All of the simulation results are compatible with previous studies

#### 2 Floating memristor circuits

We have used two memristor emulator circuits using two different operational transconductance (OTA) elements (Fig.1). The symmetric OTA used in this study was designed for the TSMC 0.18u process (Fig.1c.) The capacitor provides the memristor's memory behaviour; transistors behave as nonlinear resistors when operating in the subthreshold region. The transistors we've used are of a p-type. Their bulk terminals should be connected to the highest voltage in the circuit. We connected them bulk terminals to the drain terminals to provide more nonlinear ity. Therefore, we were able to obtain more nonlinear memristive behaviour. The implementation of this memristor has been presented previously in [20]. First memristor (Fig.1a) is nonvolatile,



**Figure 1:** The circuit schematics of memristor emulators a) Fully floating memristor emulator circuit b) Modified fully floating memristor emulator circuit c) OTA circuit.

whilst the second memristor (Fig.1b) is volatile because of the  $T_D$  transistor. The memristor emulators fully float thanks to their symmetric structures. The OTA provides current to the capacitor, which is connected to the gate terminals of transistors. The charging and discharging mechanism provides both a memory effect and results in the memristor's resistance.

### 3 Memristor based neuristor circuit

Memristor-based electronic neurons were reported by Pickett and co-workers using mott memristors [15]. Feali put forth the memristor based neuristor SPICE model [19] after they were able to produce a electronic neuron (neuristor). The circuit we used in this study (Fig. 2) is composed of three memristors [20]. Two of them have parallel capacitors that emulate channel-I and channel-II. Neurons are composed of many types of channels, namely sodium, potassium, and calcium. However, only sodium and potassium channels are represented in Hodgkin & Huxley model [1]. The other channels are not very important (in comparison to sodium and potassium), and they represent a leakage channel in Hodgkin & Huxley's circuit model [1]. Therefore, we thought of these two key channels as channel-1 and channel-2. The capacitors model the channel capacitance. Memristors employ model the channel conductance. The conductances of memristors change when a DC input signal is applied, thereby charging and discharging the capacitor, and ultimately leading to voltage changes. If we control the voltage change, we can produce a spike train. Channel-I and channel-II behave as nonlinear resistors thanks to  $M_A$  and  $M_C$  memristors. These two channels are separated by a M<sub>R</sub> nonlinear memristor. Channel-II is isolated from the output terminal of the circuit by Rout and  $\mathrm{C}_{_{\mathrm{out}'}}$  which are located in the output stage of circuit. Moreover, both circuit elements also provide nonlinearity (thanks to the charging/ discharging mechanisms of the capacitor) as well as an appropriate voltage drop to produce spike trains.



**Figure 2:** Memristor based floating neuristor circuit. Here, the  $M_A$  and  $M_c$  memristors have identical structure and the  $M_B$  memristor has modified structure [20].

| Transistors       | W(µm)      | L(µm) | Сара                | citors     |
|-------------------|------------|-------|---------------------|------------|
| T <sub>1-2A</sub> | 3          | 1     | C <sub>A</sub>      | 10 nF      |
| T <sub>1-2B</sub> | 400        | 1     | CB                  | 1 nF       |
| T <sub>1-2C</sub> | 1          | 1     | Cc                  | 10 μF      |
| T <sub>D</sub>    | 1.6        | 1     | C <sub>1</sub>      | 10 nF      |
| C <sub>out</sub>  | 10         | pF    | C <sub>2</sub>      | 1 pF       |
| Curi              | rents (µA) |       | Voltage S           | ources (V) |
| I <sub>A</sub>    |            | 1     | V <sub>DD-A,C</sub> | 0.9        |
| I <sub>B</sub>    | 1(         | 00    | V <sub>SS-A,C</sub> | -0.9       |
| Ic                | 0.         | .1    | V <sub>DD-B</sub>   | 2          |
| R <sub>out</sub>  | 10         | MΩ    | V <sub>SS-B</sub>   | -2         |

Table 1: The values of the circuit elements.

The output stage of the circuit consists of one resistor and one capacitor. Spikes formation directly depends on the values of circuit elements and all values listed in Table 1. All simulations have been carried out using TSMC 0.18µm CMOS. We want to show the operation of one memristor when DC current is applied. Neurons can produce various spike types - e.g. fast spikes, initial bursting, and chattering. The details of these spike types can be found in [5]. Our circuit produced a regular spike type, a widely-known spike commonly used in VLSI design. The applied DC current and resulting voltage spikes are shown in Fig.3. The neuristor produces a spike train a DC current is applied. There are no observable spikes when a zero input signal is applied. Here, the applied DC signal value is 250nA; the resulting spike train amplitude changes from -1.3 V to 0.5V



Figure 3: a) The response of neuristor circuit and b) applied input DC current signal.

## 4 Neuristor based logic gates

As shown in Fig.4, two neuristors are connected in a parallel fashion and then serial to another neuristor to obtain logic gates. These logic gates have two inputs and one output. The circuit behaves as both an AND and OR gate, depending on the amplitude of the applied signal. If the input signal amplitude reaches  $40\mu$ A, the circuit behaves as an AND gate. If the applied signal reaches  $150\mu$ A, then the circuit behaves as an OR gate. We applied  $40\mu$ A current signals to both terminals of the proposed circuit (Fig.5b-c). If the applied input signals reach  $40\mu$ A at the same time, the output of the circuit produces a spike train. However, if any of the inputs drop to zero, the circuit does not generate any output signal. In other words, the circuit behaves as an AND gate.

To obtain OR gate behaviour from the proposed circuit, we applied  $150\mu$ A current signals to both terminal of the proposed circuit (Fig.6b-c). If one of the applied input signals reaches  $150\mu$ A, then the circuit produces a spike train. However, if both of the inputs drops to zero, then the circuit does not produce any output signal at all. In other words, the circuit behaves as an OR gate.



Figure 4: The circuit schematic of the neuristor based logic gates and their circuit symbols.



**Figure 5:** a) The response of neuristor based AND gate b) applied input signal-I c) input signal-II.



**Figure 6:** a) The response of neuristor based OR gate b) applied input signal-I c) input signal-II.

### **5** Conclusions

In this paper, we presented logic gates based on floating neuristor circuits. The neuristors are composed of two different OTA-based memristors that can fully float. The used memristor circuit has a very low current consumption; therefore, the neuristor circuit consumes little power. We were able to obtain both logic behaviours from only one circuit by changing the input signal amplitude. All of the simulations were carried out using TSMC 0.18.

# 6 Conflict of interest

The authors declare that there is no conflict of interest for this paper.

Also, there are no funding supports for this manuscript.

### 7 References

1. Hodgkin, A. & Huxley, A., (1952). A quantitative description of membrane current and its application to conduction and excitation in nerve. J. Physiology: 500-544.

http://dx.doi.org/10.1113/jphysiol.1952.sp004764

 FitzHugh, R., (1966). Mathematical models for excitation and propagation in nerve. Biological Engineering. H.P. Schawn (Ed.), New York: McGraw –Hill,  Hindmarsh, J. L. & Rose, R. M., (1984). A model of neuronal bursting using three coupled first order differential equations. Proceedings of the Royal society of London. Series B. Biologi-cal science: 87-102.

https://doi.org/10.1098/rspb.1984.0024
Gerstner, W. & Kistler, W. M., (2002). Spiking neuron models: Single neurons, populations, plasticity. Cambridge university press, http://dx.doi.org/10.1017/CBO9780511815706

- Izhikevich, E. M., (2004). Which model to use for cortical spiking neurons ?. IEEE Transactions on neural networks: 1063-1070. <u>http://dx.doi.org/10.1109/TNN.2004.832719</u>
- 6. Chua, L.O., (1971). Memristor the missing circuit element. IEEE Trans. Circuit Theory; 18: 507-519. http://dx.doi.org/10.1109/TCT.1971.1083337
- Chua, L.O. & Kang, S.M., (1976). Memristive Devices And Sys-tems. Proceedings of the IEEE; 209-223.

#### http://dx.doi.org/10.1109/PROC.1976.10092

- Strukov, D. B., Snider, G. S., Stewart, D. R. & Williams, R. S., (2008). The missing memristor found. Nature; 453(7191), 80–83. <u>http://dx.doi.org/10.1038/nature06932</u>
- Kim H., Sah M.P., Yang C., Cho S. & Chua L.O., (2012). Memristor emulator for memristor circuit applications. IEEE Trans. Circuits Syst. I Regul. Papers; 59(10): 2422–2431. <u>https://doi.org/10.1109/TCSI.2012.2188957</u>
- Elwakil, A. S., Fouda, M. E. & Radwan, A. G., (2013). A Simple Model of Double-Loop Hysteresis Behavior in Memristive Ele-ments. IEEE Trans. Circuits Syst. II Express Briefs; 60(8): 487–491. https://doi.org/10.1109/TCSII.2013.2268376
- Sanchez-Lopez, C., Mendoza-Lopez, J., Carrasco-Aguilar, M. A. & Muniz-Montero, C., (2014). A floating analog memristor emulator circuit. IEEE Trans. Circuits Syst. II Express Briefs; vol. 61(5): 309–313. https://doi.org/10.1109/TCSII.2014.2312806
- 12. Sánchez-López, C., Carrasco-Aguilar, M.A. & Muñiz-Montero, C., (2015). A 16 Hz-160 kHz memristor emulator circuit. AEU - Int. J. Electron. Commun.; 69(9): 1208–1219.

https://doi.org/10.1016/j.aeue.2015.05.003

- 13. Babacan, Y., Kaçar, F. & Gürkan, K., (2016). A spiking and burst-ing neuron circuit based on memristor. Neurocomputing;203: 86–91. http://doi.org/10.1016/j.neucom.2016.03.060
- Babacan, Y. & Kacar, F., (2017). Floating memristor emulator with subthreshold region. Analog Integrated Circuits and Sig-nal Processing; 90(2): 471–475.

http://doi.org/10.1007/s10470-016-0888-9

15. Pickett, M. D., Medeiros-Ribeiro, G. & Williams, R. S., (2012). A scalable neuristor built with Mott memristors. Nature Materials; 12: 114–117. http://dx.doi.org/10.1038/nmat3510

 Shin, S., Sacchetto, D., Leblebici, Y., & Kang, S. M. S., (2012). Neuronal spike event generation by memristors. In 2012 13th international workshop on cellular nanoscale networks and their applications (pp. 1-4). IEEE.

http://dx.doi.org/10.1109/CNNA.2012.6331427

- Ren, G., Xu Y. & Wang C., (2017). Synchronization behavior of coupled neuron circuits composed of memristors. Nonlinear Dynamics, 88(2): 893-901. <u>https://doi.org/10.1007/s11071-016-3283-2</u>
- Zhang, J., & Liao, X., (2017). Synchronization and chaos in cou-pled memristor-based FitzHugh-Nagumo circuits with memris-tor synapse. AEU-International Journal of Electronics and Communications, 75: 82-90.

https://doi.org/10.1016/j.aeue.2017.03.003

19. Feali, M.S., Ahmad, A. & Hayati, M., (2018). Implementation of adaptive neuron based on memristor and memcapacitor emulators. Neurocomputing: 1-11.

http://dx.doi.org/10.1016/j.neucom.2018.05.006

20. Babacan Y., (2018). Fully Floating Memristor Based Neuron Circuit Implementation, International Academic Research Congress (INES 2018), 886-890. https://doi.org/10.1109/TNN.2004.832719



Copyright © 2021 by the Authors. This is an open access article distributed under the Creative Com-

mons Attribution (CC BY) License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Arrived: 15. 12. 2020 Accepted: 23. 04. 2021

https://doi.org/10.33180/InfMIDEM2021.204



# Low Power Area Optimum Configurable 160 to 2560 Subcarrier Orthogonal Frequency Division Multiplexing Modulator-Demodulator Architecture based on Systolic Array and Distributive Arithmetic Look-Up Table

Girish Nanjareddy<sup>1</sup>, Veena Mysuru Boregowda<sup>2</sup>, Cyril Prasanna Raj<sup>3</sup>

 <sup>1</sup>Visvesvaraya Technological University, BMS College of Engineering, Department of Electronics, Bengaluru, Karnataka, India.
 <sup>2</sup>BMS College of Engineering, Department of Electronics, Bengaluru, Karnataka, India.
 <sup>3</sup>CIT, Department of Electronics, Bengaluru, Karnataka, India.

**Abstract:** In this work, configurable and reusable Dual-Tree Complex Wavelet Transform (DTCWT) Orthogonal Frequency Division Multiplexing (OFDM) modulator-demodulator based on Optimum Systolic Array (OSA) and Modified Distributive Arithmetic (MDA) algorithm is designed for low power underwater MODEM applications. The DTCWT-Inverse Dual-Tree Complex Wavelet Transform (IDTCWT) filters are designed considering integer 10-tap Q shift filter coefficients that are quantized and rounded off to achieve symmetry among filter coefficients. Multi stage DTCWT structure is used to perform 2560 subcarrier modulation and demodulation using MDA and OSA modules. The OSA structure is designed with optimum placement of Processing Elements (PE) and the MDA structure is designed to compute two filter outputs per module with Look-Up Table (LUT) of depth 8. The 2560 modulation is carried out using folded pipelined structure that comprises of two fold and four fold configurable modules. The reusable pipelined folded OFDM modulator-demodulator is implemented on Virtex-5 FPGA and operates at a maximum frequency of 248 MHz occupying less than 15% of FPGA resources and consumes less than 1.33 W power.

Keywords: DTCWT; optimum systolic array; folded pipelined structure; FPGA design; underwater MODEM; low power architecture

Nastavljiv modulator-demodulator s frekvenčnim multipleksiranjem s 160 do 2560 ortogonalnimi podnosilci, zasnovan na arhitekturi sistoličnih polj s porazdeljeno vpogledno tabelo

Izvleček: V članku je predstavljen nastavljiv modulator-demodulator s frekvenčnim multipleksiranjem z ortogonalnimi podnosilci, zasnovan na arhitekturi sistoličnih polj s porazdeljeno vpogledno tabelo (DTCWT). Namenjen je uporabi v podvodnih sistemih nizkih moči MODEM. Zasnovani so z upoštevanjem kvantiziranih in zaokroženih Q-shift koeficientov filtra za doseganje simetrije med koeficienti. Večstopenjska DTCWT struktura modulira in demodulira 2560 podnosilcev z uporabo MDA is OSA modulov. OSA struktura je zasnovana z optimalno pozicijo procesnih elementov (PE). MDA struktura računa dva izhoda filtrov z uporabo vpogledne tabele v osmih nivojih. Modulacija uporablja dva- ali štiri-kratno zloženo strukturo v nastavljivih modulih. OFDM modulator-demodulator je uporabljen v Virtex-5 FPGA okolju in deluje pri največji frekvenci 248 MHz, pri čemer zasede manj kot 15% procesorske moči FPGA in porabi manj kot 1.33 W moči.

Ključne besede: DTCWT; optimalni sistolični niz; zložena cevovodna konstrukcija; zasnova FPGA; podvodni MODEM; arhitektura z majhno porabo

<sup>\*</sup> Corresponding Author's e-mail: giribms17@gmail.com

### 1 Introduction

Inverse DTCWT performs OFDM modulation at the transmitter and DTCWT is used to retrieve symbols at the receiver. DTCWT decomposition is like Discrete Wavelet Transform (DWT) but with decomposition structures of real and imaginary trees. Hardware implementation of DTCWT is twice complex than DWT implementation. Goalie et al. [1] use DWT based subcarrier modulation for OFDM. Shift variance in DWT energy levels introduce additional errors in OFDM subcarrier modulation and demodulation. Replacing IDWT-DWT with IDTCWT-DTCWT improves BER performances in OFDM. Po-Cheng Wu et al. [2] proposes DWT architectures using arithmetic blocks based on linear convolution property of the wavelet filters. Architectures for DWT implementation on FPGA platform optimizing area, speed and power have been reported, that can be used for DTCWT implementation. Direct mapped architecture, folded architecture, multiply and accumulate based programmable architecture, flipping architecture, recursive architecture, lifting based architectures, dual scan architecture and DSP type architecture are used for DWT implementation reported in [3]-[6]. Grzeszczak et al. [7] have proposed single stage systolic array architecture proving improvement in throughput and latency with the large area occupied by multipliers on FPGA platform. Chao Cheng et al. [8] have proposed a high-speed single stage architecture based on hardware efficient parallel finite impulse response (FIR) filter structures for the DWT calculation. These structures differ in terms of size of arithmetic unit, on-chip memory, cycle period and average calculation time (ACT) [9]. Chengjun Zhang et al. [10] proposed a scheme for the design of pipeline architecture for a fast calculation of the DWT on Xilinx FPGA running at a maximum freguency of 200MHz and power dissipation of 1005mW. Xin Tian et al. [11] have proposed a line based scanning scheme and a folded architecture for the calculation of multilevel 2-D DWT level-by-level. Yeong-Kang Lai et al. [12] have used parallel data access scheme to avoid line buffers to implement the reversal mechanism in the folded design using both parallel and pipeline processing logic. Chih-Chi Cheng et al. [13] have proposed a convolution based recursive architec-ture for 2-D DWT using 9/7 filters, where the throughput rate is increased in a controlled manner that requires large storage space and decomposition time. Design of DTCWT architecture is like DWT architectures reported in literature Divakar et al. [14] that requires a large storage area.

DTCWT filter structure as presented by Kingsbury requires 10-tap filter structure with four filter banks every stage. Systolic array algorithm for data processing is reported to achieve high throughput, reduced potential with reusability logic presented in Divakar et al. [15]-[16]. The DTCWT architecture reported by Poornima et al. [17] presents the design of systolic array architecture using multiplexed distributive arithmetic algorithm for image decomposition. The architecture is implemented on FPGA and operates at a maximum frequency of 300 MHz consuming less than 10 mW of power and 12% of FPGA resources. Most of the DTCWT architectures reported in the literature is for image processing and the DTCWT architecture for performing OFDM modulation presented in this work is the first of its kind implemented on FPGA for 512 symbols subcarrier OFDM. In this paper, pipelined and optimized filter structures are designed for computing both DTCWT and inverse DTCWT based on OSA logic for OFDM.

## 2 DTCWT Architecture design

In this work a generalized 2048 stage architecture is designed that is customized for both implementations of DTCWT as well as IDTCWT. Figure 1 presents the 7-level

IDTCWT structure. The input data  $x_i^*$  (i=0,1,2,3..7....15) representing complex symbols generated from the QAM or QPSK modulators are processed by the 7-level IDTCWT structure performing OFDM represented as X'R and X'I. The DTCWT structure at the receiver performs demodulation to generate the symbols  $x_i^*$  either from received data X'R and X'I. The DTCWT filter coefficients are 10-tap Q shift filters represented as {H'00a, H'01a, H'10b, H'11b} for the first stage and all subsequent stages, and {H'0a, H'1a, H'0b, H'1b} for last stage of filtering. In every stage, there are four filters (or two pairs of filters representing real and two for

imaginary part). The inputs  $\{x_0^*, x_1^*, x_8^*, x_9^*\}$  are processed by the first stage filters {H'00a, H'01a, H'10b, H'11b}, with 10 filter coefficients the number of arithmetic operations per output will be 10 multiplications and 9 additions. Considering 7 stages and four filters in the stage the number of arithmetic operations will be 280 multiplications and 252 additions per output. Considering 2048 levels the total number of multiplications and additions are 81920 and 73728, respectively. The 10-tap filters require 16-bits for representation and the arithmetic operations need to carry out using floating point logic. To reduce the computation complexity fixed point integer logic is used and the 10-tap filters (N-tap) are scaled and rounded to an integer by multiplying with 64. The integer filter coefficients for the first stage and the last stage of DTCWT and IDTCWT are shown in Table 1. The filters Is1 and Is2 represent the integer filter coefficients for the first stage and all succeeding stage of DTCWT filters. Ins1 is the filter coefficient for the last stage and Ins2 is the filter coefficient



Figure 1: Seven stage IDCWT OFDM modulation for 16 symbols.

for the 1st stage of IDTCWT. The DTCWT coefficients Is2 is equal to IDTCWT coefficients Ins2 represented by {ILa, IHb, ILb, IHb}. The filter coefficients Is1 and Ins1 are related as h (n) = -h (N-1), n = 0, 1, 2, 3...9. The filter outputs generated by {ILa, IHb, ILb, IHb} are represented

by { $y_0^0$ ,  $y_0^1$ ,  $y_0^2$ ,  $y_0^3$ } and mathematically expressed as in Eq. (1), where a0 to a9 are the filter coefficients.

$$y_0' = \left\{ a_0 x_0 + a_1 x_1 - a_1 x_2 + a_2 x_3 + a_2 x_4 + a_1 x_5 + a_1 x_6 - a_3 x_7 + a_3 x_8 + a_0 x_9 \right\}$$
(1)

The filter coefficients of Ins2  $\{\pm 7, \pm 38, \pm 49\}$  are approximated to  $\{\pm 6, \pm 44, \pm 44\}$  to bring symmetry in the filter coefficients with an error of  $\{\pm 1, \pm 6, \pm 5\}$  respectively.

Approximated filter coefficients are indicated in brackets in Table 1. Grouping the terms considering common terms and rearranging Eq. (1) is reduced to Eq. (2)

which represents the first filter output  $y_0^0$  with filter coefficients ILa.

$$v_0^0 = \left\{ a_0 \left( x_0 + x_9 \right) + a_1 \left( x_1 + x_2 \right) + a_2 \left( x_3 + x_4 \right) + a_1 \left( x_5 + x_6 \right) + a_3 \left( x_7 + x_8 \right) \right\}$$
(2)

In Eq. (2), the arithmetic operations required are five multiplications and five additions for input terms and 4 adders for partial product terms. With the rearrangement of common terms, the number of multipliers is reduced to 5 from 10 and the number of adders re-

|   |     |          |          | DTCW     | /T  |          |          |                 | IDTCWT  |         |           |      |        |          |                  |          |
|---|-----|----------|----------|----------|-----|----------|----------|-----------------|---------|---------|-----------|------|--------|----------|------------------|----------|
| 5 |     | ls       | 1        |          |     | ls       | 2        |                 |         | Ins1    |           |      |        | Ins2     |                  |          |
| n | ILa | $IH_{a}$ | $IL_{b}$ | $IH_{b}$ | ILa | $IH_{a}$ | $IL_{b}$ | IH <sub>b</sub> | $IIL_a$ | $IIH_a$ | $IIL_{b}$ | IILa | IILa   | IIHa     | IIL <sub>b</sub> | IILa     |
| 0 | 0   | 0        | 1        | 0        | 2   | 0        | 0        | -2              | 0       | 0       | 0         | -1   | 2      | 0        | 0                | -2       |
| 1 | -6  | -1       | 1        | 0        | 0   | 0        | 0        | 0               | 1       | -6      | 0         | 1    | 0      | 0        | 0                | 0        |
| 2 | 6   | 1        | -6       | -6       | -6  | -6       | -6       | 6               | 1       | -6      | -6        | 6    | -6     | -7(-6)   | -7(-6)           | 6        |
| 3 | 45  | 6        | 6        | -6       | 15  | 0        | 0        | 15              | -6      | 45      | 6         | 6    | 15     | 0        | 0                | 15       |
| 4 | 45  | 6        | 45       | 45       | 44  | 44       | 44       | -44             | 6       | -45     | 45        | -45  | 49(44) | 38(44)   | 38(44)           | -49(-44) |
| 5 | 6   | -45      | 45       | -45      | 44  | -44      | 44       | 44              | 45      | 6       | 45        | 45   | 38(44) | -49(-44) | 49(44)           | 38(44)   |
| 6 | -6  | 45       | 6        | 6        | 0   | 15       | 15       | 0               | 45      | 6       | 6         | -6   | 0      | 15       | 15               | 0        |
| 7 | 1   | -6       | -6       | 6        | -6  | 6        | -6       | -6              | 6       | 1       | -6        | -6   | -7(-6) | 6        | -6               | -7(-6)   |
| 8 | 1   | -6       | 0        | 1        | 0   | 0        | 0        | 0               | -6      | -1      | 1         | 0    | 0      | 0        | 0                | 0        |
| 9 | 0   | 0        | 0        | -1       | 0   | -2       | 2        | 0               | 0       | 0       | 1         | 0    | 0      | -2       | 2                | 0        |

Table 1: Low pass and high pass filter coefficients.

quired is 9. Considering Eq. (2), the total delay in computing every output sample will be 3 clock cycles (1 clock for adding input data x, 1 clock for multiplication of filter coefficient with input data and 1 clock for adding all the multiplied terms). Similarly, the filter outputs

 $y_0^1$ ,  $y_0^2$ ,  $y_0^3$  for the remaining four filters are expressed as in Eq. (3).

$$y_{0}^{1} = a_{0} (x_{0} + x_{9}) + a_{3} (x_{1} + x_{2}) + a_{4} (x_{3} + x_{4}) + a_{2} (x_{5} + x_{6}) + a_{1} (x_{7} + x_{8}) \quad 3(a)$$

$$y_{0}^{2} = a_{3} (x_{0} + x_{9}) + a_{1} (x_{1} + x_{2}) + a_{2} (x_{3} + x_{4}) + a_{1} (x_{5} + x_{6}) + a_{4} (x_{7} + x_{8}) \quad 3(b)$$

$$y_{0}^{3} = a_{0} (x_{0} + x_{9}) + a_{1} (x_{1} + x_{2}) + a_{2} (x_{3} + x_{4}) + a_{1} (x_{5} + x_{6}) + a_{3} (x_{7} + x_{8}) \quad 3(c)$$
(3)

From Eq. (2), Eq. (3a) and comparing the terms in Eq. (3b) and 3(c) the accumulated terms of input data are common. Eq. (2) and Eq. (3) are expressed as in Eq. (4), the accumulated terms are represented as bj and cj (j= 0, 1, 2, 3, 4).

$$y_{0}^{0} = a_{0}b_{0} + a_{1}b_{1} + a_{2}b_{2} + a_{1}b_{3} + a_{3}b_{4}$$

$$y_{0}^{1} = a_{0}b_{0} + a_{3}b_{1} + a_{1}b_{2} + a_{2}b_{3} + a_{1}b_{4}$$

$$y_{0}^{2} = a_{3}c_{0} + a_{1}c_{1} + a_{1}c_{2} + a_{1}c_{3} + a_{0}c_{4}$$

$$y_{0}^{3} = a_{0}c_{0} + a_{1}c_{1} + a_{2}c_{2} + a_{1}c_{3} + a_{3}c_{4}$$
(4)

Considering the expression in Eq. (4) the number of arithmetic operations is 20 multiplications and 16 addition operations per stage, for 2048 stages the numbers of operations are 40960 and 32768, respectively.

The terms b0, b1, b2, b3, b4 and c0, c1, c2, c3, c4 are read out into the arithmetic unit twice, once for com-

puting  $y_0^0 \& y_0^1$  and second for  $y_0^2 \& y_0^3$ . To reduce the number of arithmetic operations and memory read operation, the OSA algorithm is used to compute the filter outputs.

#### 2.1 Optimum systolic array design

To arrive at OSA algorithm, Eq. (4) is expressed as in Eq. (5), with each of the expression are computed by PE that performs two operations, multiplication of terms a<sub>i</sub> & b<sub>i</sub> or ai & ci and accumulation of the multiplied outputs with the previous data. The filter outputs  $y_0^0 \ \& y_0^3$  computation requires the filter coefficients  $a_i^0 = \{a0, a1, a2, a1, a3\}$  and the inputs b<sub>i</sub> and c<sub>i</sub> is required. The filter output  $y_0^1$  is generated by considering the filter coefficients  $a_i^1 = \{a0, a3, a4, a2, a1\}$  and input data bi is required. Similarly, for generating the term  $y_0^2$  filter coefficients  $a_i^2 = \{a3, a1, a2, a1, a3\}$  and input data c<sub>i</sub> is required.

$$PE_{0} \rightarrow y_{i+1}^{0} = y_{0}^{0} + a_{i}^{0}b_{i}$$

$$PE_{1} \rightarrow y_{i+1}^{1} = y_{0}^{1} + a_{i}^{1}b_{i}$$

$$PE_{2} \rightarrow y_{i+1}^{2} = y_{0}^{2} + a_{i}^{2}c_{i}$$

$$PE_{3} \rightarrow y_{i+1}^{3} = y_{0}^{3} + a_{i}^{3}c_{i}$$
(5)

Figure 2 presents the OSA structure designed to perform the filtering operation and the placement of PEs are carried out for optimum utilization of the filter co-



Figure 2: Optimum systolic array structure

efficients and the input data. Bold lines represent the input, the output is represented by solid lines and dotted lines represent the control input. The input data bi is orderly allowed to flow into the PE0 and then into

PE1. with one clock delay, the filter coefficients  $a_i^0$  and

 $a_i^1$  also allowed to flow into PE0 and PE1 from left to

right. The filter coefficient  $a_i^1$  is appended with '0' to synchronize with the data flow of  $b_i$ . The  $c_i$  data input

is appended with '0' and  $a_i^2$  is appended with [0 0] to synchronize with the coefficient data flow into PE3 and PE2, respectively. The internal structure of the PE is presented in Figure 3. The data inputs to the PE are  $u_{i-1}$  and  $v_{i-1}$  that flows into the PE from left and bottom. The outputs of PE are  $u_i$ ,  $y_i$  and  $y_{i-1}$  that represent the data input to the next stage of PE and the output. The control signal  $S_0$  directs the output of the demultiplexer to the output pin of PE or into the accumulator logic. The Delay Register (DR) is used to store the data ui and  $y_i$  for one clock cycle and transfers to the next PE. The register also stores the intermediate data for accumulation during the next clock cycle.



Figure 3: Internal structure of PE.

Table 2 presents the data flow and the output generation of the OSA structure shown in Figure 2 for first 12

**Table 2:** Data flow activity in the OSA structure.

clock cycles. The input coefficients enter from the Left (L) and the data input enter from the Bottom (B) of the PE. The first output is generated at 5th clock for the first filter, the second and third filter generates output at the 6<sup>th</sup> clock cycle and the fourth filter generates the output at the 7<sup>th</sup> clock cycle. The latency is 5 clock cycle, and the throughput is 4 outputs per clock. The advantage of the OSA structure is optimum utilization of and filter coefficients for output computation. The OSA structure designed is used in the realization of 1 to 2047 stage of the IDTCWT and 2 to 2048 stage of DTCWT. For realizing the 1<sup>st</sup> stage of DTCWT and 2048<sup>th</sup> stage of IDTCWT, MDA logic is used. Approximating the 2<sup>nd</sup> stage (Ins2) filter coefficients (±38 to ±44, ±49 to ±44and ±7 to ±6) the number of arithmetic operations is reduced.

#### 2.2 Modified distributive arithmetic

Figure 4 presents the forward DTCWT and IDTCWT structure that performs demodulation and modulation of the OFDM signal. The corresponding filters for DTCWT and IDTCWT are represented as Is1 and Ins1 respectively and it is presented in Table 1. The four output that will be generated by the DTCWT filters is

represented as { $y_i^{00}$ ,  $y_i^{01}$ ,  $y_i^{20}$ ,  $y_i^{21}$ } and expressed as in Eq. 6(a),(b),(c),& (d). The filter coefficients that are common are grouped together to reduce the number of multiplication operation, the corresponding input data is added prior to multiplication by the filter coefficient. From the expressions in Eq. 6 it is observed that the grouping of common terms has reduced the num-ber of multiplications to 4. The summed terms 'Y' is not common with all the four filter expressions and

the filter coefficients {  $\dot{h_{00}}$  ,  $\dot{g_{00}}$  } are zero which will further reduce the number of multiplication operations.

| PEs/Clock            |   | 1              | 2              | 3                     | 4                     | 5                     | 6                     | 7                     | 8                     | 9                     | 10                    | 11                    | 12                    |
|----------------------|---|----------------|----------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|-----------------------|
| DE                   | L | a <sub>0</sub> | a <sub>1</sub> | a <sub>2</sub>        | a <sub>1</sub>        | a <sub>3</sub>        | a <sub>0</sub>        | a <sub>1</sub>        | a <sub>2</sub>        | a <sub>1</sub>        | a <sub>3</sub>        | a <sub>0</sub>        | a <sub>1</sub>        |
| PEo                  | В | b <sub>0</sub> | b <sub>1</sub> | b <sub>2</sub>        | b₃                    | b <sub>4</sub>        | b <sub>0</sub>        | b <sub>1</sub>        | <b>b</b> <sub>2</sub> | b <sub>3</sub>        | b <sub>4</sub>        | b <sub>0</sub>        | b <sub>1</sub>        |
| PE <sub>1</sub>      | L | 0              | a <sub>0</sub> | a₃                    | a <sub>1</sub>        | a <sub>2</sub>        | a <sub>1</sub>        | 0                     | a <sub>0</sub>        | a₃                    | a <sub>1</sub>        | a <sub>2</sub>        | a <sub>1</sub>        |
|                      | В | 0              | b <sub>0</sub> | b <sub>1</sub>        | b <sub>2</sub>        | b₃                    | b <sub>4</sub>        | 0                     | b <sub>0</sub>        | b <sub>1</sub>        | b <sub>2</sub>        | b₃                    | b <sub>4</sub>        |
| PE3                  | L | 0              | a <sub>0</sub> | a <sub>1</sub>        | a <sub>2</sub>        | a <sub>1</sub>        | a₃                    | 0                     | a <sub>0</sub>        | a <sub>1</sub>        | a <sub>2</sub>        | a <sub>1</sub>        | a₃                    |
| FE3                  | В | 0              | <b>C</b> 0     | <b>C</b> <sub>1</sub> | <b>C</b> <sub>2</sub> | <b>C</b> <sub>3</sub> | <b>C</b> <sub>4</sub> | 0                     | <b>C</b> <sub>0</sub> | <b>C</b> <sub>1</sub> | <b>C</b> <sub>2</sub> | <b>C</b> <sub>3</sub> | <b>C</b> <sub>4</sub> |
| PE <sub>2</sub>      | L | 0              | 0              | a <sub>3</sub>        | a <sub>1</sub>        | a <sub>2</sub>        | <b>a</b> <sub>1</sub> | a <sub>0</sub>        | 0                     | 0                     | a <sub>3</sub>        | <b>a</b> <sub>1</sub> | a <sub>2</sub>        |
| FE2                  | В | 0              | 0              | <b>C</b> <sub>0</sub> | <b>C</b> <sub>1</sub> | <b>C</b> <sub>2</sub> | <b>C</b> <sub>3</sub> | <b>C</b> <sub>4</sub> | 0                     | 0                     | <b>C</b> <sub>0</sub> | <b>C</b> <sub>1</sub> | <b>C</b> <sub>2</sub> |
| PE <sub>0(Out)</sub> |   | -              | -              | -                     | -                     | $y_0^0$               | $\mathcal{Y}_1^0$     | $y_2^0$               | $y_3^0$               | ${\cal Y}_4^0$        | $y_5^0$               | $y_6^0$               | $y_7^0$               |
| PE <sub>1(Out)</sub> |   | -              | -              | -                     | -                     | -                     | ${\cal Y}_0^1$        | $y_1^1$               | ${\cal Y}_2^1$        | $y_3^1$               | $\mathcal{Y}_4^1$     | $y_5^1$               | $y_6^1$               |
| PE <sub>3(Out)</sub> |   | -              | -              | -                     | -                     | -                     | $y_0^3$               | $y_1^3$               | $y_2^3$               | $y_3^3$               | $y_4^3$               | $y_5^3$               | $y_5^3$               |
| PE <sub>2(Out)</sub> |   | -              | -              | -                     | -                     | -                     | -                     | $y_0^2$               | $y_1^2$               | $y_2^2$               | $y_3^2$               | $y_4^2$               | $y_5^2$               |



Figure 4: DTCWT and IDTCWT filters for OFDM.

The input data terms are summed up using two stage adder structures that are stored in an intermediate register. The intermediate register data is used as the address (AD) to the Look-Up Table (LUT) that is stored with precalculated partial products (PP) based on the MDA algorithm. The two stage adder structure is designed such that the register contents are used to compute two filter outputs simultaneously. Figure 5 presents the MDA structure for computing two real filters outputs

 $y_i^{00}$  and  $y_i^{01}$ . The input data is loaded into the register

array that loads 10 data samples. The two stage adder array performs addition operation of the data and the summed up data is stored in an intermediate register. The three multiplexers allow corresponding data into the LUT and controlled by the clock signal.

$$y_{i}^{00} = \begin{pmatrix} (y_{0}^{0} + y_{9}^{0})h_{00}^{'} + \\ (y_{1}^{0} + y_{2}^{0})h_{01}^{'} + \\ (y_{3}^{0} + y_{4}^{0} + y_{7}^{0} + y_{8}^{0})h_{02}^{'} + \\ (y_{5}^{0} + y_{6}^{0})h_{03}^{'} + \end{pmatrix}$$
(6a)  
$$y_{i}^{01} = \begin{pmatrix} (y_{0}^{0} + y_{9}^{0})h_{00}^{'} + \\ (y_{3}^{0} + y_{9}^{0})h_{03}^{'} + ) \\ (y_{1}^{0} + y_{2}^{0} + y_{5}^{0} + y_{6}^{0})h_{02}^{'} + \\ (y_{7}^{0} + y_{8}^{0})h_{01}^{'} + \end{pmatrix}$$
(6b)  
$$y_{i}^{20} = \begin{pmatrix} (y_{0}^{2} + y_{1}^{2})g_{00}^{'} + \\ (y_{8}^{2} + y_{9}^{2})g_{01}^{'} + \\ (y_{2}^{2} + y_{3}^{2} + y_{6}^{2} + y_{7}^{2})g_{02}^{'} + \\ (y_{4}^{2} + y_{5}^{2})g_{03}^{'} \end{pmatrix}$$
(6c)



Figure 5: MDA structure for computing two real filter.

$$y_{i}^{21} = \begin{pmatrix} (y_{0}^{2} + y_{1}^{2})g_{01}^{'} + \\ (y_{2}^{2} + y_{3}^{2} + y_{6}^{2} + y_{7}^{2})g_{02}^{'} + \\ (y_{4}^{2} + y_{5}^{2})g_{03}^{'} + \\ (y_{8}^{2} + y_{9}^{2})g_{00}^{'} \end{pmatrix}$$
(6d)

In the positive clock, the LUT locations are accessed to

compute the first filter output  $y_i^{00}$  and in the negative clock, the PP are accessed to compute the second filter

output  $y_i^{01}$ . Figure 6 presents the MDA structure for

computing two imaginary filter outputs  $y_i^{20}$  and  $y_i^{21}$ . The MDA logic is accessed at both positive and negative clock to compute the PP from the LUT and compute the two filter outputs. The advantage of MDA structure are that the LUT size is limited to a depth of 8 locations and used twice for computing two filter outputs. The intermediate registers are used to ensure pipelined structure and improves throughput. The first output of the filters is generated after 13 clock cycle and throughput is 4 for every clock.

#### 2.3 Comparison of resources

The OSA structure and MDA structure designed are optimum in terms of a number of arithmetic operations

Table 3: Performance comparison.

| Parameters                              | Direct<br>Implementation | OSA                   | MDA                   |
|-----------------------------------------|--------------------------|-----------------------|-----------------------|
| Filter type -<br>DTCWT/Inverse<br>DTCWT | Real and<br>Imaginary    | Real and<br>Imaginary | Real and<br>Imaginary |
| Filter order                            | 10-tap                   | 10-tap                | 10-tap                |
| Multiplications                         | 40                       | 20                    | Multiplier<br>less    |
| Additions                               | 36                       | 16                    | 14                    |
| LUTs                                    | 4                        | -                     | 2                     |
| LUT depth                               | 1024                     | -                     | 8                     |
| Throughput<br>(Output/clock)            | 1                        | 4                     | 4                     |
| Latency (clocks)                        | 10                       | 5                     | 13                    |

and hence it is suitable for performing 2048 symbol OFDM modulation and demodulation. Table 3 compares the hardware resources and performances of the designed structure with direct implementation structure. From the comparisons of a number of multipliers and adders, the MDA structure requires 50% of LUT and the LUT depth is reduced by 99.21%. The OSA structure requires 50% of multipliers and 55% of adders. The latency is reduced by 50% due to reuse of the input data and filter coefficients. Considering the DTCWT structure designed using both OSA and the MDA algorithm, it is required to implement OFDM for subcarrier modu-



Figure 6: MDA structure for computing two Imaginary filter.

lation and demodulation. In this work, the design of 2560 subcarrier OFDM is presented.

## 3 DCWT OFDM using 2560 subcar-rier

According to generic underwater MODEM standards, Table. 4 presented by Kochanska et al. [20] shows the basic parameters for OFDM recommended for underwater acoustic channels for shallow water. OFDM based underwater MODEM with rank 8 and 30 kHz carrier frequency for synchronization by considering sampling rate of 200 kHz, bandwidth of 5 kHz and subcarrier spacing of 312.5 kHz. By considering 5 kHz bandwidth and number of subcarriers is 4, the subcarrier spacing will be 1250 Hz (5 KHz / 4). Increasing the number of subcarriers to 64 will impact the improvement in data rate, and the spacing between the subcarriers will be limited to 78.13 Hz leading to Inter Symbol Interference (ISI). An ideal spacing would be 312.5 Hz with number of subcarriers to be 16. Considering the standard specifications and based on discussions presented by Kochanska et al. [20] a 2560 subcarrier OFDM model is developed. The number of subcarriers is varied from 160 to 2560 and the subcarrier spacing is set between 1250 Hz to 78.13 Hz. As per the recommendations for OFDM based acoustic underwater communication, it is required to have configurable modulator and demodulator that can support 160 to 2560 subcarriers. To develop 2560 subcarrier OFDM based on DTCWT, it is required to have 2560 levels of IDTCWT for modulation and 2560 level DTCWT for demodulation. The DTCWT structure for 2560 level is presented in Figure 7. To have variation in subcarrier modulation, the structure presented in Figure 7 is designed with output taps from levels 160, 320, 640, 1280 and 2560. The 2560 structure can be configured to perform any of this modulation by selecting the outputs from intermediate stages. Every stage of DTCWT or IDTCWT uses four stages of filters that are either designed using OSA or MDA algorithm. By introducing intermediate registers between two stages, a pipeline structure can be designed for generating 2560 subcar-rier modulation. The 2560 stage pipelined structure generates 2559 detail filter coefficients and 1 approximate filter coefficients. The data movement between 2560 stages are controlled by control units that can synchronize data movement. Each stage requires 16 adders and 20 multipliers (considering OSA) has a latency of 5 clock cycles. For 2560 stage OFDM the number of multipliers and adders are 40960 and 51200. The latency for 2560 stage will be 12800 clocks. To design an optimum 2560 modulator with a trade-off between latency and arithmetic operation, a folded pipelined modulator is designed.

#### 3.1 Folded Pipelined OFDM Modulator

In the folded pipelined OFDM modulator structure, each stage is reused twice or four times for subcarrier modulation and the data flow is controlled by a configurable logic. By designing reusable filter bank scheme, the number of filter banks required is reduced by 1280 or by 640 and hence it is called a folded structure. The first stage DTCWT comprises of four filters and generates four outputs. Stage 2 to stage 2560 of DTCWT comprises of two filters that are grouped together and process the data from preceding stage to generate two outputs from every pair or group as shown in Figure 7. The OSA structure shown in Figure 2 is designed for processing data input and to generate four outputs (real and imaginary). The reduced optimum OSA structure is designed to generate two outputs as shown in Figure 8 (a) for the real part and (b) for imaginary part separately.

Figure 9 presents the top-level block diagram of proposed folded pipelined unit for computing 2560 subcarrier OFDM demodulation using forward DTCWT. The first stage is realized using the MDA algorithm and generates four outputs. One output from each pair of filters

{ $y_i^{00}$ ,  $y_i^{20}$ } is considered for the next level of processing. The folded pipelined structure consists of N stage of processing units (N=1280) and each processing unit has two Fold Units (FLU) and two compute unit. The compute unit is the OSA shown in Figure 8. The FLU is designed to realize either two or four stages of DTCWT decomposition. By reusing the processing units twice, two-stage decomposition is carried out and by reusing

**Table 4:** OFDM parameters for underwater acoustic communication.

| No. of Subcarriers<br>NS | No. of Subcarriers<br>NB in B=5 KHz | Subcarrier<br>Spacing B <sub>s</sub> [Hz] | Symbol<br>Duration T <sub>OFDM</sub><br>[ms] | Symbol<br>Duration with<br>CP T <sub>s</sub> [ms] | Symbols Per<br>Frame |
|--------------------------|-------------------------------------|-------------------------------------------|----------------------------------------------|---------------------------------------------------|----------------------|
| 160                      | 4                                   | 1250.0                                    | 0.80                                         | 1.00                                              | 2500                 |
| 320                      | 8                                   | 625.0                                     | 1.60                                         | 2.00                                              | 1250                 |
| 640                      | 16                                  | 312.5                                     | 3.21                                         | 4.01                                              | 625                  |
| 1280                     | 32                                  | 156.3                                     | 6.41                                         | 8.01                                              | 312                  |
| 2560                     | 64                                  | 78.13                                     | 12.82                                        | 16.03                                             | 156                  |



Figure 7: DTCWT structure for 2560 stages.

four times, four stages of DTCWT are achieved. Stage 0 in Figure 9 is implemented using the proposed MDA logic discussed in section 2.2 and stage 1 to stage N is realized using the OSA structure discussed in section 2.1. Figure 10 presents the fold unit design that comprises of input register of depth 18 that stores the input data from the previous stages denoted as x. The register contents of each memory location of input register is accessed and denoted as R. The outputs of input register are connected to the multiplexer array so that the data R0 to R18 are rearranged as shown at the inputs of the multiplexer array. Select line S1 and S0 are used to configure the fold unit to either fold by two or fold by four. If the select signal S0 is alone used then fold by two logic is achieved and if both S0 and S1 are used, then fold by four logic is achieved. The data enters the OSA structure for processing and generates two outputs of

which one of the outputs  $y_{i+1}^0$  is de-multiplexed and stored in the output register (Q) for next stage pro-

cessing. The other data  $y_{i+1}^{1}$  is demodulated subcarier forwarded to last stage of OFDM demodulator. The demultiplexer logic and input stage multiplexer array are synchronized for computing either two levels or four levels DTCWT decomposition. The de-multiplexer output denoted as {Q0, Q1, Q2, Q3} are stored into a corresponding memory location in the output register for reuse and for computing the next level decomposition. The read and write signal are used to read out the data into the multiplexer array form the input register and write the output of OSA into the output register. After every read and write operation is performed the input and output registers are shifted to load new data inputs for processing. Figure 11 and Figure 12 presents PE configured to perform decomposition by two or four. In Figure 11, the input stage for fold by two logic consists of two data array registers of depth four, represented as 'x' and 'y'. The input register array 'x' is loaded with new input data at the register (xi + 4). At every clock, the data is shifted up in the data register 'x'. The output of OSA is demultiplexed and shifted as input to register 'y' and again the cycle repeats. In the first clock the data input 'x' is processed by OSA to generate two

outputs  $y_i^{00}$  and  $y_i^{01}$  of which  $y_i^{00}$  is shifted back for a second level of processing into the 'y' register.

In the next clock pulse, the data from 'y' register is multiplexed and processed by OSA logic, which generates two inputs, of this one of them is de-multiplexed and sent to next stage for processing. Similarly, Figure 12 presents the logic of fold by four that comperes of four input array registers represented as {x, y, z, w}, each of depth four. The fold by four module is designed to pro-



Figure 8: Optimum systolic array (a) Real part (b) Imaginary part.



Figure 9: Folded pipelined OFDM demodulator block diagram.

cess subcarrier modulation by reusing DTCWT filter pair four times. The control signal S0 and S1 are set to perform fold by two or fold by four operations. The data flow logic designed for fold by two logic is presented in Figure 13. There are two registers at input represented as 'y' and 'x'. In the first clock, the four inputs x0, x1, x2 and x3 are loaded into 'x' register array whereas 'y' array is set to zero. The OSA unit generates y0 at 2nd clock cycle that gets loaded into the fourth register of 'y'. In every clock, the input 'x' is processed to generate output 'y' and will be loaded into the 'y' array register. Once all the contents of 'y' register are computed (clock 5), the multiplexer at the input of OSA is enabled to process 'y' data. The two-stage fold logic generates output for alternate stages i.e stage 1 to stage 3. Similarly, Figure 14 presents a data flow diagram for fold by four logic. In this logic, the input data x0, x1, x2 and x3 are loaded in a first clock cycle and the first output of the 1st stage is computed at second clock pulse. The 1<sup>st</sup> output of the third stage (z0) is computed at 11<sup>th</sup> clock and the 1<sup>st</sup> output of 4<sup>th</sup> stage (w0) is computed at 16<sup>th</sup> clock. The fold by four logic processes data from the 1<sup>st</sup> stage to generate data to the 5<sup>th</sup> stage. The folded pipelined architecture is configured to compute N=2560 subcarrier OFDM symbols using only N/2 stages in the fold by two logic and N/4 stages using fold by four logic. In fold by two logic, the latency is 10 clock cycles for every six outputs and in the fold by four logic, the latency is 20

clocks for every ten outputs. The number of multiplier and adder operations are reduced by 50% in fold by two as compared with direct implementation and reduced by 75% in fold by four logic. A trade-off between computation complexity and latency is achieved in the folded pipelined architecture. This structure can be configured to compute 160, 320, 640, 1280 and 2560 subcarrier modulation by taping the outputs at N/2, N/4, N/8 and N/16 stages. The 2560 stage DTCWT decomposition unit is modelled using Verilog HDL and is verified for its functionality. The functionally correct HDL code for DTCWT and IDTCWT is implemented on FPGA and the logic correctness of OFDM module is verified in system generator environment.

## 4 FPGA Implementation

Figure 15 presents the top-level block diagram of IDTCWT-DTCWT validation model. Input signal represented as in Eq. (7) is generated in the Matlab Simulink environment. Two frequencies of 40 kHz and 100 kHz are used that are quantized to 8-bit numbers as a composite input signal. The parameters q1 and q2 are quantization factors to scale the input to nearest integer numbers. Each of the samples of composite input



Figure 10: Fold unit logic for computing decomposition by 2 and 4.

data is encoded using 64-QAM modulation scheme in the Simulink environment.

$$lnput=q1*round ((10sin (2*pi*f1*t) +10)/q1+q2) *round ((10sin (2*pi*f2*t) +10)/q2)$$
(7)

The QAM data from Simulink environment is read into system generator model through gateway in port. The symbols from the gateway in port are converted to parallel data and are processed by the IDTCWT unit that is modelled using Verilog code. The four filter outputs of IDTCWT module which is a complex data is converted to real data and is further processed by the DTCWT model. OFDM modulation and demodulation is per-formed by the IDTCWT-DTCWT pair. The output of DTCWT is read into the Simulink environment through gateway out module. From the results obtained in the workspace of MATLAB the demodulated data symbols are processed by the inverse QAM module to generate the output signal. Figure 16 presents the simulation results of the system generated module for DTCWT stage 1 demodulator. As the input from Simulink envi-ronment is quan-



**Figure 11:** Computing two stage decomposition using one stage DTCWT.

tized to positive integers, the sine wave generated is used as an input sequence for OFDM modulation. The modulated data is processed by the DTCWT module and OFDM demodulation is carried out. The input sequence and the demodulated output sequence is seen and verified for its numerical values. At the output, the numerical values obtained are equal to the input samples but a delay of 3 clocks is seen.

Figure 17 presents the FPGA implementation of OFDM modulation and demodulation on the Virtex-5 development kit. The input data from Simulink environment is used as the source to the system generator model and the same input is provided to the FPGA device for performing modulation and demodulation process. The output of FPGA is read back using chip scope debugging tool for validation of the designed model. The OFDM model implemented on FPGA is verified for its logic correctness and the hardware implementation report generated is analyzed to show area, power, and timing parameters. A detailed discussion on FPGA results is presented in the next section.



Figure 12: Computing four stage decomposition using one stage DTCWT.

| Clk  |   | 1              |                | 2              |                       | 3              | 4                     | 1              |                | 5              | (              | 5              |                | 7              | 1                   | 5               |
|------|---|----------------|----------------|----------------|-----------------------|----------------|-----------------------|----------------|----------------|----------------|----------------|----------------|----------------|----------------|---------------------|-----------------|
| Reg. | Y | Х              | Y              | Х              | Y                     | Х              | Y                     | Х              | Y              | Х              | Y              | Х              | Y              | Х              | <br>Y               | Х               |
| i    | 0 | X <sub>0</sub> | 0              | X <sub>1</sub> | 0                     | X <sub>2</sub> | 0                     | X <sub>3</sub> | Y <sub>0</sub> | X <sub>4</sub> | Y <sub>2</sub> | X <sub>6</sub> | Y <sub>2</sub> | X <sub>6</sub> | <br>Y <sub>10</sub> | X <sub>14</sub> |
| i+1  | 0 | X <sub>1</sub> | 0              | X <sub>2</sub> | 0                     | X <sub>3</sub> | Y <sub>0</sub>        | X <sub>4</sub> | Y <sub>1</sub> | X <sub>5</sub> | Y <sub>3</sub> | X <sub>7</sub> | Y <sub>3</sub> | X <sub>7</sub> | <br>Y <sub>11</sub> | X <sub>15</sub> |
| i+2  | 0 | X <sub>2</sub> | 0              | X <sub>3</sub> | Y <sub>0</sub>        | X <sub>4</sub> | <b>Y</b> <sub>1</sub> | X <sub>5</sub> | Y <sub>2</sub> | X <sub>6</sub> | Y <sub>4</sub> | X <sub>8</sub> | Y <sub>4</sub> | X <sub>8</sub> | <br>Y <sub>12</sub> | X <sub>16</sub> |
| i+3  | 0 | X <sub>3</sub> | Y <sub>0</sub> | X4             | <b>Y</b> <sub>1</sub> | X <sub>5</sub> | Y <sub>2</sub>        | X <sub>6</sub> | Y <sub>3</sub> | X <sub>7</sub> | Y <sub>5</sub> | X <sub>9</sub> | Y <sub>5</sub> | X <sub>9</sub> | <br>Y <sub>13</sub> | X <sub>17</sub> |

Figure 13: Data flow in decomposition by two.

| Clk  | 1 |                |                | 2                     |   |                |                | 3                      |   |    | 4                     |                |    | 5              |                 |                 |                |                |                       |                 |
|------|---|----------------|----------------|-----------------------|---|----------------|----------------|------------------------|---|----|-----------------------|----------------|----|----------------|-----------------|-----------------|----------------|----------------|-----------------------|-----------------|
| Reg. | W | Z              | Y              | Х                     | W | Z              | Y              | Х                      | W | Z  | Y                     | Х              | W  | Z              | Y               | Х               | W              | Ζ              | Y                     | Х               |
| i    | 0 | 0              | 0              | X <sub>0</sub>        | 0 | 0              | 0              | X <sub>1</sub>         | 0 | 0  | 0                     | X <sub>2</sub> | 0  | 0              | 0               | X <sub>3</sub>  | 0              | 0              | Y <sub>0</sub>        | X <sub>4</sub>  |
| i+1  | 0 | 0              | 0              | <b>X</b> <sub>1</sub> | 0 | 0              | 0              | X <sub>2</sub>         | 0 | 0  | 0                     | X <sub>3</sub> | 0  | 0              | Y <sub>0</sub>  | X <sub>4</sub>  | 0              | 0              | <b>Y</b> <sub>1</sub> | X <sub>5</sub>  |
| i+2  | 0 | 0              | 0              | X <sub>2</sub>        | 0 | 0              | 0              | X <sub>3</sub>         | 0 | 0  | Y <sub>0</sub>        | X <sub>4</sub> | 0  | 0              | $Y_1$           | X <sub>5</sub>  | 0              | 0              | Y <sub>2</sub>        | X <sub>6</sub>  |
| i+3  | 0 | 0              | 0              | X <sub>3</sub>        | 0 | 0              | Y <sub>0</sub> | X <sub>4</sub>         | 0 | 0  | <b>Y</b> <sub>1</sub> | X <sub>5</sub> | 0  | 0              | $Y_2$           | X <sub>6</sub>  | 0              | 0              | Y <sub>3</sub>        | X <sub>7</sub>  |
| Clk  |   |                | 9 10           |                       |   |                |                |                        |   | 14 |                       |                | 15 |                |                 |                 |                |                |                       |                 |
| Reg. | W | Z              | Y              | Х                     | W | Z              | Y              | Х                      |   |    |                       |                | W  | Z              | Y               | Х               | W              | Ζ              | Y                     | Х               |
| i    | 0 | Z <sub>0</sub> | Y <sub>4</sub> | X <sub>8</sub>        | 0 | Z <sub>1</sub> | Y <sub>5</sub> | X <sub>9</sub>         | ] |    |                       |                | W1 | Z <sub>5</sub> | Y <sub>9</sub>  | X <sub>13</sub> | W <sub>2</sub> | Z <sub>6</sub> | Y <sub>10</sub>       | X <sub>14</sub> |
| i+1  | 0 | Z <sub>1</sub> | $Y_5$          | X <sub>9</sub>        | 0 | Z <sub>2</sub> | Y <sub>6</sub> | X <sub>10</sub>        |   |    |                       |                | W2 | Z <sub>6</sub> | Y <sub>10</sub> | X <sub>14</sub> | W <sub>3</sub> | Z <sub>7</sub> | Y <sub>11</sub>       | X <sub>15</sub> |
| i+2  | 0 | Z <sub>2</sub> | Y <sub>6</sub> | X <sub>10</sub>       | 0 | Z <sub>3</sub> | Y <sub>7</sub> | <b>X</b> <sub>11</sub> |   |    |                       |                | W3 | Z <sub>7</sub> | Y <sub>11</sub> | X <sub>15</sub> | W <sub>4</sub> | Z <sub>8</sub> | Y <sub>12</sub>       | X <sub>16</sub> |
| i+3  | 0 | Z <sub>3</sub> | Y <sub>7</sub> | X <sub>11</sub>       | 0 | Z <sub>4</sub> | Y <sub>8</sub> | X <sub>12</sub>        |   |    |                       |                | W4 | Z <sub>8</sub> | Y <sub>12</sub> | X <sub>16</sub> | W <sub>5</sub> | Z9             | Y <sub>13</sub>       | X <sub>17</sub> |

Figure 14: Data flow in decomposition by four.



**Figure 15:** System generator model for validation of IDTCWT-DTCWT model.



Figure 16: Validation of OFDM modulation using DTCWT.

## 5 Results and Discussion

The functional correct HDL model is synthesized targeting Virtex-5 FPGA family and synthesis report is obtained. Verilog HDL code is developed to model the



Figure 17: FPGA implementation of OFDM model.

proposed DTCWT calculation unit. FSM is designed to model control logic that synchronizes the forward transform operation. The input stage consists of serial to parallel converter realized using de-multiplexer and multiplexer that is designed to work as a parallel to serial converter at the output stage. The DTCWT module is modelled using Xilinx IP (Internet Protocol) cores and glue logic. A test bench is developed that uses known test vectors (Sinusoidal signal with center frequency 14 KHz, -3dB bandwidth of 5.6 KHz, effective bit rate 1222 b/s, with transition from 0 to 2v and each bit represented by signed integer of 8 bits) to verify logic correctness of the developed Verilog HDL model. Input data symbols that are represented using 8-bit signed representation are stored in test bench and are forced



Figure 18: Simulation results of OFDM module.

into the HDL model for DTCWT calculation. The DTCWT coefficients computed by the HDL model are seen for its numerical values and compared with theoretical values. From the comparison of practical and theoretical values, the logic correctness of HDL code is verified. In addition to processing modulation symbols, the filter is verified for its impulse response and the output of the filter is seen to produce filter coefficients. The first stage four filter bank structure results are verified for impulse response and the output is seen to be the filter coefficients proving logic correctness of Verilog HDL and architecture design. The functionally verified HDL code is synthesized and RTL schematic is obtained for one filter and four filter structure. Figure 18 presents the simulation results of OFDM modulator and demodulator captured in Modelsim environment for an input sine wave. The input sine wave is modulated using OFDM modulator and the results seen at one of the four filters are presented. At the receiver, the DTCWT module demodulates the OFDM signal and the data is recovered at the output. From the simulation results, the input and output wave are matching as per the requirement. Figure 19 presents the timing report of OFDM modulator, from the timing report it is seen that for the clock period of 10ns with 50% duty cycle there is no errors found after analyzing 350359 timing paths with 392 endpoints. There is no setup and hold time violations as well and the minimum time is seen to be 7.16 ns which gives a maximum operating frequency of 139 MHz. Figure 20 presents the hardware results captured using chip scope debugging tool at the output of IDTCWT module. The peaks are seen at regular intervals and the pattern is seen to be like the results seen in Modelsim environment.

Timing constraint: TS clk 45ab3537 = PERIOD TIMEGRP "clk 45ab3537" 10 ns HIGH 50%;

350359 paths analyzed, 392 endpoints analyzed, 0 failing endpoints

O timing errors detected. (O setup errors, O hold errors)





Figure 19: Timing report of OFDM module



Figure 20: Chip scope debugging results of OFDM modulator

Table 5 summarizes the FPGA implementation report of the DTCWT filter alone. Each of the four filters is implemented on FPGA independently and area, timing and power report is generated. Finally, DTCWT structure for 2560 stage is implemented on FPGA. The 2560 stage DTCWT filter runs at a maximum frequency of 248 MHz consuming power dissipation of less than 1.33W occupying 9982 LUTs and 32 DSP arithmetic resources. The results presented in this paper are the first set of information of implementing DTCWT on FPGA for OFDM applications. Table 6 compares the FPGA implementation results of 1D - DTCWT level structure. The proposed single-stage structure operates at a maximum frequency of 302.87 MHz with power dissipation of less than 0.82 W consuming less than 376 slices. The first stage is designed using MDA logic and hence the number of LUTs is very less as compared with all other implementations.

Table 5: Summary of Synthesis Report.

| Parameter                            | One<br>filter | Two<br>filter | Four<br>filter | 2560<br>stage |  |
|--------------------------------------|---------------|---------------|----------------|---------------|--|
| Number of Slice<br>Registers         | 87            | 168           | 376            | 9982          |  |
| Number of Slice<br>LUTs              | 88            | 170           | 373            | 9982          |  |
| Number of fully<br>used LUT FF pairs | 81            | 156           | 353            | 9982          |  |
| Number of bonded<br>IOBs             | 24            | 37            | 68             | 436           |  |
| Number of BUFG/<br>BUFGCTRLs         | 1             | 1             | 1              | 32            |  |
| Number of<br>DSP48A1s                | 1             | 2             | 4              | 32            |  |
| Maximum<br>Frequency (MHz)           | 489.89        | 374.5         | 302.87         | 248.23        |  |
| Total Supply Power<br>(W)            | 0.37          | 0.37          | 0.82           | 1.33          |  |

|                                 | Ref [17] | DTCWT<br>[16] | Ref [15] | This<br>work |  |
|---------------------------------|----------|---------------|----------|--------------|--|
| Number<br>of Slice<br>Registers | 1836     | 2056          | 3741     | 376          |  |
| Number of<br>Slice LUTs         | 1586     | 2045          | 3612     | 373          |  |
| Total power<br>(W)              | 0.7851   | 0.85          | 1.001    | 0.82         |  |
| Maximum<br>Frequency<br>(MHz)   | 291.12   | 278.89        | 246.76   | 302.87       |  |

#### Table 6: Comparison of hardware requirements.

#### 6 Conclusion

The configurable DTCWT based OFDM modulator-demodulator is designed and is implemented on FPGA. The 2560 stage DTCWT OFDM structure is configurable to perform 160, 320, 640, 1280 and 2560 level subcarrier modulation. Optimum systolic array (OSA) unit is designed with PE computing 4 outputs at every clock with latency of 5 clocks. The Modified Distributive Arithmetic (MDA) unit computes two filter outputs with throughput of four and latency of 13 clocks optimizing LUT size to 99.21%. Folded pipelined OFDM modulator is designed using fold unit logic to either perform two stage decomposition or four stage decomposition. The 2560 stage OFDM modulator is realized using folded pipelined structure operating at maximum frequency of 248 MHz consuming power less than 1.33 W. With low power and high processing speed, the OFDM structure is suitable for underwater communications that requires adaptive modulation scheme.

#### 7 Patent

Application no. - 202041023845 Date of filing the application: 06-06-2020 Publication Date: 12-02-2021 Title of Invention: Method for Performing OFDM Modulation and Demodulation Based on DTCWT with N-Subcarriers Designed using Pipelined-Folded-Reusable Systolic Array Algorithm with Reconfigurable Logic. Name of Inventors: Girish N, Veena M B, Cyril Prasanna Raj

## 8 Conflict of Interest

We hereby declare that there is no conflict of interest in publishing this paper.

#### 9 References

1. A. Goalie, J. Trubuil, N. Beuzelin,"Channel coding for underwater acoustic communication sys-tem," IEEE Oceans 2006, September 18-21, Bos-ton, MA, USA, 2006.

https://doi.org/10.1109/OCEANS.2006.307093

 Po-Cheng Wu, Liang-Gee Chen,"An efficient architecture for two-dimensional discrete wavelet transform," IEEE Transactions on Circuits and Systems for Video Technology, vol. 11, no. 4, pp. 536–545, 2001.

#### https://doi.org/10.1109/VTSA.1999.786013

- Wim Sweldens,"The lifting scheme: A customdesign construction of biorthogonal wavelets," Journal of Applied and Computational Harmonics Analysis, vol. 3, no. 2, pp. 186-200, 1996. <u>https://doi.org/10.1006/acha.1996.0015</u>
- 4. A.S. Lewis, G. Knowles,"VLSI architecture for 2-D daubechies wavelet transform without multipliers," Electronics Letters, vol. 27, no. 2, pp. 171–173, 1991.

#### https://doi.org/10.1049/el:19910110

 C. Chakrabarti, M. Vishwanath,"Efficient realizations of the discrete and continuous wavelet transform: from single chip implementations to mapping on SIMD array computers," IEEE Trans-actions on Signal Processing, vol. 43, no. 3, pp.759–771, 1995.

#### https://doi.org/10.1109/78.370630

- M.B.Veena, M.N. Shanmuka Swa-my, "Performance analysis of DWT based OFDM over FFT based OFDM and implementing on FPGA," International Journal of VLSI design and Communication Systems (VLSICS), vol.2, no.3, September, 2011. <u>http://doi.org/10.5121/vlsic.2011.2310</u>
- 7. A.Grzeszczak, M.K.Mandal, S.Panchana-than, "VLSI implementation of discrete wavelet transform," IEEE Transactions on VLSI Systems, vol.4, no.4, 1996. https://doi.org/10.1109/92.544407
- Chao Cheng, Keshab K. Parhi, "High Speed VLSI Implementation of 2-D Discrete Wavelet Transform," IEEE Transactions on Signal Processing, vol. 56, no.1, pp. 393-403, 2008. https://doi.org/10.1109/TSP.2007.000754

#### https://doi.org/10.1109/TSP.2007.900754

- 9. S. Masud, J. V. McCanny,"Reusable Silicon IP Cores for Discrete Wavelet Transform Applica-tions," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 51, no. 6, pp. 1114-1124, 2004. https://doi.org/10.1109/TCSI.2004.829236
- Chengjun Zhang, ChunyanWangand, M. Omair Ahmad,"A Pipeline VLSI Architecture for Fast Computation of the 2-D Discrete Wavelet Trans-form," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 59, no. 8, 2012. <u>https://doi.org/10.1109/TCSI.2011.2180432</u>

- XinTian, Lin Wu, Yi-Hua Tan, Jin-Wen Tian, "Efficient Multi-input/Multi-output VLSI ar-chitecture for 2-D lifting-based discrete wavelet transform," IEEE Transaction on Computers, vol. 60, no. 8, pp. 1207–1211, 2011. https://doi.org/10.1109/TC.2010.178
- Yeong-Kang Lai, Lien Fei Chen, Yui-Chih Shih,"A high-performance and memory efficient VLSI ar-chitecture with parallel scanning method for 2-D lifting based discrete wavelet transform," IEEE Transaction on Consumer Electronics, vol. 55, no. 2, pp. 400-407, 2009.

https://doi.org/10.1109/TCE.2009.5174400

13. Chih-Chi Cheng, Chao-Tsung Huang, Ching.-Yeh Chen, Chung-JrLian, Liang-Gee Chen,"On-chip memory optimization scheme for VLSI implementation of line-based two-dimensional discrete wavelet transform," IEEE Transaction on cir-cuits and System for Video Technology, vol. 17, no. 7, pp. 814-822, 2007.

https://doi.org/10.1109/TCSVT.2007.897106

14. S. S. Divakara, Sudarshan Patilkulkarni, Cyril Prasanna Raj,"High Speed Area Optimized Hybrid DA Architecture for 2D-DTCWT," International Journal of Image and Graphics, vol. 18, no. 01, 2018.

https://doi.org/10.1142/S0219467818500043

- 15. S. S. Divakara, Sudarshan Patilkulkarni, Cyril Prasanna Raj,"High Speed Modular Systolic array based DTCWT with Parallel Processing Architecture for 2D Image Transformation on FPGA," International Journal of Wavelets, Multiresolution and Information Processing, vol. 15, no. 5, 2017. https://doi.org/10.1142/S0219691317500473
- G. Venkateshappa, Cyril Prasanna Raj,"Design of DTCWT-DWT Image Compressor- Decompressor with commanding Algorithm", European journal of Advances in Image and Video Processing, vol.5, no. 1, 2017.

https://doi.org/10.14738/aivp.51.2777

- B. Poornima, A. Sumathi, Cyril Prasanna Raj,"Memory efficient high speed systolic array architecture design with multiplexed distributive arithmetic for 2D DTCWT computation on FPGA," journal of microelectronics, electronic components and materials, vol. 49, no. 3, 2019 <u>https://doi.org/10.33180/InfMIDEM2019.301</u>
- H. T. Kung, Charles E. Leiserson,"Algorithms for VLSI processor arrays," Introduction to VLSI Systems, pp. 271-292, 1980. http://www.eecs.harvard.edu/~htk/publication/1980-introductionto-vlsi-systems-kung-leiserson.pdf
- M. Y. Chern, T. Murata," Efficient matrix multiplications on a concurrent data-loading array processor," IEEE ICPP, pp. 90-94, 1983. <u>https://www.osti.gov/biblio/5364346</u>

 Iwona Kochańska, Jan H. Schmidt, Jacek Marszal, "Shallow Water Experiment of OFDM Underwater Acoustic Communications," Archives of Acoustics, vol. 45, no. 1, pp. 11–18, 2020. https://doi.org/10.24425/aoa.2019.129737



Copyright © 2021 by the Authors. This is an open access article distributed under the Creative Com-

mons Attribution (CC BY) License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Arrived: 14.12.2020 Accepted: 13.05.2021 https://doi.org/10.33180/InfMIDEM2021.205



Journal of Microelectronics, Electronic Components and Materials Vol. 51, No. 2(2021), 135 – 146

# Extending Leeson's Equation

Matjaž Vidmar

#### Univerza v Ljubljani, Fakulteta za Elektrotehniko, Ljubljana, Slovenia

**Abstract:** The oscillator phase noise is one of the key limitations in several fields of electronics. An electronic oscillator phase noise is usually described by the Leeson's equation. Since the latter is frequently misinterpreted and misused, a complete derivation of the Leeson's equation in modern form is given first. Second, effects of flicker noise and active-device bias are accounted for. Next the complete spectrum of an electronic oscillator is derived extending the result of the Leeson's equation into a Lorentzian spectral line. Finally the spectrum of more complex oscillators including delay lines is calculated, like opto-electronic oscillators.

Keywords: phase noise; Leeson's equation; oscillator bias; Lorentzian line; opto-electronic oscillator

## Razširitev Leesonove Enačbe

**Izvleček:** Fazni šum oscilatorja je ena ključnih omejitev v številnih področjih elektronike. Fazni šum elektronskega oscilatorja običajno opisuje Leesonova enačba. Ker je slednja pogosto slabo razumljena in napačno uporabljena, bo najprej opisana celotna izpeljava Leesonove enačbe. V drugem koraku je nujna obravnava učinkov šuma 1/*f* in nastavitve delovne točke aktivnega gradnika. Sledi celovita izpeljava spektra elektronskega oscilatorja, ki rezultat Leesonove enačbe razširi v Lorentzovo spektralno črto. Končno se izpelje spekter bolj kompliciranih oscilatorjev, kot so to opto-elektronski oscilatorji.

Ključne besede: fazni šum; Leesonova enačba; delovna točka oscilatorja; Lorentzova črta; opto-elektronski oscilator

\* Corresponding Author's e-mail: matjaz.vidmar@fe.uni-lj.si

## 1 Introduction

Towards the end of the 19<sup>th</sup> century, the Hertz experiments connected two areas of physics, namely electricity and optics. While radio communications started with filtered noise from spark gaps, the latter were quickly replaced by much more efficient vacuum-tube electronic oscillators, invented independently by Armstrong and Meissner around 1912.

Electronic oscillators were so successful that their spectrum was considered an infinitely narrow spectral line at relatively low radio frequencies f < 30 MHz in the first half of the 20<sup>th</sup> century. Their spectral line was only broadened by external causes like unfiltered supply, load pull, temperature drift and/or vacuum-tube aging.

On the other hand, in optics it was quickly discovered that spectral lines of different light sources were not infinitely narrow. The optical line width  $\Delta \lambda_{\circ}$  or  $\Delta f$  could be measured with (relatively simple) interferometers and expressed as longitudinal coherence length d in free space  $c_0$ :

$$d \approx \frac{c_0}{\Delta f} \approx \frac{\lambda_0^2}{\Delta \lambda_0} \tag{1}$$

Unfortunately the amplitude dynamic range of simple optical instruments was quite limited.

In the second half of the 20<sup>th</sup> century, both the frequency resolution of radio measurements as well as the amplitude dynamic range of optical measurements improved by several orders of magnitude. Both keep improving as the user requests keep increasing. Last but not least, the spectrum gap between radio and optics is shrinking as radio frequencies are increasing towards the terahertz region and optical wavelengths are increasing towards the far-infrared region.

One of the most important contributions is the derivation of the oscillator noise spectrum by David Leeson in 1966 [1]. The same derivation is applicable to (relatively low) radio-frequency electronic oscillators as well as to lasers. In electronics, high-performance oscillators are followed by buffer stages that may add their own noise. Electronic limiters may reduce the amplitude noise but they have no effect on the phase noise.

The design of a performing radio-frequency oscillator is complex. Besides basic radio-frequency design the knowledge of different noise contributions is required as well as the knowledge of feedback theory. Due to this complexity the Leeson's equation is frequently misunderstood, misused and even degraded to an "empirical" equation by some sources. The term phase noise only starts appearing in equipment specifications as well as in text books in the 21<sup>st</sup> century as it is becoming the limiting parameter for increasingly complex modulation schemes at ever increasing carrier frequencies.

## 2 Electronic oscillator

An electronic oscillator includes an amplifier with a voltage gain A and a feedback network with a voltage transfer function  $H(\omega)$ . The feedback network is usually a frequency-selective resonator to define the output spectrum of the oscillator:



Figure 1: Electronic oscillator.

For the circuit to oscillate, the Barkhausen criterion applies:

$$A \cdot H(\boldsymbol{\omega}_0) = 1 \tag{2}$$

The Barkhausen criterion is an equation with complex numbers defining both the phase and the magnitude of the feedback. The circuit can only oscillate at the frequency  $\omega_{g}$  where the feedback phase is zero or an integer multiple of  $2\pi$ . The amplifier should provide enough gainto start the oscillation. During steady oscillation, saturation will eventually decrease the amplifier gain A to satisfy the Barkhausen criterion.

Some feedback networks may generate complex results. A laser may oscillate at many different modes at the same time. Some electronic circuits may satisfy the Barkhausen criterion at zero frequency. Such circuits do not oscillate but act as bi-stables. A flip-flop intentionally driven into a meta-stable state will quickly settle into one of its two stable states.

Some form of noise is always present in all circuits. In electronic circuits operating in the radio-frequency range, the main contribution is thermal noise. No matter how small, noise will always significantly affect the output spectrum of an oscillator as shown later in the derivation of the Leeson's equation.

In the case of a class A amplifier, noise actually starts the oscillation:





With some excess gain, the oscillation amplitude will initially grow exponentially out of noise. As the oscillation amplitude increases, the amplifier will be driven into saturation. The excess gain shrinks and finally reaches the Barkhausen criterion during steady oscillation.

Some oscillators use a class C amplifier. Such oscillators can not start out of noise, but need a start pulse. Unfortunately, after reaching steady oscillation, class C amplifiers add even more noise than class A amplifiers. The gain in class C is lower, there is much less control over the device bias and due to the heavily non-linear operation, class C amplifiers efficiently up-convert lowfrequency noise to the desired oscillator frequency.

## 3 Leeson's equation

The Leeson's equation [1] describes how noise propagates through the circuit of an oscillator. The derivation below refers to Fig 1:

$$U_{Nout} = U_{Nin} + A \cdot H(\omega) \cdot U_{Nout}$$
(3)

can be rearranged to:

$$U_{Nout} = \frac{U_{Nin}}{1 - A \cdot H(\omega)} \tag{4}$$

A simple resonator with a lumped capacitor C and a lumped inductor L with losses R' provides the following transfer function of the feedback:

$$H(\omega) = \frac{R_{in}}{R_{in} + j\omega L + R' + \frac{1}{j\omega C} + R_{out}}$$
(5)

During steady oscillation the Barkhausen criterion simplifies the transfer function for small signal s  $U_{\text{Neut}} \ll U_0(\omega_0)$  compared to the carrier to:

$$A \cdot H(\omega) = \frac{\sum R}{\sum R + j\omega L + \frac{1}{j\omega C}}$$
(6)

where the sum of resistors denotes:

$$\sum R = R_{in} + R' + R_{out} \tag{7}$$

The transfer function can be further simplified by introducing the loaded quality  $Q_L$  of the resonator:

$$Q_L = \frac{\omega_0 L}{\sum R} \tag{8}$$

and the frequency offset from the carrier  $\omega_0$ :

$$\Delta \omega = \omega - \omega_0 = \omega - \frac{1}{\sqrt{LC}} \tag{9}$$

into:

$$A \cdot H(\omega) \approx \frac{1}{1 + j2Q_L \frac{\Delta \omega}{\omega_0}} \tag{10}$$

resulting in:

$$U_{Nout} \approx \frac{U_{Nin}}{1 - \frac{1}{1 + j2Q_L} \frac{\Delta \omega}{\omega_0}} \rightarrow (11)$$
$$\rightarrow U_{Nout} \approx U_{Nin} \cdot \left(1 + \frac{\omega_0}{j2Q_L \Delta \omega}\right)$$

Dealing with noise is easier with average signal powers  $P_j=\alpha |U_j|^2$  rather than voltages. The resulting propagation of noise power is:

$$P_{Nout} \approx P_{Nin} \cdot \left[ 1 + \left( \frac{\omega_0}{2Q_L \Delta \omega} \right)^2 \right]$$
(12)

In engineering it is also preferred to replace angular frequencies  $\omega_i = 2\pi f_i$  with ordinary frequencies:

$$P_{Nout} \approx P_{Nin} \cdot \left[ 1 + \left( \frac{f_0}{2Q_L \Delta f} \right)^2 \right]$$
(13)

All derivations in this paper are made considering just one side-band of the symmetrical noise spectrum on both sides of the carrier  $U_0(\omega_0)$  or  $U_0(f_0)$ . If a single sideband is observed, there is no distinction between amplitude noise and phase noise.

When both upper and lower side-bands are summed, the resulting noise signal has both an in-phase component and a quadrature component with respect to the carrier. Due to the random nature of noise, both the in-phase component and the quadrature component are of equal magnitude. The in-phase component adds a random amplitude modulation to the carrier, also called amplitude noise. The quadrature component adds a random phase modulation to the carrier, also called phase noise.

The original Leeson's derivation [1] as well as many other theoretical papers include both noise side-bands, frequently denoted as  $S(\omega)$  or S(f). On the other hand, single side-band noise is required in many practical calculations. Care should be taken since both side bands have twice the power of a single side band.

The oscillator noise includes both amplitude noise and phase noise. Both have equal power:

$$P_{NA} = P_{N\phi} = \frac{P_{Nout}}{2} \approx \frac{P_{Nin}}{2} \cdot \left[1 + \left(\frac{f_0}{2Q_L \Delta f}\right)^2\right] \quad (14)$$

Since the amplitude noise  $P_{NA}$  can be removed easily with an electronic limiter, only the phase-noise power  $P_{N\phi}$  is interesting.

In electronics, noise is usually referred to the input of an amplifier although it can only be measured on its output. Therefore for compatibility all quantities onare referred to the amplifier input. The thermal-noise spectral density  $dP_{Nin}/df$  at the amplifier input is equal to the sum of the temperatures of all noise sources multiplied by the Boltzmann constant  $k_{\rm B} \approx 1.38 \cdot 10^{-23}$  J/K:

$$\frac{dP_{Nin}}{df} = k_B \cdot \sum T_j = k_B \cdot \left(T_R + T_A\right) \tag{15}$$

The resonator temperature  $T_R \gg T_0 = 290$  K may be much higher than the reference temperature in the case of resonators using active circuits. The noise temperature of a passive resonator is usually close to the reference (room) temperature  $T_R \approx T_0 = 290$  K. In this case the thermal-noise spectral density can be rewritten using the amplifier noise figure *F* (in linear units!):

$$\frac{dP_{Nin}}{df} \approx k_B \cdot T_0 \cdot F \tag{16}$$

Note that the amplifier noise figure F will be higher in saturation (steady oscillation) than in linear operation!

The phase-noise spectral density of an oscillator becomes:

$$\frac{dP_{N\phi}}{df} = \frac{1}{2} \cdot \left[ 1 + \left( \frac{f_0}{2Q_L \Delta f} \right)^2 \right] \cdot k_B T_0 F \tag{17}$$

Since the oscillator output is amplified, limited and/or attenuated, the important quantity is the phase-noise spectral density relative to the oscillator output power  $P_0$ :

$$L\left(\Delta f\right) = \frac{1}{P_0} \cdot \frac{dP_{N\phi}}{df} \tag{18}$$

The relative phase-noise spectral density is denoted by the symbol  $L(\Delta f)$  and has units  $[Hz^{-1}]$  in the Leeson's equation:

$$L(\Delta f) = \left[1 + \left(\frac{f_0}{2Q_L \Delta f}\right)^2\right] \cdot \frac{k_B T_0 F}{2P_0}$$
(19)

Due to the extremely wide dynamic range of  $L(\Delta f)$  it is common to use logarithmic units, namely decibels relative to the carrier per unit bandwidth or [dBc/Hz]:

$$L(\Delta f)_{[dBc/Hz]} = 10\log_{10} \left[ L(\Delta f) \cdot 1Hz \right]$$
(20)

Unfortunately many popular sources like [2] forget to multiply  $L(\Delta f)$  in linear units with the unit bandwidth 1 Hz, degrading the Leeson's equation to an empirical equation.

As an example, the spectrum of a typical oscillator is computed on Fig. 3 using the Leeson's equation. The carrier power is selected as  $P_0 = 0.1 \text{ mW}$  typical at the input of a small-signal RF transistor. The noise figure degradation is comparable to the gain compression due to saturation, therefore F = 10 dB is a reasonable choice. The most important parameter of an oscillator, the loaded quality of the resonator is selected  $Q_L = 10$  corresponding to a varactor-tuned microstrip resonator at  $f_0 = 3 \text{ GHz}$ :



Figure 3: Oscillator spectrum.

The propagation of noise through an oscillator increases the phase noise close to the desired carrier well above the thermal noise. Since the two noise sidebands are symmetric, it makes sense to observe a single side band in detail using a logarithmic scale for the frequency offset  $\Delta f$  from the carrier as shown on Fig. 4:



Figure 4: SSB phase-noise spectrum.

At frequency offsets  $|\Delta f| > f_0/(2Q_L)$  larger than the Leeson's frequency, the oscillator has little effect on the noise spectral density. Other circuits like buffer amplifiers, limiters and/or attenuators add their own thermal

noise. If required, this thermal noise can easily be filtered away using resonators with a similar  $Q_L$  as used in the oscillator itself.

At frequency offsets  $|\Delta f| \le f_0 / (2Q_L)$  smaller than the Leeson's frequency, the predominant noise is the oscillator phase noise. Other circuits like amplifiers, limiters and/or attenuators have little effect on the phase-noise spectral density. The oscillator phase noise can NOT be filtered away using resonators with a similar  $Q_L$  as used in the oscillator itself.

Since the oscillator phase-noise is the interesting quantity, a simplified Leeson's equation neglecting thermal noise is frequently used:

$$L(\Delta f) \approx \left(\frac{f_0}{Q_L \Delta f}\right)^2 \cdot \frac{k_B T_0 F}{8P_0}$$
(21)

The result of the simplified Leeson's equation is shown as a dotted extension on Fig. 4. There is a significant difference from the full equation only at large offsets  $|\Delta f| > f_0/(2Q_L) \approx 150 \text{ MHz}$  in the example shown on Fig. 3 and Fig. 4.

The Leeson's equation was derived assuming that the noise amplitude  $U_{Nout} \ll U_0 (\omega_0)$  is much smaller than the desired-carrier amplitude. This assumption no longer holds at small offsets  $\Delta f$ . The Leeson's equation only holds when the relative phase-noise spectral density is much smaller than the  $L(\Delta f) \ll \Delta f^{-1}$  limit shown with a dotted line on Fif. 4. In practice, the result on Fig. 4 is only valid at offsets above  $|\Delta f| > 1$  kHz.

The relative phase-noise density at very small offsets  $\Delta f$  is usually not very important in practical electronic oscillators. It is much more important in laser oscillators. A corrected derivation of the Leeson's equation for very small offsets  $\Delta f$  will be presented later.

## 4 Effects of phase noise

Phase noise was first noted as residual frequency modulation in analog radio links. The unwanted random frequency deviation (root-mean-square value) can be calculated as:

$$\sigma_{f} = \sqrt{2 \int_{f_{MIN}}^{f_{MAX}} \Delta f^{2} L(\Delta f) d\Delta f}$$
(22)

The frequency limits  $f_{MIN}$  and  $f_{MAX}$  of the integral are the band limits of the analog base-band modulation signal.

In QAM radio links, phase noise randomly rotates the constellation of the modulation. The unwanted random angle of rotation (root-mean-square value) can be calculated as:

$$\sigma_{\phi} = \sqrt{2 \int_{B_{carrier-recovery}}^{B_{modulation}} L(\Delta f) d\Delta f}$$
(23)

Any phase noise above  $\Delta f > B_{modulation}$  is filtered away by the channel filter in the receiver. Further it is assumed that the carrier-recovery circuit of the receiver is able to track slow frequency and/or phase changes below  $\Delta f < B_{carier-recovery}$ .

In digital communications, phase noise manifests itself as clock jitter. The unwanted clock jitter (root-meansquare value) can be calculated as:

$$\sigma_{t} = \frac{\sigma_{\phi}}{\omega_{0}} = \frac{1}{2\pi f_{0}} \sqrt{2 \int_{B_{clock-recovery}}^{f_{MAX}} L(\Delta f) d\Delta f}$$
(24)

Limiting the bandwidth of the clock, the upper limit  $f_{MAX} < f_0$  is less than the clock frequency. Further it is assumed that the clock-recovery circuit of the receiver is able to track slow frequency and/or phase changes below  $\Delta f < B_{clock-recovery}$ .

Finally in all radio communications, phase noise causes interference to neighbor channels. The interference power can be calculated as:

$$P_{i} = P_{0} \cdot \int_{\Delta f_{i}}^{\Delta f_{2}} L\left(\Delta f\right) d\Delta f$$
<sup>(25)</sup>

The frequency limits  $\Delta f_1$  and  $\Delta f_2$  of the integral are the frequency offsets of the interfered channel from the interfering carrier  $P_0(f_0)$ .

Note that all of the above-mentioned integrals start from an offset  $\Delta f > 0$  larger than zero. Radio equipment is usually designed to work with relatively clean sources where the phase-noise power  $P_{N\phi} \ll P_0$  is much smaller than the carrier power and the Leeson's equation is valid thanks to  $L(\Delta f) \ll \Delta f^{-1}$  in the region of interest.

## 5 Active-device noise

Besides thermal noise, active devices also add flicker noise to the amplified signal. Flicker noise is usually described as an increase of the radio-frequency noise figure F into a frequency-dependent noise figure F'(f):

$$F'(f) = F \cdot \left(1 + \frac{f_C}{f}\right) \tag{26}$$

The parameter describing flicker noise is the corner frequency  $f_{c^*}$  The latter depends on the device technology [3]. In general, surface devices have higher current densities and more structure defects than bulk devices. Surface semiconductor devices like a silicon MOSFET, a GaAs MESFET or a GaAlAs HEMT may have the corner frequency in the range  $f_c \approx 1...10$  MHz. Bulk semiconductor devices like a silicon JFET may have the corner frequency in the range  $f_c \approx 1...10$  kHz.

Although a HEMT may produce slightly less noise at radio frequencies than a BJT, a HEMT is significantly noisier at low frequencies than a BJT as shown on Fig. 5:



Figure 5: Active device noise figure.

In an oscillator, the active device operates in saturation while producing steady oscillations. The nonlinear effects associated with saturation up-convert the lowfrequency flicker noise into noise side bands very close to the carrier radio frequency. High-performance radiofrequency (microwave) oscillators therefore use silicon bipolar transistors due to their lower flicker noise.

The additional up-converted flicker noise can be built into the Leeson's equation describing the increase the oscillator phase noise at small offsets  $|\Delta f| < f_{C}$ :

$$L(\Delta f) = \left[1 + \left(\frac{f_0}{2Q_L\Delta f}\right)^2\right] \cdot \frac{k_B T_0 F}{2P_0} \cdot \left(1 + \frac{f_C}{|\Delta f|}\right) \quad (27)$$

The phase noise of the same oscillator example as shown earlier including flicker noise is shown on Fig 6:



Figure 6: Phase noise including flicker noise.

Calculations including flicker noise may not be simple. Calculating the flicker-noise power  $P_N$  from equation (26):

$$P_N = \int_{f_{MIN}}^{f_{MAX}} k_B \cdot F \cdot \left(1 + \frac{f_C}{f}\right) df$$
(28)

may give an infinite result:

$$\lim_{f_{MIN}\to 0} \int_{f_{MIN}}^{f_{MAX}} k_B \cdot F \cdot \left(1 + \frac{f_C}{f}\right) df \to \infty$$
(29)

suggesting that further limitations apply to (26) at very low frequencies.

Further it is necessary to understand that the flickernoise corner frequency  $f_c$  in equation (26) is different from the  $f_c$  in equation (27)! Between the two quantities there is a frequency conversion that may be more or less efficient depending on parameters that are NOT described by the Leeson's equation!

The phase noise of an oscillator depends heavily on the bias and DC decoupling circuits. Since the impedance parameters  $[Z_{ij}]$  of a bipolar transistor depend mainly on the DC currents through the device, the currents through the RF amplifier transistor have to be regulated as constant as possible with a bias circuit like that on Fig. 7 [4]. Keeping the impedance parameters  $[Z_{ij}]$  constant attenuates the up-conversion of low-frequency flicker noise to the RF carrier frequency:



Figure 7: Oscillator bias circuit.

Flicker noise is not the only concern while designing the bias network of an oscillator. Reactive components like RF chokes (inductors) may introduce additional unwanted modes of the resonator  $H(\omega)$ . Therefore resistors  $R_s$  and  $R_6$  are usually used to apply the DC bias in oscillators.

Besides the RF feedback there is yet another feedback circuit built into every electronic oscillator. Gain reduction at saturation during steady oscillation is governed by this additional feedback (bottom graph on Fig. 2). A poorly-designed bias network will make this lowfrequency feedback unstable causing self quenching of the oscillator. While self quenching may simplify a super-regenerative receiver compared to the original Armstrong design [9], it has a catastrophic effect on the oscillator spectrum.

The gain-reduction feedback already has one pole due to the RF energy stored in the resonator  $H(\omega)$ , rectified by the nonlinear effects of the saturation of the active device and added to the DC bias of the latter. Additional poles are added by the RF bypass capacitors  $C_1$  and  $C_2$  and by the DC-bias decoupling capacitors  $C_3$  and  $C_4$ . Unless the component values on Fig. 7 are selected carefully, the oscillator will be self-quenching. Even if the oscillator is not self-quenching, a poor phase margin of the bias feedback may cause a significant increase of the oscillator phase noise.

If varactors are used to tune the oscillator (VCO) [6], the phase noise is degraded further. First, varactors decrease the  $Q_{\rm L}$  of the resonator due to their series resistance. Second, the tuning voltage may introduce additional noise. Even the noise voltage introduced by the resistors acting as RF chokes to tune the varactors is not insignificant.

## 6 Spectral-line width

The Leeson's equation (19) is unable to describe the frequency spectrum of an oscillator very close to its central frequency  $\omega_0$  or  $f_0$  when the condition  $L(\Delta f) \ll \Delta f^{-1}$  is no longer fulfilled. Although there are several comprehensive papers on this topic like [5], [6], a simplified derivation is given here.

Analyzing Fig. 1, the feedback gain has to be slightly less than unity during steady oscillation, since some noise is being added all of the time. Accordingly, the original Barkhausen criterion (2) has to be modified to:

$$A \cdot H(\boldsymbol{\omega}_0) = 1 - \epsilon \tag{30}$$

where the gain decrease is described by the very small, but non-zero quantity  $0 \le \epsilon \le 1$ . The feedback transfer function (10) is modified to:

$$A \cdot H(\omega) = \frac{1 - \epsilon}{1 + j2Q_L \frac{\Delta\omega}{\omega_0}}$$
(31)

resulting in equation (11) extended to:

$$U_{Nout} = \frac{U_{Nin}}{1 - A \cdot H(\omega)} \approx \frac{U_{Nin}}{1 - \frac{1 - \epsilon}{1 + j2Q_L} \frac{\Delta \omega}{\omega_0}} \rightarrow$$

$$\rightarrow U_{Nout} \approx U_{Nin} \cdot \frac{1 + j2Q_L \frac{\Delta \omega}{\omega_0}}{j2Q_L \frac{\Delta \omega}{\omega_0} - \epsilon}$$
(32)

At frequency offsets  $|\Delta f| > f_0/(2Q_L)$  larger than the Leeson's frequency, the oscillator has little effect on the noise while other circuits add their own noise. It therefore makes sense to evaluate (32) at small offsets  $|\Delta f| < f_0/(2Q_L)$  only. Considering  $|j2Q_L\Delta\omega_0/\omega_0| < 1$ , equation (32) simplifies to:

$$U_{Nout} \approx \frac{U_{Nin}}{j2Q_L} \frac{\Delta \omega}{\omega_0} - \epsilon$$
(33)

Replacing noise voltages with average powers, replacing angular frequencies with ordinary frequencies and considering the phase noise only:

$$P_{N\phi} = \frac{P_{Nout}}{2} \approx \frac{P_{Nin} / 2}{\left(2Q_L \frac{\Delta f}{f_0}\right)^2 + \epsilon^2}$$
(34)

Introducing the thermal-noise spectral density (15) or (16) and the spectral-line half width:

$$f_{HW} = \frac{\epsilon f_0}{2Q_L} \tag{35}$$

the simplified Leeson's equation (21) evolves into a Lorentzian spectral line:

$$L\left(\Delta f\right) = \left(\frac{f_0}{Q_L}\right)^2 \cdot \frac{1}{\Delta f^2 + f_{HW}^2} \cdot \frac{k_B T_0 F}{8P_0} \tag{36}$$

The missing quantities  $f_{HW}$  or  $\epsilon$  can be calculated by summing the whole relative spectrum power considering  $\Delta f = f - f_0$ :

$$\int_{-f_0}^{\infty} L(\Delta f) d\Delta f = 1$$
(37)

In all practical cases the integral start may be replaced by  $-\infty$ , the error being smaller than neglecting faraway thermal noise:

$$\int_{-\infty}^{\infty} \left(\frac{f_0}{Q_L}\right)^2 \cdot \frac{1}{\Delta f^2 + f_{HW}^2} \cdot \frac{k_B T_0 F}{8P_0} d\Delta f =$$

$$= \left(\frac{f_0}{Q_L}\right)^2 \cdot \frac{k_B T_0 F}{8P_0} \left[\frac{1}{f_{HW}} \cdot \arctan\frac{\Delta f}{f_{HW}}\right]_{\Delta f = -\infty}^{\Delta f = \infty} = (38)$$

$$= \left(\frac{f_0}{Q_L}\right)^2 \cdot \frac{k_B T_0 F}{8P_0} \cdot \frac{\pi}{f_{HW}} \approx 1$$

The spectral-line half width is obtained as:

$$f_{HW} \approx \pi \cdot \left(\frac{f_0}{Q_L}\right)^2 \cdot \frac{k_B T_0 F}{8P_0}$$
(39)

The small correction of the Barkhausen criterion is:

$$\epsilon \approx \frac{\pi f_0 k_B T_0 F}{4 Q_I P_0} \tag{40}$$

Analyzing the same oscillator example with  $f_0 = 3$  GHz,  $Q_L = 10, P_0 = 0.1$  mW and F = 10 dB as on Fig. 3 and

Fig. 4, a spectral-line half width of  $f_{HW} \approx 14$  Hz is obtained. The corresponding correction of the Barkhausen criterion is small indeed  $\epsilon \approx 10^{-7}$ .

One side band of the calculated spectrum  $L(\Delta f)$  (solid line) is compared to the original Leeson's equation (dotted extensions) on Fig. 8 in logarithmic scale:



Figure 8: Lorentzian spectral line.

The result of the original Leeson's equation is plotted with a dotted line on the same graph as well as the  $\Delta f^{-1}$ limit. Note that at small offsets the spectrum  $L(\Delta f)$  flattens thus avoiding the  $\Delta f^{-1}$  limit.

Besides thermal noise, additional noise like flicker noise further broadens the spectral line. The calculation is more difficult since the low-frequency flicker-noise spectrum is not up-converted by a single carrier frequency but by the oscillator signal itself with non-zero spectral width.

In most cases the spectral-line half width remains much narrower  $f_{HW} \ll B_{recovery}$  than the carrier or clock recovery circuits in radio equipment. In all these frequent cases the result of the original Leeson's equation is sufficient.

## 7 Delay-line oscillators

The most important parameter in the Leeson's equation is the loaded quality  $Q_L$  of the resonator. Unfortunately electrical resonators in the radio-frequency range do not achieve very high values of  $Q_L$ . Mechanical resonators like quartz crystals are frequently used in highperformance radio oscillators. Electrical resonators may achieve very high values of  $Q_L$  in the optical-frequency range. Lasers may produce relatively very narrow spectral lines. Unfortunately dividing optical frequencies down to radio frequencies is not practical yet.

Delay lines may act as resonators in oscillator circuits. Their equivalent  $Q_{ID}$  is directly proportional to the de-

lay  $\tau_D$  and increases linearly with frequency:

$$Q_{LD} = \pi f_0 \tau_D \tag{41}$$

Unfortunately delay lines may fulfill the Barkhausen criterion (2) at many different frequencies causing a laser to oscillate on many different modes. Lasers may use frequency-selective mirrors or gain medium to decrease the number of modes.

A similar approach may be used to design radio-frequency oscillators using either acoustic (BAW or SAW) delay lines or opto-electronic delay lines [7]. The latter look promising due to the low loss and wide bandwidth of optical fibers. The basic design of an opto-electronic oscillator is shown on Fig. 9. The desired mode of oscillation is selected by an additional electric (microwave) resonator:



Figure 9: Opto-electronic oscillator.

The Barkhausen criterion (2) can be rewritten for the circuit on Fig. 9 as:

$$A_{1} \cdot H_{R}\left(\boldsymbol{\omega}_{0}\right) \cdot A_{2} \cdot H_{D}\left(\boldsymbol{\omega}_{0}\right) = 1$$

$$\tag{42}$$

If the electric resonator is tuned precisely to the desired mode of the delay line, the voltage transfer function of the latter can be written as:

$$H_{D}(\omega) = a \cdot e^{-j\Delta\omega\tau_{D}}$$
(43)

For small signals and small offsets:

$$A_{1} \cdot H_{R}(\omega) \cdot A_{2} \cdot H_{D}(\omega) = \frac{e^{-j\Delta\omega\tau_{D}}}{1+j2Q_{LR}\frac{\Delta\omega}{\omega_{0}}}$$
(44)

The noise-voltage transfer function becomes:

$$U_{Nout} \approx \frac{U_{Nin}}{1 - \frac{e^{-j\Delta\omega\tau_D}}{1 + j2Q_{LR}}\frac{\Delta\omega}{\omega_0}}$$
(45)

The corresponding phase-noise average power is:

$$P_{N\phi} \approx \frac{P_{Nin}}{2} \cdot \left| 1 - \frac{e^{-j\Delta\omega\tau_D}}{1 + j2Q_{LR}\frac{\Delta\omega}{\omega_0}} \right|^{-2}$$
(46)

Finally the extended Leeson's equation for the optoelectronic oscillator shown on Fig. 9 becomes:

$$L(\Delta f) = \frac{k_B \sum T_j}{2P_0} \cdot \left| 1 - \frac{e^{-j2\pi\Delta f \tau_D}}{1 + j2Q_{LR}\frac{\Delta f}{f_0}} \right|^{-2}$$
(47)

The largest contribution to  $\Sigma T_j$  comes from the optoelectronic delay line that may include flicker noise:

$$\sum T_{j} \approx T_{D} \cdot \left(1 + \frac{f_{C}}{|\Delta f|}\right)$$
(48)

In an opto-electronic oscillator as on Fig. 9 the most vulnerable point in the circuit is the photo-diode output. Here the signal power  $P_0$  is the lowest and the relative phase-noise spectral density  $L(\Delta f)$  is calculated. Saturation will likely be achieved in  $A_2$  since optical modulators require substantial amounts of RF drive power. The output  $L(\Delta f)_{out}$  is taken after all amplification and filtering:

$$L(\Delta f)_{out} \approx \frac{L(\Delta f)}{1 + \left(2Q_{LR}\frac{\Delta f}{f_0}\right)^2}$$
(49)

The analytical result for  $L(\Delta f)_{out}$  is fitted to the welldocumented experimental data from [8]. The latter describes a microwave  $f_0 = 3$  GHz opto-electronic oscillator with the delay line made from  $l\approx 15$ km of optical fiber resulting in a delay of  $\tau_D \approx 75 \,\mu s$  corresponding to a  $Q_{LD} \approx 7 \cdot 10^5$ . Mode selection is performed by an additional microwave dielectric resonator with the  $Q_{LR} \approx 8300$ .

The opto-electronic delay line noise temperature may be rather high due to several reasons: relative intensity noise (RIN) of the laser, optical reflections including Rayleigh scattering converting optical phase noise into amplitude noise and inefficient broadband impedance matching of the photodiode. The opto-electronic delay line noise temperature is found as expected around  $T_D \approx 2 \cdot 10^5$  K. What really matters is the ratio  $T_D/P_0$  and the latter can be measured conveniently at the output of a PIN-FET module.

Flicker noise comes at least in part from the built-in HEMT amplifiers. Due to the required relatively high

electronic gain, several amplifier stages are connected in series. If broadband amplifiers are used, flicker noise may originate in the first stage, it is amplified by the intermediate stage and it is up-converted by the last stage. Due to the high noise contribution from the opto-electronic delay line, the overall flicker-noise corner frequency is found around  $f_c \approx 5$  kHz.

The fitted analytical result for  $L(\Delta f)_{out}$  on Fig. 10 shows the unwanted side modes at the correct frequencies. However, the peak magnitudes of the unwanted modes are about 15 dB stronger than the measured values. This may be due to an insufficient resolution of the phase-noise test setup:



Figure 10: Simulated OEO phase noise.

The well-documented experimental data from [8] additionally includes results with a Q-multiplier circuit. The latter increases the loaded quality of the microwave mode-selection filter to about  $Q_{LR} \approx 75000$  thus improving the rejection of unwanted modes. Since a Q multiplier is an active filter, the system noise temperature increases to about  $T_D \approx 5 \cdot 10^5$  K.

The fitted analytical result for  $L(\Delta f)_{out}$  including the Q multiplier is shown on Fig. 11. The unwanted-mode magnitudes are reduced and their line widths are broader. Both frequencies and magnitudes are very close to the measured values in [8]:



Figure 11: Simulated OEO with Q multiplier.

Finally, a parabolic approximation of the close-in response of a single microwave resonator suggests that the unwanted mode rejection is proportional to  $(Q_{LR})^4$ . For a Q-multiplication factor  $m \approx 8$  as described in [8], the unwanted-mode rejection improvement is expected as  $10\log_{10} m^4 \approx 36$  dB. The difference between Fig. 10 and Fig. 11, corrected for the change in  $T_{D'}$  comes much closer to this value than the measured data published in [8], again suggesting an insufficient resolution of the phase-noise test setup.

#### 8 Avoiding UV & DC catastrophes

When natural laws are extended from a few laboratory measurements up to the whole frequency spectrum, problems usually arise at both extremes: when the frequency approaches infinity  $f \rightarrow \infty$  and when the frequency approaches zero  $f \rightarrow 0$ . One of the most famous problems in physics was the ultraviolet catastrophe predicted from the Rayleigh-Jeans law for black-body thermal radiation [10], suggesting infinite radiated power. The more accurate Planck's law solved the problem a few years later.

The same problem also applies to phase noise. What happens with the relative phase noise density at both extremes  $L(\Delta f \rightarrow \infty)$  (UV catastrophe) and  $L(\Delta f \rightarrow 0)$  (DC catastrophe)? The answer is not simple since  $L(\Delta f)$  may achieve very differing shapes and magnitudes. To the best of my knowledge, the limitations of different equations for phase noise that may produce non-physical results are identified by introducing the  $\Delta f^{-1}$  limit for the first time in this article.

The Leeson's equation for electronic oscillators is usually derived from the Johnson noise. In electronics, the Johnson-noise spectral density is usually considered frequency-independent as shown in equation (15), since it is derived form the Rayleigh-Jeans law. At room temperatures, the Rayleigh-Jeans law becomes inaccurate at infrared frequencies. At cryogenic temperatures, the Rayleigh-Jeans law becomes inaccurate already at microwave-radio frequencies.

If equation (15) is rewritten to include the complete Planck's law, the resulting thermal-noise spectral density also depends on the Planck constant  $h\approx 6.626 \cdot 10^{-34}$  Js:

$$\frac{dP_{N}}{df} \left[ \frac{W}{Hz} \right] = \frac{hf}{e^{\frac{hf}{k_{B}T}} - 1}$$
(50)

The Johnson noise is just an approximation for low frequencies:

$$\frac{dP_N}{df} \left( hf \ll k_B T \right) \approx \lim_{f \to 0} \left( \frac{hf}{e^{\frac{hf}{k_B T}} - 1} \right) = k_B T \tag{51}$$

The complete equation should be considered at frequencies above  $f \approx k_B T/h \approx 6$  THz at a room temperature  $T \approx 290$  K. The electronic noise decays even sooner since the gain bandwidth of electronic devices is about three orders of magnitude smaller. Therefore there are at least two valid and independent reasons to avoid the UV catastrophe.

Flicker noise is usually modeled as 1/f noise in equation (26). The latter suggests an infinite amount of power (DC catastrophe) even in a simple amplifier without feedback (29). If the spectral noise density at very low frequencies or the total noise power is required, a better model than 1/f should be used for flicker noise. In order to separate different effects, flicker noise will not be considered in the following discussion.

Another DC catastrophe may originate in the simple derivation of the Leeson's equation for an oscillator (19) or (21). At small offsets  $\Delta f \rightarrow 0$ , the noise power is no longer small compared to the carrier power. A complete derivation of the spectral line (36) avoids this DC catastrophe.

In order to explain different effects, the same result from Fig. 8 is plotted on much broader scales on Fig. 12. On the latter, the frequency spans from bi-weekly  $10^{-6}$  Hz up to soft X rays  $10^{18}$  Hz. The amplitude range spans an incredible 350 dB:



Figure 12: Catastrophes explained.

The  $\Delta f^{-1}$  limit corresponds to an infinite amount of power over the whole spectrum. Considering both side-bands of a single octave  $f \leq \Delta f \leq 2 f$ , the  $\Delta f^{-1}$  limit produces a finite amount, just slightly too much relative noise power:

$$\frac{P_N}{P_0} = 2 \int_{f}^{2f} \Delta f^{-1} d\Delta f = 2\ln 2 \approx 1.386$$
 (52)

In order to comply with equation (37), the relative phase-noise spectral density  $L(\Delta f)$  may approach the  $\Delta f^{-1}$  limit over less than an octave and drop to zero elsewhere else.

A Lorentzian spectral line approaches the  $\Delta f^{-1}$  limit to -8 dB at a frequency offset  $\Delta f = f_{HW}$ ( $\approx 14$  Hz) (39). At smaller offsets  $\Delta f \ll f_{HW}$  the Lorentzian spectral line is flat with frequency  $L(\Delta f) \approx \alpha$ . At larger offsets  $\Delta f \gg f_{HW}$  the Lorentzian spectral line decays as  $L(\Delta f) \approx \alpha \Delta f^{-2}$  with increasing offset. At both smaller and larger offsets  $\Delta f$ , the Lorentzian spectral line diverges far below the  $\Delta f^{-1}$  limit thus avoiding both UV and DC catastrophes.

The result of the Leeson's equation (dotted line) matches the Lorentzian spectral line (solid line) over the usuallyinteresting offset range and stays well below the  $\Delta f^{-1}$ limit. At very small offsets  $\Delta f \rightarrow 0$ , the Leeson's result grows as  $L(\Delta f) \approx \alpha \cdot \Delta f^{-2}$  with decreasing offset, eventually exceeding the  $\Delta f^{-1}$  limit and causing a DC catastrophe. At very large offsets  $\Delta f \rightarrow \infty$ , the Leeson's result is flat with frequency  $L(\Delta f) \approx \alpha$ , eventually exceeding the  $\Delta f^{-1}$  limit and causing an UV catastrophe.

As long as the relative phase-noise spectral density  $L(\Delta f)$  is a monotonically decreasing function, it should avoid the  $\Delta f^{-1}$  limit in a similar way to the Lorentzian spectral line. More complex spectra  $L(\Delta f)$  like that shown on Fig. 10 may even exceed the  $\Delta f^{-1}$  limit over very narrow off-

set ranges (much less than an octave) without causing catastrophes analyzing a likely useless oscillator.

In any case, comparing the magnitude and slope of  $L(\Delta f)$  to the  $\Delta f^{-1}$  limit quickly tells whether a certain equation for  $L(\Delta f)$  with certain parameters provides useful results or not over the desired offset range. The ratio  $L(\Delta f)/\Delta f^{-1} = \Delta f \cdot L(\Delta f)$  tells whether the phase noise at the specified offset  $\Delta f$  is much smaller or comparable to the whole signal power.

## 9 Conclusions

The Leeson's equation for relative phase-noise spectral density is frequently misunderstood and misused even in commercial simulation software. Therefore a complete derivation is made first to understand the limitations of the different forms of the same equation. While derivations produce results in linear units [Hz<sup>-1</sup>], logarithmic units [dBc/Hz] (20) are used elsewhere including the graphs in this article.

The complete Leeson's equation (19) is frequently simplified to (21), since wide-band thermal noise originates elsewhere and not just in the oscillator.

Flicker noise is usually built in the Leeson's equation like (27), but its exact magnitude actually depends on factors not included in the Leeson's equation, like the design of active-device bias networks. Last but not least, the simple 1/f approximation of flicker noise may produce non-physical, infinite results in some cases.

The original Leeson's derivation is valid for small noise signals only. The result is only valid in the offset range when  $L(\Delta f) \ll \Delta f^{-1}$ . When  $L(\Delta f)$  approaches or even exceeds the  $\Delta f^{-1}$  limit, non-physical results are usually obtained. In the latter case a complete derivation of the oscillator spectrum has to be performed including the shape of the main spectral line of non-zero width. Flat thermal noise produces a Lorentzian spectrum (36).

Finally, the Leeson's equation is extended to delay-line oscillators and in particular to opto-electronic oscillators. The extended equation (47) is fitted to experimental data showing potential problems of the latter.

As a conclusion of all of the above findings, an electronic oscillator is just a Q multiplier amplifying and filtering its own noise. The Q-multiplication factor is very large  $m \approx \epsilon^{-1}$  resulting in a very small, but non-zero spectralline half width  $f_{HW} > 0$ . Besides bandwidth differences of many orders of magnitude, an electronic oscillator produces a similar signal to the spark radio transmitter or filtered white light in optics.

## 10 Conflict of Interest

The author declares no conflict of interest.

The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.

### 11 References

- 1. D. B. Leeson, "A Simple Model of Feedback Oscillator Noise Spectrum", *Proceedings of the IEEE 54 (2)*, February 1966, pp. 329–330, https://doi.org/10.1109/PROC.1966.4682
- 2. Wikipedia, "Leeson's equation" https:// en.wikipedia.org/wiki/Leeson%27s\_equation [Accessed: 08-Feb-2021]
- Wikipedia, "Flicker noise" https://en.wikipedia. org/wiki/Flicker\_noise [Accessed: 05-Apr-2021]
- M. Vidmar, "TV Satellite Receive System, Part 2: Indoor Unit", VHF COMMUNICATIONS 1/87, pp. 35-56, ISSN 0177-7505
- R. Poore, "Overview on Phase Noise and Jitter", Agilent Technologies, 2001, http:// cp.literature. agilent.com/litweb/pdf/5990-3108EN.pdf [Accessed: 01-May-2013]
- F. Herzel, "An Analytical Model for the Power Spectral Density of a Voltage-Controlled Oscillator and Its Analogy to the Laser Linewidth Theory", IEEE Transactions on Circuits and Systems – I: Fundamental Theory and Applications, vol. 45, pp. 904–908, Sept. 1998.
- E. Rubiola, "The Leeson Effect Phase Noise in Quasilinear Oscillators", https://arxiv.org/abs/ physics/0502143v1 [Accessed: 17-Feb-2021]
- L. Bogataj, M. Vidmar, B. Batagelj, "Opto-Electronic Oscillator With Quality Multiplier", IEEE Transactions on Microwave Theory and Techniques, January 2016, 64(2):1-6, <u>https://doi.org/10.1109/TMTT.2015.2511755</u>
- 9. E. H. Armstrong, "Signaling System", US patent 1424065, July 25, 1922.
- 10. Wikipedia, "Ultraviolet catastrophe" https:// en.wikipedia.org/wiki/Ultraviolet\_catastrophe [Accessed: 24-Apr-2021]



Copyright © 2021 by the Authors. This is an open access article distributed under the Creative Com-

mons Attribution (CC BY) License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Arrived: 18. 02. 2021 Accepted: 20. 05. 2021



Journal of Microelectronics, Electronic Components and Materials Vol. 51, No. 2(2021), 147 – 147

# MIDEM 2021

## 56<sup>th</sup> INTERNATIONAL CONFERENCE ON MICROELECTRONICS, DEVICES AND MATERIALS WITH THE WORKSHOP ON PERSONAL SENSOR FOR REMOTE HEALTH CARE MONITORING

September 22<sup>nd</sup> – September 24<sup>th</sup>, 2021 Faculty of Electrical Engineering, Ljubljana, Slovenia

Announcement and Call for Papers

#### Chairs:

Prof. dr. Janez Trontelj (UL FE) Doc. Dr. Aleksander Sešek (UL FE)

#### **IMPORTANT DATES**

Abstract submission deadline: May 1, 2021

> Acceptance notification: June 15, 2021

Full paper submission deadline: July 31, 2021

Invited and accepted papers will be published in the Conference Proceedings.

Deatailed and updated information about the MIDEM Conferences, as well as for paper preparation can be found on

http://www.midem-drustvo.si/

#### **GENERAL INFORMATION**

The 56<sup>th</sup> International Conference on Microelectronics, Devices and Materials with the Workshop on Personal Sensor for Remote Health Care Monitoring continues a successful tradition of the annual international conferences organized by the MIDEM Society, the Society for Microelectronics, Electronic Components and Materials. The conference will be held at **Faculty of Electrical Engineering, Ljubljana, Slovenia** from **SEPTEMBER 22<sup>nd</sup> – September 24<sup>th</sup>, 2021**.

#### Topics of interest include but are not limited to:

- <u>Workshop focus</u>: Personal Sensor for Remote Health Care Monitoring,
- Novel monolithic and hybrid circuit processing techniques,
- New device and circuit design,
- Process and device modelling,
- Semiconductor physics,
- Sensors and actuators,
- Electromechanical devices, microsystems and nanosystems,
- Nanoelectronics,
- Optoelectronics,
- Photovoltaic devices,
- Electronic materials science and technology,
- New electronic materials and applications,
- Materials characterization techniques,
- Reliability and failure analysis,
- Education in microelectronics, devices and materials.

#### **ORGANIZER:**

MIDEM Society - Society for Microelectronics, Electronic Components and Materials, Slovenia

**CONFERENCE SPONSORS:** UL FE, UL FS, IJS, IMAPS, Slovenia Chapter; IEEE, Slovenia Section



# Boards of MIDEM Society | Organi društva MIDEM

## MIDEM Executive Board | Izvršilni odbor MIDEM

**President of the MIDEM Society | Predsednik društva MIDEM** Prof. Dr. Barbara Malič, Jožef Stefan Institute, Ljubljana, Slovenia

Vice-presidents | Podpredsednika

Prof. Dr. Janez Krč, UL, Faculty of Electrical Engineering, Ljubljana, Slovenia Dr. Iztok Šorli, Mikroiks d.o.o., Ljubljana, Slovenia

#### Secretary | Tajnik

Olga Zakrajšek, UL, Faculty of Electrical Engineering, Ljubljana, Slovenia

#### MIDEM Executive Board Members | Člani izvršilnega odbora MIDEM

Prof. Dr. Slavko Bernik, Jožef Stefan Institute, Slovenia Izr. Prof. Dr. Miha Čekada, Jožef Stefan Institute, Ljubljana, Slovenia Prof. DDr. Denis Đonlagić, UM, Faculty of Electrical Engineering and Computer Science, Maribor, Slovenia Prof. Dr. Leszek J. Golonka, Technical University, Wroclaw, Poljska Prof. Dr. Vera Gradišnik, Tehnički fakultet Sveučilišta u Rijeci, Rijeka, Croatia Mag. Leopold Knez, Iskra TELA, d.d., Ljubljana, Slovenia Mag. Mitja Koprivšek, ETI Elektroelementi, Izlake, Slovenia Doc. Dr. Gregor Primc, Jožef Stefan Institute, Ljubljana, Slovenia Doc. Dr. Janez Trontelj, UL, Faculty of Electrical Engineering, Ljubljana, Slovenia Dr. Danilo Vrtačnik, UL, Faculty of Electrical Engineering, Ljubljana, Slovenia

## Supervisory Board | Nadzorni odbor

Prof. Dr. Franc Smole, UL, Faculty of Electrical Engineering, Ljubljana, Slovenia Prof. Dr. Drago Strle, UL, Faculty of Electrical Engineering, Ljubljana, Slovenia Igor Pompe, retired

Court of honour | Častno razsodišče

Darko Belavič, Jožef Stefan Institute, Ljubljana, Slovenia Dr. Miloš Komac, retired Dr. Hana Uršič Nemevšek, Jožef Stefan Institute, Ljubljana, Slovenia

Informacije MIDEM Journal of Microelectronics, Electronic Components and Materials ISSN 0352-9045

Publisher / Založnik: MIDEM Society / Društvo MIDEM Society for Microelectronics, Electronic Components and Materials, Ljubljana, Slovenia Strokovno društvo za mikroelektroniko, elektronske sestavne dele in materiale, Ljubljana, Slovenija

www.midem-drustvo.si