ISSN 0352-9045
Informacije ( MIDEM
Journal of Microelectronics, Electronic Components and Materials Vol. 45, No. 4 (2015), December 2015
Revija za mikroelektroniko, elektronske sestavne dele in materiale letnik 45, številka 4 (2015), December 2015
II
UDK 621.3:(53+54+621+66)(05)(497.1)=00
ISSN 0352-9045
Informacije MIDEM 4-2015
Journal of Microelectronics, Electronic Components and Materials
VOLUME 45, NO. 4(156), LJUBLJANA, DECEMBER 2015 | LETNIK 45, NO. 4(156), LJUBLJANA, DECEMBER 2015
Published quarterly (March, June, September, December) by Society for Microelectronics, Electronic Components and Materials - MIDEM. Copyright © 2015. All rights reserved. | Revija izhaja trimesečno (marec, junij, september, december). Izdaja Strokovno društvo za mikroelektroniko, elektronske sestavne dele in materiale - Društvo MIDEM. Copyright © 2015. Vse pravice pridržane.
Editor in Chief | Glavni in odgovorni urednik
Marko Topič, University of Ljubljana (UL), Faculty of Electrical Engineering, Slovenia
Editor of Electronic Edition | Urednik elektronske izdaje
Kristijan Brecl, UL, Faculty of Electrical Engineering, Slovenia
Associate Editors | Odgovorni področni uredniki
Vanja Ambrožič, UL, Faculty of Electrical Engineering, Slovenia Slavko Amon, UL, Faculty of Electrical Engineering, Slovenia
Danjela Kuščer Hrovatin, Jožef Stefan Institute, Slovenia Matjaž Vidmar, UL, Faculty of Electrical Engineering, Slovenia Andrej Žemva, UL, Faculty of Electrical Engineering, Slovenia
Editorial Board | Uredniški odbor
Mohamed Akil, ESIEE PARIS, France Giuseppe Buja, University of Padova, Italy Gian-Franco Dalla Betta, University of Trento, Italy Martyn Fice, University College London, United Kingdom Ciprian Iliescu, Institute of Bioengineering and Nanotechnology, A*STAR, Singapore Malgorzata Jakubowska, Warsaw University of Technology, Poland Marc Lethiecq, University of Tours, France Teresa Orlowska-Kowalska, Wroclaw University of Technology, Poland Luca Palmieri, University of Padova, Italy
International Advisory Board | Časopisni svet
Janez Trontelj, UL, Faculty of Electrical Engineering, Slovenia - Chairman Cor Claeys, IMEC, Leuven, Belgium Denis Donlagic, University of Maribor, Faculty of Elec. Eng. and Computer Science, Slovenia Zvonko Fazarinc, CIS, Stanford University, Stanford, USA Leszek J. Golonka, Technical University Wroclaw, Wroclaw, Poland Jean-Marie Haussonne, EIC-LUSAC, Octeville, France Barbara Malič, Jožef Stefan Institute, Slovenia Miran Mozetič, Jožef Stefan Institute, Slovenia Stane Pejovnik, UL, Faculty of Chemistry and Chemical Technology, Slovenia Giorgio Pignatel, University of Perugia, Italy Giovanni Soncini, University of Trento, Trento, Italy Iztok Šorli, MIKROIKS d.o.o., Ljubljana, Slovenia Hong Wang, Xi'an Jiaotong University, China
Headquarters | Naslov uredništva
Uredništvo Informacije MIDEM MIDEM pri MIKROIKS Stegne 11, 1521 Ljubljana, Slovenia T. +386 (0)1 513 37 68 F. + 386 (0)1 513 37 71 E. info@midem-drustvo.si www.midem-drustvo.si
Annual subscription rate is 100 EUR, separate issue is 25 EUR. MIDEM members and Society sponsors receive current issues for free. Scientific Council for Technical Sciences of Slovenian Research Agency has recognized Informacije MIDEM as scientific Journal for microelectronics, electronic components and materials. Publishing of the Journal is cofinanced by Slovenian Book Agency and by Society sponsors. Scientific and professional papers published in the journal are indexed and abstracted in COBISS and INSPEC databases. The Journal is indexed by ISI® for Sci Search®, Research Alert® and Material Science Citation Index™. |
Letna naročnina je 100 EUR, cena posamezne številke pa 25 EUR. Člani in sponzorji MIDEM prejemajo posamezne številke brezplačno. Znanstveni svet za tehnične vede je podal pozitivno mnenje o reviji kot znanstveno-strokovni reviji za mikroelektroniko, elektronske sestavne dele in materiale. Izdajo revije sofinancirajo JAKRS in sponzorji društva. Znanstveno-strokovne prispevke objavljene v Informacijah MIDEM zajemamo v podatkovne baze COBISS in INSPEC. Prispevke iz revije zajema ISI® v naslednje svoje produkte: Sci Search®, Research Alert® in Materials Science Citation Index™.
Po mnenju Ministrstva za informiranje št.23/300-92 se šteje glasilo Informacije MIDEM med proizvode informativnega značaja.
Design | Oblikovanje: Snežana Madic Lešnik; Printed by | tisk: Biro M, Ljubljana; Circulation | Naklada: 1000 issues | izvodov; Slovenia Taxe Percue | Poštnina plačana pri pošti 1102 Ljubljana
Informacije i midem
Journal of Microelectronics, Electronic Components and Materials vol. 45, No. 4 (2015)
Content | Vsebina
Original scientific paper	Izvirni znanstveni članki
A. R. Buzdar, L. Sun, A. Latif, A. Buzdar: 225 A. R. Buzdar, L. Sun, A. Latif, A. Buzdar: Instruction Decompressor Design	Ukazni dekodirnik za VLIW procesor
for a VLIW Processor
M. Malajner, D. Gleich, P. Planinsic: Angle of Arrival Estimation Algorithms Using
Received Signal Strength Indicator 237
M. R. Ghaderi Karkani, M. Kamarei, A. Fotowat Ahmady: Design of Low-Power Temperature Sensor 249 Architecture for Passive UHF RFID Tags
M. Malajner, D. Gleich, P. Planinšič:
Algoritmi za ocenjevanje kota prihoda RF signala z
uporabo indikatorja moči
M. R. Ghaderi Karkani, M. Kamarei, A. Fotowat Ahmady:
Dizajn arhitekture temperaturnega senzorja nizke moči za pasivne UHF RFID etikete
H. Uršič, M. Vrabelj, L. Fulanovič, A. Bradeško, S. Drnovšek, B. Malič: Specific Heat Capacity and Thermal Conductivity of the Electrocaloric (1-x)Pb(Mg1/3Nb2/3)O3-xPbTiO3 Ceramics Between Room Temperature and 300oC
H. Uršič, M. Vrabelj, L. Fulanovič, A. Bradeško, S. Drnovšek, B. Malič:
Specifična toplotna kapaciteta in toplotna 260 prevodnost elektrokalorične keramike (1-x)
Pb(Mg1/3Nb2/3)O3-xPbTiO3 v temperaturnem območju od sobne temperature do 300°C
B. Milinkovic, M. Milicevic, D. Simic,
G. Stojanovic, R. Duric: 266 Low-pass Filter for UWB System With the Circuit for Compensation of Process Induced On-chip Capacitor Variation
B. Milinkovic, M. Milicevic, D. Simic,
G. Stojanovic, R. Duric:
Nizko pasovni filter za UWB system z vezjem
za kompenzacijo procesno vzpodbujenega
spreminjanja integriranega kondenzatorja
Y. Bao, X. Wu, X. Xia, Y. Gao: High-Efficiency Negative Charge-Pump Circuit for WLED Backlights
277
Y. Bao, X. Wu, X. Xia, Y. Gao:
Visoko učinkovito vezje negativne črpalke naboja
za WLED osvetljevanje ozadja
Slovene Science Awards 2015 284 Najvišja priznanja v slovenski znanosti v letu 2015
Announcement and Call for Papers: 52nd International Conference on Microelectronics, Devices and Materials With the Workshop on Biosensors and Microfluidics
285 Napoved in vabilo k udeležbi:
52. Mednarodna konferenca o mikroelektroniki, napravah in materialih z delavnico o biosenzorjih in mikrofluidiki
Front page: Puh award winner (photo: Marjan Smrke)
Naslovnica:
Nagrajenci Puhovega priznanja (foto: Marjan Smrke)
223
Editorial | Uvodnik
Dear Reader,
This issue brings 6 original scientific papers. A focus of the last (4th) issue used to be on state-of-the-art papers by invited speakers at the MIDEM Conference that we organize in late September every year. Although the 51st MIDEM Conference under the Chairmanship of Professor Janez Trontelj was a big success with the highlight on Terahertz and Microwave Systems Workshop, distinguished invited speakers could not commit themselves to write a full paper for our journal. The world is spinning too fast and everyone faces with lack of time. Conference attendees may be happy to have a privilege to listen to their inspiring talks.
Year 2015 is fading out and this editorial should reveal some statistics about manuscripts. In 2015 we have received more than 140 manuscripts, out of which only 12 have been accepted for publication and more than 100 manuscript were rejected due to too low quality or being out of scope. Despite clearly defined title of our journal and on-line instructions for authors we receive each year a dozen of manuscripts that are out of our journal's scope. In 2015 we published 27 original scientific papers and last 4 Professional Articles. The success rate below 20% in 2015 reflects desire for quality that will path long-term quality growth. I would like to sincerely thank all reviewers and Editorial Board Members for their valuable contribution to the journal quality growth.
In 2015 we have worked hard to switch to an on-line submission and review process of manuscripts. Thanks to Dr. Kristijan Brecl and Dr. Matija Pirc who mastered the Open Journal Systems and prepared it for our journal, we have successfully passed the testing phase in autumn and got ready to start with the on-line submission on 1st Jan 2016. We expect that the system will enable faster review times and higher satisfaction of authors.
December is time for recognition and celebration. Also in Slovenian science arena. We are happy to congratulate the Associate Editor Dr. Danjela Kuscer Hrovatin and her teammates (see the cover page photo) to be honoured with the Puch Award for innovation and successful transfer of cordierite ceramic with low thermal expansion coefficient into mass production.
Let the festive days bring joy and peace in each home, office or research laboratory. It is the time to look ahead and make ambitious plans for the coming year. This brings me to editorial wishes for 2016. As a part of your success we look forward to receiving your next manuscript(s) on our submission page (http://ojs.midem-drustvo.si/).
Merry Christmas and a Happy and Prosperous New Year!
Prof. Marko Topič Editor-in-Chief
P.S.
All papers published in Informacije MIDEM - Journal of Microelectronics, Electronics Components and Materials (since 1986) can be access electronically for free at http://journal.midem-drustvo.si/. A search engine is provided to use it as a valuable resource for referencing previous published work and to give credit to the results achieved from other groups.
224
Original scientific paper
/midem
Journal of M
Informacije |
Journal of Microelectronics, Electronic Components and Materials Vol. 45, No. 4 (2015), 225 - 236
Instruction Decompressor Design for a VLIW Processor
Abdul Rehman Buzdar, Liguo Sun, Azhar Latif, Abdullah Buzdar
University of Science and Technology of China (USTC), Department of Electronic Engineering and Information Science, Hefei, Peoples Republic of China
Abstract: The FlexCore processor is a wide instruction word processor, which allows the control of datapath elements at a very precise level. The FlexCore scheme offers full control over the architecture and helps to improve the overall performance. As the memory is very expensive in embedded systems both in terms of power and area, to gain the full advantages of long instruction word of the FlexCore we need to use the memory footprint very efficiently. To remedy this the instructions in the FlexCore processor memory are stored as application-specific, compressed instruction format (AS-ISA) which is then converted on-the-fly to a native, decompressed instruction format (N-ISA) by an instruction decompressor. This paper deals with the implementation of the instruction decompressor and the analysis of compression and decompression schemes used in the FlexCore processor. The instruction decompressor is designed and implemented in VHDL and synthesized using Cadence RTL compiler into three different process technologies 130-nm, 90-nm, and 65-mn provided by the STMicroelectronics. The synthesis results show that the design and implementation of instruction decompressor greatly impacts the performance of FlexCore in terms of power, area and timing. We show the impact of different parameters of compression scheme used for the implementation of instruction decompressor in hardware which was previously shown in software. These parameters include the formation of lookup table (LUT) groups, the size of LUTs and the LUT-Load instruction Interval meaning how often the LUTs needs to be updated and how many LUTs are updated through a single LUT-Load instruction.
Keywords: FlexSoC; FlexCore; VLIW Processor; Instruction Decompressor; LUT; ASIC
Ukazni dekodirnik za VLIWprocesor
Izvleček: Procesor FlexCore je procesor z zelo dolgo ukazno besedo, ki omogoča kontrolo poti elementov z visoko natančnostjo. Shema FlexCore pmogoča popolni nadzor nad arhitekturo in omogoča izboljšavo delovanja. Za doseganje vseh prednosti dolge ukazne besede in visoke cene pomnilnika je potrebno spomin učinkovito izrabiti. Ukazi so v spominu FlexCore procesorja shranjeni kot aplikacijsko specifični in stisnjeni v AS-ISA formatu. Dekodiranje v N-ISA format poteka v ukaznem dekodirniku. Ukazni dekodirnik, opisan v članku, je realiziran v treh tehnologijah (130 nm, 90 nm in 65 nm). Rezultati kažejo, da ima dizajn in implementacija velik vpliv na učinkovitost procesorja v luči moči, prostora in časa. Vplivi parametri so prikazani v strojni opremi. Ti parametri vključujejo tvorjenje skupin vpoglednih tabel (LUT), njihovo velikost in potreben interval njihovega osveževanja.
Ključne besede: FlexSoC; FlexCore; VLIW Procesor; Ukazni dekoder; LUT; ASIC
* Corresponding Author's e-mail: abdul.buzdar@alumni.chalmers.se
1 Introduction
There is an ever increasing demand for the electronic gadgets to have a wide range of applications ranging from multimedia to video games and the list of demands is increasing day by day. To efficiently manage all these applications the electronic devices should have functionalities offered by general purpose processors and must also be efficient in terms of both power and area. This is a demanding task, to run the applications which are compute-intensive, one has to use specialized hardware accelerators or dedicated application-specific processing units which are controlled by
microprocessors [1-4], such as an ARM core [5], placed on a single chip. The memory management is also very critical for embedded systems both in terms of cost and area. To accommodate these hardware accelerators the I/O activity and memory usage has to be kept down. The approach of adding hardware accelerators in this way does not cater the rapidly changing depends of users, so we need to have an architecture which offers the efficiency of an ASIC and flexibility of a programmable platform. The demand for the embedded systems to have higher performance and more functionality makes general purpose processors unsuitable for
216
© MIDEM Society
M. R. Ghaderi Karkani et al; Informacije Midem, Vol. 45, No. 4 (2015), 249 - 259
them. The higher functionalities offered by the general purpose processor comes with a cost of higher power dissipation which will result in shorter battery life and increased weight in the form of cooling parts. To gain the required performance in the embedded systems with low power and small area, using heterogeneous system-on-chips is one of the options [6-9]. The heterogeneous SoCs uses some special purpose hardware blocks, which are controlled by one or more embedded microprocessors. One of the major drawbacks of heterogeneous SoCs is their high non-recurring engineering (NRE) costs.
Application-Specific Instruction-set Processors (ASIPs) [10-14] try to combine the flexibility of programmable processors and the efficiency offered by the customized integrated circuitry. The ASIPs are generally constructed by adding specialized hardware blocks to programmable processor cores. The instruction set of ASIPs consists of some general instructions to gain the advantage of general purpose processors and some application-specific instructions to gain the efficiency of specialized hardware. This scheme makes it easy to add specialized hardware blocks in the existing datapath and subsequently add application-specific instructions. By modifying the application software running on ASIPs, late design alterations can be accommodated easily, enabling flexible and high performance SoCs. This makes it possible to adopt a hardware-software co-design methodology, in which the conventional software design flow can be adopted. The major drawback of ASIPs is that as the addition of new instructions make them prone to binary incompatibility issues between various hardware implementations.
Figure 1: Overview of FlexCore processor
The FlexCore processor [15-20] which is based on the concept of the FlexSoC, is an attempt to integrate the efficiency of an ASIC (or special-purpose hardware) and the flexibility or programmability of general purpose
processors. The FlexCore integrates all the functional units in a homogenous way to take the advantage of traditional general purpose processors, shown in Fig. 1. The specialized hardware blocks are added into the datapath of general purpose processor to gain the benefits of conventional five stage pipelined processors. The FlexCore processor does not have a standard instruction set architecture (ISA) like that offered by conventional general purpose processors, in which the ISA is used to control the pipeline stages of the processor at various clock cycles. The FlexCore is a wide control word processor which controls the datapath at a much finer grained level than conventional processors. The FlexCore processors wide control word takes a single cycle to control the whole datapath. The datapath units of a FlexCore processor consist of conventional five-stage processor components and some specialized hardware blocks. The wide control word of the FlexCore processor contains all the signals to every datapath unit and the interconnecting structure. The use of a wide control word gives full control of underlying hardware to the programmer/compiler, resulting in increased performance, which lacks in the conventional instruction set architecture (ISA) approach. The previous research on datapath [23-27] has shown to improve the efficiency due to increased controllability.
2 FlexCore processor Architecture
The Baseline FlexCore processor [15-20] without any hardware accelerators and datapath units connected in their minimum configuration, act as a single issue five-stage pipelined processor e.g. similar to the Hennessy-Patterson 32-bit DLX [21] and MIPS R2000 [22] . This feature of the FlexCore makes it possible to execute the application code of a general purpose processor as efficiently as a single issue five-stage processor. Unlike the conventional methods, the performance benefits in the FlexCore processor are gained through the use of hardware accelerators and the fine grained control of datapath units. Depending on the application requirements, the FlexCore processor can be easily extended with special-purpose hardware accelerators [29], [30]. The FlexCore processor has a native ISA (N-ISA), which is 91-bit wide, when no hardware accelerators are used. The N-ISA is capable of controlling the datapath units and interconnects at a very fine-grained level. The instructions in the memory of the FlexCore are stored as applications specific ISAs (AS-ISA), which are then converted on-the-fly to a native ISA (N-ISA) format, by a reconfigurable instruction decompressor.
The AS-ISA can be configured for a particular class of applications, those who have identical processing needs. The addition of new application needs only to
226
M. R. Ghaderi Karkani et al; Informacije Midem, Vol. 45, No. 4 (2015), 249 - 259
Figure 2: Baseline FlexCore processor
define a new ASISA, thus the N-ISA and the translation process would remain unchanged. This feature of defining a new AS-ISA offers a possibility for performance optimization for the compiler e.g. using the already available instruction sequence instead of expanding the N-ISA. Fig. 2 shows the datapath units used in a baseline FlexCore processor. It consist of a register file, arithmetic and logic unit (ALU), load/store unit and a program counter unit. All these datapath units are fully interconnected, meaning that the interconnect configuration can be changed for different application requirements during the design stage. The baseline FlexCore has many unused interconnect paths that may be removed later, which is one of the main reason for the FlexCore enhanced performance. The output of each datapath unit is connected to a data register, which acts as pipeline registers, so that the FlexCore can emulate the functionality of a general purpose processor. Since data can be routed to any place, different datapath pipeline schemes can be created. The flexCore processor can be extended with new hardware accelerators depending on an application requirements. The FlexCore processor was used to run fast Fourier transform (FFT) benchmark application. Since this algorithm makes extensive use of multiplication operations, the baseline FlexCore was extended with a 32-bit multiplier unit, shown in Fig. 3. The addition of a multiplier unit also affected the size of N-ISA with the addition of two 32-bit inputs, 64-bit output and an enable signal, became part of N-ISA. The N-ISA of multiplier extended Baseline FlexCore processor consists of 109-bit control signals.
The concept of the FlexCore N-ISA is very different from the conventional ISA approach, and in this way changes the abstraction level at which the compiler/ programmer manages the datapath and interconnect. The conventional ISA of a general purpose processor contains instructions like ADD, SUB etc. and the results of these instructions are stored on the register file. In case of a statically scheduled processor if the input operands are not yet available, the processor needs to be stalled and wait for the input operands. However in a
FlexCore Control														
{	\ I 1 \			| / 7 \			\ / 7 \			\ 7		I \	\ s/	\ 7
Register File			ALU			LS Unit			PC Unit			Multiplier Unit		
/ \	\ 7 7 \			\ / 7 \			\ 7 7 \			\ 7 x			A \/	\ 7
FlexCore Interconnect														
Figure 3: FlexCore Processor extended with Multiplier
dynamically scheduled processor the result of previous instructions, can be rerouted if it has been executed but not yet written on the register file. This technique makes the scheduling process simpler, but reduces the performance because of putting extra load on the register file. Instead of storing back every result unnecessarily on the register file, it can be routed directly to the instructions that needs it. The FlexCore compiler [33], [34] has complete control over the datapath units and interconnects for each clock cycle. For example while performing the multiplication operation the FlexCore compiler will set the control signals for the multiplier unit, when the input values for the multiplier are available at the right clock cycle and route the result of multiplication to the destination, where it is needed. This technique improves the overall performance of the system at the cost of complicating the scheduling process. In this way the compiler can freely route the data to any destination. This results in the minimum register file access as the data can be routed to the place where it is required, instead of storing it on a register file. Hence this technique saves power and improves performance.
3 Flexible datapath interconnect
The flexible interconnect of the baseline FlexCore processor [20] consists of a matrix switch, shown in Fig. 4. This means that there is a multiplexer connected to the inputs of each datapath unit, which can select any of the inputs coming from output ports of other datapath units. This maximum freedom of routing the data to any location, results in scheduling efficiency in contrast to a general purpose processor, where there are limited options for routing. This also helps the compiler to control the order of the pipeline stages and increase the efficiency of datapath units.
227
M. R. Ghaderi Karkani et al; Informacije Midem, Vol. 45, No. 4 (2015), 249 - 259
Figure 4: Illustration of FlexCore Datapath Interconnect
The FlexCore processor is statically scheduled, which means that the compiler knows in advance which interconnect paths will be used for a particular set of applications. This can help to save power and improve performance by removing the unused interconnect paths based on the application profiling at design time. To make sure that the FlexCore can emulate the functionality of a general purpose processor, those interconnect paths which are necessary for the FlexCore to act as a general purpose processor, are not removed. The research on the FlexCore flexible interconnect, shows that the performance improves when just a few paths are added beyond the GPP case and almost half of the interconnect paths are never used by a particular set of applications executed. So these unused paths are removed physically at design time, without any impact on the performance and the number of cycles needed to execute a set of applications.
4 The FlexSoC framework
A lot of work has been done on the FlexSoC framework, since this project has started. The FlexSoC framework [33], [34] consists of a compiler, simulator and a hardware generator, shown in Fig. 5.
4.1	Compiler
The input to the compiler is the MIPS assembly which is produced by a MIPS cross-compiler. The EEMBC [28] benchmarks have been used to produce MIPS assembly and then compile it using FlexSoC compiler. The output of the compiler is Register Transfer Notations (RTN) format instructions. These RTN format instructions are statically scheduled and are used to exploit the inherent parallelism of the FlexCore processor. These instructions later can be used to compare the performance of FlexCore with a general purpose processor.
4.2	Simulator
A cycle accurate simulator is implemented in Haskel and is capable of simulating both the FlexCore and MIPS assembly. This feature of simulator helps to trace bugs in the compiler and measure its performance. The simulator is capable of giving simulation cycle count,
Figure 5: Illustration of FlexSoC Framework
profiling and simulation trace statistics with accuracy. As the FlexCore processor is flexible in terms of both its datapath units and their interconnections, this feature can be emulated in the simulator and the simulation of FlexCore processor can be done in different hardware configurations. The simulator can also be configured to a single issue five-stage processor to emulate a general purpose processor.
4.3 Hardware Generator
The FlexSoC hardware generator is capable of generating VHDL code for the FlexCore processor in different configurations, some of which have been implemented on FPGAs. The FlexSoC framework also has the capability of verifying the VHDL code generated and synthesis, place and route features have also been provided. It also gives information about area, timing and power usage.
228
M. R. Ghaderi Karkani et al; Informacije Midem, Vol. 45, No. 4 (2015), 249 - 259
5 Existing compression schemes
FlexCore is a wide instruction word processor, so to take the full advantage of the expressiveness found in its wide control word, the instructions are stored on the memory in compressed format. Let's take a brief look at the compression scheme used in the FlexCore processor. The main idea behind the encoding scheme [35] is the use of lookup tables (LUTs) to store the bit patterns, Shown in Fig. 6.
r—i r——i i
L<-:
i > i i i i -i
Figure 6: Illustration of Compression Scheme Implemented
The indexes of these LUTs are then combined to form the compressed instructions. The bit patterns are generated at compile-time based on the fact that some combination of bits in the control word of the FlexCore will not be used in some portions of the code being executed. The full advantage of the expressiveness found in the wide control word of the FlexCore processor is thus not utilized. This technique can be implemented in hardware with a simple logic and the sizes of the LUTs are also reasonably small. The contents of the LUTs can be changed using special instructions (LUT-Load instructions) and the bit patterns to be stored in the LUTs are sent through these Load instructions. The processor is stalled each time the contents of LUTs need to be changed, so the placement of the LUT-load instructions will affect the overall performance. The size of the LUTs will affect the compressed instruction size and the interval of LUT-Load instructions. The indices of the LUTs are combined to form the compressed instruction, and the size of the LUT decides the number of bits needed for each index. The main goal of this compression scheme is to utilize the expressiveness found in the wide control word of the FlexCore processor and to be able to store large programs, yet keeping the runtime costs low. The compression scheme [35] is also associated with a methodology for the partitioning of wide instruction stream that is, how many LUTs will be needed for a particular application and what should be
the size of each LUT. The NISC [23-27] project also proposes the use of LUTs for compression and decompression of long instruction word. It uses only one or two LUTs to store the entire program, making the LUT size very large. Therefore it is more suitable for implementing on FPGA, rather than on an ASIC platform.
6 Implementation of compression scheme
The compression scheme [35] is implemented in VHDL to study the impact of this scheme on the performance of the FlexCore processor in terms of area, timing and power requirements. Let's take a look at the specification of the instruction decoder implemented. The 71bit instruction stream is coming from the Cache of the FlexCore processor, as an input to the instruction decoder. The 71-bit instruction stream consists of 39 bits of instructions coming from I-Cache, and 32 bits of data coming from D-Cache. There are two types of instructions, shown in Fig. 7, one to load the LUTs with new content (Load instructions) and one used to send the already stored content of the LUTs to form the decompressed full 109-bit wide control word of the FlexCore (Normal instructions). The last bit of 71-bit wide compressed instruction is used to decide between the two types of instructions. One entry each of two LUTs can be loaded with a single LUT-Load instruction. The two instruction types consist of sub fields, shown below:
Load Instruction:
6-bit Index of LUTn, 6-bit Index of LUTm, Data of LUTn, Data of LUTm, Unused bits, 8 Ctrl bits, Load=1
Normal Instruction:
LUT1 address, LUT2 address, LUT3 address LUT8 address, 32-bit imm, Load=0
LUT#1 LUTS2 LUT#3 LUT#4 LUT#5 LUT#6 LUT#7 LUT#S	32 bit Immediate Load=0
Index Index Index Index Index Index Index Index
Index LUTn Index LUTm	Data LUTn	Data LUTm Unused 8 Ctrl bits Load=l
Figure 7: Illustration of Instruction Format used
The index of the LUT decides the depth of each LUT, with n-bit index the depth of the LUT would be 2n. Fig. 8 illustrates the implementation scheme of the FlexCore processor with the instruction decoder. The 109bit control word of the FlexCore is divided into eight groups and each group forms one LUT. These groups are formed using the FlexSize tools, which were developed for implementing the compression scheme [35].
229
M. R. Ghaderi Karkani et al; Informacije Midem, Vol. 45, No. 4 (2015), 249 - 259
Table 1: Specification of LUTs Implemented
LUT Name	LUT Index Bits	LUT-Entry Data Bits
ALU Group	6	13
RFA Group	5	6
RFB Group	5	6
RFW Group	6	10
LS Group	4	13
BUF Group	4	10
PC Group	4	9
Mult Group	4	10
Table I shows the specification of eight LUTs used in the implementation of the instruction decompressor. Here the LUT-Entry Data Bits indicate the width of each LUT and index bits are the minimum bits required to access all entries of each LUT. The sum of all Index bits of each LUT group, 32 immediate bits and one bit for indicating the instruction type equals 71 bits, the total length of compressed instruction which is the input to the instruction decompressor i.e. :
6+5+5+6+4+4+4+4+32+1 = 71 bits
The output of the instruction decompressor is 109-bit wide control word of the Baseline FlexCore processor, which is formed by concatenating all the data bits from one entry each of eight LUTs and 32 immediate bits i.e. :
13+6+6+10+13+10+9+10+32 = 109 bits
The above mentioned LUT groups are formed using a methodology which is used for the partitioning of the wide instruction word into smaller groups and is associated with the compression scheme [35]. The method consists of four steps, the first step is the identification of bits that are highly correlated and should be placed in the same group. Later the groups formed are evaluated using a user-defined cost function. In our case the LUT-access time, compression ratio and energy efficiency forms the cost function. Here the energy efficiency means that to reduce the power dissipated by the instruction decompressor during the LUT-Load and Normal instructions.
7 Instruction decompressor
Fig. 9 shows the block diagram of instruction decompressor, it consists of a main unit and eight LUT units, which act as simple memory units. As the input to the instruction decompressor is 71-bit compressed instruction stream, which is divided into different sub fields internally in the main unit to control the eight LUT units.
Figure 8: FlexSoC scheme with Instruction decompressor
The 8-bit ctrl field of CTN-ISA is used to indicate which LUT unit to load, and one bit each of 8-bit ctrl field is connected to the Load signal of LUT units. The address bits coming through the CTN-ISA, are connected to each LUT unit address signal which is used to decide which LUT entry to load or to send the stored data out in case of Normal instructions.
Figure 9: Block Diagram of Instruction decompressor
230
M. R. Ghaderi Karkani et al; Informacije Midem, Vol. 45, No. 4 (2015), 249 - 259
Similarly data bits coming through CTNISA are connected to the Load-Data signals of each LUT unit, which carries the data to be loaded in to the specified LUT address. Fig. 10 shows the block diagram of a LUT unit, it consists of a DeMux which is used to select the LUT-Load Data based on the LUT-Address, to send it to the specified LUT entry. In case of an n-bit wide LUT, n flipflops for each LUT entry are used to store the LUT-Entry Data. A multiplexer is used to select which LUT-Entry Data is to be send to the output, based on the LUT-Address. Fig. 11 shows the input output pin configuration of the instruction decompressor. The pins on the left are the input pins and the pins on the right are the output pins. The detail of each pin is as follows:
-	Clk /Reset
As the name implies the Clk pin is used as an external clock to the instruction decompressor and the Reset pin is used to give global reset to the instruction decompressor.
-	CTN_ISA
This pin is used to get the 71-bit compressed instruction stream as an input into the instruction decompressor from the Cache of the FlexCore processor.
-	Immediate
This pin is used to output the 32-bit immediate data coming from the D-Cache of the FlexCore processor.
-	ALUgroup
This pin is used to output the 13-bit wide data from one of the entries of ALU LUT and contains the signals for ALU of the FlexCore processor.
-	RFgroupA
This pin is used to output the 6-bit wide data from one of the entries of RFA LUT and contains the signals for Register File of the FlexCore processor.
-	RFgroupB
This pin is used to output the 6-bit wide data from one of the entries of RFB LUT and contains the signals for Register File of the FlexCore processor.
-	RFgroupW
This pin is used to output the 10-bit wide data from one of the entries of RFW LUT and contains the signals for Register File of the FlexCore processor.
-	LSgroup
This pin is used to output the 13-bit wide data from one of the entries of LS LUT and contains the signals for Load Store Unit of the FlexCore processor.
-	BUFgroup
This pin is used to output the 10-bit wide data from one of the entries of BUF LUT and contains the signals for Interconnect and Buffer of the Flex-Core processor.
PCgroup
This pin is used to output the 9-bit wide data from one of the entries of PC LUT and contains the signals for PC unit of the FlexCore processor. MULTgroup
This pin is used to output the 10-bit wide data from one of the entries of Mult LUT and contains the signals for Multiplier unit of the FlexCore processor.
Figure 10: Block Diagram of a LUT Unit
The timing diagram in Fig. 12 shows what happens during a LUT-Load operation. When the load signal goes high, the data coming from the I-Cache is loaded into the specified entry of that LUT. One entry each of two LUTs can be loaded through one LUT-Load instruction. The LUT-Load instruction takes one cycle to load the data into the specified LUT entry.
Similarly, the timing diagram in Fig. 13 shows what happens during a Normal Instruction. When the load signal goes low the address of each LUT entry for eight LUTs is sent to corresponding LUTs and the data corresponding to each address is sent out on eight output pins. This operation takes a single cycle.
8 Implementation of instruction decompressor
After the VHDL implementation of the instruction decompressor, the next task was to synthesize the VHDL
231
M. R. Ghaderi Karkani et al; Informacije Midem, Vol. 45, No. 4 (2015), 249 - 259
	Clk	Immediate (31:0) ALUgroup (12:0)	-
			
		RFgroupA (5:0) RFgroupB (5:0) RFgroupW (9:0)	
	Reset		-
	CTNJSA (70:0)	LSgroup (12:0) BUFgroup (9:0) PCgroup (8:0) MULTgroup (9:0)	-
Figure 11: Instruction Decompressor Pinout
¡¡u			Uli—I M							
					jgj					
aga	Hi il'					iHinwi^				
gSggöggjjjlH								1	■¡^■■IK	
										
	■■BB-il MHHiMHH Hil^HllHH'l							HI Kl	Mini	
Figure 12: Timing Diagram of LUT-Load operation
Figure 13: Timing Diagram of Normal Instructions
description to a certain process technology using Cadence RTL compiler [31]. Three different process technologies 130-nm, 90-nm, and 65-mn were used for synthesis, provided by the STMicroelectronics [32]. But we present here only the synthesis results for 65-nm technology. The aim of synthesizing the VHDL description of the instruction decompressor is to study the impact of inclusion of the instruction decompressor into the FlexCore processor in terms of timing, area and power requirements. The reason for this study is that the instruction decompressor will greatly affect the overall performance of the FlexCore processor, because its purpose is to efficiently manage the memory footprint. The focus of this section would be to study the impact of lookup tables (LUTs), in terms of power usage which are used in implementing the instruction decompressor. Also study the effect of LUT-Load instruction Inter-
val, meaning how often the LUTs needs to be updated and how many LUTs are updated through a single LUT-Load instruction. After starting the RTL Compiler, some basic steps were performed such as setting up the library paths for 130-nm, 90-nm and 65-nm process technologies and linking the VHDL files required for synthesis. The RTL Compiler was instructed to assemble the VHDL files into an internal representation i.e. network of virtual gates using the elaboration command. The VHDL code of the instruction decompressor was found to be synthesizable with no errors. The next step in the synthesis process was to map the network of virtual gates to real hardware that is to the real standard cells provided by the STMicroelectronics. Initially no timing constraint was set. Also, a low computational effort was used to get some idea of the intrinsic timing behavior of the implementation, via Static Timing Analysis (STA). The worst-case delay and area of implementation were documented. The worst-case signal propagation path was found to be passing through RFgroupW LUT, because the size of this LUT is bigger than most of the other LUTs implemented for the instruction decompressor.
The clock frequency for the FlexCore processor was set to 400 MHz. The design was re-synthesized with the timing constraint of 2.5 ns and using medium effort. The worst-case delay and area of implementation were documented again for these specifications. This time the worst-case signal propagation path was found to be passing through ALUgroup LUT, since this LUT is width and length wise bigger than the other LUTs implemented for the instruction decompressor. Table II shows the timing and area results for the instruction decompressor. The worst-case delay value shows that the instruction decompressor can be synthesized with a more strict timing constraint.
Table 2: Timing and Area results
Timing Constraint (ps)	Synthesis Effort	Worst-case Delay (ps)	Estimated Area (^m2)
no	low	1053	44840
2500	medium	1240	44551
The power analysis of instruction decompressor was performed initially by assigning some switching probabilities on the primary data inputs using medium effort. Table III shows the power results with probability for high logic state on CTNISA=0.5, Reset=0.0 and toggling probability (ns) on CTNISA=0.02, Reset=0.0.
232
M. R. Ghaderi Karkani et al; Informacije Midem, Vol. 45, No. 4 (2015), 249 - 259
Table 3: Initial power results
Leakag Power(m	e W)		Total Power(mW)	Clk Net Power(mW)
1.694		11.086	12.781 1.511	
Later different test vectors were generated, by setting different LUT-Load intervals and the number of LUTs loaded through a single Load instruction. Three different set of test vectors were generated setting 60, 100 and 300 as LUT-Load instruction intervals, each set having a total of 20000 test vectors. Two variants of these three set of test vectors were also generated, first by setting one entry of a single LUT is loaded through one LUT-Load instruction and the other one by setting that one entry each of two LUTs is loaded through a single Load instruction.
Table 4: Signal Statistics for test vectors from TCF files
Test Vectors Type	Toggle Rate (toggles/ns) CTN-ISA	Toggle Rate (toggles/ns) NISA
Less Random	0.0858	0.0358
More Random	0.1808	0.1632
Later another set of test vectors was also generated keeping the same specifications as mentioned earlier, but this time the LUT-Load data fields and immediate field were generated, where as in the previous version of test vectors the LUT-Load data and immediate fields do not have much variations among the test vectors. The reason for generating these two set of test vectors is to get a better idea of power consumption of the instruction decompressor. The first version of the test vectors will be referred to as test vectors having less randomness and the later one as test vectors having more randomness in this document. The Table IV shows the signal statistics for test vectors, obtained from the Toggle Count Format (TCF) files.
Table 5: Power results with test vectors having less randomness
No. of LUTs Loaded	LUT-Load Instruction Interval	Leakage Power (mW)	Dynamic Power (mW)	Total Power (mW)
two	60	1.816	2.720	4.536
one	60	1.804	2.740	4.544
two	100	1.815	2.717	4.533
one	100	1.804	2.734	4.539
two	300	1.809	2.703	4.512
one	300	1.814	2.711	4.525
Tables V and VI shows the power results using 20000 test vectors having less and more randomness respec-
tively for the instruction decompressor.
Table 6: Power results with test vectors having more randomness
No. of LUTs Loaded	LUT-Load Instruction Interval	Leakage Power (mW)	Dynamic Power (mW)	Total Power (mW)
two	60	1.713	10.747	12.460
one	60	1.721	10.758	12.480
two	100	1.713	10.721	12.435
one	100	1.721	10.741	12.462
two	300	1.713	10.712	12.425
one	300	1.721	10.728	12.449
To compare the power dissipation of Normal and LUT Load instructions more precisely, two set of test vectors were generated each having 1000 test vectors, one set only contained Normal instructions while the other one only contained the LUT-Load instructions. Later power analysis was performed using these two set of test vectors. Table VII shows the power comparison of Normal and LUT-Load instructions using 1000 test vectors for the instruction decompressor.
Table 7: Power comparison of Normal and LUT-Load Instructions
Instruction Type	Leakage Power(mW)	Dynamic Power(mW)	Total Power(mW)
Normal	1.616	10.239	11.855
LUT-Load	1.702	10.435	12.138
Table VIII shows the synthesis results for the FlexCore processor with full interconnect configuration, synthesized with medium effort and timing constraint of 3 ns.
Table 8: Synthesis results of the FlexCore processor
Benchmark EEMBC-Telecom	No. of Instruc-	Cycle Count	Total Power (mW)	Estimated Area (|m2)
autcor	1399	16110	7.30	49527
fft	1730	136596	8.91	
viterb	1639	265291	7.80	
conven	1457	262039	7.45	
9 Discussion on synthesis results
The results of power analysis shows that the power consumption of the instruction decompressor slightly decreases with reducing the LUT-Load instruction interval, which is obvious because less switching would take
233
M. R. Ghaderi Karkani et al; Informacije Midem, Vol. 45, No. 4 (2015), 249 - 259
place. It means that, applications that will require less LUT reloads would consume less power, not by much. Another observation is that the power consumption with updating a single entry each of two LUTs is lower than with updating a single entry of one LUT through a single LUT-Load instruction. This is because more LUT-load instructions would be required for loading all the entries of eight LUTs, than with updating a single entry of one LUT through a single LUT-Load instruction. Fig. 14 shows the power results with 20000 test vectors having less randomness for the instruction decompressor, synthesized with medium effort and timing constraint of 2.5 ns using three different process technologies.
■	130nm
■	90nm
Figure 14: Power comparison of Instruction Decompressor
If we compare the power consumption of the instruction decompressor between the three different technologies, we can see that the power consumption is higher for 90nm than for 130nm technology, but the worst case delay and area is smaller for 90nm than for 130nm. As the timing constraint for both the technologies is same, the higher worst case delay and area for 130nm suggests that it should have higher power consumption than for 90nm technology, since it has to put more effort to meet this timing constraint which results in higher worst case delay and area. The technology files used for 90nm technology, can be a reason for these unexpected results. The LUT-Load instruction interval do not affect the power consumption of the instruction decompressor to a greater extent, which was shown previously in software [35] and this implementation confirms the idea in hardware. The major drawback of having more LUT-Load instructions is that the processor needs to be stalled each time the contents of a particular LUT is updated. So the LUT Load instruction interval must to be kept down for optimum performance. After observing the wide control word of the FlexCore processor, one can see that some combination of control bits e.g. (MULTA, MULTB, READ ADDR1 REG, READ ADDR2 REG) are most of the time zero and the compression scheme takes advantage of this fact. Also if we see the compressed instructions produced by the compression algorithm, most of the bits remain zero repeatedly, which can help to reduce power consumption because less switching would take place. If
we look at the power consumption of individual LUT groups, more power is being consumed by the LUT groups having large size, which is obvious. It will be a good idea to reduce the sizes of larger LUT groups and see its effect on the power consumption of instruction decompressor. The synthesis results for the Instruction decompressor were obtained using a timing constraint of 2.5 ns, but synthesis results for the FlexCore processor are obtained using a timing constraint of 3 ns, which are presented here as reference and the difference of timing constraint between the two designs will have an impact on the area and power results.
10 Conclusion
The aim of this research was to design an instruction decompressor for a very long instruction word (VLIW) processor to save the memory footprint based on an optimal compression scheme. The instruction decompressor is designed and implemented in VHDL and synthesized using Cadence RTL Compiler into three different process technologies 130-nm, 90-nm, and 65-mn provided by the STMicroelectronics. We have shown that various parameters of instruction decompressor greatly impacts the overall performance of FlexCore in hardware in terms of power, area and timing. These parameters includes the formation of LUT groups, the size of LUTs and the LUT-Load instruction Interval meaning how often the LUTs needs to be updated and how many LUTs are updated through a single LUT-Load instruction. It will be interesting to compare the average toggle rate on NISA for the test vectors which are used to compute the power results for the instruction decompressor, to the average toggle rate on NISA for the benchmark applications which are used to compute the power results for the FlexCore. It can give us a better idea about the power consumption of instruction decompressor. The instruction decompressor implemented needs to be verified, for this we need to have real traces of compressed instructions produced by the compression algorithm using various benchmark applications. After getting these real traces of compressed instructions the accurate power analysis of the instruction decompressor would be possible. Later it would be interesting to see the integration of instruction decompressor into the FlexCore processor and verify the whole design using some benchmark applications.
11 Acknowledgment
This work is partially supported by the Chinese Academic of Sciences (CAS) and The World Academy of Sciences (TWAS).
Power [nW]
234
M. R. Ghaderi Karkani et al; Informacije Midem, Vol. 45, No. 4 (2015), 249 - 259
12 References:
1.	V. Sklyarov, I. Skliarova, A. Rjabov, A. Sudnitson, "Zynq-based System for Extracting Sorted Subsets from Large Data Sets', Informacije MIDEM-Journal of Microelectronics, Electronic Components and Materials Vol. 45, No. 2 (2015), 142 - 152.
2.	Abdul Rehman Buzdar, Liguo Sun, Azhar Latif and Abdullah Buzdar, "Distance and Speed Measurements using FPGA and ASIC on a high data rate system" International Journal of Advanced Computer Science and Applications (IJACSA), 6(10), 2015, 273 - 282.
3.	J. Noguera, R.M. Badia, "HW/SW codesign techniques for dynamically reconfigurable architectures" IEEE Trans. Very Large Scale Integration (VLSI) Systems, vol. 10, no. 4, pp. 399-415, Aug. 2002.
4.	M. D. Edwards, et al., "Acceleration of software algorithms using hardware/software co-design techniques', J. Syst. Architecture, vol. 42, no. 9/10, pp.1997.
5.	ARM Inc., [Online]. Available: http://www.arm. com/.
6.	Yun Wu, J. Nunez-Yanez, R. Woods, D.S. Nikolo-poulos, "Power modelling and capping for heterogeneous ARM/FPGA SoCs',IEEE International Conference on Field-Programmable Technology (FPT), Dec. 2014, pp 231-234
7.	D. Gebhardt, Junbok You, K.S. Stevens, "Design of an Energy-Efficient Asynchronous NoC and Its Optimization Tools for Heterogeneous SoCs" IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, vol. 30, no. 9, pp. 1387-1399, Sept. 2011.
8.	M.D. Grammatikakis, A. Papagrigoriou, P. Petrakis, G. Kornaros, "Monitoring-Aware Virtual Platform Prototype of Heterogeneous NoC-Based Multi-core SoCs',IEEE International Conference on Digital System Design (DSD), Sept. 2013, pp 497-504
9.	B. Ristau, T. Limberg, G. Fettweis, "A Mapping Framework for Guided Design Space Exploration of Heterogeneous MP-SoCs",IEEE International Conference on Design, Automation and Test in Europe (DATE), March 2008, pp 780-783
10.	Wu-An Kuo , TingTing Hwang, A.C.-H. Wu, "A power-driven multiplication instruction-set design method for ASIPs" IEEE Trans. Very Large Scale Integration (VLSI) Systems, vol. 14, no. 1, pp. 81-85, Jan. 2006.
11.	Yosi Ben Asher, Irina Lipov, Vladislav Tartako-vsky, Dror Tiv, "Using Multi-op Instructions as a Way to Generate ASIPs with Optimized Pipeline Structure",IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM), May 2014, pp. 29.
12.	Hong Chinh Doan, H. Javaid, S. Parameswaran, "Using Multi-op Instructions as a Way to Generate ASIPs with Optimized Pipeline Structure', IEEE International Conference on Design, Automation and Test in Europe (DATE), March 2014, pp. 1-6
13.	M. Jacome and G. de Veciana "Lower bound on latency for VLiW ASIPs", Proc. of ACM/IEEE International Conference on Computer Aided Design (ICCAD), 1999
14.	K. Keutzer, Malik S., and A. R. Newton, "From ASIC to ASIP: The next design discontinuity," in Proc. Int. Conf. on Computer Design, 2002, pp. 84-90.
15.	M. Thuresson, M. Sjalander, M. Bjork, L. Svensson, P. Larsson-Edefors, and P. Stenstrom, "FlexCore: Utilizing exposed datapath control for efficient computing," Springer J. of Signal Processing Systems, vol. 57, no. 1, pp. 5-19, Oct. 2009.
16.	T. Schilling, M. Sjalander, and P. Larsson-Edefors, "Scheduling for an embedded architecture with a flexible datapath," in Proc. IEEE Computer Society Annual Symp. on VLSI, 2009, pp. 151-156.
17.	J. Hughes, K. Jeppson, P. Larsson-Edefors, M. Sheeran, P. Stenstrom, and L.J. Svensson, "Flex-SoC: Combining flexibility and efficiency in soc designs," in Proc. IEEE NorChip Conf., 2003.
18.	M. Sjalander, M. Larsson-Edefors, and M. Bjork, "A flexible datapath interconnect for embedded applications," in Proc IEEE Computer Society Annual Symp. on VLSI, 2007, pp. 15-20.
19.	U. Jalmbrant and E. der Hagopian, "Improved configurability with FlexSoC," Msc. thesis, Chalmers University of Technology, Mar. 2009.
20.	Tung Thanh Hoang, Ulf Jalmbrant, Erik der Hagopian, Kasyab P. Subramaniyan, Magnus Sjalander, and Per Larsson-Edefors, "Design Space Exploration for an Embedded Processor with Flexible Datapath Interconnect," in Proc. of IEEE Int. Conf. on Application-specific Systems, Architectures and Processors, 2010, pp. 55-62.
21.	J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative Approach, Elsevier Publisher Inc., 2007.
22.	MIPS Technologies Inc., [Online]. Available: http:// www.mips.com/.
23.	Bita Gorjiara and Daniel Gajski, "Custom Processor Design Using NISC: a Case-Study on DCT Algorithm," in Workshop on Embedded Systems for Real-Time Multimedia, September 2005.
24.	Bita Gorjiara, Mehrdad Reshadi, and Daniel Gajski, "Designing a Custom Architecture for DCT Using NISC Design Flow," in Asia and South Pacific Conference on Design Automation, 2006.
25.	B. Gorjiara, D. Gajski, "FPGA-friendly code compression for horizontal microcoded custom IPs," Proceedings of the 2007 ACM/SIGDA 15th inter-
235
M. R. Ghaderi Karkani et al; Informacije Midem, Vol. 45, No. 4 (2015), 249 - 259
national symposium on Field programmable gate arrays (ISFPGA), ACM Press 2007.
26.	M. Reshadi and D. Gajski, "A cycle-accurate compilation algorithm for custom pipelined datapaths," in Proc. 3rd IEEE/ACM/IFIP Int. Conf. on Hardware/ Software Codesign and System Synthesis, 2005, pp. 21-26.
27.	B. Gorjiara, M. Reshadi, and D. Gajski, "Merged dictionary code compression for FPGA implementation of custom microcoded PEs," ACM Trans. Reconfigurable Technol. Syst., vol. 1, pp. 11:1-11:21, Jun. 2008.
28.	Embedded Microprocessor Benchmark Consortium (EEMBC), [Online]. Available: http://www. eembc.org.
29.	Muhammad Waqar Azhar, Tung Thanh Hoang, and Per Larsson-Edefors, "Cyclic Redundancy Checking (CRC) Accelerator for the FlexCore Processor," in Proc. of EUROMICRO Conf. on Digital System Design, 2010, pp. 675-680.
30.	Muhammad Waqar Azhar, Magnus Sjalander, Hasan Ali, Akshay Vijayashekar, Tung Thanh Hoang, K. K. Ansari, and Per Larsson-Edefors, "Viterbi Accelerator for Embedded Processor Datapaths," in Proc. of IEEE Int. Conf. on Application-specific Systems, Architectures and Processors, 2012.
31.	Cadence EDA Tools. [Online]. Available: www.cadence.com/en/default.aspx
32.	STMicroelectronics. [Online]. Available: www. st.com/web/en/home.html
33.	Kasyab P. Subramaniyan, Erik Ryman, Magnus Sjalander, Tung Thanh Hoang, Mafijul Md Islam, and Per Larsson-Edefors, "FlexDEF: A Toolchain Framework for Processor Development," in Proc. of IEEE Conf. on Ph.D. Research in Microelectronics and Electronics, 2011, pp. 37-40.
34.	Erik Ryman, Kasyab P. Subramaniyan, Tung Thanh Hoang, Mafijul Md Islam, Magnus Sjalander, and Per Larsson-Edefors, "FlexTools: Design Space Exploration Tool Chain from C to Physical Implementation," in Proc. of the Fifth Annual Cadence User Conf., 2010.
35.	M. Thuresson, M. Sjalander, L. Svensson, and P. Stenstrom, "A flexible code compression scheme using partitioned look-up tables," in Proc. Int. Conf. on High Performance Embedded Architectures and Compilers, 2009, pp. 95-109.
Arrived: 26. 05. 2015 Accepted: 24. 12. 2015
236
Original scientific paper
Informacije
Journal of Microelectronics, Electronic Components and Materials Vol. 45, No. 4 (2015), 237 - 248
Angle of Arrival estimation algorithms using
Received Signal Strength Indicator
Marko Malajner, Dušan Gleich, Peter Planinšič
University of Maribor, Faculty of Electrical Engineering and Computer Science, Maribor, Slovenia
Abstract: Angle of Arrival (AoA) is one of few techniques in the localization of a wireless sensor network. With two measured angles and with known distance between anchor the position of unknown node can be obtained. This paper deals with our approach of AoA measurement using a combination of more omnidirectional antennas on low-cost ZigBee modules which enable measurements of Received Strength Signal Indicator (RSSI). The used omnidirectional microstrip antenna has almost a symmetrical radiation pattern with sharp minimums along the x antenna axis. Therefore, an algorithm based on an approach where an angle of arrival is obtained along a direction where the measured RSSI is minimal. This paper presents on our approach proposed methods, algorithms and comparison of them.
Keywords: Angle of Arrival, Localization, Wireless Sensor Networks, RSSI
Algoritmi za ocenjevanje kota prihoda RF signala
Izvleček: Kot prihoda signala je ena izmed lokalizacijskih tehnik, ki se uporabljajo v brezžičnih senzorskih omrežjih. Z dvema izmerjenima kotoma in z znano razdaljo med dvema svetilnikoma lahko določimo pozicijo neznanega vozlišča. V tem članku bomo opisali naš princip ocenjevanja kota prihoda s pomočjo kombinacije večih neusmerjenih anten. Uporabili smo cenovno ugodne ZigBee module, ki omogočajo merjenje sprejete moči signala ter iz tega izluščili kot prihoda. Antena, vgrajena na modulih, je praktično neusmerjena, z dokaj simetričnim sevalnim diagramom, z dvema ostrima robovoma vzdolž x osi antene. Kot prihoda signala je v smeri, v kateri smo izmerili minimalne vrednosti sprejete moči. V tem članku bomo opisali nekaj različnih metod in pripadajočih algoritmov, ki smo jih razvili ter naredili primerjavo.
Ključne besede: Kot prihoda RF signala, Lokalizacija, Brezžična senzorska omrežja, RSSI
* Corresponding Author's e-mail: marko.malajner@um.si
1 Introduction
Two physical values are primarily measured, separately or in combination, when positioning wireless sensor nodes: (i) time of radio frequency (RF) signal flight
Localization is a crucial mechanism for obtaining the locations of data source or sink within wireless sensor networks (WSN). WSN consists of small so-called wireless sensor-nodes (further simple denoted as sensors or nodes), equipped with environmental sensing devices, power sources, radio and processor units. Sensors can communicate with each other or with the base-station. In many cases the sensors are randomly deployed on fields where locations information not exist in advance. Some localization systems must be performed on sensors' fields for locating each sensor.
(ToF) and (ii) the power of the received RF signal. The distances or angles between the nodes are then calculated (via trilateration or triangulation) [20] from these primary values for determining the locations of each node. Three main localization techniques are known in WSN: (i)ToF - Time of Flight, (ii) RSSI - Received Signal Strength Indicator and (iii) AoA - Angle of Arrival [1, 22]. Measuring the ToF is costly because the RF signal propagation has the speed of light and therefore the nodes must have a common accurate clock, or exchange of timing information using certain protocols as, for example, a two-way ranging protocol [2, 3]. Most 802.11 and 802.15.4 radio modules support the measuring of RSSI, which enables the calculations of received power for each received packet [6]. The power or energy distribution of a RF signal traveling between two nodes is
237
© MIDEM Society
B. Bertoncelj et al; Informacije Midem, Vol. 45, No. 3 (2015), 216 - 221
a signal parameter which can be used for distance estimation, depending on the path-loss and shadowing effects. RSSI measurements are very uncertain because there are several disturbing sources like many delayed multipath signals arriving at the receiver [4, 5]. AoA or directions to neighboring sensors can be estimated from ToF and/or RSSI. There are two common ways how sensors measure the AoA, i) multiple static smart antennas or arrays and ii) rotatable antennas.
Localization systems using multiple static antennas or arrays for AoA-measurements. Usually they are called smart antennas, because they use signal processing units for AoA measurements. The most common method is to use antenna array with known array geometry, and measuring the differences of signal arrival times at different antennas [7]. Authors in [8] estimates AoA at each receiving node via frequency measurement of the local RSSI. Another method uses antenna arrays and RSSI measurements on each antenna and from their RSSI -ratios the AoA can be estimated [9, 10, 11]. On the other hand AoA systems can uses beamform-ing [12, 13]. Beamforming on static antennas system can shape radiation pattern without physical shaping of antennas. This can be done with switching each individual antenna "on" or "off" with various algorithms. Adaptive smart antennas are the second category of antennas where adaptive beam is formed based on the direction of the desired and interfering sources [13]. In [14] authors proposed AoA estimation based on sub-array beamforming. The target AoA is estimated from the phase shift introduced in the target signal by subar-ray beamforming. In the literature many algorithm can be found for estimating AoA from antennas arrays or smart antenna, such are MUSIC [15], root MUSIC [16], maximum likelihood estimator [17], ESPIRIT [18], etc.
Many AoA techniques using rotatable antennas with shaped radiation patterns in one direction and then seek the maximal power of the received signals around the axis [19, 21].
This paper presents our proposed methods of AoA estimation using a system of multiple omnidirectional monopole static and rotatable antennas placed on the circumference of a circular plate. The radiation shape of the used monopole antenna is not ideally isotropic and has a tiny minimum in the direction of the antenna axis. The idea presented in this paper is to find from which direction the minimal RSSI is measured. Searching for the minimum was chosen because better selectivity is obtained due to the tiny sharp minimum within the radiation pattern of the antennas system. Rotatable antennas have advantage because of better resolution on the other hand the advantage of a static plate is that all movable parts are eliminated. We also show that the
static system has a slightly lower resolution, but can be improved by interpolation or approximation
The rest of the paper is organized as follow: Section II provides the theoretical background, Section III presents the used hardware, Section IV presents the experimental verification of the proposed method with rotatable antennas, Section V shows the experimental verification of the proposed method with static antennas, Section VI compares the used methods, and Section VII concludes the paper.
2 Theoretical background of our approach
The dipole antenna is considered as an omnidirectional antenna. The ideal dipole has a spherical radiation pattern, which radiates isotropically. A real dipole has some degree of directivity and radiates weaker in the direction of the antenna axis. The performance of a linearly-polarized antenna, which is almost an approximation of an ideal antenna, is described with E- and H-plane patterns [23]. The H [A/m]- and E [V/m]- planes are perpendicular to each other. The cross-product of E and H* (* denotes complex conjugation) divided by 2, gives the time-average Poynting vector of the radiated field.
The orientation of used short dipole antenna is shown in Figure 1. and the radiation pattern of antenna can be expressed as approximation of triangular current distribution in spherical coordinates:
Figure 1: Orientation of antenna in coordinate system
(1) (2)
E « L ^ I l—sin 0
"6 0_ -g
8n
jjk
8n " r
H «1 ikIl<— sin 0
where k = 6)^Ju0£0 = 2p /A, Z0 = /e0, r is the
length of spherical vectorr =(r, 0,O), 0is the azimuth angle, and F is the latitude angle. I is a source
217
M. Malajner et al; Informacije Midem, Vol. 45, No. 4 (2015), 237 - 248
current and d is the length of antenna. The 2-D omnidirectional radiation patterns are presented in Figure 2, obtained by Ansoft HFSS [25]. It is simulation of antenna, used in our experiments.
Figure 2: E-plane radiation of used antenna.
It can be observed from the radiation pattern in Figure 2 that the pattern has all the characteristics of short dipole. On the right side in Figure 2 it is orientation of the used antenna in respect to the radiation pattern. The radiation pattern in Figure 2 consists of two zones of tiny decreased intensity at 0 and 180 degrees. The RF signal, which arrives from the transmitter in the direction where radiation pattern has less sensibility, causes lower RSSI. For analytically searching angle © with minimum radiation value we should solve the following equation:
F (0,0) = 0
(3)
We can see that the analytically minimums of the simplified radiation pattern of dipole appear at every 180. Real dipole has a little difference between two minimums as is shown in Figure 2 and Figure 3. This difference is about a few dBs and can be detected with algorithm. Therefore, the minimums of real dipole appear at
integer multiple of 2p i.e. 0 = k ■ 2p; ke Z.
In this paper, two different methods using our proposed approach are presented and compared. i) system with 4 rotatable antennas, partly presented in [4], ii) system with 12 static antennas which uses approximations and interpolations for AoA obtaining, partly presented in [26].
						
						
						
						
						
						
						
	I-	-Amp iitude of e tectrt fefci « ■				
150 200 250
Angle »{•)
Figure 3: Radiation pattern of single antenna
3 Hardware set-up
The device for the proposed AoA measurement in our case consists of transceivers (used as receivers) placed on a circular plate. The plate with transceivers MRF24J-40MA is attached to our SPaRCMosquito WSN board with a Cortex M3 NXP microcontroller, and a battery-powered source. The transceivers use microstrip antenna operating within the ISM 2.4 GHz band. Monopole antenna uses ground plane of MRF24J40MA module and ground plane of circular plate as a counterpoise ground plane. Additional ground plane on our circular plate enhance the performance of module [27] and does not significantly reshape the radiation pattern.
In this section two types of proposed hardware are presented. First is rotatable and consists of 4 antennas and second is static (without mechanical moving parts) with 12 antennas.
3.1 Rotatable antennas
First version of receiving device for AoA measurement consists of a four transceivers placed on circular plate (Fig. 4a), a stepper motor (Fig. 4c), a driver circuit for stepper-motor (Fig. 4d), a WSN board-1 for controlling the receiver plate (Fig. 4b), and WSN board-2 for communication with computer (Fig. 4e). A 200 steps-perrevolution motor is used to rotate the plate with transceivers. The stepper motor-driver is the well-known integrated circuit L298. The plate with four transceivers is attached on the WSN board-1. The entire device for measuring AoA, including power-source, is mounted on the axis of stepper motor. The slipping rings for communications and power supply are eliminated because of this. WSN board-2 is also provided for controlling the stepper-motor.
239
M. Malajner et al; Informacije Midem, Vol. 45, No. 4 (2015), 237 - 248
During the experiments, the stepper motor turns the plate using receivers with resolutions of 3,6°. The transmitter (Fig. 4f) continually transmits RF signal. For each position of the plate, the receivers measure RSSI and then the data is sent to board-1, which wireless communicates with WSN board-2. The transmitter transmits about 40 packets of data at each position of stepper motor. WSN board-2 is connected via USB to laptop where the data is collected and later processed to estimate AoA.
a)
b)
e>

Figure 4: The rotatable AoA measuring experimental set-up.
3.2 Static antennas
We decided to build simplest static version of hardware in order to eliminate mechanical moving parts witch are not practical in real use of the device. Instead of stepper motor we added more antennas on circle plate and try to achieve the same effect, as with four rotatable antennas. Static measurement device is shown in Figure 5.
The measurement proceeding is next: The transmitter (Figure 5c) continually transmits RF signals during the experiments. The SPaRCMosquito boards (Figure 5b) with receivers on circular plate (Figure 5a) collects the RSSI values from each receiver. Approximately 40 packets of RSSI data are collected and averaged. Each transmitted packet was received on all receivers simultaneously. The averaged RSSI data were then sent to the laptop for further processing.
4 Description and experimental verification of the proposed method with rotatable antennas
The presence of a moving object or humans in the vicinities of the transceivers causes large fluctuations in
Figure 5: Static AoA measuring system. a) Plate with transceivers. b) Main measurement control device -SPaRCMosquito. c) SPaRCMosquito as a transmitter.
the RSSI measurements, therefore all the experiments were carried-out within environments without obstacles and moving objects, and in the line of sight (LOS). The experiments were conducted within both indoor and outdoor environments for different distances between transceivers.
4.1 Algorithm for AoA estimation method with rotatable multiple antennas
The goal of the algorithm is to find the AoA where minimums of RSSI appear. The simple algorithm for gathering the RSSI from each receiver is as follows. Firstly, the transmitter sends one packet with dummy data. All transceivers on the circular plate are in receiving mode and waiting for incoming packet. When the receivers receive a packet, they calculate the RSSI from the voltage of the incoming RF signal. One of receivers is connected to a microcontroller interrupt line. When receiver sends an interrupt to microcontroller, the reading phases are started. The microcontroller via SPI reads the RSSI data from each receiver in a sequence. When the reading is done, the command for next step is sent. This procedure is repeated until complete revolution of motor is done. All readings are sent to PC for further processing. On PC, specially developed software is gathering this data in files. In this files are saved RSSI readings of each antenna together with information of position of motor. This files serves as input for the Mat-lab algorithm for AoA estimation by analyzing t RSSI data. Simple algorithm first shifts obtained RSSI curves of antennas and then searches for the minimums of averaged curves. Where the global minimum appears, there algorithm reads the corresponding angle and adds ± 180° and estimated AoA. Figure 6 show principle of proposed method.
240
M. Malajner et al; Informacije Midem, Vol. 45, No. 4 (2015), 237 - 248
Figure 6: Top view of measuring device, and the coordinate system for measuring AoA. a) Initial set-up with unknown position of transmitter (Tx), bar graphs show measured RSSI-values. b) The scenario where the true AoA is obtained. The minimums of measured RSS-val-uesl on Rx1 and Rx3 can be observed on the bar graphs.
4.1.1 Outdoor experiments
Outdoor experiments were carried-out on an asphalt floor in a line-of-sight between transceivers. The transceivers were placed at a 1m height from ground, on a rack. The true AoA was set at 54. Figure 7 shows the measured RSSI versus rotating-angle at a distance between the transmitter and the receiver of 2 m.
Figure 7: Shifted and averaged RSSI vs. angle at distance 2m outdoor.
In some cases measurements are corrupted because of reflection from ground and other obstacles.
4.1.2 Indoor experiments
Indoor experiments were carried-out in LOS in a 6 m-by-8 m room in LOS between transceivers; the true angle was set at 54°. The transceivers were placed at 1 m in height. The experiments were carried-out in the same way as the outdoor experiments. Figure 8 shows received RSSI from all antennas versus angle. The in-
door measurements were influenced by reflected signals from the walls, floor, ceiling, furniture, and other objects. Therefore, the RSSI curves were not as smooth, having more local minimums and maximums, and it was more difficult to accurately estimate the AoA than during outdoor measurements. These effects are extremely visible in Figure 9 where we can observe lot of minimums and the algorithm therefore fails to obtain correct AoA.
Figure 8: Shifted and averaged RSSI vs. angle at distance 3m indoor.
Figure 9: Shifted and averaged RSSI vs. angle at distance 5m indoor.
4.1 Outdoor and indoor accuracies
The transmitter was placed at distances from 1 m to 6 m, with steps of 1 m. All measurements were repeated 3 times at same position of transceivers. Figure 10 shows the averaged errors between estimated and true AoA for three complete estimation procedures at different distances in outdoor environment. The maximum absolute mean error of the estimated AoA was about 4°.
241
M. Malajner et al; Informacije Midem, Vol. 45, No. 4 (2015), 237 - 248
ure 12 shows only 4 of 12 antenna's patterns (E-planes) because of transparency.
Figure 10: The error of the estimated AoA from all outdoor tests as a function of the distance between transmitter and receiver.
Fig. 11 shows the errors between the estimated and true AoA over different distances for indoor environment. The angle was estimated three times using the complete proposed algorithm. The maximum absolute mean error was about 8°.
Figure 11: The errors of estimated AoA from all indoor tests as a function of the distance between transmitter and receiver.
5. Description and experimental verification of the proposed method with static antennas
5.1 Simple algorithm for AoA estimation with multiple static antennas
The radiation pattern of monopole is well-known and can be calculated from equations (1) and (2). With the proposed antenna arrangement, twelve static dipole patterns placed around the circle can be imagined. Fig-
Figure 12: Principle of AoA measuring system with multiple static antennas.
The simple algorithm for gathering the RSSI-values from each receiver is as follows. Firstly, the transmitter sends one packet with dummy data. All the transceivers on the circular plate are in receiving mode and waiting for incoming packet. When the receivers receive a packet, they calculate the RSSI from the voltage of the incoming RF signal. One of receivers is connected to a microcontroller interrupt line. When the receiver sends an interrupt to the microcontroller, the reading phases starts. The microcontroller via SPI reads the RSSI data from each receiver in a sequence. When the reading is done, the transmitter sends a new data packet. RSSI measuring is repeated till 40 packets are processed and averaged. The on-line phase of the algorithm is finished with the last operation.
The off-line part of algorithm is performed on a laptop. Special software was developed only for gathering data from the measuring device via USB to the laptop. The software writes the averaged RSSI of each receiver in the file, including the receiver's number. This file serves as input for the Matlab algorithm for AoA estimation by analyzing RSSI -data. Basically, simple algorithm searches for receiver, at which minimum of RSSI data is measured. The true AoA is in the opposite direction (± 180°) of minima. By this proposed method, the reso- lution of AoA is 360°divided by the number of receivers, in our case 30°, because the antennas are equally arranged around the circle.
242
M. Malajner et al; Informacije Midem, Vol. 45, No. 4 (2015), 237 - 248
5.2 Experimental estimations of AoA using simple algorithm
All experiments were carried-out within a real environment, outdoor and indoor. Measurements were limited to line of sight (LOS) between transmitter and receivers. Moving and static obstacles caused significant fluctuations in the measurements of the received power. Each experimental set-up was repeated three times and the results averaged.
5.2.1 Outdoor experiments
Outdoor experiments were carried-out on an asphalt floor with line-of-sight between transceivers. The transceivers were placed on a rack at 1m height from the ground. The transmitter was placed at distances from 1m to 30 m, at steps of 1 m. Figure 13 shows the measured RSSI points versus angle at a distance between the transmitter and the receiver of 1 m. In first case, the true azimuth AoA was set at 90°. The algorithm searches for angle with measured minimal RSSI-value, which was in this case at 270° i.e. at the 9rd antenna. The estimated AoA was 270°-180°= 90°and completely matched the true AoA, set in advance.
In the second case the true AoA was set between antennas 3 and 4. The RSSI measurements are shown in Figure 14. By algorithm estimated AoA was again at 90°, and the true AoA was set at 105°. The estimation error in this case was about 15°and was a consequence of the fixed resolution of this proposed method, in our case 30°, as mentioned in previous section.
Figure 13: Outdoor measurements of RSSI versus angle at a distance of 1 m.
5.2.2 Indoor measurements
Indoor measurements were carried-out in a 6 m-by-8 m room and in underground garage. The transceivers were placed at 1m heights. The experiments were car-ried-out in the same way as the outdoor experiments.
Figure 14: Outdoor measurements of RSSI versus angle at a distance of 3 m and true AoA at 105°.
Figure 15 shows the measured RSSI versus azimuth angle. The measurements were done at a distance of 1m and an angle of 90°. The indoor measurements were influenced by the reflected signals from the obstacles in the room. Due to reflected signals, the RSSI measurements contained more local minimums and maximums and it was more difficult to accurately estimate the AoA than during outdoor measurements and therefore the differences between local minimums were smaller. However, despite small differences s, the AoA estimation was very close to the true value. In the case depicted in Figure 15, the algorithm found the smallest RSSI at the 9rd antenna (270°-180°), where also the true AoA of 90° was set.
Figure 15: Indoor measurements of RSSI versus angle at 1m
Figure 16 shows the case where the true AoA was set at 105°and the distance between the transceivers was 3 m. The algorithm estimated AoA at 300°-180°, where the smallest RSSI was measured. The greater errors during indoor estimation were the consequences of un-considered reflective signals from the obstacles. Due to the disturbing reflections from the obstacles, the esti-
243
M. Malajner et al; Informacije Midem, Vol. 45, No. 4 (2015), 237 - 248
mation errors in indoor environment were, in general, greater than a resolution of 30°.

Figure 16: Indoor measurements of RSSI versus angle at 5 m. True AoA was at 105°
5.3 Improved AoA- estimation algorithm
The interpolations and approximations between RSSI points were realized in order to improve the resolution and accuracy of the simple algorithm of AoA method, and then the angle, by witch minimum on curve with interpolated RSSI-values was searched for. Different interpolations and approximations were used and compared. In this subsection all the experiments were taken outdoors at distances of 2 meters.
5.3.1	Linear and spline interpolations
The simplest is interpolation which connects points with lines. However, this interpolation does not improve the accuracy of AoA estimation by searching the minimum. The next interpolation is cubic spline. Spline interpolation uses low-degree polynomials at each of the intervals, and chooses the polynomial pieces so that they fit smoothly together. Figure 17 shows the linear and spline interpolations where the true AoA was at 90° and in Figure 18 where the true AoA was at 105°. Where the true AoA was set in the direction of antenna 3, both the linear and spline interpolations gave the same minima of close to 90°. Figure 18 depicts a case where the true AoA was between antenna 3 and 4 (105°). Using linear polarization, the algorithm returned AoA at 90°, which meant the error was about 15°. The algorithm with spline polarization estimated AoA at 99°and the error was much lower, at about 6°.
5.3.2	Polynomial and Gaussian approximations
In this case the Polynomial and Gaussian approximations were included in the algorithm. When the number of data points (in our case twelve) is equal to an order of degree of polynomial, then interpolation is
Figure 17: Outdoor measurement using linear and spline interpolation (distance 2 m).
Figure 18: Outdoor measurement using linear and spline interpolation at true AoA at 105°
obtained and the polynomial goes through all the data points. The Gaussian process is a powerful non-linear interpolation tool. In addition the Gaussian function, it can not only be used for fitting an interpolant that passes exactly through the given data points but also for regression, i.e., for fitting a curve through noisy data [28].
A curve-fitting tool by Matlab [29] was used for calculating the polynomial coefficients. Fig. 19 shows the approximated data with polynomial of the 9th degree:
y(x) = p • x9 + p2 • x8 +...p8 • x2 + p9 • x + p10 (4)
The algorithm with polynomial approximation returned 91°, the estimated AoA was a little closer in comparison to linear polarization. The error was about 14°.
Gaussian approximation used general model Gauss2 with equation:
244
M. Malajner et al; Informacije Midem, Vol. 45, No. 4 (2015), 237 - 248
y (x)
(-((x-b,)/c,)2) .
a, ■ e u l> 1 ' + a2 ■ e
,(-(( X-b2)/C2)2)
(5)
The result of Gaussian approximation is in Fig. 20. The estimated AoA was almost exactly at 105°, with an error of 0°. In this case the Gaussian approximation was the best choice.
Figure 19: Outdoor measurement using polynomial approximation using 9th degree polynomial (distance 2 m).
Figure 20: Outdoor measurement using Gaussian approximation at true AoA at 105°.
5.3.3 MUSIC estimator
In regard to comparisons between interpolations techniques we implemented the MUSIC (Multiple Signal Classification) estimator [13] for obtaining AoA. This MUSIC algorithm is well-known in AoA estimation. The inputs of the MUSIC algorithm are usually the amplitude and phase of the received signal. MRF24J40 radio modules return just the amplitude of a signal (RSSI). The authors in [13] (Chapter 10, p.343) reported a MUSIC estimator which neglected information about the signal phase. This RSSI estimator is defined as follow:
The RSSI value on each i-th antenna can be written as:
St [k] = Gt (0) x ¡[k ] + n[k];	(6)
where Gt(8) are the gains of each antenna, x[k] are those signals assumed as being non-correlated from snapshot to snapshot [k]. An estimation of the correlation matrix R of the received signal is:
= K Its [k ]S [k ]
K k=i
(7)
By applying the single-value decomposition, a set of three following matrices is obtained:
R = USU:
(8)
The space spanned by the signal is partitioned as
U = [U,Un], where the matrix Us contains the singular vectors corresponding to the largest singular value, and the matrix Un containing the singular vector corresponding to the smallest singular values. U is signal subspace, and Un is signal null space (complementary space of the signal subspace). Because U is a unitary matrix, the signal and noise subspaces are orthogonal, (<Us,Un >= 0). This can be defined as a pseudo-spectrum of the MUSIC algorithm:
1 MUSIC
(0) =
1
G (0)Un
(9)
which exhibits a peak of angle (8) close to the actual angle (0).
Basically the RSSI MUSIC estimator searches areas where the maximums in the signals appear. In our case it is necessary to search for the minimums of the received signal and because of that we arranged an input signal for the MUSIC estimator. The raw measured signal depicted with dots in Fig. 21 is mirrored (depicted in Fig. 20) before going to the MUSIC estimator. The mirrored signal exhibits peaks instead of minimums which are crucial for correct MUSIC estimation. In regard to this modification, the parameters of the MUSIC estimator must be set appropriately. Instead of the beam width of the antenna, we used the width of the so-called 'cone of silence' which is the opposite of the beam width, in our case 20°. Fig. 20 shows the measured mirrored signal and the pseudospectrum of a signal as an output of the MUSIC estimator. The amplitudes of both signals are adjusted for better representation in Fig. 20.
245
M. Malajner et al; Informacije Midem, Vol. 45, No. 4 (2015), 237 - 248
Ai—imr
9	A	i	A	R	I	I	111 II
m in 1« m » m »
Figure 21: MUSIC estimator used on mirrored RSSI measurements.
5.4 Comparison of improved algorithms accuracies with different interpolations/ approximations
The interpolations improved the accuracy of AoA estimation using, in our case, multiple static omnidirectional antennas. The resolution of the estimation was 30° using the simple algorithm without interpolations and approximations. This section provides the comparison of results between algorithms by used the different interpolations/approximations.
1			—	I	n	—	.....AoA"				1	90° ■	
	■ A4 ■ 1= d	u A-W AoA ■	umr ■Ma iMwumtM ZHOniHln ■	music			1»*				I			-	u Aov-wxr
MNA
Figure 23: Indoor deviations at distance of 1 m.
The indoor measurements were more problematic because of the influences of reflected signals from the obstacles. Meanwhile, because LOS were between the transceivers, large fluctuations were caused by reflected signals from the floor, ceiling, walls, furniture, etc. Reflections of the signal causes "fake" minimas, which cannot be very successfully eliminated by our improved algorithm. Therefore the errors of AoA estimates were larger within the indoor environment. Again, Fig. 23 shows that the best results were obtained using Gaussian -approximation at all AoA measurement points.
Fig. 22 shows AoA estimation using five interpolations at a few different pre-set AoA's. Measurements were carried-out outside in LOS between transceivers. The linear interpolations provided an average error of about 15°. Polynomial approximation was slightly better and returned about 10° of error. The MUSIC estimator returns similar results on average as polynomial approximation. Much more accurate were the Spline and Gaussian interpolations/approximations where the errors reached about 6° and 4°, respectively.
	pi	1	r-|	—				1	■	LtH ■	Spine 1 Po^nomU IBaimtan ■	MUSIC 1			p	1
u										u			
HoA-90" AdA" 105" AoA									■1	B5	AoA-380"		
Set No.
Figure 22: Outdoor deviations at different pre-set AoA's (distance 2 m).
6 Comparison of accuracies using rotatable and static antennas
The last Fig. 24 shows an overall comparisons between all the different interpolation/approximation methods and with different hardware within different environments. It was expected that the bigger errors would be within the indoor environment. The errors from all measurements at all distances were averaged and presented in % of 360° for each method. Rotatable antennas gave the best AoA estimation because of higher resolution in comparison with static antennas. Relative errors were about 5% for outdoor and 14% for indoor measurements. On the other hand static antennas and Gaussian approximation provided the best results, with outdoor errors of about 3% and indoor about 18%, respectively. The spline and Linear interpolations and MUSIC estimator returned approximately similar results. The maximum errors were obtained by Polynomial approximation because of oscillations. The results show, that the Gaussian approximation improved the accuracy of AoA estimation within static antennas.
246
M. Malajner et al; Informacije Midem, Vol. 45, No. 4 (2015), 237 - 248
Figure 24: Comparison of all proposed methods.
7 Conclusions
In this paper we compared accuracies of two AoA estimations method. First method based on rotatable multiple antennas and second based on multiple static antennas. Rotatable antennas has resolution 3,6° and static antennas has resolution 30°. We showed that it is possible to improve accuracy of static antennas with interpolations/approximations. Hoverer, rotatable antennas gave the best results than static antennas with improved algorithm. But static antennas has advantage because all movable parts are eliminated. Overall comparison of result shows that the best results gave rotatable antennas (about 4% outdoor and 14% indoor relative error), close to this results gave static antennas and Gaussian interpolation (about 5% outdoor and 18% indoor error).
8 References
1.	L. Bras, N.B. Carvalho, and P. Pinho. Pentagonal patch-excited sectorized antenna for localization systems. Antennas and Propagation, IEEE Transactions on, 60(3): 1634-1638, 2012. doi:10.1109/ TAP.2011.2180339.
2.	N. Patwari, A.O. Hero, M. Perkins, N.S. Correal, and R.J. O'Dea. Relative location estimation in wireless sensor networks. Signal Processing, IEEE Transactions on, 51 (8):2137-2148, 2003. doi:10.1109/ TSP.2003.814469.
3.	S. M. Lanzisera. RF Ranging for Location Awareness. PhD thesis, UC Berkeley, 2009.
4.	M. Malajner, K. Benkic, P. Planinsic, and Z. Cucej. The accuracy of propagation models for distance measurement between wsn nodes. In Systems, Signals and Image Processing, 2009. IWSSIP 2009.
16th International Conference on, pages 1-4,
2009.	doi:10.1109/IWSSIP.2009.5367782.
5.	N. Patwari, J.N. Ash, S. Kyperountas, A.O. Hero, R.L. Moses, and N.S. Correal. Locating the nodes: cooperative localization in wireless sensor networks. Signal Processing Magazine, IEEE, 22(4):54-69, 2005. doi:10.1109/MSP.2005.1458287.
6.	M. Botta, M. Simek Adaptive Distance Estimation Based on RSSI in 802.15.4 Network RADIOENGINEERING,, VOL. 22, NO. 4, December 2013 .
7.	S. Maddio, A. Cidronali, and G. Manes. An azimuth of arrival detector based on a compact complementary antenna system. In Microwave Conference (EuMC), 2010 European, pages 1726-1729,
2010.
8.	Weile Zhang, Qinye Yin, Hongyang Chen, Feifei Gao, and N. Ansari. Distributed angle estimation for localization in wireless sensor networks. Wireless Communications, IEEE Transactions on, 12(2):527-537, 2013. doi:10.1109/ TWC.2012.121412.111346.
9.	M. Abusultan, S. Harkness, B.J. LaMeres, and Yikun Huang. Fpga implementation of a bartlett direction of arrival algorithm for a 5.8ghz circular antenna array. In Aerospace Conference, 2010 IEEE, pages 1-10, 2010.
10.	M.R. Kamarudin, Y.I. Nechayev, and P.S. Hall. On-body diversity and angle-of-arrival measurement using a pattern switching antenna. Antennas and Propagation, IEEE Transactions on, 57(4):964-971, 2009. doi:10.1109/TAP.2009.2014597.
11.	Yuan Shen and M.Z. Win. On the accuracy of localization systems using wideband antenna arrays. Communications, IEEETransactions on, 58(1):270-280, 2010. doi:10.1109/TCOMM.2010.01.080141.
12.	K.A. Gotsis, K. Siakavara, and J.N. Sahalos. On the direction of arrival (doa) estimation for a switched-beam antenna system using neural networks. Antennas and Propagation, IEEE Transactions on, 57(5):1399-1411, May 2009. doi:10.1109/ TAP.2009.2016721.
13.	R. M. Buehrer) Seyed A. (Reza, editor. Handbook of Position Localization Theory, Practice, and Advances. Wiley, 1 edition, 2012.
14.	Nanyan Wang, P. Agathoklis, and A. Antoniou. A new doa estimation technique based on sub-array beamforming. Signal Processing, IEEE Transactions on, 54(9):3279-3290, Sept 2006. doi:10.1109/TSP.2006.877653.
15.	Qiaowei Yuan, Qiang Chen, and K. Sawaya. Accurate doa estimation using array antenna with arbitrary geometry. Antennas and Propagation, IEEE Transactions on, 53(4):1352-1357, April 2005. doi:10.1109/TAP.2005.844409.
16.	Y. Takahashi, H. Yamada, and Y. Yamaguchi. Array calibration techniques for doa estimation with
247
M. Malajner et al; Informacije Midem, Vol. 45, No. 4 (2015), 237 - 248
arbitrary array using root-music algorithm. In Microwave Workshop Series on Innovative Wireless Power Transmission: Technologies, Systems, and Applications (IMWS), 2011 IEEE MTT-S International, pages 235-238, May 2011. doi:10.1109/ IMWS.2011.5877119.
17.	Xin Chen, Yu. Morton, and F. Dovis. A computationally efficient iterative mle for gps aoa estimation. Aerospace and Electronic Systems, IEEE Transactions on, 49(4):2707-2716, OCTOBER 2013. doi:10.1109/TAES.2013.6621847.
18.	R.L. Johnson and G.E. Miner. An operational system implementation of the espirit df algorithm. Aerospace and Electronic Systems, IEEE Transactions on, 27(1):159-166, Jan 1991. doi:10.1109/7.68159.
19.	B.N. Hood and P. Barooah. Estimating doa from radio-frequency rssi measurements using an actuated reflector. Sensors Journal, IEEE, 11(2):413-417, 2011. doi:10.1109/JSEN.2010.2070872.
20.	Guangjie Han, Deokjai Choi, andWontaek Lim. Reference node placement and selection algorithm based on trilateration for indoor sensor networks. In Wireless Communications and Mobile Computing, Wirel. Commun. Mob. Comput. 2009; 9, pages 1017-1027, 2008. doi:10.1002/wcm.651.
21.	J. Graefenstein, A. Albert, P. Biber, and A. Schilling. Wireless node localization based on rssi using a rotating antenna on a mobile robot. In Positioning, Navigation and Communication, 2009. WPNC 2009. 6th Workshop on, pages 253-259, 2009. doi:10.1109/WPNC.2009.4907835.
22.	Abderrahim Benslimane, Clement Saad, JeanClaude Konig, and Mohammed Boulmalf. Cooperative localization techniques for wireless ssensor networks: free, signal and anglebased techniques. In Wireless Communications and Mobile Computing, Wirel. Commun. Mob. Comput. 2014; 14, pages 1627-1646, 2012. doi:10.1002/wcm.2303.
23.	R. Dean Straw. The ARRL Antenna Book. The national association for ARRL amateur radio, 21 edition, 2007.
24.	C. A. Balanis. Antenna Theory, Analysis and Design. Wiley, 3 edition, 2005.
25.	Ansys hfss. http://www.ansys.com/
26.	M. Malajner, D. Gleich, and P. Planinsic. Angle of arrival measurement using multiple static monopole antennas. Sensors Journal, IEEE, 2015. doi:10.1109/JSEN.2014.2386537.
27.	Mrf24j40ma - rf. http://www.microchip.com/
28.	Interpolation - wikipedia, the free encyclopedia.
29.	Curve fitting toolbox - matlab. http://www.math-works.com/products/curvefitting/
Arrived: 09. 07. 2015 Accepted: 31. 12. 2015
248
Original scientific paper
Informacije
Journal of Microelectronics, Electronic Components and Materials Vol. 45, No. 4 (2015), 249 - 259
Design of Low-Power Temperature Sensor Architecture for Passive UHF RFID Tags
Mohammad Reza Ghaderi Karkani1, Mahmud Kamarei1, Ali Fotowat Ahmady2
1University of Tehran, School of Electrical and Computer Engineering, Tehran, Iran 2Sharif University of Technology, Department of Electrical Engineering, Tehran, Iran
Abstract: A low-power wide-range CMOS temperature sensor architecture is proposed based on temperature-to-frequency conversion using supply voltage controlled sub-threshold ring oscillator. The principles of operation are investigated and proved via analytic and simulation results. Most errors are canceled out by this ratio-metric design. An inaccuracy of -0.84°C to +0.34°C occurs over a range of -40°C to 80°C after using a novel in-field digital two-point calibration. The entire sensor consumes less than 93nW to 305nW over the temperature range and can be digitally reconfigured for setting sample rate and resolution in a tradeoff.
Keywords: CMOS temperature sensor; temperature-to-frequency; low-cost calibration; wireless sensing; RFID tags
Dizajn arhitekture temperaturnega senzorja nizke moči za pasivne UHF RFID etikete
Izvleček: Predlagana je arhitektura CMOS temperaturnega senzorja nizke moči, ki temelji na pretvorbi temperatura-frekvenca z uporabo napajalne napetosti kontroliranega pod pragovnega oscilatorja. Principi delovanja so raziskani in utemeljeni analitično in z rezultati simulacij. Večina napak je odpravljena z metričnim dizajnom. Negotovost, z uporabo nove dvotočkovne digitalne kalibracije je, znaša od -0.84 °C do +0.34 °C v širokem pasu od -40 °C do 80 °C. Poraba senzorja je od 93 nW do 305 nW in je lahko digitalno spremenljiva za določanje optimalnega vzorčenja in resolucije.
Ključne besede: temperaturni senzor CMOS; temperatura-frekvenca; cenena kalibracija; brezžično tipanje; RFID etikete
* Corresponding Author's e-mail: mrghaderi@ut.ac.ir
1 Introduction
Integrating Radio-Frequency Identification (RFID) tags with sensors has becomes the mainstream of realizing sensor nets [1]. Integrating passive RFID tags with external temperature sensor is reported in [2]. While external sensors require separate readout circuitry, smart sensors combine a sensor and interface electronics in a single chip. Most smart temperature sensors in CMOS technologies make use of temperature dependent characteristics of substrate PNP transistors. These sensors could reach a high accuracy over a wide temperature range [3] but usually consume power in the order of tens of micro watts and their performance deteriorates once the supply voltage falls below 0.6 V.
However, adding analog to digital converters and other associated digital signal processing electronics, the power consumption of these sensors, is still higher than the power budget of passive RFID tags, few hundreds of nano watts.
Using threshold voltage and thermal voltage variation of sub-threshold MOS transistors, low silicon area voltage-output temperature sensors [4] or front-end thermal sensing elements [5] could be implemented.
Using time-domain readout techniques and terminating power-hungry analog to digital converters, low power sub-microwatt smart sensors could be implemented in temperature-to-frequency and temperature-to-time (pulse width) converter architectures in cost of sacrificing sensor gain linearity and operating range. Temperature-to-frequency architectures are reported using temperature-dependent bias current ring oscillator [6, 7, 8] and temperature-dependent voltage-controlled LC oscillator [9]. Main architectures of temperature-to-time sensors are: converting temperature-dependent current to pulse-width [10, 11], temperature-dependent voltage to current to pulse-width
249
© MIDEM Society
M. R. Ghaderi Karkani et al; Informacije Midem, Vol. 45, No. 4 (2015), 249 - 259
[12, 13], leakage current to logarithmic pulse-width [14], temperature to delay time using delay line [15, 16] temperature to pulse-width of variable ring oscillator instead of delay line [17].
Sub-threshold ring oscillators are highly sensitive to the supply voltage, while consuming low power. Wide-range low power temperature sensors could be implemented using such an oscillator as a PTAT frequency generator. Compared to a similar oscillator as a reference, the common source of errors will be canceled in a ratio-metric design and the linearity will improve. Based on this concept, in this paper, a new wide-range, reconfigurable, nano-watt smart sensor architecture is proposed. RFID tag applications need a low-cost calibration technique. Avoiding the conventional costly two-point calibration process, a novel low-cost in-field group digital calibration technique is presented too.
The rest of this paper is organized as follows. Section II introduces the temperature sensor architecture and its measurement principles. Building blocks of the sensor architecture are theoretically analyzed and described in circuit level in Sections III and IV. Digital sensor gain, temperature calculation and calibration mechanisms are described in Section V. Section VI shows the simulation results. The conclusion is presented in Section VII.
2 Temperature sensor architecture and operation principle
2.1 Low Power Temperature Sensor Architecture
The architecture of the proposed temperature sensor is shown in Fig. 1. The sensor consists of two frequency paths. One of them is a constant-with-temperature reference frequency oscillator and the other one is a proportional to absolute temperature (PTAT) frequency oscillator. Two similar counters start to count the number of the output signal pulses of each oscillator as soon as Reset is asserted. N , is the number of refer-
count
ence frequency oscillator pulses indicating the period of comparison and N is the number of times that the
times
VDD
Ncount
En
Reference freq. Osc.	juin		Counter R
		p	
VDD -r			Reset
			
PTAT freq. Osc.	jinn	L	Counter P
			
comparison is repeated. The digital bit of En enables the current bias of both oscillators.
2.2 Compensated temperature measurement
Digital temperature readout could be produced comparing the temperature dependent frequency of the PTAT oscillator to the reference frequency. Considering the frequency change across the temperature variation range as:
A/"sen = fsen(Tmax) ~ fsen(Tmn)
the sensor gain SG can be defined as:
S = A f / (T — T )
G J sen \ max min)
(1)
(2)
Dividing the frequency change by the reference frequency ff the sensor digital output could be defined as:
AD = Af / f ,	(3)
sen J sen J ref	v-v
Finally, the digitized sensor gain DSG can be expressed as:
DG = AD / (T - T ) = Sg / f
SG	sen \ max min / G J r
ref
(4)
Figure 1: The architecture of the proposed temperature sensor.
which is the ratio of the sensor gain and the reference frequency and compensate for the bulk of common sources of error and nonlinearity in the ratio-metric design.
3 Temperature and process variation compensated oscillator
For low-power low-cost oscillator, needed in RFID tag applications, ring oscillator architecture seems to be the best candidate. The frequency of the ring oscillator could be controlled robustly via current bias of the chain inverters. Another technique to control the frequency of the ring oscillators is supply voltage control [18]. The frequency of the ring oscillator is highly sensitive to the supply voltage, temperature and process variations. This sensitivity will increase even more in sub-threshold regime. Considering this, a compensation technique is proposed to control the frequency of a sub-threshold ring oscillator using an adaptive supply voltage.
Fig. 2 (a) shows the architecture of the reference frequency generator. A series voltage regulator generates the adaptive sub-threshold supply voltage of the ring oscillator Va, from a supply voltage VDD, using an adaptive voltage reference Vb. Vb is generated by biasing a diode-connected PMOS with a digitally enabled current mirror in the sub-threshold region. E enables the
250
M. R. Ghaderi Karkani et al; Informacije Midem, Vol. 45, No. 4 (2015), 249 - 259
Mref bias which sets the level of the reference voltage Vb and thus the sub-threshold supply voltage V.
In (2 )((( + Reqp )/2) CL
(5)
The circuit schematic of the adaptive voltage regulator is shown in Fig. 2 (b). The digitally enabled current mirror generates the reference current If The seriesvoltage regulator compares the sample of Va with the reference Vb and control the output transistor Mp which causes V to be a fixed ratio of V. as V = aV,.
a	b a	b
VDD
VDD
(a)
\	r Mref Vbias O-l I
(b)
Figure 2: (a) The architecture of the reference frequency generator; (b) Complete circuit schematic of the adaptive voltage regulator and the reference frequency generator.
Biasing Mf in sub-threshold regime with the fixed current If Va and Vb are decreasing with temperature and show a complementary to absolute temperature (CTAT) behavior. The oscillation frequency of the ring oscillator is decreasing with reduction of sub-threshold supply voltage Va too. Therefore using Va as supply voltage, the oscillation frequency of the ring oscillator is decreasing with temperature and represents a CTAT behavior. On the other side considering a fixed subthreshold supply voltage, the oscillation frequency of the ring oscillator is increasing with temperature due to increase of the sub-threshold current of the transistors and shows a PTAT behavior. It is sufficient to adjust a to set the sub-threshold supply voltage Va in a range that the CTAT behavior of the oscillation frequency of the ring oscillator compensates the PTAT one to make the reference frequency constant with temperature.
Now we proceed with propagation delay calculation. The propagation delay of a CMOS inverter can be calculated as:
where R and R
eqp
are the equivalent resistors of pull-up and pull-down transistors in an inverter and CL is the total output capacitance. The drain current equation in the sub-threshold region can be expressed as [19]:
1DS 1S 0
r r 1 — exp
Vd.
\\
exp
' J J
Vs — V, — Voff
nv.
(6)
where the parameter vt is the thermal voltage and is given by KBT/q. Vth. Vth is the threshold voltage of the transistor. Voff is the offset voltage which determines the drain current at Vgs = 0. The parameter n is the subthreshold swing parameter (or slope factor) which is a function of channel length and the interface state density [19] and can be illustrated by slope of logarithmic drain current versus gate voltage plot, with fixed drain, source, and bulk voltage in sub-threshold regime. IS0 is a temperature and process dependent parameter which its dependency on the temperature can be expressed as:
i _ k Tß
1S0 ^S 01
(7)
Where the constant parameter kso and the power factor p can be calculated form technological parameters. Considering Vs>> n, channel length modulation can be neglected and sub-threshold drain current can be simplified to:
lDS
IS 0 X exp ((
* V* -Voff)/nvt)
(8)
For an inverter operating in sub-threshold region, Rq and R can be calculated as follows:
"It!
-T7 ' /2 "»'
V -V
dV--
V
IsoeexP
V -VA
(9)
where Va is the supply voltage (See Fig. 2). Replacing (9) in (5), the propagation delay of each inverter can be written as:
L - 0.5-
VC
Is oexp
V
(10)
The above equation confirms the simulation results which represented in Fig. 3 (b) and shows the inverse relation between the propagation delay and the supply voltage Va at 25 °C. Va in turn is proportional to the reference voltage Vb, and is expressed as:
251
M. R. Ghaderi Karkani et al; Informacije Midem, Vol. 45, No. 4 (2015), 249 - 259
= (1 + VR2) à aVb	(ii)
The adaptive reference voltage can be calculated as:
F, = nvt In ( Iref /1s „r, )
+ K.
(12)
based on the current bias I , and the values of thresh-
old voltage and technological parameters n and IS for the reference diode-connected PMOS transistor M
ref
The threshold voltage of transistor with substrate junction connected to source [19] can be expressed as:
Vh = VA (T0) + kt (T / To -1)
(13)
which is a linear function of temperature and is derived from the threshold voltage in the reference temperature VJTJ and constant temperature coefficient kt Based on (12), (13) and (7), Vb can be rewritten as:
K = vth{T0)-kt +
i \ K K. iref
Tn a k,
vs o,
"f /
(14)
K,
- j3n—T\nT^kl+k2T + k3ThiT^kl+k2T 1
Where k1, k2 and k3 are the corresponding coefficients of each term. The simulation results in Fig. 3 (a) shows that Vb is an approximately linear function of temperature with an R-squared regression of 0.9997 from -40°C to 80°C therefore k3 ~ 0 and the corresponding term can be eliminated from the equation. Substituting Va from (11) and Vb from (12) in (10) and using a few mathematical operations, t can be expressed as:
0.5aVbCL
'k I a
KS 01 ref k
\
S 0,
'ref
-,(ß-aß,
'ref,
exp
aVth„f - V A
/
V
nv„
(15)
Substituting Vb from (14) and Vh from (13), (15) can be
rewritten as:
tp (T )« tp 0 ( + k2T )T Yexp (ktp Jr) Where:
(16)
tp0=0.5aCL[kS0i
ef ! kS0href
((a-1) kq / (nKBT0 ))
(a-(T0)-kt)q/nkb
kt
Y=aßef
,
and brrf and ß, are the power factors for the reference and the ring oscillator transistors, as expressed in (7). a is the constant ratio of supply voltage to adaptive ref-
erence voltage and can be set with fine tuning of the ratio of R1 and R2 in the adaptive bias voltage regulator as in (11). The offset voltage of the amplifier, which directly adds to the supply voltage, will tune out in the calibration process too.
Taking the first derivative of tp with respect to T from (16) and setting its value to zero results in t to be constant with temperature variations. Taking this derivative and eliminating the negligible terms (the terms with lower order of T) renders:
t\ (t)=dtp i dT=tp0 [k2 (r+1)+(Kr-hk^r^
+ (k1ktp0)T-2]rexP(ktp0/T)~	(17)
~tp0[k2{y+l)]rexp(klp0/T)
setting t'p (T) = 0 results in y = -1. Since y = abref - p, the parameter a can be set to a value which satisfy y = -1. The desired condition obtained by tuning the ratio of R1 and R2 via parameter sweep in the simulation which resulted in a = 1.15.
Fig. 3 (a) shows the variation of Va and Vb versus temperature. With temperature variation of AT from -40°C to 80°C reduction value of AVa and AVb in the both supply and reference voltages are observed which conforms the CTAT behavior of the voltages.
Fig. 3 (b) represents the propagation delay and the oscillation frequency of the ring oscillator versus supply voltage at fixed 25°C temperature. It can be seen that the oscillation frequency will reduces with reduction of supply voltage and proportionally shows the CTAT behavior. Adjusting a and therefore AVa in a proper range, the desired frequency variation value of Afa will be occurred with AVa. In Fig. 3 (c) a frequency increase of AfT is observed due to full range temperature variation of AT from -40°C to 80°C which shows the PTAT behavior of the oscillation frequency of subthreshold ring oscillator with a fixed 0.4V supply voltage. In order to make the oscillation frequency stable with temperature the parameter a is adjusted to equate Afa with AfT which balances the CTAT and PTAT behavior of the oscillation frequency.
The propagation delays of the reference oscillator t versus temperature from 500 Monte Carlo simulation runs are shown in Fig. 3 (d). As expected from (16), the propagation delay is nearly constant across wide ranges of process and temperature variation. It can be seen that delay to temperature ratio of At / AT = 0.002ppm is resulted from -40°C to 80°C. Total ratio of the oscillation frequency variation to temperature across process corners and -40°C to 80°C temperature range is Afef / AT

252
M. R. Ghaderi Karkani et al; Informacije Midem, Vol. 45, No. 4 (2015), 249 - 259
= 980ppm and phase noise for center frequency of 310 KHz at 1KHz offset is -48dBC/Hz at 20°C.
(a)
(b)
(c)
(d)
		-»-tp
		—■—tpsen
		y= -6E-0SX+ 1E-05 "■*.., R!=0,9944
		
		"" ■ V = 2 KB* V 3 E« R1 =0,8394
Temperature 4C
Figure 3: (a) Variation of reference voltage Vb and supply voltage Va with temperature variation setting a = 1.15; (b) Propagation delay tp and oscillation frequency of the ring oscillator f vs. supply voltage at 25 °C; (c) Propagation delay and oscillation frequency of the ring oscillator f vs. temperature variation with a fixed 0.4 V supply voltage; (d) Average and its ±3o boundaries of the propagation delay of the reference clock (tp) and the PTAT frequency generator (tpsen) versus temperature based on Monte Carlo simulations.
4 Temperature dependent process variation compensated oscillator
In the previous section it was demonstrated how the reference frequency was made stable with temperature by adjusting a to set the sub-threshold supply voltage Va in a range that the CTAT and PTAT behaviors of oscillation frequency counteract each other in the reference oscillator. Setting Va higher than the adjusted range in the stabilized reference oscillator causes the oscillation frequency to show a CTAT behavior and setting Va lower than the range, the PTAT behavior will be achieved.
The PTAT frequency generator circuit is similar to the circuit shown in Fig. 2 (b) except for R1 = 0 which results in asen = 1+R/R2 = 1, hence the regulated sub-threshold supply voltage output is equal to the adaptive bias voltage. The circuit uses the same adaptive voltage reference Vb and a separate ring oscillator with similar size and number of stages.
Fig. 3 (a) shows the variation of Vb versus temperature which is the same supply voltage for the PTAT oscillator. Fig. 3 (b) shows variation of the oscillation frequency Dfb due to variation of the supply voltage DVb at fixed 25°C temperature. It can be seen adjusting supply voltage in lower range DVb by setting asen = 1, lower frequency variation Dfb is resulted which is smaller than PTAT frequency variation DfT in Fig. 3 (c) as mentioned before. Therefore the PTAT behavior is dominant and make the frequency of the oscillator temperature dependent.
Considering asen = 1+R1/R2 = 1, similar to (16), the propagation delay for the PTAT frequency generator circuit can be derived as:
PSen
tposn ( + k2T)T~pexp
XtP «s«
(18)
As the R1 value and therefore asen is optimized to make the digital output of the sensor linear (as described in section 5), the propagation delay of the PTAT frequency generator remains slightly nonlinear.
The propagation delay of the PTAT oscillator tpen, versus temperature at different process corners are shown in Fig. 3 (d). Compared to the reference oscillator delay, the PTAT oscillator delay varies in inverse proportion to the absolute temperature while the reference oscillator delay is relatively constant. At room temperature (20°C) the propagation delay of the reference clock is t = 3.21ms, and the propagation delay of the PTAT frequency generator is t sen = 0.10ms, which render fref = 1/ t = 310KHz and f = 7/t = 94KHz.
p	sen	psen
253
M. R. Ghaderi Karkani et al; Informacije Midem, Vol. 45, No. 4 (2015), 249 - 259
5 Digital sensor gain and temperature calculator
Although the PTAT frequency generator is a slightly non-linear function of temperature, the digital output of the sensor, which is the ratio of the PTAT frequency to the reference frequency, is an approximately linear function of temperature. To see this, assume the digital output to be:
Den = fsen 1 fref = 'p 1 tp, = "/a ^
(sen -a)Ky
(19)
and substituting the exact equation of Vb from (14) and rewriting (19), the digital sensor output can be expressed as:
D =a/
OL
exp
(asen-a)k3lnT
nK
exp
exp
nK
q
exp
{asen-a)k, nK j
q
(20)
= D0Tsexp(kDJT)
Where y = (a - a ) b .and D„ and are constant coef-
* v sen ref	0	d0
ficients. In order to make the digital sensor output a linear function of temperature, the second derivative of Dsen with respect to T should equals to zero. The first and second derivatives of (20) can be written as:
D (T) =
sen \ J
dD.
dT 0
+ (2kD0 - SkD0)Tö-3 + km2Ts-4]exp{kD0/T) * ^D0[(^-S)T6-1]exp(kDJT)
(22)
neglecting the terms with lower order of T, the parameter 8 can be set as 8 = 1 to make D" (T) = 0. It means
sen 1 '
(a - asen)Pref = 1. By tuning the ratio of R1 and R2 via parameter sweep in the simulation, it reveals that, despite of the approximations, asen = 1 satisfy this equation and results in the best linearity for the digital sensor output.
Practically the second derivative of Den remains slightly non-zero and for high-precision digital sensor output it should be presented in a second-order polynomial equation form (neglecting higher order terms) as:
Dsen (T) = Dsen {T0) + [T - T0)Dse; {T0) + + {T- T0f Dse; (T0)/2 = a0 + b0T + c0T2
(23)
which is almost a perfect second-order polynomial function of temperature.
Fig. 4 shows the digital sensor output vs. temperature from 500 Monte Carlo simulation runs. The graphs fit second-order polynomial trend functions with an R-squared regression of 1 from -40°C to 80°C. According to the average Den fitted equation, equation (23) can be written as:
Dsen (T) = 0.2672 + 0.0018T + 0.000004T2 (24)
Figure 4: Average and its ±3o boundaries of the digital sensor output with temperature variation based on Monte Carlo simulations.
kD0Ts 22exp(kD0 / T) (21) 5.1 Temperature Calculator
In section 2.2, we showed how the digital readout circuit principally measures the temperature. Here, we illustrate how the temperature calculator computes the temperature using N , and N signals. The time pe-
1	^ count	times ^	1
riod of each comparison is defined by:
T = N
P ' count
/ fr
ref
(25)
Where N , is the number of reference frequency oscil-
count	1	7
lator pulses at frequency of fref In this period of time, the number of PTAT oscillator pulses with frequency of fen can be calculated as:
NSen ~ Tpfsen ~ Ncount fsen ^ fref	(26)
This results in:
Nsen ^ Ncount fsen ^ fref	(27)
so the sensor digital output can be directly obtained by:
254
M. R. Ghaderi Karkani et al; Informacije Midem, Vol. 45, No. 4 (2015), 249 - 259
kDsen = ¥sen 1 fref = sen2 - Nsenl ) 1 Ncount (28)
Therefore the sample rate can be calculated as:
Sample Rate = UTP = fref / Ncount	(29)
To calculate the temperature measurement resolution or the minimum measurable change in temperature D T ., note that:
AD
= dsgat .
senmin	SG min
(30)
Considering (28), the minimum calculated frequency variation will be given by:
^senmin = ((2 - NS1 ) 1 Ncount = 1 1 Ncount (31)
Finally, from the recent two equations, the digital readout resolution of the temperature measurement can be calculated as:
Resolution = ATmin = 1/ (NcountDSG)	(32)
5.2	Sensing Errors and Calibration
The process parameters' spread, the supply noise, and the nonlinearity of the frequency variation with temperature are the dominant sources of error. Due to the highly similar architecture of the two oscillators which differ only in the value of R1, and their highly symmetric layout, most of the errors are expected to be canceled out in this ratio-metric design.
The supply and device noises of the ring oscillator are directly translated to the jitter of the output oscillation waveforms, which will be eliminated in digital counters.
As seen in Fig. 4, the nonlinearity of the frequency variation with temperature results in less error while perfectly fitting Dsen to a second-order polynomial function of temperature. Needless to say, this nonlinearity could be digitally compensated by implementing the inverse function of (24) in the temperature calculator to calculate the measured temperature from the resulting Den as below:
T = 500 [V(Dsen - 0.0647) - 0.45]	(33)
The process parameters' spread causes an offset in both y-intercept and slope of the digital sensor output curve in Fig. 4. Thus a two-point calibration is required to trim the sensor for the 120°C temperature range.
5.3	Digital Group Calibration
There are some low-cost after packaging calibration techniques using an extra on-chip calibration transis-
tor and calibrating the sensor by measuring die temperature [20], or batch mode calibration by calibrating a limited number of samples from a production batch, obtaining an estimate of average error and trim the entire batch using the information [21]. Due to ease of group communication in RFID applications, for the proposed temperature sensor, an In-field group-mode calibration at two different temperatures is proposed to digitally trim the sensor tags after packaging.
Fig. 5 shows the error of the digital sensor output vs. temperature from 500 Monte Carlo simulation runs. In the first-point calibration, all sensor tags are placed in the minimum operating temperature, e.g. T1 = - 40°C, an interrogator announces the field temperature. Each sensor calculates the ideal corresponding digital sensor output D1ref using the digitally implemented equation (24) and measures the real digital sensor output D1. The y-intercept offset of the digital sensor output curve is calculated as follows:
AD = D1ref
a
(34)
DD1 for a sample on the +3o boundary is shown in Fig. 5. From then on, the sensor adds the above offset DD1 to any measured output as a one-point calibration. Fig. 6 shows the error of the digital sensor output vs. temperature after the one-point calibration.
Figure 5: Error of the average and its ±3o boundaries of the digital sensor output with temperature variation based on Monte Carlo simulations.
Again all sensor tags are placed in the maximum operating temperature, e.g. T2 = 80°C, and an interrogator announces the field temperature. Each sensor calculates the ideal corresponding digital sensor output D2f and measures the real digital sensor output D2. The slope offset of the digital sensor output curve is calculated as below:
AD' = (( rf - A)/(T2 - T)
(35)
255
M. R. Ghaderi Karkani et al; Informacije Midem, Vol. 45, No. 4 (2015), 249 - 259
Figure 6: Error of the average and its ±3o boundaries of the digital sensor output with temperature variation after one-point calibration.
At any working temperature, the sensor calculates the offset of the digital sensor output using the latest measured temperature T' as follows:
AD2 = AD' (T'- Tj)
(36)
AD' and AD2 at T' for a sample on the +3o boundary are shown in Fig. 6. The sensor adds the above offset AD2 to the latest measured digital output to calculate the new temperature, T and replaces T' with T and then recalculates AD2 to find a more accurate temperature value, in an iterative way. The digital parameter Ntimes determines the number of iterations for temperature calculation. Fig. 7 shows the error of the digital sensor output vs. temperature after the two-point calibration. Being compared to Fig. 5, offsets in both y-intercept (AD1) and slope (ADO of the digital sensor output curves are cancelled out for average and its ±3o boundaries after the proposed two-point calibration.
The controlling signal Ncount defines the programmable resolution and the sampling rate. The temperature sensor can be digitally reconfigured. There are two options: reducing the sampling rate while decreasing the resolution, or keeping the sample rate high while increasing the resolution.
6 Simulation results
The layout of the sensor core circuit is designed using an industrial 0.18 ^m technology library and shown in Fig. 8. The size of the core sensor layout is 52.6^mx51^m. The netlist of the sensor circuit is extracted for post layout simulation and 500 Monte Carlo simulations were run. The functionality of the counters and the temperature calculator is evaluated using a software on a PC. In practice digital modules can be implemented with
Figure 7: Error of the average and its ±3o boundaries of the digital sensor output with temperature variation after two-point calibration.
Figure 8: Layout of the sensor core.
sub-threshold static CMOS logic gates alongside with other digital parts of the RFID tag IC using the same supply voltage of the reference oscillator (Va).
Fig. 9 shows the sensor error vs. temperature after the two-point calibration. It is shown that error ranging from -0.84°C to 0.34°C occurs over a range of -40°C to 80°C which is less than 1% of the measurement range.
N is set to 4600 to achieve a resolution of less than
count
0.3°C and the sample rate is calculated from (29). Fig. 10 shows the resolution of the sensor vs. temperature. This N , value renders a sample rate of higher than 66
count
samples per second. Diagram of sample rate vs. temperature is shown in Fig. 11. The dynamic power consumption of the core sensor at different temperatures is calculated from the simulation and is shown in Fig. 12. The total power consumption varies from 93nW to 305nW over the full temperature range. The dynamic
256
M. R. Ghaderi Karkani et al; Informacije Midem, Vol. 45, No. 4 (2015), 249 - 259
(ac) power consumption changes depending on the oscillation frequency and supply voltage of the Reference and PTAT oscillators. The static (DC) power consumption increases with temperature due to increase in the sub-threshold currents and change in supply voltage even with decrease of the supply voltage V.
Figure 9: Temperature measurement error after two-point calibration.
Figure 10: Sensor resolution vs. temperature with N ,
J	^	count
= 4600.
The results are given in Table 1. Compared with the references, the proposed sensor architecture exhibits higher accuracy over a wider temperature range of 120°C, while having a nano-watt power consump-
Figure 11: Sensor sample rate vs. temperature with
N , = 4600.
count
gaoo
	—•— Dynamic		
	■ Static		
■	—ft— Total		
ft--			
1 1 ■ T W I			
-40 -20	0	20	40	60	BO
Temperature 'C
Figure 12: Power consumption of the sensor core vs. temperature.
tion and comparable resolution and sample rate (See Table 1).
7 Conclusions
Using supply voltage controlled sub-threshold ring oscillators, a wide-range, accurate and low-power temperature sensor architecture is demonstrated which can be dynamically reconfigured for setting resolution and sample rate. The architecture uses a ratio-metric design which cancels out most of the common sourc-
Table 1: Simulation results and comparison with references.
Parameter	[61	[71	[81	[101	[111	[121	[131	[141	[151	[161	[171	This Work
Resolution (°C)	0.035	0.5	0.3	0.35	0.3	0.3	0.21	0.28	0.5	0.2	0.595	0.294
Error (°C)	±0.1	±1.5	-1.4/+1.5	±0.8	±1.5	-0.8/+1	-0.8/+1	±1.97	±1	-0.8/-1	-0.63/+1.04	-0.84/+0.34
Temp. Range (°C) '	35~45	-40~85	0~100	-20~30	-30~60	0~100	-10~30	20~100	0~75	0~100	20~120	-40~80
Power Consumption (nW) @SR	110 @10	600 @-	71 @33	2400 @25	350 @68	405 @5	119 @333	1.05~65.5 @5	9000 @20	1500 @10	288100 @430K	93~305 @66
Energy/ Conversion (nJ per conversion)	11	-	2.2	96	5.2	81	0.35	0.2~13	450	150	0.67	1.4~4.6
Sampling Rate (Sample/Sec)	10	-	33	25	68	1K	333	12	1M	10	430K	66
Calibration	2-point	1-point	2-point	1-point	1-point	2-point	2-point	Without Cal.	2-point	2-point	1-point	2-point
Area (mm2)	0.084	0.005	0.09	-	0.14	0.0324	0.0416	0.000843	0.4	0.025	0.031	0.00268
CMOS Technology	0.35|im	0.18|m	0.18|m	0.18|m	0.18|m	0.18|m	0.18|m	0.35|m	0.35|m	0.35|m	0.13|m	0.18|m
257
M. R. Ghaderi Karkani et al; Informacije Midem, Vol. 45, No. 4 (2015), 249 - 259
es of error. The difference between the reference and PTAT oscillators is only the value of a resistor which guarantees perfect tracking over process and tempera- 9. ture variations. Temperature inaccuracy of -0.84°C to 0.34°C occurs over a wide-range of -40°C to 80°C while the entire sensor consumes less than 93nW to 305nW 10. over the measurement range, digital circuits excluded. While most of low-power temperature sensors have limited accuracy or temperature range, the proposed sensor accurately works over a wide range of 120°C. The proposed in-field digital calibration provides an appropriate low-cost method for sensor calibration. 11. The sensor is suitable to be embedded in passive RFID tags and any other low-power wireless sensing application.
8 References
1.	S. Roy, V. Jandhyala, J. R. Smith, D. J. Wetherall, B. P. Otis, R. Chakraborty, M. Buettner, D. J. Yeager, Y. C. Ko, and A. P. Sample, "RFID: from supply chains to sensor nets," Proceedings of the IEEE, vol. 98, no. 9, pp. 1583-1592 Sep. 2010.
2.	D. Yeager, F. Zhang, A. Zarrasvand, B. P. Otis, "A 9 ^A, addressable Gen2 sensor tag for biosignal acquisition," IEEE J. of Solid-State Circuits, vol. 45, no.
10, pp. 2198-2209, Oct. 2010.
3.	D. De Venuto, and E. Stikvoort, "Low power highresolution smart temperature sensor for autonomous multi-sensor system," IEEE Sensors J., vol. 12, no. 12, pp. 3384-3391, Dec. 2012.
4.	M. Sasaki, M. Ikeda, and K. Asada, "A temperature sensor with inaccuracy of -1/+0.8 °C using 90-nm 1-V CMOS for online thermal monitoring of VLSI Circuits," IEEE Trans. on Semiconductor Manufacturing, vol. 21, no. 2,pp. 201-208, May 2008.
5.	P. C. Crepaldi, R. L. Moreno, and T. C. Pimenta, "Low-voltage, low-power, high linearity front-end thermal sensing element," Electronics letters, vol. 46, no. 18, pp. 1271-1272, Sep. 2010.
6.	A. Vaz, A. Ubarretxena, I. Zalbide, D. Pardo, H. Solar, A. Garcia-Alonso, and R. Berenguer, "Full passive UHF tag with a temperature sensor suitable for human body temperature monitoring," IEEE Trans. on Circuits and Systems II: Express Briefs, vol. 57, no. 2, pp. 95-99, Feb. 2010.
7.	Z. Qi, Y. Zhuang, X. Li, W. Liu, Y. Du, and B. Wang, "Full passive UHF RFID Tag with an ultra-low power, small area, high resolution temperature sensor suitable for environment monitoring," Microelectronics J., vol. 45, pp. 126-131, Oct. 2013.
8.	S. Jeong, Z. Foo, Y. Lee, J. Y. Sim, D. Blaauw, and D. Sylvester, "A fully-integrated 71 nW CMOS temperature sensor for low power wireless sensor
nodes," IEEE J. of Solid-State Circuits, vol. 49, no. 8, pp. 1682-1693, Aug. 2014. F. Kocer, and M. P. Flynn, "An RF-powered, wireless CMOS temperature sensor," IEEE Sensors J., vol. 6, no. 3, pp. 557-564, Jun. 2006. J. Yin, J. Yi, M. K. Law, Y. Ling, M. C. Lee, K. P. Ng, H. C. Luong, A. Bermak, M. Chan, W. H. Ki, C. Y. Tsui, and M. Yuen, "A system-on-chip EPC Gen-2 passive UHF RFID tag with embedded temperature sensor," IEEE J. of Solid-State Circuits, vol. 45, no. 11, pp. 2404-2420, Nov. 2010. B. Wang, M. K. Law, A. Bermak, and H. C. Luong, "A passive RFID tag embedded temperature sensor with improved process spreads immunity for a -30 °C to 60 °C sensing range," IEEE Trans. on Circuits and Systems I: Regular papers, vol. 61, no. 2, pp. 337-346, Feb. 2014.
M. K. Law, and A. Bermak, "A 405-nW CMOS temperature sensor based on linear MOS operation," IEEE Trans. on Circuits and Systems II: Express Briefs, vol. 56, no. 12, pp. 891-895, Dec. 2009. M. K. Law, A. Bermak, and C. Howard, "A sub-^W embedded CMOS temperature sensor for RFID food monitoring application," IEEE j. of solid-state circuits, vol. 45, no. 6, pp. 1246-1255, Jun. 2010. P. Ituero, J. Ayala, and M. Lopez-Vallejo, "A na-nowatt smart temperature sensor for dynamic thermal management," IEEE Sensors J., vol. 8, no. 12, pp. 2036-2043, Dec. 2008. P. Chen, T. K. Chen, Y. S. Wang, and C. C. Chen, "A time-domain sub-micro watt temperature sensor with digital set-point programming," IEEE Sensors J., vol. 9, no. 12, pp. 1639-1646, Dec. 2009. Ch. Ch. Chen, and H. W. Chen, "A low-cost CMOS smart temperature sensor using a thermal-sensing and pulse-shrinking delay line," IEEE Sensors J., vol. 14, no. 1, pp. 278-284, Jan. 2014. Y. J. An, K. Ryu, D. H. Jung, S. H. Woo, and S. Jung, "An energy efficient time-domain temperature sensor for low-power on-chip thermal management," IEEE Sensors J., vol. 14, no. 1, pp. 104-110, Jan. 2014.
K. Sundaresan, P. H. Allen, and F. Ayazi, "Process and temperature compensation in a 7-MHz CMOS clock oscillator," IEEE J. of Solid-State Circuits, vol. 41, no. 2, pp. 433-442, Feb. 2006. BSIM3v3.2.2 MOSFET model users' manual, University of California, Berkeley, USA, 1998. pp. 2-30. M. Pertijs, K. A. A. Makinwa, and J. H. Huijsing, "A CMOS smart temperature sensor with a 3o inaccuracy of ±0.1 °C from -55 °C to 125 °C," , IEEE J. of Solid-State Circuits, vol. 40, no.12, pp. 2805- 2815, Feb. 2005.
A. L. Aita, M. A. P. Pertijs, K. A. A. Makinwa, J. H. Huijsing, and G. C. M. Meijer, "Low-power CMOS
258
M. R. Ghaderi Karkani et al; Informacije Midem, Vol. 45, No. 4 (2015), 249 - 259
smart temperature sensor with a batch-calibrated inaccuracy of ±0.25 °C (±3o) from- 70 °C to 130 °C," IEEE Sensors J., vol. 13, no. 5, pp. 1840-1848, May 2013.
Arrived: 30. 07. 2015 Accepted: 10. 11. 2015
259
Original scientific paper
/midem
Journal of M
Informacije |
Journal of Microelectronics, Electronic Components and Materials Vol. 45, No. 4 (2015), 260 - 265
Specific heat capacity and thermal conductivity of the electrocaloric (1-x)Pb(Mg1/3Nb2/3)O—xPbTiO3 ceramics between room temperature and 300oC
Hana Uršič1'2, Marko Vrabelj1,2, Lovro Fulanovič12, Andraž Bradeško1,2, Silvo Drnovšek1, Barbara Malič12
1Jozef Stefan Institute, Electronic Ceramics Department, Ljubljana, Slovenia 2Jozef Stefan Postgraduate School, Ljubljana, Slovenia
Abstract: We report the specific heat capacity and thermal conductivity of electrocaloric (1-x)Pb(Mg1/3Nb2/3)O3-xPbTiO3 (x = 0, 0.1, 0.3 and 0.35) ceramics between room temperature and 300 °C. The specific heat capacity for all ceramic samples is between 0.323 and 0.326 J/gK at 35 °C. For the samples with a high PbTiO3 content (x = 0.3 and 0.35), a pronounced anomaly is observed in the specific heat capacity versus temperature at 130 °C and 153 °C, indicating the phase transition from the polar to a non-polar phase. The thermal conductivity in this system significantly depends on the PbTiO3 content. The lowest thermal conductivity is obtained for Pb(Mg1/3Nb2/3)O3, and it increases with increasing PbTiO3 content in the whole temperature range. For example at 23 °C the values of thermal conductivity of Pb(Mg1/3Nb2/3)O3 and 0.65Pb(Mg1/3Nb2/3)O3-0.35PbTiO3 are 1.25 W/mK and 1.43 W/mK, respectively.
Keywords: PMN-PT; thermal conductivity; specific heat capacity; relaxor-ferroelectric; electrocaloric
Specifična toplotna kapaciteta in toplotna prevodnost elektrokalorične keramike (1-x)Pb(Mg1/3Nb2/3)O3-xPbTiO3 v temperaturnem območju od sobne temperature do 300°C
Izvleček: V članku poročamo o specifični toplotni kapaciteti in toplotni prevodnosti elektrokalorične keramike (1-x)Pb(Mg1/3Nb2/3)O3-xPbTiO3 (x = 0, 0,1, 0,3 in 0,35) v temperaturnem območju od sobne temperature do 300 °C. Specifična toplotna kapaciteta vseh merjenih vzorcev se pri 35 °C giblje v intervalu med 0,323 in 0,326 J/gK. Ko vzorce segrevamo, pri vzorcih z večjim deležem PbTiO3 (x = 0,3 in 0,35) v meritvah toplotne kapacitete opazimo anomalijo, ki je značilna za premeno iz polarne v nepolarno fazo. Toplotna prevodnost trdne raztopine je izrazito odvisna od deleža PbTiO3. Pb(Mg1/3Nb2/3)O3 izkazuje manjšo toplotno prevodnost kot 0,65Pb(Mg1/3Nb2/3)O3-0,35PbTiO3 v celotnem temperaturnem območju. Na primer, pri temperaturi 23 °C je toplotna prevodnost Pb(Mg1/3Nb2/3)O3 enaka 1,25 W/mK, medtem, ko je toplotna prevodnost 0,65Pb(Mg1/3Nb2/3)O3-0,35PbTiO3 za 13 % višja.
Ključne besede: PMN-PT; toplotna prevodnost; specifična toplotna kapaciteta; relaksor-feroelektrik; elektrokalorik * Corresponding Author's e-mail: hana.ursic@ijs.si
1 Introduction
The relaxor-ferroelectric (1-x)Pb(Mg1/3Nb2/3)O3-xPbTiO3 (PMN-100xPT) ceramics exhibit a high dielectric permittivity, polarization, electromechanical [1-4] and electrocaloric (EC) effects [5, 6] and can be used for different applications, such as piezoelectric sensors, actuators, transducers [7-10] and cooling devices of a
new generation [11, 12]. In our previous work we have shown that PMN-30PT [13] and PMN-10PT [14] bulk ceramics possess the EC temperature changes (DTEC) as high as 2.7 °C and 3.5 °C, respectively. These values are the highest reported for the lead-based ceramics [5, 6]. By using such ceramic PMN-PT elements in a cooling device with the efficient heat regeneration system, the
260
© MIDEM Society
M. Malajner et al; Informacije Midem, Vol. 45, No. 4 (2015), 237 - 248
temperature span between the hot and the cold sides of the regenerator can exceed the temperature change of a single PMN-PT ceramic plate by several times [11]. However, when designing an EC device not only the EC properties, but also the thermal properties of the EC materials may have a great impact on its efficiency. Heat diffuses out of an EC layer of the thickness d in the time tdiff ~ d2Cp/X, where Cp is the specific heat capacity and X is the thermal conductivity [15, 16]. Therefore for an efficient cooling device, the EC material should possess a low C to X ratio.
enized powder mixture was high-energy milled in a Retsch PM 400 planetary mill at 300 rpm for up to 140 h, and additionally milled in a Netzsch PE 075 / PR 01 at-tritor mill at 800 rpm for 4 h in isopropanol. The powder compacts were pressed uniaxially at 50 MPa and then isostatically at 300 MPa. The compacts were sintered in double alumina crucibles in the presence of the packing powder of the same composition at 1200 °C for 2 h with the heating and cooling rates 2 °C/min. Regarding the synthesis procedure of PMN-100xPT please see also [2, 13, 14, 17].
In the present study we report the specific heat capacity and thermal conductivity of the PMN-100xPT ceramics of different compositions, i.e., x = 0, 0.1, 0.3 and 0.35. These material compositions were intentionally selected due to their specific properties. The PMN-rich compositions are of interest because of their large room temperature EC effects [13, 17, 18]. Furthermore, it has been recently shown that for a highly-efficient device the PMN and PMN-10PT ceramic elements are preferable than the elements from the PT-rich compositions due to their slim polarization versus electric field hysteresis loops and consequent lower losses [19]. On the other hand the PMN-35PT composition is the morphotropic phase boundary (MPB) composition and possesses high piezoelectric [1-3] as well as EC properties [14, 19], which could be an added value in the development of multifunctional devices.
The Cp was mainly studied in PMN and PMN-PT single crystals [20-25]. Only the Cp of PMN ceramics can be found in the literature [26, 27]. The X of PMN and PMN-PT materials were previously studied mainly in the single-crystal form [20, 28-31] in the range between -271 °C and 117 °C. A few studies report the X of PMN ceramics, but only at very low temperatures (i.e., from -271 °C to -173 °C) [26, 32]. The temperature dependence of DTEC for PMN-PT exhibits the maximum value at the relaxor-to-ferroelectric phase-transition temperature [13, 14], which typically takes place in the middle of the low-temperature slope of the dielectric permittivity peak, i.e., below room temperature for PMN, and at ~160 °C for PMN-35PT. For designing cooling devices based on EC materials, the knowledge of the thermal properties of the PMN-PT ceramics is needed, especially at elevated temperatures.
2 Materials and methods
For the synthesis of the stoichiometric PMN-100xPT (x = 0, 0.1, 0.3 and 0.35) ceramic powder, PbO (99.9 %, Al-drich), MgO (99.9 % Alfa Aesar), Nb2O5 (99.9 %, Aldrich) and TiO2 (99.8 %, Alfa Aesar) were used. The homog-
The densities of the sintered pellets were measured with a gas-displacement density analyser (Micromeri-tics, AccuPyc III 1340 Pycnometer). The relative densities (RD) of the samples were 97.08 %, 95.52 %, 96.31 % and 97.59 % for the compositions with increasing PT content. For these calculations the theoretical density of PMN was used, i.e., 8.13 g/cm3 (JCPDS 81-0861). For the microstructural analysis with a field-emission scanning electron microscope FE-SEM (JEOL FEG-SEM JSM-7600) the samples were ground, polished and thermally etched. The FE-SEM images of the etched surfaces reveal homogeneous and uniform microstructures (Figure 1). For the stereological analyses more than 250 grains per sample were measured. The grain size (GS) is expressed as the Feret's diameter (Figure 1).
Figure 1: FE-SEM micrographs of PMN-100xPT (x = 0, 0.1, 0.3 and 0.35) ceramics.
The specific heat capacity versus temperature Cp(T) of the ceramic samples was determined from the differential scanning calorimetry curves measured using a differential scanning calorimeter DSC (Netzsch, DSC 204 F1). The samples of ~5 mm in diameter and the thickness of ~1 mm were put in Pt crucibles with lids, and heated in a calorimeter with a heating rate of 2 °C/ min from 35 °C to 300 °C. Each measurement was repeated two times. To determine the C of the samples,
261
H. Uršič et al; Informacije Midem, Vol. 45, No. 4 (2015), 260 - 265
sapphire (Netzsch, diameter of 5.2 mm, thickness of 1 mm) was used as the standard material.
For the measurements of the thermal transport properties versus temperature, the 8 mm thick pellets with 18 mm in diameter were prepared. The measurements were performed by the transient plane source technique [33] using the HotDisk TPS 2500S equipment (Hot Disk AB, Gothenburg, Sweden). The HotDisk sensor (Mica, 3.2 mm diameter, C5465) was placed between two ceramic pellets. The basic principle of the system is to supply the constant power to the sample via a Hot Disk sensor. The sensor is used as the heat source and the temperature monitor. The material was heated at 100 mW for 10 s. The length of the heating pulse was chosen short enough so that the sensor could be considered in contact with an infinite solid throughout the recording. In this way the thermal properties of the studied material could be determined by measuring the temperature increase of the sensor in a short period of time [34, 35]. The measurements at elevated temperatures (from 50 °C to 300 °C) were performed in a tube furnace (Entech) in nitrogen atmosphere to prevent the oxidation of the sensor. The step and the stabilization time of the measurements were 10 °C and 15 min, respectively. The step of 2 °C was used for the measurements performed in the range close to the phase transition temperature from polar to non-polar phases for PMN-30PT and PMN-35PT samples. At each temperature 5 measurements were performed. The waiting time between each measurement was long enough that the sample temperature reached the equilibrium with the furnace temperature (5 min). The room temperature measurements (i.e. 23 °C) were performed in air. In order to compare the thermal conductivity of ceramic samples with different chemical composition, i.e., PMN-100xPT (x = 0, 0.1, 0.3 and 0.35), the thermal conductivities were normalized to the sample density using the equation:
k =
K -Kir(1 'P) P
(1)
where Xm is the measured thermal conductivity of the sample, Xair is the thermal conductivity of the air, which is equal to 0.026 W/mK [36] and p is the RD of the sample.
3 Results and discussion
The specific heat capacity, Cp, versus temperature of PMN-100xPT (x = 0, 0.1, 0.3 and 0.35) ceramic samples is shown in Figure 2, and the values of Cp at selected temperatures are collected in Table 1. At 35 °C the C
is between 0.323 and 0.326 J/gK for all samples. These values are in a good agreement with previously published data for PMN-29PT single crystals [24]. The Cp of PMN and PMN-10PT increases continuously with increasing temperature. In PMN such slow increase of Cp is observed up to ~220 °C, and at higher temperatures it remains almost constant. In PMN-10PT, the plateau is reached at a lower temperature, at ~120 °C (Table 1).
	0.42-
	0.40-
	
	
-1	0.38-
a	
u	
	0.36-
	0.34-
	0.32-
PMN-35PT	
PMN-30PTAJ	
J^^PMI^OPT "^jjMlIM	
	"PMN
50
100
150 T (X)
200
250
T
300
Figure 2: The Cp(T) of PMN-100xPT (x = 0, 0.1, 0.3 and 0.35) ceramics.
Table 1: Selected Cp values of PMN-100xPT samples (from Figure 2).
Cp (J/gK)				
T	PMN	PMN-10PT	PMN-30PT	PMN-35PT
35 °C	0.326	0.325	0.326	0.323
50 °C	0.330	0.335	0.336	0.332
100 °C	0.340	0.347	0.355	0.350
200 °C	0.345	0.349	0.363	0.362
300 °C	0.346	0.350	0.357	0.353
In PT-rich compositions the Cp(T) behaviour is different;
clear anomalies in C are observed at 130 °C and 153 °C
p
related to the phase transitions from monoclinic and tetragonal phases to the high temperature cubic phase for PMN-30PT and PMN-35PT, respectively. These temperatures are in a good agreement with the published temperatures of dielectric permittivity anomalies; for PMN-30PT at ~130 °C (at 0.4 kHz) [14] and for PMN-35PT at ~160 °C (at 1 kHz) [1]. The phase transition anomaly observed in PMN-35PT is much more pronounced (DC = 0.062 J/gK) in comparison to the one observed in PMN-30PT (DCp= 0.008 J/gK). No anomalies have been detected in PMN and PMN-10PT samples within our temperature measurement range. Namely, for PMN and PMN-10PT ceramics the dielectric anomalies were reported at ~-15 °C [37] and ~40 °C [13, 37] (at 1 kHz), respectively, which is below or close to the lower limit of our measurement range.
262
H. Uršič et al; Informacije Midem, Vol. 45, No. 4 (2015), 260 - 265
The temperature dependence of thermal conductivity, X, of PMN-100xPT ceramic samples is shown in Figure 3 and the X values at selected temperatures are given in Table 2. At 23 °C the X values of Pb(Mg1/3Nb2/3)O3 and 0.65Pb(Mg1/3Nb2/3)O3-0.35PbTiO3 are 1.25 W/mK and 1.43 W/mK, respectively. The X increases with increasing temperature for all samples until approximately 250 °C, while above this temperature it is approximately constant. Such non-linear behaviour of the X(T) has been also observed in PMN-PT single crystals [20]. An interesting observation deduced from Figure 3 is that X significantly increases with increasing PT content in the whole temperature range which is in agreement with [31], where PMN single crystals possess lower X in comparison to PT single crystals in the whole measurement range, i.e., from -271 °C to 117 °C. Thermal conductivity measurements of BaTiO3 and KNbO3 single crystals revealed anomalies in X(T at the phase transition temperatures [31]. In our study, no such anomalies were observed for PMN-30PT and PMN-35PT at our measurement conditions, i.e., with the step of 2 oC in the temperature intervals close to the respective phase transition temperature from ferroelectric to high-temperature cubic phase, see the insets in Figure 3.
Figure 3: The U(T of PMN-100xPT (x = 0, 0.1, 0.3 and 0.35) ceramics. Insets: The U(T) of PMN-30PT (below) and PMN-35PT (above) measured in the range close to the phase transition temperature from ferroelectric to the high-temperature cubic phase.
Table 2: Selected values of U from Figure 3.
A (W/mK)				
T	PMN	PMN-10PT	PMN-30PT	PMN-35PT
23 °C	1.25	1.27	1.38	1.43
50 °C	1.30	1.32	1.41	1.45
100 °C	1.36	1.39	1.47	1.53
200 °C	1.43	1.46	1.59	1.64
300 °C	1.47	1.51	1.66	1.71
As previously mentioned for the efficient cooling device, the Cp to X ratio of EC material should be as low as possible. The values of the Cp/X ratio of PMN-100xPT ceramic samples (x = 0, 0.1, 0.3 and 0.35) are shown in Figure 4. In the whole measurement range the lowest Cp/X ratio is obtained for PMN-35PT ceramics and the highest ones are obtained for PMN and PMN-10PT ceramics. For example, at 50 °C the Cp/X ratio of PMN-35PT is for 10 % lower than the one of PMN-10PT and PMN samples. Note also that the anomaly in the C /X(T) of PMN-30PT and PMN-35PT appears at 130 °C and 153 °C, respectively, and it is the result of the pronounced anomaly obtained in the Cp(T) measurement (see Figure 2).
Figure 4: The Cp/X(T) of PMN-100xPT (x = 0, 0.1, 0.3 and 0.35) ceramics.
4 Summary and conclusions
In the present work, we examined the thermal properties of PMN-PT, which is one of the most promising EC materials compositions. We studied four different compositions; from relaxor PMN to the MPB composition PMN-35PT. The measurements of specific heat capacity versus temperature Cp(T vary for different PMN-PT compositions. At 35 °C the Cp is between 0.323 and 0.326 J/gK. With increasing temperature, the Cp increases continuously untill it reaches the plateau value in PMN and PMN-10PT. No anomalies are observed in the whole measured temperature range. In PT-rich compositions the Cp(T dependence is different; pronounced anomalies are observed at 130 °C and 153 °C for PMN-30PT and PMN-35PT, respectively. The temperatures at which the anomalies occur correspond to the previously published phase transition temperatures from polar to high-temperature non-polar phases obtained from dielectric spectroscopy data. The heat capacity anom-
263
H. Uršič et al; Informacije Midem, Vol. 45, No. 4 (2015), 260 - 265
aly observed in PMN-35PT is much more pronounced (AC = 0.062 J/gK) in comparison to the one observed in PmN-30PT (AC = 0.008 J/gK).
The thermal conductivity X of PMN at 23 oC is 1.25 W/ mK, while the one of PMN-35PT is about 13 % larger. Such increase of X with increasing PT content in the PMN-100xPT materials persists within the whole measurement temperature range.
To conclude, for an efficient cooling element the EC material should possess a low Cp/X ratio. PMN-35PT ceramics possess the lowest Cp/X ratio in the whole measurement range is spite of the anomaly at ~153 °C. On the other hand, the PMN and PMN-10PT materials possess a higher Cp/X ratio, but no pronounced anomaly in Cp/X is detected in this measurement range. The answer to the question: "Which composition of PMN-PT material is more appropriate for use in efficient EC cooling device of new generation?" is not trivial. From the thermal point of view, the more appropriate compositions are the ones with higher amounts of PT (i.e., PMN-30PT and PMN-35PT), but on the other hand these compositions possess high piezoelectric coefficients and well defined ferroelectric hysteresis loops, which could be a drawback for some specific electrocaloric applications.
5 Acknowledgements
The authors thank the Slovenian Research Agency for financial support in the frame of the projects L2-6768, doctoral projects PR-06166, PR-06804, PR-05025 and program P2-0105. Centre of Excellence NAMASTE is acknowledged for access to the HotDisk TPS 2500S equipment.
6 References
1.	J. Kelly, M. Leonard, C. Tantigate, A. Safari: Effect of composition on the electromechanical properties of (1-x)Pb(Mg1/3 Nb2/3)O3- xPbTiO3 ceramics, J. Am. Ceram. Soc. 1997, 80, 957.
2.	H. Uršič, J. Tellier, M. Hrovat, J. Holc, S. Drnovšek, V. Bobnar, M. Alguero, M. Kosec: The Effect of Poling on the Properties of 0.65Pb(Mg1/3Nb2/3)O3-0.35PbTiO3 Ceramics, Jpn. J. Appl. Phys. 2011, 50, 035801.
3.	T. Y. Koo, S. W. Cheong: Dielectric and piezoelectric enhancement due to 90° domain rotation in the tetragonal phase of Pb(Mg1/3Nb2/3)O3-PbTiO3, Appl. Phys. Lett. 2002, 80, 4205.
4.	H. Uršič, M. Škarabot, M. Hrovat, J. Holc, M. Skalar, V. Bobnar, M. Kosec, I. Muševič:
The electrostrictive effect in ferroelectric 0.65Pb(Mg1/3Nb2/3)O3 -0.35PbTiO3 thick films, J. Appl. Phys., 2008,103, 12410 1.
5.	M. Valant: Electrocaloric materials for future solidstate refrigeration technologies, Prog. Mater. Sci., 2012, 57, 980.
6.	X. Moya, S. Kar-Narayan, N. D. Mathur: Caloric materials near ferroic phase transitions, Nat. Mater. 2014, 13, 439.
7.	H. Uršič, F. Levassort, J. Holc, M. Lethiecq, M. Kosec: 0.65Pb(Mg1/3Nb2/3)O3-0.35PbTiO3 Thick Films for High-Frequency Piezoelectric Transducer Applications, Jpn. J. Appl. Phys. 2013, 52, 055502.
8.	F. Levassort, A. C. Hladky-Hennion, H. Le Khanh, P. Tran-Huu-Hue, M. Lethiecq, M. Pham Thi: 0-3 and 1-3 piezocomposites based on single crystal PMN-PT for transducer applications, Advances in Applied Ceramics, 2010, 109, 162.
9.	J. F. Tressler, S. Alkoy, R. E. Newnham: Piezoelectric sensors and sensor materials, J. Electroceram., 1998, 2, 257.
10.	H. Uršič, M. Hrovat, J. Holc, S. M. Zarnik, S. Drnovšek, S. Maček: Sensors Actuat. B, 2008, 133, 699.
11.	U. Plaznik, A. Kitanovski, B. Rožič, B. Malič, H. Uršič, S. Drnovšek, J. Cilenšek, M. Vrabelj, A. Poredoš, Z. Kutnjak: Bulk relaxor ferroelectric ceramics as a working body for an electrocaloric cooling device, Appl. Phys. Lett. 2015, 106, 043903.
12.	X. Moya, E. Defay, V. Heine, N. D. Mathur: Too cool to work, Nat. Phys., 2015, 11, 202.
13.	M. Vrabelj, H. Uršič, Z. Kutnjak, B. Rožič, S. Drnovšek, A. Benčan, V. Bobnar, L. Fulanovic, B. Malič: Large electrocaloric effect in grain-size-engineered 0.9Pb(Mg1/3Nb2/3)O3-0.1PbTiO3, J. Eur. Ceram. Soc., 2016, 36, 75.
14.	B. Rožič, M. Kosec, H. Uršič, J. Holc, B. Malič, Q.M. Zhang, R. Blinc, R. Pirc, Z. Kutnjak: Influence of the critical point on the electrocaloric response ofrelaxor ferroelectrics, J. Appl. Phys. 2011, 110, 064118.
15.	R. I. Epstein and K. J. Malloy: Electrocaloric devices based on thin-film heat switches, J. Appl. Phys. 2009, 106, 064509.
16.	S. Karmanenko, A. Semenov, A. Dedyk, A. Eskov, A. Ivanov, P. Beliavskiy, Y. Pavlova, A. Nikitin, I. Star-kov, A. Starkov, O. Pakhomov: New Approaches to Electrocaloric-Based Multilayer Cooling (Eds: T. Correia, Q. Zhang), Springer-Verlag, Berlin Heidelberg, Germany, 2014, pp. 186.
17.	B. Rožič, B. Malič, H. Uršič, J. Holc, M. Kosec & Z. Kutnjak: Direct Measurements of the Electrocaloric Effect in Bulk PbMg1/3Nb2/3O3 (PMN) Ceramics, Ferroelectrics, 2011, 421, 103.
18.	J. Perantie, H. N. Tailor, J. Hagberg, H. Jantunen, Z.-G. Ye, Electrocaloric properties in relaxor fer-
264
H. Ursic et al; Informacije Midem, Vol. 45, No. 4 (2015), 260 - 265
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
roelectric (1-x)Pb(Mg1/3Nb2/3)O3-xPbTiO3 system, J.Appl. Phys. 2013, 114, 174105. U. Plaznik, M. Vrabelj, Z. Kutnajk, B. Malic, A. Poredos, A. Kitanovski: Electrocaloric cooling: The importance of electric-energy recovery and heat regeneration, Europhysics Letters, 2015, 111, 57009.
M. Tachibana, E. Takayama-Muromachi: Thermal conductivity and heat capacity of the relaxor ferroelectric [PbMg^Nb^O^JPbTiOJ*, Phys. Rev. B 2009, 79, 100104.	*
Z. Kutnjak: Heat capacity response of relaxor fer-roelectrics near the morphotropic phase boundary, Ferroelectrics, 2008, 369, 198. N. Novak, G. Cordoyiannis, Z. Kutnjak, Dielectric and heat capacity study of (PtyMg^Nb^O,.)^-(PbTiO3)026 ferroelectric relaxor near the cubic-tetragonal-rhombohedral triple point, Ferroelectrics, 2012, 428, 43.
Z. Kutnjak, J. Petzelt, R. Blinc, The giant electromechanical response in ferroelectric relaxors as a critical phenomenon, Nature, 2006, 441, 956. Y. Tang, X. Zhao, X. Feng, W. Jin, H. Luo, Pyroelectric properties of [111]-oriented Pb(Mg1/3Nb2/3)O3-Pb-TiO3 crystals, Appl. Phys. Lett., 2005, 86, 0852901.
A.	Fouskova, V. Kohl, N. N. Krainik, I. E. Mylnikova: Specific heat of PbMg1/3Nb2/3O3, Ferroelectrics, 1981, 34, 119.
D. A. Ackerman, D. Moy, R. C. Potter, A. C. Anderson, W.N. Lawless: Glassy behavior of crystalline solids at low temperatures, Phys. Rev. B 1981, 23, 3886.
M. V. Gorev, I. N. Flerov, V. S. Bondarev, Ph. Sciau: Heat Capacity Study of Relaxor PbMg1/3Nb2/3O3 in a Wide Temperature Range, Journal of Experimental and Theoretical Physics, 2003, 96, 531. W. Hassled, E. Hegenbarth: Glasslike behaviour of thermal conductivity at ferroelectric single crystals of relaxor type, Ferroelectrics Letters, 1985, 4, 117-121.
D.M. Zhu, P. D. Han: Thermal conductivity and electromechanical property of single-crystal lead magnesium niobate titanate; Appl. Phys. Lett., 1999, 75, 3868.
J. J. De Yoreo, R. O. Pohl, G. Burns: Low-temperature thermal properties of ferroelectrics, Phys. Rev.
B,	1985, 32, 5780.
M. Tachibana, T. Kolodiazhnyi, E. Takayama-Muromachi: Thermal conductivity of perovskite ferroelectrics, Appl. Phys. Lett. 2008, 93, 092902. M. Fahland, E. Hegenbarth: Thermal conductivity of Pb(Mg1/3Nb2/3)O3 under the influence of high electric field, Ferroelectric Letters, 1993, 15, 89. S. E. Gustafsson: Transient plane source technique for thermal conductivity and thermal diffusivity
34.
35.
36.
37.
measurements of solid materials, Rev. Sci. Instrum. 1991, 797, 797.
Y. He: Rapid thermal conductivity measurement with a hot disk sensor, Part 1 and 2, Thermochimi-ca Acta, 2005, 436, 122.
B. Bertoncelj, K. Vojisavljevic, M. Vrabelj, B. Malic: Thermal properties of polymer-matrix composites reinforced with E-glass fibers, Informacije MIDEM, Journal of Microelectronics, Electronic Components and Materials, 2015, 45, 216. D. R. Lide: Handbook of chemistry and phyiscs, CRC Press LLC, 84 edition, 2003-2004, pp. 1161. S. L. Swartz, T.R. Shrout, W.A. Schulze, L.E. Cross, Dielectric properties oflead-magnesium niobate ceramics, J. Am. Ceram. Soc., 1984, 87, 311.
Arrived: 03. 12. 2015 Accepted: 02. 01. 2016
265
Original scientific paper
Informacije
Journal of Microelectronics, Electronic Components and Materials Vol. 45, No. 4 (2015), 266 - 276
Low-pass filter for UWB system with the circuit for compensation of process induced on-chip capacitor variation
Branislava Milinkovic12, Milenko Milicevic13, Dorde Simic1, Goran Stojanovic2, Radivoje Duric3 1TES Electronic Solutions GmbH, Stuttgart, Germany
2University of Novi Sad, Faculty of Technical Sciences, Department for Power, Electronic and Communication Engineering, Novi Sad, Serbia
3University of Belgrade, School of Electrical Engineering, Department of Electronics, Belgrade, Serbia
Abstract: This paper describes the design and optimization of a Chebyshev 5th order low pass filter with included circuit for automatic process calibration and compensation. The filter is realized using lumped elements in 130 nm radio frequency (RF) CMOS process and is dedicated to cover lower sub-band (3.4 GHz - 4.8 GHz) of ultra-wideband (UWB) system. The proposed full on-chip calibration concept estimates MIM-Capacitor (Metal-Insulator-Metal) capacitance process-induced variation against more stable on-chip MOS capacitor reference. In order to estimate the capacitance value, a low frequency oscillator is designed, which uses both types of capacitors for generating the oscillations, one after another. The MIM capacitor value is determined in digital domain based on the ratio of two oscillation frequencies and its exact needed value is obtained using a compensation capacitor bank. Detailed mathematical optimization of the calibration method is presented.
All RF, analog and digital circuits have been integrated on a test chip and fabricated in 130 nm RF CMOS process. The produced ICs have been on-wafer measured and compared to simulation results. According to obtained results, the proposed calibration concept lowers process-induced filter transfer characteristic variation from approximately 5 dB to 0.6 dB at the critical frequency. The calibration needs to be applied just once at the beginning of circuit operation. The total area of implemented calibration circuit is less than 0.1 mm2. The same method and the compensation circuit can be employed for the calibration of all on-chip circuits whose performance is affected by MIM capacitance process variation.
Keywords: process variation; compensation; MIM capacitor; low-pass filter; UWB
Izvleček: Članek opisuje optimizacijo Chebyshevega nizkopasovnega filtra petega reda, ki vključuje vezje za avtomatsko kalibracijo in kompenzacijo. Filter je realiziran z uporabo 130 nm CMOS procesa in je namenjen za podpas (3.4 GHz - 4.8 GHz) UWB sistema. Predlagan polno integriran koncept kalibracije ocenjuje spremembe kapacitivnosti MIM kondenzatorja v nasprotju s stabilnim referenčnim MOS kondenzatorjem. Za oceno kapacitivnosti je uporabljen nizkofrekvenčni oscilator. Opravljena je bila natančna matematična optimizacija kalibracijske metode.
RF, analogna in digitalna vezja so integrirana na testnem čipu v 1300 nm RF CMOS tehnologiji. Čipi so bili merjeni na rezini. Glede na rezultate predlagana kalibracija zmanjšuje procesno prožen prenos karakteristike sprememb za 0.6 do 5 dB pri kritični frekvenci. Ista metodologija se lahko uporabi za vsa vezja, ki so obremenjena s spremembami MIM kapacitivnosti.
Ključne besede: variacije procesa; kompenzacija; MIM kondenzator; nizkopasovni filter; UWB
* Corresponding Author's e-mail: branislava.milinkovic@tes-dst.com
Nizko pasovni filter za UWB system z vezjem za kompenzacijo procesno vzpodbujenega spreminjanja integriranega kondenzatorja
266
© MIDEM Society
B. Milinkovič et al; Informacije Midem, Vol. 45, No. 4 (2015), 266 - 276
1 Introduction
Constant IC manufacturing technology scaling allows the device integration in ever smaller area. As an adverse effect, the size reduction causes degradation of the intrinsic precision of the manufactured components [1]. In order to satisfy extreme design constraints on the analog/RF circuits with given component tolerances, some method of digital calibration must be applied [1]. For the mass product solutions, it is very important that the calibration circuits take as small area on the silicon as possible. Since with the technology scaling the size of digital devices is reduced, it is possible to implement complex digital calibration circuits occupying very small on-chip area. Moreover, for consumer products, external references are not applicable, since the external component size is almost comparable to the chip size [2].
In this paper, the design of a passive, LC low-pass filter is described. The DC inductance value for on-chip inductors is mostly insensitive to process variations [2], but on-chip capacitors notably change their values due to the finite manufacturing accuracy. The Table 1 presents capacitor value variations for three types of capacitors, available in the used technology.
The MIM-capacitors are the most suitable for RF applications, since they are the most linear and have the highest Q-factor of all available on-chip capacitor types, so this type of the capacitor is chosen to be used in the design. In case of MIM-capacitors, the shift in capacitance occurs mostly due to the oxide thickness variation, rather than to the temperature-induced variation. Unfortunately, the shift in the capacitance value will degrade the final performance of circuits beyond allowed limitations, so an adequate calibration and compensation method must be applied. Presented calibration concept compensates MIM-capacitor variation and can be applied in any circuit which characteristics are deteriorated due to MIM cap process-induced variations.
Table 1: Tolerances of the available capacitors in the used technology
Capacitor type	Process and temperature (-40:125°C) tolerance
MIM	±15%
MOM	±15%
MOS	±4%
The Paper [3] has demonstrated a way to estimate and compensate capacitor values using an external reference. In [2], the calibration concept with the internal
on-chip reference has been proposed, which makes calibration suitable for the small form factor solutions. The price is paid by limited accuracy of the reference, but at the other hand, the approach is insensitive to parasitic and systematic errors introduced by calibration circuit. This paper combines these two calibration approaches with additional optimizations, offering unique calibration solution, applicable for mass production. The proposed solution is applied on the low-pass filter calibration. The concept is verified through measurements.
The second section describes low-pass filter implementation. In the third section, the calibration concept is reviewed in detail. The experimental results are presented in the fourth section that is followed by conclusion in the fifth section.
2 Low-pass filter
2.1 Description
Ultra Wideband (UWB) systems are very suitable for low cost, low power or high data rate, short range communication. By using a large bandwidth, they are immune to narrow band interference and multipath fading. These systems are preferable in the applications that demand high security level, since transmitted signal is noise-like, and hence hard to intercept.
Filters are one of the key components in UWB systems. In transmitter, they control out-of-band radiation and suppress higher harmonics. In receiver, filters enable the suppression of unwanted signals and interferers. Proposed filter is designed for the lower band of UWB system according to 802.15.4a standard [4]. The filter can be applied in both, transmitter and receiver.
The specifications of the proposed 5th order Chebyshev low-pass filter are listed in Table 2. The specifications are chosen based on transmitter transmission power level and linearity and estimated levels of unwanted signals and interferers in image band on the receiver side. The Chebyshev filter has the best compromise between pass-band ripple, which degrades Error Vector Modulation (EVM) and selectivity, which limits out-of-band emission and reception.
Table 2: Filter specifications
Item	Value	Description
Zin	50Q	
Zout	50Q	
S11	<-10dB	
267
B. Milinkovic et al; Informacije Midem, Vol. 45, No. 4 (2015), 266 - 276
S22	<-10dB	
IL	<1 dB	Goal
Order	5	Chebyshev
Ripple	±0.5dB	
Fc	4.8GHz	3dB point
	-15dB	@6.4GHz
Selectivity	-30dB	@8.53GHz
	-40dB	@10.67GHz
3 Calibration
The first filter implementation has been synthesized using ideal component values from [5]. Due to the low Q-factor of the on-chip passive components, optimization of component values under nominal conditions has been performed. The filter schematic and obtained S-parameter simulation results through all three process variation corners are presented in Figure 1 and Figure 2, respectively. The results are obtained on the schematic level and extracted parasitic effects after the circuit layout will introduce additional losses.
RF,
IN

L

L2
RF,
OUT
C^

PADn
Figure 1: LPF- schematic
Figure 2: Filter transfer characteristic in slow(blue), typical(green) and fast(red) corner
In this section, the calibration concept using internal reference is described in details.
The most suitable internal reference for on-chip capacitor calibration can be obtained using MOS capacitors. As it can be seen from the Table 1, the variation with a process and temperature is acceptable ±4%. But MOS capacitors are very nonlinear and can't be used in the circuits without appropriate polarization. For the purpose of the calibration, the reference MOS capacitor is polarized in the region where its nonlinear behavior is negligible.
3.1 Concept
Each of three capacitors from Figure 1 is replaced with a bank of one base and several tuning capacitors, used for the compensation. Depending on process variation effects on the capacitance value, the corresponding compensation capacitors are included or excluded from the circuit operation using RF switches, Figure 3. Thus, the effective capacitance is adjusted to the nominal value under all process variations. Control signals for the switches are generated from the circuits that estimate capacitor process variation.

Figure 3: Capacitor bank
Figure 4 illustrates the concept of MIM-capacitor value estimation. The oscillator core generates oscillations on charge-pump principle using first the MIM and than the MOS capacitance. The MIM-capacitor value is calculated in digital domain by determining ratio of the oscillation frequencies which corresponds to inverse ratio of the MIM and MOS capacitor values.
As it can be seen from Table 2 and Figure 2, the specifications are not fulfilled under all process variations. One way to overcome this problem is to increase the order of the filter. That leads to overdesign at the price of larger chip area. Another solution is to apply a calibration and this solution is the preferred one.
Figure 4: Calibration concept
The real advantage of the proposed approach is cancellation of PVT (Process, Voltage and Temperature)
268
B. Milinkovic et al; Informacije Midem, Vol. 45, No. 4 (2015), 266 - 276
variations for all components used for frequency ratio determination since the same oscillator core generates oscillations in both cases (with MIM and MOS capacitors).
Values of the capacitors in the cap bank and the process values at which they are included in the circuit operation are chosen according to the calculation derived in [3].
New value of the nominal capacitor is:
1 + £
C = C
nom	nom
k„
(1)
3.2.1 "On" state
When the switch is "on", the gate voltage corresponds to VDD, while VD= VS=0. The impedance seen between drain and source terminals of the transistor is dominated by rds resistance, (4).
~ W-M-CoxL(Vdd - Vh)	(4)
For the minimal transistor length and the fixed polarization we can assume that switch "on" resistance is approximately Ron=Kr/W, where KR=M-C0x-(VDD-V,h)/L is constant. This approximation is good enough in the observed case.
Where C is the capacitor nominal value, £ is maximal
nom	1
acceptable error caused by discrete nature of compensation and k is maximal process value- k =1+3-o,
max	1	max
where o is normalized standard process deviation for MIM-capacitors.
New, n-th compensation capacitor (2) is included in the
circuit operation at the process value given by means
of (3). It is assumed that C is 1. Note, that at k pronom	n
cess value, only C is included in the circuit operation.
In order to calculate contribution of the switch "on" resistance to the filter insertion loss, we need to transform impedances (Figure 5).
RP — RS
- (Q1 +1), CP — Cs
Q2
C
Q +1
Q —
1
(O-Rs ■ C
(5)
S C
P
R
p
Cn =
£•(1 + £) 2"
1 + 3 a (1 -£)"
(2)
:R
S
k =
1 -e
1+e+£ c,
max ,=1
(3)
k„
Satisfactory accuracy of up to £=2% can be reached using three compensation capacitors. Normalized values of the compensation capacitors are presented in Table 3. All capacitor values are normalized to C .
nom
Table 3: Normalized values of the compensation capacitors in cap bank presented in Figure 3.
Capacitor name	Capacitor value
C'nom	0.8872
Cc1	0.0368
CC2	0.0751
CC3	0.1534
3.2 Switch design
Figure 5: A series to parallel impedance transformation
For Q>10 we get:
Rp
■RS • Q J CP ~ CS
(6)
Since we have three switches in the circuit, the total contribution of the switch "on" resistance to the node impedance can be expressed via equivalent parallel resistance, (7). Influence of the switch "on" resistance on filter performance is measured by means of Q-factor. Equivalent Q-factor of the observed node has the lowest value in the case when all three switches are "on". This case corresponds to the "fast" corner (k=k . ).
Req = Ri ■ Q2I|R2 • Ö22 IIR • Q:
(7)
In that case, the equivalent Q-factor of each capacitor can be expressed by (8).
Compensation capacitances are included in the circuit operation via RF switches as presented in the Figure 3. The switches are optimized so that the best compromise between insertion loss (when the switches are "on") and isolation (when the switches are "off") is obtained for the given application.
Q = ■
m kmn • C,
• K
nom R
r 2 •(— W
+ -
W2 W3
(8)
Where r corresponds to CCi/Cnom ratio for i=1,2,3 and r=C' /C . These values are listed in Table 3. u is
0 nom nom
angular frequency, k is minimal process value, C
min	nom
r0 + ri + r2 + r3
2
2
r
r.
)
269
B. Milinkovic et al; Informacije Midem, Vol. 45, No. 4 (2015), 266 - 276
is nominal capacitor value, KR is switch constant expressed above and Wi is width of the i-th switch which includes the compensation capacitors CCi into the circuit operation.
3.2.2 "Off" state
Figure 6 presents switch parasitic capacitors in "off" state. Cgd and Cgs are originating from overlap of the gate poly and drain/source areas and they can be approximately expressed via C ,=C =C =W-L -C' . C., and C,
* r	gd gs ov	ov ox db	sb
are junction capacitances between drain/source terminal and substrate. This capacitance is usually decomposed into bottom plate capacitance, associated with the bottom of the junction, C and sidewall capacitance due to the perimeter of the junction, C . C and C are
r	3	' jsw j	jsw
capacitance per unit area and unit length, respectively, and both can be expressed as C=Cj0/(1+VR/$B)m, where VR is reverse voltage across junction. is the junction build-in potential and m is typically in the range of 0.3 and 0.4 [6]. In order to make these capacitances as low as possible, multi-finger structure is adopted and the drain is connected to the supply voltage in switch "off" state. The switch polarization, as presented in Figure 3, is done via inverter and a high value resistor. The resistor increases output inverter impedance since it appears in parallel with switch "off" impedance.
-cdb +
c„ c
dg wgs
c,,„+c„
w
■■ — ■E-C- + 2 J
W	W
(9)
Where W is transistor width, E is width of the diffusion at drain terminal, Lov is determined by the technology and represents length of an overlap area between gate poly and drain diffusion area, while C'ox is oxide capacitance per unit area.
With the adopted polarization, we can approximately conclude that the drain capacitance is determined with the technology parameters and transistor width, Cdrain~KC'W. Note that in the frequency range of interest (up to 10GHz) and with a good layout we can neglect Rsub in a given technology. Also W/2>>E is assumed.
Capacitor error (£p) due to the switch parasitic capacitance is largest when all switches are "off" and that occurs in slow process corner k=k .
'	max
r.
"p i-
J max
1-c,
- + -
■r,
KcWy
k -r -C
nom j "'max 2 nom
+
kcW2
(10)
C
G
C.
lr -r -C
J max '3 ^nom
gd
S
c ^
D
B
Figure 6: Switch in "off" state
Rsub models the substrate resistance from the junction to the substrate ground and in the given technology it depends on size and distance of the substrate contacts, the transistor size, the number of the gate fingers, and even of nearby circuit elements [7].
With the given polarization and multi-finger structure and with neglecting Rsub the impedance seen from the drain terminal is mainly capacitive and given by means of the following formula:
KCW,
3.2.3 Switch optimization
Without compensation, the IL (Insertion Loss) is determined by Q-factor of the inductors. With the compensation present, the switches can significantly degrade the IL. In order to prevent it, equivalent capacitor Q-factor has to be high enough at the frequency range of interest.
According to (8) equivalent capacitor Q-factor decreases with a frequency. Thus, insertion loss will be the most degraded at the highest frequency where it is important: at cutoff frequency (/=/c=4.8GHz). Based on simulation results that consider degradation of IL due to equivalent capacitor Q-factor degradation, it is found that for capacitors having Q-factor above 40 at f, the degradation will be lower than 0.5dB.
From (8) we can observe that transistor width should be maximal in order to have high Q-factor. From the other side, the width should be minimal for the minimal error, (10) so, the optimal trade-off between insertion loss and capacitor error needs to be made. The calculation below gives the optimum ratio of switch width for a given Q-factor.
270
B. Milinkovic et al; Informacije Midem, Vol. 45, No. 4 (2015), 266 - 276
The goal is to minimize £p for a given Q. For the derivation we are going to use Jesen's inequality-
Xj, , X3 . tj * f (Xi) +12 * f (X ) +t3 * f (X3) ^ f (tj * Xj +12 * X +t3 * X3)
Where / is a convex function, x1, x2 and x3 in its domain, t1, t2 and t3 positive weights for which applies-t1+t2+t3=1. Equality applies if and only if x1= x2= x3 or / is linear. For /(x)=1/x, we can write:

(
1 +
^max ' r\ ' Cnc
kcwx
t'f
+ —•/
' k r-C N
^ I "'max 3 umi<
kcw,
f

^ 1 ^max ' ri ' Cnoi
kcwx k •r C
J max 3 nom
kc-w,,
\
y J
t
' k r-C ^
2 "'max '2 ^nam
kc-w2
k ■r ■C
^ 1 ""max '2 ^nom
kcw2
(11)
r1 r2 1 — + — + — = 1 t t t
From (10), (11) and (12) we can obtain:

1
1 +

ma^ nom
2 -2 2
t ■
• +
W W2 W3
+
)
(12)
(13)
Using (8) we can rewrite (13):
£p — t ■ k
1
1+-
t ■ Kr
(oKR ■ Q
(14)
For constant Q at fc, the expression from the right side in (14) is constant. Note that the expression doesn't vary with the frequency, since it cancels out.
Minimal error can be obtained in the case when the left and the right side of (14) are equal. It will be the case if and only if:
k • r • C	k • r • C	k • r • C
1 I _max v-'nom _ ^ | _max '2 v-'nom _ ^ | _max '3 ^nom (15)
Kc W
Kc W
Kc W
We can than rewrite (15) into condition:
W W2 W3
(16)
With specified Q-factor at fc and (16) and (8) we can obtain the widths of the switches for all three capacitors.
Note that with choosing Q-factor value at f, we determine the capacitor error, too. So if the error for chosen Q-factor is not satisfactory, one can decrease it at the
cost of higher IL. For QC=40 we obtain maximal error of £p=2% which is acceptable. For the capacitor C2 the switches are sized for these values. For C1 and C3 we are restricted with the minimal size of switches in the used technology. In this case, QC=35 for £p=2%. Note that the worst IL degradation and maximum error arise in the case of different corners. For selected switch widths, the IL degradation at fc is below 0.6 dB.
3.3 Compensated filter
The compensated filter is simulated on the extracted level through "fast", "typical" and "slow" corners and obtained S-parameter results are presented in Figure 7.
a)
b)
Figure 7: S-parameters of the compensated filter in slow (blue), typical (green) and fast (red) corner- a) S21 b) S11
If we compare the results with the ones obtained in the non compensated case, Figure 2, we can conclude that the filter transfer characteristic variation of 5 dB at the critical frequency 6.4 GHz is lowered to only 0.6 dB and the specifications are met under all process variations.
2
1
ro + ri + r2 + r
r
r
2
3
271
B. Milinkovic et al; Informacije Midem, Vol. 45, No. 4 (2015), 266 - 276
3.4 Capacitor value estimation
In this section, circuit that generates control bits for designed switches is described in detail.
3.4.1 Oscillator
The oscillator concept is presented in Figure 8 and is in detail described in [3]. Single-ended oscillator circuit generates oscillations on the charge-pump principle. Although the topology is more-less the same like in [3], the design optimization differs a lot. As noted, the design procedure in this work gains the benefits from using the internal reference, since the accuracy of the capacitor value estimation is insensitive to the temperature, power supply and process variations and on the parasitic influences of the line connections.
(p.
UP / DOWN-t\ MM / MOS
-¿-.v..
S Q-
R Qn -j
current mirroring, offset of operational amplifiers, V1 and V2 variations, parasitic capacitance and resistance of the connection lines. Since non-idealities are almost the same in both cases of oscillations due to the same oscillator core, follows that aMIM=aMOS.
Voltages V1 and V2 have to be high enough that nonlinear behavior of MOS capacitor does not affect the calibration accuracy. From the other side, these voltages have to be low enough, so the "P side" of current mirrors has high output impedance.
Proposed calculation shows the influence of V1 and V2 voltages on the estimation error caused by MOS cap non-linearity.
According to ACM (Advance Compact MOSFET) model, gate capacitance, for VDS=0, can be expressed by means of (21) [9].
Cgate - Cgs + Cgd + Cgb
C = n-1 C +1C ^+IF -1
Cgate	m C0x + m C0x .J\+^IF
Vg Vto - VS,D = 0 [l + IF - 2 + ln(Vl + IF -1)]
n
n
(20)
(21)
(22)
Figure 8: Oscillator- concept
n = n(VG), Cox = W ■ L ■ C0,« 26mV(t' = 27° C) (23)
Digital logic coordinates the oscillator. Digital signal osc_enb sets the oscillator in the initial state and enables its running. The MIM / MOS signal determineds weather the oscillation are generated with MIM or MOS capacitance. The signal SEL in the oscillator has the rectangular shape. Its frequency corresponds to the oscillation frequency and is measured in the digital domain.
The oscillation period is proportional to the value of the measured capacitance, (17).
. 2 ' CMIM/MOS '	_ 2 ' CMIM/MOS ' (V2 Vl)
Ib	IB
(17)
In digital domain, the oscillations using each of capacitors are counted within predefined measure time, T .
' measure
COUNTM
T
_ measure
MIM / MOS
L MIM / MOS
COUNTmos _ Tmu_¡dml ■ (1 + aMU) _ CMM 1 + uM
COUNTmm TmoS ideal ' (1 + aMOS) CMOS 1 + aMOS
Cm
(18)
(19)
In (21), C0x is gate oxide capacitance, n is so-called slope factor and is a function of gate voltage, IF is inversion factor which can be calculated using (22). In (22), VT0 is threshold voltage, Qt is thermal voltage and VG, VS and VD are transistor gate, source and drain voltages.
For the chosen value of MOS capacitor and high enough VG, the Cox is determined. Using the procedure described in [9] we can extract parameters VT0 and n(VG). For VD=VS=0 from (22), we can express IF and substitute it in (21). Now we are obtaining the gate capacitance as a function of gate voltage, Cgate= Cgate(V,
"gatev G'
With this expression, we can calculate deviation of TMIM/
1)
Tmos ratio in nominal conditions from ideal (T, as a function of V.
mim/tmos
Using (24) we can express voltage across MOS cap (gate voltage) as a function of time. We are assuming that capacitor charges from voltage V1 with constant bias current, IB.
Ib = Cgate (vc ) ^, vc (0) = Vi	(24)
In (19) aMIM and aMOS model the oscillation period deviations from the nominal values caused by non-idealities; namely, inaccurate on-chip current source, non-ideal
From (25) we can find time needed to charge observed cap from V1 to V2, namely TMOS(V1,V2).
vc(t) = V2 ^ Tmos(V1,V2)
(25)
UP
C
272
B. Milinkovic et al; Informacije Midem, Vol. 45, No. 4 (2015), 266 - 276
In order to have ideal ratio TMIM/TMOS=1, we are choosing:
C =
MIM
Cgate (V1 ) + Cgate (V2 ) 2
(26)
Combining (17) and (26) we can express TMIM=TMIM(V1,V2). With that and (25), we can express error (TMIM/TMOS-1)-100% in respect to V1 i V2. The absolute error is depicted in Figure 9 as the function of V1 for V2=V1+0.1 V,
V2=V1+0.2 V and V2=V1+0.3 V.
As can be seen from the Figure 9 the error caused by MOS cap non linearity is negligible for V1>0.6 V for V2-V1<0.2 V.
In order to have constant current capacitor (de)charg-ing, which is assumed during all calculations, current mirrors should have high output resistance. Furthermore, Tosc /2 should be larger than clock for digital logic under all PVT variations in order to synchronize and sense the oscillations in digital network. The nominal values of Cm^mos=30 pF, IB=100 mA, V=0.6 V and V=0.8 V allow these conditions to be realizable with the acceptable area of the oscillator.
The oscillator with the bias sources is fully implemented. Two current sources are designed, one for the comparators polarization, another for the purpose of charging and discharging the capacitors through current mirror. The sources are self-biased and operate using the positive feedback. For each source, Schmitt trigger is designed in order to provide certain start under all PVT variations.
Oscillation frequency for both, MOS and MIM capacitors, simulated through 81 different PVT combinations,
Figure 9: Error in MIM cap value estimation due to the finite MOS cap linearity versus V1, for V2=V1+0.1 V (green), V2=V1+0.2 V (purple) and V2=V1+0.3 V (blue)
changes a lot, due to the full on-chip implementation. The obtained frequencies are in the range from 3.37MHz to 15.41MHz. Figure 10 presents time waveforms of the slowest, nominal and fastest oscillations that occur with MIM-capacitors.
Figure 10: Oscillation waveforms in the slowest, nominal and fastest case
Table 4: CMIM/CMOS ratio for different PVT values on the extracted level
Item	Vdd=1.14V			Vdd=1.2V			Vdd=1.26V		
T [°C]	-40	27	90	-40	27	90	-40	27	90
CORE corner			MIM cap in slow corner (ideal=1.15)						
slow	1.17	1.17	1.18	1.16	1.17	1.17	1.16	1.16	1.17
typical	1.13	1.14	1.14	1.13	1.14	1.14	1.12	1.13	1.14
fast	1.10	1.11	1.11	1.09	1.10	1.11	1.09	1.10	1.10
MIM ca				p in typical corner (ideal=		=1)			
slow	1.03	1.04	1.04	1.02	1.03	1.04	1.00	1.03	1.04
typical	1.00	1.01	1.01	0.99	1.00	1.01	0.99	1.00	1.01
fast	0.97	0.98	0.99	0.97	0.98	0.98	0.96	0.97	0.98
MIM ca				p in fast corner (ideal=0.85)					
slow	0.89	0.90	0.91	0.89	0.90	0.90	0.88	0.89	0.90
typical	0.87	0.88	0.89	0.86	0.87	0.88	0.86	0.87	0.88
fast	0.84	0.85	0.86	0.84	0.85	0.86	0.83	0.84	0.86
273
B. Milinkovic et al; Informacije Midem, Vol. 45, No. 4 (2015), 266 - 276
Table 4 summarizes estimated values of MIM-capacitor through different corners, supply voltages and temperatures. Nine different combinations of temperature and supply voltage are considered- when all except CMIM are in one corner, core corner, and CMIM is in another, non-correlated corner. Estimation error is always smaller or equal to 6% (in 96.3% cases error is <5%).
3.4.2 Digital logic
Since the calibration process is being performed only once after power supply is applied, the speed and the low power consumption of the digital logic are not so important requirements. The area should be restricted, which is not a problem, due to low complexity and large level of integration of digital logic.
Block diagram of the digital logic is presented in Figure 11. The logic is synchronized with an external clock of 32 MHz.
Digital logic coordinates the oscillator running, determines the ratio of the oscillation frequencies and according to the ratio value, sets the control bits for the filter capacitor bank.
Figure 12: Control block- FSM (Finite State Machine)
Figure 11: Digital network for generating filter control bits
Figure 13: Counter of the oscillations and time counter
The listed digital blocks are described via Verilog code and are implemented in the silicon. Whole logic is implemented using 450 digital gates and takes the area of 114^m x 110^m. After synthesis and place-and-route, timing and functional checks were performed.
External signal reset_n sets the initial state of the logic. All external signals are synchronized with the clock in order not to violate setup and hold times of used flipflops, to avoid flip-flops to reach metastable state. Chosen oscillations that should be measured are presented at the input port osc of the digital network. Signal cal is external and it starts the calibration process again. Output signals flt_ctrl_b[2:0] control the switches in the filter adjustable capacitor bank.
Digital part of the design consists of the four main blocks described by Verilog code:
(1)	CONTROL BLOCK, which is the core of the digital logic realized as the finite state machine, Figure 12;
(2)	OSC_COUNTER, that counts oscillation in the predefined time period T , equation (18);
measure' "	\ /1
(3)	DIVIDER, which divides COUNTmos and COUNTMIM equation (19);
(4)	FLT_CTRL block, that generates filter control bits according to the division result.
4 Experimental results- Measurements
The filter with its compensation capacitors, oscillator and digital network are designed and fully integrated. The layout of the whole design is presented in Figure 14. As it can be observed from the figure, the compensation capacitors are realized with multiple capacitors connected in series. This has been done due to the high minimal value of MIM capacitors in used technology. The effective area of the design is significantly smaller than the size of the entire chip. The reason for that and for layout aspect ratio is adjusting the design to available on-chip measurement equipment and integration of the test chip on the multi-project-wafer available area. It should be emphasized that, in order to have possibility of external calibration, an additional circuit is added. The circuit is composed of the three multiplexers controlled by signal reset_n which determines whether the calibration is internal or external. Also, shift register is implemented for writing three control bits via two external signals.
274
B. Milinkovic et al; Informacije Midem, Vol. 45, No. 4 (2015), 266 - 276
OSCILLATOR
Figure 14: Integrated design- layout view
The Figure 15 shows the measurement results of the circuit using internal calibration procedure. Measurements are performed under nominal conditions - at the room temperature and nominal supply voltage. As it can be seen from the figure, the compensated filter transfer characteristic matches well with the simulated one - at the cutoff frequency the difference is 0.6 dB. Also, uncompensated filter characteristic is shown. In this case, control bits have random values. The difference between simulated and non-compensated case at cut-off frequency is unacceptable 2dB.
Figure 16 and Figure 17 present the photo of the IC die and the measurement set-up using the on-wafer probes.
-0-Simulated	-(-Compensated -o- Non-compensated
													f										1^0(4			^GjHzj r					3,	525dB)								
																							ÏV	1 (a			:	i				Old			1					
									4	8	C	H	_c		.E			ÎÉ																						
																																								
																															•4									
																																								
																								0			X		1,											
																									■		—													
																										+														
																											-													
																																								
																														it										
																																								
																																S								
																																								
																																								
																																			■■■					
																																				S		H		
																																								
																																								
																																								
0	2.5	5.0	7.5
freq (GHz)
a)
Simulated	+ Compensated -»-Non-compensated
O-i																																							
																							f	£															
																																							
0-								C			se	h	z	_	8	3	3		B		'd																		
																			\	r																			
																				"i																			
o-															3=					f			?f	4						?	1		HP						
																	5		:	r		............																	
0-												v						\					K	4	H		Hz		-1	Z		/	Hb	)					
											i		i				\	r																					
																	st																						
										1	>	t																											
n											i																												
									1	t																													
						*	\		/	J																													
								t																															
0	2.5	5.0	7.5
freq (GHz)
Figure 15: Filter a) S21 and b) S11 parameters in compensated (green), simulated (red) and noncompensated case (purple)
Figure 16: Die photo
5 Conclusion
This paper proposes one way of fully integrated on-chip calibration of MIM-capacitor process induced variation, utilizing more stable MOS capacitor as reference. The test circuit is designed and verified using standard 130 nm CMOS process. The concept is applied to low-pass filter design and is verified through simulations and measurements. After the calibration is applied, MIM capacitance variation is lowered from 15% to 8%.. Moreover, optimization of RF switches is proposed.
275
B. Milinkovic et al; Informacije Midem, Vol. 45, No. 4 (2015), 266 - 276
Figure 17: Die with probes
With adopted optimization, the switches increase filter insertion loss no more than 0.6 dB in "on" state, and introduce additional capacitor error below 2% when they are all "off".
The same method can be used for compensating the process variation in any other circuit type and in any other CMOS technology process.
6 Acknowledgments
The authors would like to thank Faculty of Technical Science, Department for Power, Electronic and Telecommunication Engineering in Novi Sad, Serbia, for providing measurement facilities. The design is part of the SENSEIVER project- www.senseiver.com founded by European Union's Seventh Framework Programme.
4.	IEEE Std 802.15.4a-2007 (2007) Amendment to 802.15.4- 2006: Wireless Medium Access Control (MAC) and Physical Layer (PHY) Specifications for Low-Rate Wireless Personal Area Networks (LR-WPANs)
5.	R. Rhea (1994) HF Filter Design and Computer Simulation, Noble Publishing Corporation, Georgia- USA
6.	B. Razavi (2001) Design of Analog CMOS Integ rated Circuits, McGraw-Hill, New York
7.	B. Min (2008) SiGe/CMOS Millimeter-Wave Integrated Circuits and Wafer-Scale Packaging for Phased Array Systems, Ph.D. thesis, The University of Michigan
8.	A. Niknejad (2007) Electromagnetics for HighSpeed Analog and Digital Communication Circuit, Cambridge University Press, New York
9.	M. Schneider, C. Galup-Montoro (2010) CMOS Analog Design Using All-Region MOSFET Modeling, Cambridge University Press, New York
10.	J. Bhasker, R. Chadha (2009) Static Timing Analysis for Nanometer Designs- A Practical Approach, Springer, New York
11.	D. Harris, S. Harris (2013) Digital Design and Computer Architecture, Elsevier, Waltham, Massachusetts
Arrived: 04. 12. 2015 Accepted: 31 12. 2015
7 References
1.	M. Pastre, M. Kayal (2006) Methodology for the Digital Calibration of Analog Circuits and Systems, Springer, Netherlands
2.	C.-W. Lee (2012) On-chip Benchmarking and Calibration without External References, Ph.D. thesis, EECS Department, University of California, Berkeley
3.	I. Milosavljevic, D. Grujic, D. Simic, J. Popovic-Bozovic (2014) Estimation and compensation of process-induced variations in capacitors for improved reliability in integrated circuits, Analog Integrated Circuits and Signal Processing, vol. 81, no 1, pp. 253-264
276
Original scientific paper
Informacije
Journal of Microelectronics, Electronic Components and Materials Vol. 45, No. 4 (2015), 277 - 283
High-Efficiency Negative Charge-Pump Circuit for
Yuwen Bao1, XiaoLin Wu2, Xiaohong Xia1, Yun Gao1
1Faculty of Materials Science and Engineering, Hubei University, Wuhan, CO 430062, China 2Faculty of Physics and Electronic Technology, Hubei University, Wuhan, CO 430062, China
Abstract: Positive charge pumps, also known as inductor-less DC/DC converters, are very common in white LED drivers. They are less expensive and simpler to use, but they usually achieve a lower efficiency than inductor-based boost circuits. In this paper, we describe a novel negative charge-pump design for a white LED driver that can automatically select the 1X/1.5X mode. Unlike a conventional positive charge-pump circuit, the negative charge-pump circuit is integrated with current sources having an ultra-low dropout voltage, and the current source dropout is typically 80 mV. The negative charge-pump does not require an additional series-voltage-regulated transistor to adjust the output voltage, which can extend the operating time of the 1X mode and dramatically improve the efficiency of the lithium-ion battery. In addition, the negative charge pump does not require a substrate selection circuit, which reduces the circuit complexity. The proposed negative charge pump is realized in a 0.5-^m 5-V BiCMOS process.
Keywords: Current regulator; DC-DC power converters; LED driver; negative charge pump
Visoko učinkovito vezje negativne črpalke naboja za WLED osvetljevanje ozadja
Izvleček: Pozitivne črpalke naboja, ki jih poznamo kot DC/DC pretvornike brez tuljav, so zelo pogoste pri napajalnikih LED. So poceni in enostavne, vendar običajno dosegajo nižje izkoristke kot vezja na osnovi tuljav. V članku predstavljamo nov dizajn negativne črpalke naboja za napajanje belih LED, ki se avtomatsko postavijo v 1X/1.5X način delovanja. V nasprotju s pozitivnimi črpalkami naboja imajo negativne črpalke naboja integriran tokovni vir z izredno nizkim padcem napetosti (tipično 80 mV). Negativne črpalke naboja ne potrebujejo dodaten tranzistor za reguliranje izhodne napetosti, kar povečuje čas delovanja 1X načina in izboljša izkoristek litij-ionskih baterij. Predlagana črpalka naboja je realizirana v 0.5-^m 5-V BiCMOS tehnologiji.
Ključne besede: tokovni regulator; DC-DC močnostni pretvornik; LED krmilnik; negativna črpalka naboja
* Corresponding Author's e-mail: yungaoedu@126.com
The need for a white-LED (WLED) driver to illuminate	3 V to 4.2 V, while a WLED's forward voltage is typically the small color displays in cellular phones and other	3.1 V to 3.5 V. Consequently, in order to improve the ef-portable devices has increased rapidly over the last	ficiency of the battery, an automatic-select multi-mode few years [1]. Currently, two approaches are commonly	charge pump is used to provide an adequate forward used to generate an adequate forward bias for WLEDs:	bias for WLEDs. Presently a general choice is to use a a capacitor-charge pump and inductor-based boost	two-mode (1X and 1.5X) charge pump [3-6]. A WLED circuits. Compared with inductor-based boost circuits,	is a current-driven device whose brightness is procharge pumps are lower in cost, lower in working fre-	portional to the conduction current. The conduction quency, and simpler in design, but they have also	current is normally regulated to avoid exceeding the been less efficient than inductor-based boost circuits	rated maximum current and to obtain a constant lumi-[2]. However, efficiency might be the most important	nous intensity. As shown in Fig. 1, a traditional positive parameter for the designer of a portable device. There-	charge-pump solution for WLED drivers uses a PMOS fore, the improvement of the efficiency of charge-	regulator transistor to generate a regulated output pump circuits becomes the key point of circuit design.	voltage. The PMOS transistor before the charge-pump
277
© MIDEM Society
Y. Bao et al; Informacije Midem, Vol. 45, No. 4 (2015), 277 - 283
stage is operated as a controlled resistance RDSON, and regulation can be achieved by generating a voltage drop across RDSON. The current regulator ensures that each WLED produces similar light output [7-9]. In Fig. 1, Rp is the parasitic resistance of the ground pad and bonding wire. Obviously, the key to improving the circuit efficiency focuses on lowering the voltage drop of the PMOS regulator transistor, Rp, and that of the current regulator.
To reduce the voltage drop of Rp and the current regulator, a current-regulated charge pump was designed for a WLED driver [10]. Fig. 2 shows the current-regulated charge pump scheme. The WLED can directly be connected to the system ground, and the current regulator transistor before the charge-pump stage is operated as a controlled current source IDS. The current regulator transistor controls the pumping current, and output current regulation is accomplished by the changes in pumping current for all variations of the load [11]. The negative effect of Rp and the voltage drop of the current regulator are eliminated.
Parallel-connected WLED are commonly used in medium- and high-power driving WLED systems without the high cumulative voltage drop requirement, which is needed in series-connected driving WLED system. To cope with the current imbalance problem in parallel-connected WLED strings, a simple and highly efficient method is to have a current regulator for each WLED string [12]-[13]. However, the current-regulated charge pump (shown in Fig. 2) removes current regulator in order to obtain high efficiency, therefore it just suitable for driving a series-connected WLED string. If it is implemented in a 5 V process, it is only suitable for driving a single WLED. In practical applications, customers often need to drive multiple WLEDs and obtain high current matching accuracy.
This brief presents a novel negative charge pump for a plurality of driven WLEDs, as shown in Fig. 3. The scheme is composed of an automatic-select 1X/1.5X negative charge pump and a series current regulator. It is noted that there is no series voltage/current regulator transistor; therefore, the source-drain voltage drop of the regulator transistor can be saved, and high efficiency can be achieved. The anode of the WLED can be directly connected to the Li+ battery, and the cathode can be connected to negative charge pump. Hence, the current flowing through the WLED does not need to flow through the power pad of the chip, thereby removing the negative effect of the parasitic resistance of the power pad. In addition, the current regulator voltage dropout is typically 80 mV. All of these dramatically improve the efficiency, which allows the proposed
negative charge pump to achieve the efficiency of inductor-based boost circuits.
Figure 1: Conventional positive charge pump integrated with a constant current. The PMOS regulator transistor in the output loop operates as a controlled resistance and regulated output voltage.
Figure 2: Current-mode charge pump that does not incorporate a series constant current. The pad parasitic resistance of the ground in the output loop is avoided.
Figure 3: Proposed negative charge pump that does not incorporate a series regulator transistor. The pad parasitic resistance of the power source in the output loop is avoided.
Section 2 discusses the efficiency improvements of the negative charge pump, and Section 3 describes the 1X/1.5X negative charge-pump topology. Section 4 discusses the ultra-low dropout voltage of the current sources and the mode selection criteria. The experimental results are presented in Section 5, and the conclusions are drawn in Section 6.
278
Y. Bao et al; Informacije Midem, Vol. 45, No. 4 (2015), 277 - 283
2 Efficiency Improvements
Compared to an inductor-based boost DC/DC converter, a capacitor charge-pump converter is less efficient, which can reduce the battery runtime. A charge-pump scheme with an automatic-select conversion mode increases the efficiency over a wide input-voltage range. The quiescent operating current of the WLED driver is usually very small compared to the load current of the WLED; thus, the efficiency of fractional-ratio charge pumps with a conversion ratio of M can be closely approximated by
(1)
where VIN is the power source voltage, and VLCD is the voltage drop across the WLED. As can be deduced from (1), the efficiency versus VNN will decrease as in the 1/x function for a fixed value of M [14], and the best conversion efficiency is offered by a 1X transfer mode (M = 1). However, this mode can only be used when the battery voltage is greater than the forward voltage of the WLED. It will be best for the driver to remain in a high-efficiency mode as long as possible while the battery voltage falls. Therefore, the main challenge in charge-pump design is to reduce the output-loop voltage losses. As shown in Fig. 1, the minimum battery voltage required by the 1X mode is:
VMN(1X) 1LED X RDSON + VLED + VDropout + 1 LED X Rp (2)
where RDSON is the source-drain conduction resistance of the regulator transistor, which is typically 2 Q. Further resistance reductions are limited because lower resistances would necessitate a large MOS transistor, which increases the cost of the power device. Vn , is
1	Dropout
the voltage dropout of the current regulator, which is about 250-300 mV [10]. The proposed circuit supports up to four white LEDs, and the maximum current for each WLED is about 20 mA, making ICD 80 mA in total. V,cn of the WLED used for the simulation is 3.18 V, and R
LCD	'	p
is ignored. The maximum efficiency of the 1X mode is 88.6%. The presented negative charge-pump topology does not require a regulated transistor, and it extends the 1X mode all the way down to
V
MIN (1X)
■ V +V
' LED 1 ' Dropout
(3)
The designed current source dropout is typically 80 mV; therefore, the maximum efficiency of 1X mode can reach 97.5%.
3 Negative Charge-Pump Topology
Fig. 4 summarizes the topology transformations of the negative charge pump for the 1X and 1.5X modes. The double-modes negative charge pump includes six NMOS switches MN1-M^ two PMOS switches MP1-MP2 and two flying capacitors CF1-CF2. By alternating the arrangement of switches and capacitors, it can realize two different conversion modes: 1.5X and 1X.
3.1 1.5X mode
During the first half period (0 to 0.5T), MOS switches MP1 MP2^ MN4are on, and the other MOS switches are off, CF1 and CF2 are in series connection and are charged by the power supply VN. The flying capacitors are equal to C, so the input voltage is evenly distributed across the two flying capacitors, and are charged to 1/2 VIN
Figure 4: Topology transformations of the negative charge pump: 0-0.5T is the charge period for the flying capacitors, and 0.5-1T is discharge period
In the second half period (0.5T to T), the switches change their state, MOS switches MNK MN2^ MN3^ MN5 are on and the other MOS switches are off. CF1 and CF2 are in parallel connection and one terminal is connected to ground, there is a charge redistribution between CF1 CF2 and CL. Assuming all the MOS switches are ideal. In a generic period j we get
VOUT (j) = Vl (j) =
2C x (- iyiN) + CLxV0UT(j-1)
(4)
2C + CL
Assuming that in the initial state VOUT (0) = 0 V, we get
= CL
2 2C + CL whose limit for j is -1/2 VIN.
y
(5)
Indeed, the output voltage will steeply decrease and will slowly tend to its final value. The voltage between the input VIN and the output VO
■OUT's 1.5VIN-
279
Y. Bao et al; Informacije Midem, Vol. 45, No. 4 (2015), 277 - 283
3.2 1X mode
Only the MOS switch MN6 are always on, while the other switches are off, the output voltage VOUT is connected to ground via the NMOS switch MN6, The voltage between the input VIN and the output VOUT is equal to VIN. In the 1X mode the charge pump does not switch and act just like a LDO.
4 Current Source of the Ultra-Low Voltage Dropout and Mode Selection Criteria
4.1 Current Source of the Ultra-Low Voltage Dropout
The use of a current mirror is a common method for a current source. Fig. 5 presents a conventional current source for a WLED. The error amplifier guarantees that the current mirror (MN1 and MN2) source-drain voltages are approximately equal; the value of the WLED current is 260I , when the mirror ratio is 260. V„,1T. is
ref	OUT
provided by the positive charge-pump output, and Vds (MN2) is set to 250-300 mV in order to obtain high current matching accuracy. This increases the voltage consumption of the output loop and dramatically lowers the efficiency [10] [15].
Figure 5: Conventional current source scheme.
Fig. 6 shows the scheme of the current source of the ultra-low voltage dropout and mode selection control circuit. The mirror transistor MN2 provides a constant current drain for the WLED, and VOUT is the output of negative charge pump. It should be clear that source-drain voltage of MN2 is:

(6)
Thus Vds2 = V,w -
VŒD in 1X mode, and V^ = 1.5 Vw - VLED
rather a linear relationship with VIN. Thus, MN2 provides not only the current source of the WLED but also acts as a regulator transistor, as shown in Fig. 1. The maximum current for each WLED is about 20 mA, which is much smaller than the total current; thus, further reducing Vd2 to 80 mV is feasible. In addition, the operating voltage range of the chip is 2.8-5 V, and Vd2 will be reduced to below 250 mV in a very small range of the operating voltage. Then, the slight decrease in the current matching accuracy in this range is acceptable.
Figure 6: Current source of the ultra-low voltage dropout and mode selection control circuit.
4.2 Mode Selection Criteria
Fig. 6 also shows the mode selection control circuit that is used for the 1X/1.5X mode transition. Depending on the drop in the input voltage, the source-drain voltage (Vds2) of MN2 decreases, and the gate-source voltage
. ., and V , contain the real in
ds2	gs2
formation of V„, and the load. At V.,
IN	ds2
(V ,) of MN2 increases. V
80mV, the circuit operating mode will change from the 1X mode to the 1.5X mode. MN2 operates in the linear region and 2(Vg - V, ) >> V.,. The current of the WLED is written as
th2	ds2
gs2
_ W
1 LED = ftnCOX (Vgs2
' Vth2~)Vds2
(7)
Therefore, V.
is inversely proportional to Vgs2. In the
in 1.5X mode. The voltage of V is not a fixed value but
design, we use Vg2 as the 1X to 1.5X mode transition as a control condition because Vds2 is very small at the point of the mode transition, and it is not suitable as the input voltage of Comp1. In addition, a small voltage change in Vds2 will cause a large voltage change in Vgs2. Therefore, the mode transition point from 1X to 1.5X can be accurately controlled.
A hysteresis voltage between the reference Vf and Vf is necessary to avoid uncontrolled toggling between
the 1X mode and the 1.5X mode. Furthermore, V „ has
ref2
to be larger than 80 mV to always guarantee stable DC operation in any mode. We set Vef2 = 300 mV; thus, the mode selection control circuits consist of Comp1 and
280
Y. Bao et al; Informacije Midem, Vol. 45, No. 4 (2015), 277 - 283
. ,and V . and set
ds2	gs2
Comp2. The two comparators sense V or reset a flip-flop. The flip-flop stores the information of the mode change according to Table I.
Table 1: Mode Selection Depending on Comp1 Output T1 and Comp2 Output T2
Actual Mode	T1	T2	Set Mode
1X	H	L	1.5X
1.5X	L	H	1X
Fig. 7 shows the simulation results of the mode change due to a change in The input voltage is swept from 5 V to 3.2 V and back to 5 V. When the input voltage drops, Vds2 drops below 80 mV (point A in Fig. 7), the Comp1 output T1 goes high, and the operation mode changes from 1X to 1.5X. When the input returns to the higher voltage, and Vds2 returns to 300 mV (point B in Fig. 7), the Comp2 output T2 goes high, and the operation mode returns to 1X. There is a hysteresis voltage of about 220 mV when transitioning from 1.5X to 1X.
Figure 7: Effect of switching from the 1X mode to the 1.5X mode and back to the 1X mode on (a) the input voltage V|N, (b) the source-drain voltage of MN2, Vds2, (c) the Comp1 output T1, (d) the Comp2 output T2, and (e) the output voltage VOUT. Point A indicates where Vds2 drops below 80 mV (transition from 1X to 1.5X). At point B, Vd 2 returns to 300 mV (1.5X to 1X).
5 Experimental Results
The simulation and experimental measurement results in Fig. 8 show the efficiency versus the input voltage when the input voltage is swept from 5 V to 2.6 V. The results show that there is a sudden drop in efficiency at approximately 3.3 V. According to (1), we know that M of the charge pump has suddenly increased, and the negative charge pump switches from the 1X mode to the 1.5X mode at approximately 3.3 V. Further, the maximum efficiency from the simulation is approximately 93.2%. From Fig. 4, the 1X mode uses an NMOS (MN6) bypass switch to connect the output to the system ground. The parasitic resistance of MN6 will reduce the efficiency of the chip. This is the main reason why the maximum simulation efficiency is lower than the theoretical value. During the experimental measurement, the parasitic resistance of the pin pads and PCB routing and the ESR of external capacitors will further reduce the efficiency. In Fig. 7, the maximum measured efficiency is 89.3%, which is lower than that of simulation. Therefore, careful PCB routing is necessary to achieve the best performance.
The average measurement efficiency of the design over the entire input voltage range of a lithium-ion battery is approximately 75.2%. In contrast, an inductor-based boost circuit can achieve an efficiency between 75% and 80% [6]. Thus, our proposed scheme can achieve a high efficiency, as in an inductor-based boost scheme.
Fig. 9 shows measured currents of the four parallel WLEDs. When the chip operates in the 1X mode or 1.5X mode, the current of each WLED decreases as the input voltage decreases. In the operating voltage range (2.85 V), the current exhibits good stability, and the rate of change is less than 3.5%. There is always a gap between the four curves, and the maximum gap is about 0.4 mA (2%) due to the mismatch between the transistors of the current source for the WLEDs.
Fig. 10 shows the layout of the chip with a die size of 1.346 x 1.34 mm2. The control and protect module comprises an oscillator operating at 250 kHz, a soft start function, an output over-voltage protection function, a 16-step brightness control function, an under-voltage lock-out function, and a mode change control.
The proposed circuit was implemented in a 0.5-um 5 V BiCMOS process by CSMC Technologies. For the simulation and experimental measurements, the external flying capacitors (CF1, CF2) and load capacitor (CL) are both 1 mF, Vled of the WLED is 3.18 V when the WLED current is 20 mA, and the chip drives four WLEDs. Each LED current is set to approximately 20 mA.
6 Conclusion
Traditionally, WLED backlight designs that employ charge pumps have been less efficient than inductor-based designs. This brief has presented a negative charge pump with an ultra-low dropout current regulator. The novel negative charge architecture overcomes the inefficiencies typically encountered in a positive
281
Y. Bao et al; Informacije Midem, Vol. 45, No. 4 (2015), 277 - 283
charge pump, and it can achieve a peak efficiency of 89.3%. This negative charge pump is designed for use in WLED drivers, which enables a high efficiency to be realized while benefitting from the simplicity and cost savings offered by the charge-pump solution.
7 Acknowledgements
This work was supported in part by a Research and Development Grant from the Science and Technology Department of Hubei Province (Project ID: 2013BAA040, 2011BAB032).
2i	3D	J.5	4J0	U
VBM
Figure 8: Simulated and measured efficiencies plotted as functions of the input voltage, swept from 5 V to 2.6 V. The sharp discontinuity at 3.3 V indicates the transition from the 1X mode to the 1.5X mode.
-m—	
—r—	WlEOi
—A—	Mi 03
—	tUP<
Figure 10: Layout of the proposed negative charge pump. The overall device dimensions are 1.346 x 1.34 mm2.
8 References
1.	R. Guo, Z. Liang, and A. Q. Huang, "A high efficiency transformerless step-up DC-DC converter with high voltage gain for LED backlighting applications," in 2011 Twenty-Sixth Annual IEEE Appl. Power Electron. Conf. Exposition, Fort Worth, TX, 2011, pp. 1350-1356.
2.	C. Richardson. (2007, Jan.). LED applications and driving techniques. National Semiconductor Corp., Santa Clara, California, USA. [Online]. Available:http://www.national.com/onlinesemi-mar/2007/led/national_LEDseminar.pdf
3.	Q. Deng, "High efficiency multi-mode charge pump base LED driver," U.S. Patent 0109205, May. 25, 2006.
4.	W.L. Deng, X.Y. Ma, W.Y. Huang, and J.K. Huang, "Design of a white LED backlight driver IC based on a new three-mode charge pump," in 2012 IET International Conf., Shenzhen, China, 2012, pp. 1-4.
Cao Y J, De H, Cao J M, et al. "High-Efficiency Charge Pump LED Driver Circuit Design," Applied Mechanics and Materials, vol. 389, pp. 612-617, Aug. 2013.
O. Nachbaur, "Leading light [portable device display illumination]'' Power Engineer, vol. 19, no. 2, pp. 42-45, May 2005.
L. Burgyan and F. Prinz, "High efficiency LED driver," U.S. Patent 6690146, Feb. 10, 2004.
Figure 9: Measured currents of the four parallel WLEDs plotted as functions of the input voltage, swept from 5 V to 2.6 V. The charge rate of the WLED current is less than 3.5%, and the current mismatch is less than 2%.
282
Y. Bao et al; Informacije Midem, Vol. 45, No. 4 (2015), 277 - 283
8.	C.-H. Tsen, "Multi-mode charge pump drive circuit with improved input noise at a moment of mode change," U.S. Patent 7250810, Jun. 28, 2007.
9.	T.-T. Chen and C.-H. Tsen, "Charge pump drive circuit for a light emitting diode" U.S. Patent 7271642, Sep. 18, 2007.
10.	C.-H. Wu and C.L. Chen, "High-efficiency current-regulated charge pump for a white LED driver," IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 56, no. 10, pp. 763-767, Oct. 2009.
11.	G. Thiele and E. Bayer, "Current mode charge pump: topology, modeling and control," in 2004 IEEE 35th Annual Power Electron. Specialists Conf., 2004, vol. 5, pp. 3812-3817.
12.	Si Nan Li, Zhong W X, Chen W, Hui S S Y, "Novel Self-Configurable Current-Mirror Techniques for Reducing Current Imbalance in Parallel Light-Emitting Diode (LED) Strings," IEEE Trans. Power Electronics, vol. 27, no. 4, pp. 2153-2162, Apr. 2012.
13.	Y. Hu and M. M. Jovanovic, "LED driver with self-adaptive drive voltage," IEEE Trans. Power Electron., vol. 23, no. 6, pp. 3116-3125, Nov. 2008.
14.	G. Thiele and E. Bayer, "Voltage doubler/tripler current-mode charge pump topology with simple "Gear Box",'' in 2007 IEEE Power Electronics Specialists Conf., Orlando, FL, 2007, pp. 2348-2352
15.	H. Van der Broeck, G. Sauerlander, and M. Wendt, "A high precision constant current source applied in LED driver," in 2011 Photonics and Optoelectronics. Conf., Wuhan, China, 2011, pp. 1-4.
Arrived: 06. 01. 2015 Accepted: 28. 10. 2015
283
Puhova priznanja; Informacije Midem, Vol. 45, No. 4 (2015), 284 - 284
Najvišja priznanja v slovenski znanosti v letu 2015
Odbor za nagrade, ki mu predseduje prof. dr. Tamara Lah Turnšek, je 20.11.2015 v Portorožu podelil najvišja priznanja za dosežke na znanstveno raziskovalnem področju. Slavnostna govornica na prireditvi je bila ministrica dr. Maja Ma-kovec Brenčič. Zoisovo nagrado za življenjsko delo je prejel akad. prof. dr. Peter Fajfar, Priznanje ambasador znanosti Republike Slovenije je prejel prof. dr. Matija Strlič, Zoisove nagrade za vrhunske dosežke so šle v roke prof. dr. Mitjanu Kalinu, prof. dr. Tomažu Pisanskemu in prof. dr. Borutu Štruklju. Podeljenih je bilo tudi pet Zoisovih priznanj in eno Puhovo priznanje.
Raziskovalna skupina, ki jo sestavljajo univ. dipl. inž. Ines Bantan in mag. Helena Razpotnik, zaposleni v podjetju ETI Elektroelement, d. d., ter doc. dr. Danjela Kuščer Hrovatin in dipl. inž. kem. tehnol. Silvo Drnovšek, zaposlena na Inštitutu »Jožef Stefan«, je razvila in uspešno uvedla v proizvodnjo neporozno kordieritno keramiko tipa C 410 s kontroliranimi toplotnimi in mehanskimi lastnostmi. Projekt razvoja neporozne kordieritne keramike je potekal tudi v okviru Centra odličnosti NAMASTE - Projekt odprtih možnosti. Kordieritni materiali imajo nizek koeficient linearnega termičnega raztezka, zato se uporabljajo za izdelavo komponent, ki so izpostavljene hitrim temperaturnim spremembam. Raziskovalna skupina je prejela Puhovo priznanje za izume, razvojne dosežke in uporabo znanstvenih izsledkov pri razvoju kordieritne keramike s stabilnim nizkim koeficientom linearnega termičnega raztezka.
Fotografija iz arhiva MIZŠ. Z leve proti desni: Ines Bantan, univ. dipl. inž., , mag. Helena Razpotnik, doc. dr. Danjela Kuščer Hrovatin, prof. dr. Tamara Lah Turnšek in Silvo Drnovšek, dipl. inž. kem. tehnol.
Iskrene čestitke vsem prejemnikom priznanj in nagrad ter njihovim inštitucijam, še posebej pa članici društva in naši področni urednici za področje tehnologij doc. dr. Danjeli Kuščer Hrovatin!
Prof. dr. Marko Topič Predsednik društva MIDEM
284
Call for papers
(midem
Journal of M
Informacije I
Journal of Microelectronics, Electronic Components and Materials Vol. 45, No. 4 (2015), 285 - 285
MIDEM 2016
52nd INTERNATIONAL CONFERENCE ON MICROELECTRONICS, DEVICES AND MATERIALS WITH THE WORKSHOP ON BIOSENSORS
AND MICROFLUIDICS
^IDEM
Announcement and Call for Papers
September 28th - 30th, 2016 Ankaran, Slovenia
ORGANIZER: MIDEM Society - Society for Microelectronics, Electronic Components and Materials, Ljubljana, Slovenia
CONFERENCE SPONSORS: Slovenian Research Agency, Republic of Slovenia; IMAPS, Slovenia Chapter; IEEE, Slovenia Section; Zavod TC SEMTO, Ljubljana.
GENERAL INFORMATION
The 52nd International Conference on Microelectronics, Electronic Components and Devices with the Workshop on Biosensors and Microfluidics continues a successful tradition of the annual international conferences organised by the MIDEM Society, the Society for Microelectronics, Electronic Components and Materials. The conference will be held at Hotel Convent, Ankaran, Slovenia, well-known resort and conference centre, from SEPTEMBER 28th - 30th, 2016.
Topics of interest include but are not limited to:
-	Workshop focus: Biosensors and Microfluidics
-	Novel monolithic and hybrid circuit processing techniques,
-	New device and circuit design,
-	Process and device modelling,
-	Semiconductor physics,
-	Sensors and actuators,
-	Electromechanical devices, Microsystems and na-nosystems,
-	Nanoelectronics
-	Optoelectronics,
-	Photonics,
-	Photovoltaic devices,
-	New electronic materials and applications,
-	Electronic materials science and technology,
-	Materials characterization techniques,
-	Reliability and failure analysis,
-	Education in microelectronics, devices and materials.
ABSTRACT AND PAPER SUBMISSION:
Prospective authors are cordially invited to submit up to 1 page abstract before May 1st, 2016. Please, identify the contact author with complete mailing address, phone and fax numbers and e-mail address. After notification of acceptance (June 15th, 2016), the authors are asked to prepare a full paper version of six pages maximum. Papers should be in black and white. Full paper deadline in PDF and DOC electronic format is: August 31st, 2016.
IMPORTANT DATES:
Abstract deadline: May 1st, 2016 (1 page abstract or full paper)
Notification of acceptance: June 15th, 2016 Deadline for final version of manuscript: August 31st, 2016
Invited and accepted papers will be published in the conference proceedings.
Deatailed and updated information about the MIDEM Conferences is available at
http://www.midem-drustvo.si/ under Conferences.
285
Boards of MIDEM Society | Organi društva MIDEM
MIDEM Executive Board | Izvršilni odbor MIDEM
President of the MIDEM Society | Predsednik društva MIDEM
Prof. Dr. Marko Topič, University of Ljubljana, Faculty of Electrical Engineering, Slovenia
Vice-presidents | Podpredsednika
Prof. Dr. Barbara Malič, Jožef Stefan Institute, Ljubljana, Slovenia Dr. Iztok Šorli, MIKROIKS, d. o. o., Ljubljana, Slovenija
Secretary | Tajnik
Olga Zakrajšek, UL, Faculty of Electrical Engineering, Ljubljana, Slovenija
MIDEM Executive Board Members | Člani izvršilnega odbora MIDEM
Prof. Dr. Slavko Amon, UL, Faculty of Electrical Engineering, Ljubljana, Slovenia Darko Belavič, In.Medica, d.o.o., Šentjernej, Slovenia Prof. Dr. Bruno Cvikl, UM, Faculty of Civil Engineering, Maribor, Slovenia Prof. DDr. Denis Donlagič, UM, Faculty of Electrical Engineering and Computer Science, Maribor, Slovenia Prof. Dr. Leszek J. Golonka, Technical University Wroclaw, Poland Leopold Knez, Iskra TELA d.d., Ljubljana, Slovenia Dr. Miloš Komac, UL, Faculty of Chemistry and Chemical Technology, Ljubljana, Slovenia Prof. Dr. Miran Mozetič, Jožef Stefan Institute, Ljubljana, Slovenia Jožef Perne, Zavod TC SEMTO, Ljubljana, Slovenia Prof. Dr. Giorgio Pignatel, University of Perugia, Italia Prof. Dr. Janez Trontelj, UL, Faculty of Electrical Engineering, Ljubljana, Slovenia
Supervisory Board | Nadzorni odbor
Prof. Dr. Franc Smole, UL, Faculty of Electrical Engineering, Ljubljana, Slovenia Mag. Andrej Pirih, Iskra-Zaščite, d. o. o. , Ljubljana, Slovenia Dr. Slavko Bernik, Jožef Stefan Institute, Ljubljana, Slovenia
Court of honour | Častno razsodišče
Emer. Prof. Dr. Jože Furlan, UL, Faculty of Electrical Engineering, Slovenia Prof. Dr. Radko Osredkar, UL, Faculty of Computer and Information Science, Slovenia
Franc Jan, Kranj, Slovenia
Informacije MIDEM
Journal of Microelectronics, Electronic Components and Materials
ISSN 0352-9045
Publisher / Založnik: MIDEM Society / Društvo MIDEM Society for Microelectronics, Electronic Components and Materials, Ljubljana, Slovenia Strokovno društvo za mikroelektroniko, elektronske sestavne dele in materiale, Ljubljana, Slovenija
www.midem-drustvo.si