UDK 621.3:(53+54+621+66)(05)(497.1)=00

ISSN 0352-9045

J 3∘2010

Strokovno društvo za mikroelektroniko elektronske sestavne dele in materiale

Strokovna revija za mikroelektroniko, elektronske sestavne dele in materiale Journal of Microelectronics, Electronic Components and Materials

INFORMACIJE MIDEM, LETNIK 40, ŠT. 3(135), LJUBLJANA, september 2010



#### UDK621.3:(53+54+621+66), ISSN0352-9045

#### **ZNANSTVENO STROKOVNI PRISPEVKI PROFESSIONAL SCIENTIFIC PAPERS** M.Atanasijević-Kunc, V.Kunc: 163 M.Atanasijević-Kunc, V.Kunc: RF mešalna vezja z aktivnim RF Mixers Comprising Active Feedback Load zaprtozančnim bremenom J.Puhan, D.Raič, T.Tuma, S.Tomažič, A.Burmen: 167 J.Puhan, D.Raič, T.Tuma, S.Tomažič, A.Burmen: Optimizacija gradnikov digitalnih vezij **Optimising Digital Circuit Cells** M.Atanasijević-Kunc, V.Kunc, Maksimilijan Štiglic: 174 M.Atanasijević-Kunc, V.Kunc, Maksimilijan Štiglic: Samodejno uglaševanje električno majhnih anten Automatic Tuning of Electrical Small Antennas J.Podržaj, J.Trontelj: 178 J.Podržaj, J.Trontelj: Izbira optimalnih materialov in komponent za Optimized Selection of Materials and Components for zvedbo močnostnega modula **Power Module Realization**

| M. Jenko:<br>Zasnova precizne in dolgotrajno točne temperaturne<br>regulacije z uporabo lastnosti mikrokontrolerja<br>za majhno porabo moči                       | 183 | M. Jenko:<br>Design of Precise and Long-term Accurate<br>Temperature Regulation Using Features of A Low-<br>power Microcontroller                        |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|----------------------------------------------------------------------------------------------------------------------------------------------------------|
| M.W.Numan, M.T.Islam, N.Misran:<br>Izvedba 4G MIMO brezžičnega sistema<br>na osnovi FPGA vezij                                                                    | 191 | M.W.Numan, M.T.Islam, N.Misran:<br>FPGA-based Hardware Realization for 4G MIMO<br>Wireless Systems                                                       |
| M.A.Amiri, S.Mirzakuchaki, M.Mahdavi:<br>Izvedba 4x4 S-Box vezja s QCA tehnologijo                                                                                | 197 | M.A.Amiri, S.Mirzakuchaki, M.Mahdavi:<br>Logic-Based QCA Implementation of a 4×4 S-Box                                                                   |
| Z.Jereb, J.Diaci:<br>Digitalna geometrijska korekcija trapeznega<br>popačenja projicirane video slike v<br>realnem času z uporabo FPGA                            | 204 | Z.Jereb, J.Diaci:<br>Real-time Keystone Correction of Video<br>Image Using FPGA                                                                          |
| B. Pečar, M. Možek, D. Resnik, D. Vrtačnik,<br>U. Aljančič, S. Penič and S. Amon:<br>Dozirni sistem za mikroprocesor goriva                                       | 208 | B. Pečar, M. Možek, D. Resnik, D. Vrtačnik,<br>U. Aljančič, S. Penič and S. Amon:<br>Microflow Generator for Fuel Cell Methanol<br>Hydrogen Microreactor |
| U.Flisar, D.Vončina, P.Zajec:<br>Nemoteno obratovanje asinhronskega motorja na<br>osnovi pretvornika z impedančnim<br>prilagodilnim vezjem                        | 218 | U.Flisar, D.Vončina, P.Zajec:<br>Voltage Sag Independent Operation of Induction<br>Motor Based on Z-Source Inverter                                      |
| M. Fras, J. Mohorko, Ž. Čučej:<br>Modeliranje izmerjenega samopodobnega<br>omrežnega prometa v simulacijskem orodju OPNET                                         | 224 | M. Fras, J. Mohorko, Ž. Čučej:<br>Modeling of Measured Self-similar Network Traffic in<br>OPNET Simulation Tool                                          |
| M.Rashed Iqbal Faruque, M.T.Islam, N.Misran:<br>Vpliv oblike človeške glave na absorpcijo<br>elektromagnetnega sevanja pri izpostavljenosti<br>mobilnih telefonov | 232 | M.Rashed Iqbal Faruque, M.T.Islam, N.Misran:<br>Effect of Human Head Shapes for Mobile Phone<br>Exposure on Electromagnetic Absorption                   |
| M.T.I.Mohammad R.I.Faruque, N.Misran:<br>Analiza parametra SAR pri uporabi kovinske zaščite                                                                       | 238 | M.T.I.Mohammad R.I.Faruque, N.Misran:<br>Specific Absorption Rate Analysis Using Metal<br>Attachment                                                     |
| M.Kseneman, D.Gleich:<br>Primerjava med Dubois in Shi empiričnim modelom<br>ocenjevanja vlažnosti iz TerraSAR-X podatkov                                          | 241 | M.Kseneman, D.Gleich:<br>Comparison Between Dubois and Shi Empirical<br>Models Used for Soil Moisture Estimation for<br>TerraSAR-X Data                  |
| MIDEM prijavnica                                                                                                                                                  | 249 | MIDEM Registration Form                                                                                                                                  |
| Slika na naslovnici:                                                                                                                                              |     | Front page:                                                                                                                                              |

Kolaž slik iz prispevkov v tej izdaji revije

Paste-up of various fotos taken from contributions

VSEBINA

CONTENT

#### Informacije MIDEM 40(2010)3, Ljubljana

### **RF MIXERS COMPRISING ACTIVE FEEDBACK LOAD**

### <sup>1</sup>Maja Atanasijević-Kunc, <sup>2</sup>Vinko Kunc <sup>1</sup>University of Ljubljana, Faculty of Electrical Engineering, Slovenia <sup>2</sup>IDS d.o.o, Ljubljana, Slovenia

Key words: RF conversion mixer, active feedback, integrated circuit, RFID

Abstract: Down converting mixing circuits are essential components of every radio-frequency receiver. A state of the art down conversion mixing system is expected to operate at low supply voltage and to fulfill at least three requirements. It has to eliminate the DC and low frequency components arising from self-jammer signal either in the mixing circuitry or in the connection to next signal processing stage. It has to provide minimum setting time in case of change of DC and/or low frequency components during operation or at power-up. The mixing stage has also to provide an optimal compromise among the allowed input signal dynamic range, noise properties of the system and minimum supply voltage. To fulfill these requirements and thus design down-conversion mixing stage suitable for battery supplied digital radio systems as well as UHF RFID readers a topology of mixing system employing close loop current sink load stage was developed. The comparison to classical topology employing resistor load is given in section 2. Then special advantages of the proposed topology are highlighted with respect to improved input signal dynamic range, adaptability to different link frequencies and the possibility to employ methods to speed-up the settling of the system. The usefulness of proposed improvements for RFID readers operating on EPC Gen2 standard was proven in praxis.

### RF Mešalna vezja z aktivnim zaprtozančnim bremenom

Kjučne besede: RF mešalno vezje, aktivna povratna vezava, integrirano vezje, radio - frekvenčna identifikacija

Izvleček: Mešalna vezja so bistveni sestavni del vsakega radio-frekvenčnega sprejemnika. Od sodobnega mešalnega vezja je pričakovati, da deluje pri nizki napajalni napetosti in hkrati izpolnjuje še vsaj tri dodatne zahteve. Mešalno vezje mora samo, ali pa v povezavi z naslednjo stopnjo sprejemnika izločevati DC in nizkofrekvenčne komponente, ki nastajajo kot rezultat množenja signala lokalnega oscilatorja s signalom iste frekvence. Mešalno vezje se mora tudi čim hitreje prilagajati, ko pride do spremembe teh nizkofrekvenčnih komponent, oziroma mora biti sposobno v kar najkrajšem času preiti iz staja mirovanja v stanje delovanja. Mešalno vezje mora tudi zagotoviti optimalen kompromis med dinamičnim območjem vhodnega signala, šumnimi lastnostmi mešalnega veza in napajalno napetostjo. Da bi v kar največji meri izpolnili te zahteve in tako skonstruirali mešalno vezje, primerno za uporabo v baterijsko napajanih digitalnih radijskih sistemih, kakor tudi v visokofrekvenčnih izpraševalnikih pametnih kartic, smo razvili mešalno vezje, ki uporablja aktivno zaprtozančno breme. V drugem razdelku podamo primerjavo s klasično topologijo, ki uporablja uporovno bremensko stopnjo. V nadaljevanju pa analiziramo prednosti predlagane topologije glede na izboljšano dinamično območje vhodnega signala, možnost prilagajanja na različne frekvence sprejemnega signala in na možnost uporabe metod za skrajšanje časa vzpostavitve sistema. Opisane prednosti so izkazane v praksi pri sistemih izpraševalnikov pametnih kartic, delujočih na frekvenčnem področju UHF po standardu EPC Gen2.

#### 1. Introduction

Down conversion mixer is basically a multiplier multiplying input signal with the local oscillator signal. The results are the sum and the difference of the two frequencies providing both input signals as sine-wave shaped. Since the freguencies of the input and local oscillator signal are typically closely spaced or even identical the resulting frequency sum is a high frequency component while the resulting frequency difference is a lower frequency component. In down-converting mixers the high frequency component is eliminated using low pass which is typically an integral part of mixer load network. This is represented by the capacitor Clp in combination with load resistors Rb1 and Rb2 in fig. 1 representing the classical down conversion mixer topology using resistor load /1/. The input signal voltage is converted to a current difference by the differential transistors pair Mdp and Mdn. Four transistors (Mlpp, Mlpn, MInp and MInn) controlled by the local oscillator signal steer the differential currents in two branches. The load stage is composed of two resistors Rb1 and Rb2 where the current is converted to voltage. As mentioned the capacitor Clp eliminates the high frequency component of the frequency mixing. In case linearity is required for larger input signal amplitude the linearization resistors can be added as it is in case with Rs1 and Rs2 in fig. 1.

Another component resulting from the multiplication is a DC component. It is a result of input signal having same frequency as local oscillator signal. In radio receivers it appears as a result of leakage of local oscillator signal in the input signal, but in RFID interrogators so called self-jammer signal is a system issue and can have extremely large amplitude. The high DC component on the mixer outputs resulting from large self-jammer signal must be eliminated not to saturate subsequent receiver stages. A typical solution represented on fig. 1 is to use AC coupling to the next receiver stage.

The limitations of such topology for use in systems with high level of self-jammer are obvious. The load resistors value which is also the key element defining the voltage gain of the mixing must be kept at sufficiently low value so that even the highest value of DC component is not limiting the required dynamic of the input signal. Unfortunately we can expect highest level of DC component when the amplitude of the input signal is high also. This means that for high self-jammer levels the load resistor's value must be kept low. The available mixer gain is thus limited to smaller values what has negative impact on the noise performance of the system. The use of automatic regulation of input signal amplitude /2/ is not applicable since the signals are often AM modulated. The problem gets worse if the system requires change in the AC coupling time constant to accommodate different signaling frequencies. As there is little freedom on the selection of load resistors value the AC coupling change must be done by changing the coupling capacitors value. For low noise systems the AC coupling capacitors are often external (not integrated) so their value cannot be changed dynamically. The problem is made worse in case of low supply voltage were the available voltage range must be divided between the input signal range and output voltage range. There are lots of low voltage solutions available for other blocks like references /3/ but low supply voltage mixers development is still a big challenge.



Fig. 1: Simplified schematic of typical downconversion mixer

#### 2. Concept of closed loop current sing load down converting mixer

As shown in previous chapter the main problem of classical resistor load stage arises from the fact that there is same signal path for DC component as wheel as the desired AC signal. Both current components are converted to voltage using same load resistor pair. The systematic solution is to separate the two current components before the desired AC signal is converted to output voltage. The basic concept of proposed solution is presented on fig. 2.

The DC current component is compensated by the two voltage controlled current sources Isi1 and Isi2. Each current source is controlled by a closed loop system which forces the average value of each output signal (outp and outn) to be equal to the output reference voltage Vro. The two current sources cannot compensate the desired AC signal since their response frequency is limited by the pole inserted between the output of error amplifier and the control input of current source. This requires pole to be set

well below the minimum desired AC frequency. Being the lowest frequency pole in the loop makes this pole also the dominant pole for the loop stability. The closed loop system implementation can also be implemented using a digital algorithm employing A/D conversion for signal detection and D/A converter /4/.



Fig. 2: Simplified schematic of mixer employing closed loop current source load.

Since DC current component is eliminated by the controlled current sources there is no need for AC coupling to the next gain stage. This is represented on figure 2 by the differential amplifier (ampsig) where the desired AC signal is converted to voltage using the differential amplifier and feedback resistors Rff1 and Rff2. The value of the two feedback resistors is now defining the gain for the AC signal which is completely independent of the DC current component. The high pass characteristic is defined by the pole in the control loop where it is relatively easy to adjust it by changing the value of resistors Rhp1 and Rhp2. The available dynamic range of input signal is significantly higher since the potential of output nodes does not change with the DC current level. Topologies with current feedback have been published before /5/ but they in general did not have separate loop for each side of the load which makes all the difference in coping with high level of DC mixing component.

#### 3. Dynamic range of the input signal

The improvement in available dynamic range of the input signal resulting from the use of closed loop current source load stage is well illustrated in on fig. 3a and 3b. They represent the response of two mixers one having classical architecture (fig. 3a) and one having the closed loop current source load stage (fig. 3b) to same input signal. The mixing gain is identical in both cases. The input signal is AM modulated with 1MHz 200mVpp signal. The Carrier signal frequency is 200MHz and the amplitude changes from 100mVpp at the begging of simulation to 600mVpp and 1200mVpp. The input signal has same phase as local oscillator signal to highlight the problem of DC component on the mixer output.



Fig. 3a: Simulation results of classical mixer architecture.

The differential output signal is displayed in canvas one. Canvas two presents the input signal together with the differential output signal to show how the available supply range (3.5V) is distributed between dynamic of the input and output signal. In case of classical mixer architecture we can see the increase of DC unbalance of the differential output signal resulting from increase in input signal amplitude. When the input signal amplitude is increased to 1200mVpp (at simulation time equal to 10us) the increased DC signal level causes the saturation of the mixer resulting in decrease of mixing gain for 3dB. This does not happen in case of mixer employing closed loop current source load (fig 3b). The closed loop system reacts to the increased DC current level by changing the current level of current sources composing the DC load stage. The control voltages for both load current sources are presented in canvas three of the figure 3b. The correction of DC current level is not instantaneous since the bandwidth of DC correction loop has to be well below the bandwidth of the desired AC signal. When the loop settles to the new DC current value, the DC component of the output voltage is eliminated thus leveling enough room for the increased input signal amplitude. The result is unchanged gain at high input signal amplitude.

The mixer gain used for both simulations was only 2. Increasing the mixing gain causes no problem in case of mixer employing closed loop current source load, while in case of classical topology it would lead to further reduction of allowed dynamic range of the input signal.

# 4. Adapting the AC coupling time constant to different signal frequencies

As mentioned before the settling time and high pass characteristic of the classical mixer architecture depends on



Fig. 3b. Simulation results of mixer employing closed loop current source load.

the value of AC coupling capacitors, value of load resistors in mixing load stage and the input impedance of the next stage. In case of low noise system, the AC coupling capacitors are usually external (not integrated) capacitors having relatively large values and thus cannot be changed dynamically. We already learned that there is little freedom in value of load resistors we can choose. It is also not possible to significantly change the impedance of the next gain stage mixer output is connected to without seriously impacting the noise performance. This means the classical mixer architecture enables only limited adjustments of low pass characteristic of the AC coupling.

In EPC Gen2 protocol which is becoming the dominant protocol for UHF RFID systems the tag's link frequency can vary from 40kHz to 640kHz. To accommodate this mixer topology employing closed loop current source load offers significant advantage. In this system the high pass characteristic of the mixer can be set with relative ease even if the capacitors are external and thus have fixed value. The AC time constant can be changed by varying the value of the resistors defining the pole in the feedback loop (Rlp1 and Rlp2 on figure 2). This has no effect on the gain and DC component of the mixer and has relatively small impact on the system noise. The EPC Gen2 reader system designed using the proposed mixer topology can cover all the link frequencies required by the standard and has equal spot noise performance for all link frequencies.

## 5. Method for reduction of load loop settling time

The same parameters defining the high pass characteristic parameters define also the required settling time the system needs to establish steady state condition after power up of transition from transmit operation to receive. In system design there is always pressure to shorten this time to minimum required. One reason is to save time and thus current in case of on/off operation mode where the system powers on and scans for the presence of the signal and powers down again if there is no signal present. The second reason is to achieve minimum settling time between transmit and receive operation since some protocols like EPC Gen2 allow only very short setting time between transmit and reactive period.

To enable shortening of settling time the value of the resistors defining the pole in the feedback loop (Rlp1 and Rlp2 on figure 2) has to be significantly reduced during the time the speed-up is preformed. It is vital to precisely define the start and the end of the speed-up duration since high pass characteristic of the system is drastically changed during that time. Typically the speed-up is initiated when the switch from transmit to receive operation occurs on when the system is powered on. The load loop time constant is reduced for a factor of 4 to 10 as long as there is a high level of unbalance of the DC level between the differential outputs. This is detected by the window comparator observing both outputs. Since also the AC signal manifest itself as temporary unbalance of the differential outputs the comparator output is evaluated by a timing system which discriminates between the AC signal and the unbalance of the outputs to detect the time the speed-up is required. Fig. 4 presents simulation results for same input signal sequence as on fig. 3. The difference is that at the time of input signal increase at (simulation time equal 10us) the speed-up is activated. The time constant of the pole in the feedback loop is reduced for a factor of 4 resulting in faster change of the control signals of the current sources. Increased speed of correction is clearly visible on canvas three presenting the control voltages of the current sources. We can also see that the speed-up mode ends at simulation time equal 12us when the unbalance of the differential output signals is resolved. Comparison between the settling behavior with and without speed-up can be made by comparing the settling to first input signal amplitude change at simulation time equal to 2us, which is preformed without using speed-up mode, and the input signal amplitude change a simulation time equal to 10us which employs speed-up. Settling time is reduced for more than factor as shown on fig. 4.

#### 6. Conclusions

Down conversion mixer topology employing closed loop current source load and settling speed-up system has proven to be an optimal solution for RFID systems. These systems must handle high amplitude of input signal due to self-jammer effect, typically require fast switching from transmit to receive operation and must handle a wide range of communication link frequencies.

Closed loop current source load stage eliminates input signal dynamic degradation due to DC component on the mixer output, thus maximizing the input dynamic range for a give supply voltage.

Closed loop current source load stage enables simple change of the high pass behavior of the mixer by changing the value of the resistors defining the dominant pole of the feedback loop. This can be done also in case the AC defining capacitors are external (not integrated) elements



Fig. 4. Simulation results of mixer employing closed loop current source load and speed-up system

which have relatively large value to ensure low system noise. This makes the system adaptable to wide range of communication link frequencies.

Changing the value of the resistors defining the dominant pole of the feedback loop can also be used to speed-up the settling time of the receiver after power-up or transition from transmit to receive mode.

The proposed solution was used for a family of integrated RFID readers operating on UHF Gen2 standard where it has clearly demonstrated its advantages over the classical architecture.

#### References

- /1/ J. Ledworatawee, W. Namgoong, R. Weigel, Generalized linear periodic time-varying analysis for noise reduction in an active mixer, IEEE Jurnal of Solid-State Circuits, vol. 42, pp 1339– 1351, June 2007.
- /2/ A. Pleteršek, A., Vodopivec, Postopek za samodejno reguliranje amplitude vhodnih signalov : patent št. 22403, podeljen z odločbo z dne 5. 5. 2008 : št. prijave P-200600218, datum prijave 21. 9. 2006. Ljubljana: Urad Republike Slovenije za intelektualno lastnino, 2008.
- /3/ A. Pleteršek, A compensated bandgap voltage reference with Sub-1-V supply voltage, *Analog integr. circuits signal process.*, vol. 44, str. 5-15, 2005.
- /4/ R. Benković, K. Kovačič, A. Pleteršek, Integral nonlinearity determined by selection order of current array units in DA converters, *Inf. MIDEM*, vol. 35, št. 3, pp. 140-143, sep. 2005.
- /5/ K. Dufene, Z. Boos, R. Weigel, Digital adaptive IP2 calibration scheme for CMOS down-conversion mixers, IEEE Jurnal of Solid-State Circuits, vol. 43, pp 2434 – 2445, Nov. 2008.

Maja Atanasijević-Kunc University of Ljubljana, Faculty of Electrical Engineering, Slovenia

> Vinko Kunc IDS d.o.o, Ljubljana, Slovenia

### **OPTIMISING DIGITAL CIRCUIT CELLS**

### Janez Puhan, Dušan Raič, Tadej Tuma, Sašo Tomažič and Arpad Burmen Faculty of Electrical Engineering, University of Ljubljana

Key words: digital ASIC design, pre-designed cells, digital circuit syntesis, transistor-level cell optimisation

**Abstract:** Pre-designed cells, such as buffers, adders and flip-flops are provided by foundries and used in digital circuit design. Actual cell implementation at transistor level is not considered during the synthesis of a digital circuit. The paper describes four cases of transistor-level cell optimisation that can be employed to reach arbitrary customisation. Due to the landscape of the cost functions a global optimisation method was used. The results show that up to 80% improvement of the properties of pre-designed cells can be obtained.

### Optimizacija gradnikov digitalnih vezij

Kjučne besede: načrtovanje digitalnih integriranih vezij, splošni osnovni gradniki, sinteza digitalnih vezij, optimizacija na tranzistorskem nivoju

Izvleček: Načrtovalec sestavi digitalno integrirano vezje iz osnovnih gradnikov, kot so medpomnilniki, seštevalniki, flip-flopi ipd. Knjižnico z naborom osnovnih gradnikov zagotovi izdelovalec integriranih vezij. Izvedba posameznega gradnika na tranzistorskem nivoju med postopkom načrtovanja ni več pomembna. Članek se ukvarja z možnostjo prilagoditve posameznega gradnika točno na zahteve, v katerih deluje. Opisani so štije primeri optimizacije osnovnih gradnikov na tranzistorskem nivoju. Pri tem je bila zaradi narave kriterijske funkcije uporabljena globalna optimizacijska metoda. Tako je možno doseči najboljšo prilagoditve gradnika na specifične zahteve. Rezultati kažejo do 80% izboljšanje glede na lastnosti splošnega gradnika, ki ga zagotovi izdelovalec integriranih vezij.

#### 1. Introduction

Digital circuits are not designed at transistor level any more. The designers work with pre-designed digital cells or blocks /1/ such as buffers, logic gates, adders, latches and flipflops etc. Those are then grouped into higher-level building elements like registers, decoders, comparators, counters, etc. The foundries usually provide a whole library of digital cells for their various IC manufacturing processes. It is customary that there are several versions of every cell like low and high voltage version, low power version, high speed version, and of course various combinations of those such as low power low voltage version etc. Every cell has a detailed description. Their characteristics like setup, delay, hold, minimum impulse width, recovery, etc. times are given for different output loads and input signal slopes. Power consumption, input capacitances, area, etc. are also given. Transistor-level simulations do not take place during the circuit design. Analog integrated circuit simulators /2,3,4,5/ are replaced with higher-level simulations /6,7/, where the circuit response is calculated according to cell descriptions. Since the simulation is no longer performed on transistor level, it is significantly faster. On the other hand the designers still like to do their final check with a classical integrated circuit simulator to verify the actual behaviour of the circuit before production takes place.

Various versions of a cell share the same transistor topology in most cases. They differ only in transistor sizing, usually only in transistor channel widths. Different versions of the same cell are actually a result of cell optimisation to different demands. We decided to test a foundry-provided digital transistor-level cell library to verify if it is possible to achieve better performance. We also wanted to see if cell optimisation makes sense, so that by automating it in the future one could generate an arbitrary cell version customised to particular needs of the circuit. Since we are searching for the most appropriate transistor channel widths, cell optimisation is performed on transistor level with an analog integrated circuit simulator /2,3,4,5/.

The left side of figure 1 shows digital Application Specific Integrated Circuit (ASIC) design flow /8,9/. Design starts after architectural and electrical specifications of the circuit are set. Register Transfer Level (RTL) coding is used to implement the specifications. A Hardware Description Language (HDL) such as VHDL (Very High Speed Integrated Circuit VHSIC HDL) and Verilog (Verifying Logic) is used. The circuit described in HDL is simulated. It is of critical importance that appropriate input stimuli testing the circuit are provided for the simulation. The reduction of a HDLdescribed circuit into a gate-level netlist is called synthesis. Synthesis also performs gate-level optimisation with regard to timing constraints defining signal-clock relationships. The cell library contains foundry-provided descriptions of cells for synthesis. A cell description is extracted from transistor-level simulations. Verification validates the RTL code against the gate-level netlist. Static Timing Analysis (STA) double checks the timing constraints fed to the synthesis. Placement, Clock Tree (CT) insertion, and routing take place during the layout phase. Post layout verification and an additional STA are performed to check the result of the layout phase.

Our main idea (depicted on the right side of figure 1) is to introduce transistor-level cell optimisation into synthesis. Synthesis in step A produces a gate-level netlist of the circuit and a list of timing constraints for all gates. Cell implementations are selected from the foundry-provided library according to these constraints. What follows is our proposed optimisation step which sizes the topologies of selected foundry-provided cells according to the constraints



Fig. 1: Digital ASIC design flow

obtained in step A of the synthesis. Some cells selected in step A barely fulfil the timing requirements while others have a broad safety margin. Former ones can be optimized for speed while latter ones can be optimized for power consumption, without affecting circuit's performances. Cell descriptions are then extracted from transistor-level simulations of optimized cells resulting in a customised cell library. This library is then used as input to synthesis in step B which selects appropriate cells from the customised library for the gate-level topology obtained in synthesis step A. A successful synthesis in step B (back annotation) validates the design that uses customised cells instead of foundry-provided ones.

#### 2. Cost definition

The criterion is a mathematical function, which evaluates a circuit candidate or a particular set of transistor channel widths w. It is calculated from the circuit's response at given transistor widths. The better the candidate, the lower the criterion value. By establishing a mathematical criterion or cost function, one can always decide, which circuit is better /10/.

Usually only the transient response matters for a digital circuit. It reveals the circuit's dynamics, general time domain behaviour and power consumption. Only one transient analysis per circuit candidate is needed. Since the goal for digital circuits is always the same (as fast as possible for as little power as possible), there are only few properties of interest:

- chip area,
- various time measurements such as slopes, delays, minimal impulse widths, setup, hold and recovery times, etc., and
- power consumption, which can be expressed as time integral of power supply current.

The first property is defined by the channel widths and the others can be obtained from the transient response.

A goal value gi for every property has to be chosen. Each measured property contributes a portion to the cost function value. Until the goal is not reached the contribution is proportional to the goal violation. When the goal is reached or even exceeded it becomes negative. For this purpose we define the contribution ci(xi) (1) of a particular measurement xi(w). It is a broken linear function depicted in figure 2.

$$c_{i}(x_{i}) = \begin{cases} \frac{t_{i}}{g_{i}}(x_{i} - g_{i}) & x_{i} \leq g_{i} \\ \frac{p_{i}}{g_{i}}(x_{i} - g_{i}) & x_{i} > g_{i} \end{cases}$$
(1)



Fig. 2: Measurement contribution function

Since all measurements listed above have to be as low as possible, only one type of contribution function ci(xi) (1) is sufficient. The final cost function is a sum of n contributions (2).

$$c(\mathbf{x}) = \sum_{i=1}^{n} c_i(x_i) \tag{2}$$

With appropriate settings of goals gi, trade-off (ti), and penalty weights (pi) optimisation for an arbitrary version of the circuit (high speed or low power version etc.) can be achieved. Individual measurements can be made more or less important by adjusting ti and pi. The optimal circuit parameters  $w_{opt}$  are those where the cost function (2) has its global minimum (3).

$$c(\mathbf{x}(\mathbf{w}_{opt})) \le c(\mathbf{x}(\mathbf{w}))$$
(3)

Also if ti « pj holds for every pair i  $\neq$  j, the cost function will guarantee that the first objective of the optimisation is to achieve all the goals. If only one of the goals is not achieved, its contribution will be very high compared to other contributions. Therefore the optimisation process tends to fulfil all the goals. Not achieving one of the goals cannot be compensated by exceeding others.

Circuit candidates, that do not converge, need special treatment regarding the cost function evaluation. Such pathological candidates normally appear during the optimisation process and in general cannot be avoided. When the simulation fails, transient response is not available and the measurements x cannot be determined. When a particular measurement  $x_i$  is not known, its contribution  $c_i(x_i)$  will be set to some large value  $c_{maxi}$ . A pathological candidate produces a huge cost value and represents a bad try. The same goes for semi-pathological candidates, for which the transient response is available, but the circuit does not behave as expected. In such cases one or more required measurements still cannot be determined. For instance a slope cannot be measured if there is no edge in the response.

To speed up the optimisation process semi-pathological circuits are additionally penalised by auxiliary measurements. Auxiliary measurements enforce the correct transient response. With digital circuits this is again a fairly simple task. Assuming proper behaviour, the state of the circuit at particular time points is known in advance. For instance, if some node voltage at some time point should be high but is not, an auxiliary measurement will considerably increase the cost value. On the other hand an auxiliary measurement will not interfere with the cost value when the selected node voltage fulfils the expectations. To penalise semi-pathological circuits auxiliary measurements have large penalty weights. Their trade-off weights are set to zero to eliminate them from the cost function when the circuit behaves as expected.

#### 3. Benchmark circuits

Four pre-designed foundry-provided digital cells were used as benchmarks. The four cells are half adder (fig. 4), full adder (fig. 5), and D flip-flop with and without scan inputs (figs. 6 and 7). In our opinion they represent a fair sample of the cell library. Although there are many different cells,



the transistor configurations remain the same. The most characteristic transistor arrangements are included in the selected cells. The foundry-provided transistor models for the digital cell library are proprietary and cannot be revealed.

Informacije MIDEM 40(2010)3, str. 167-173

A benchmark cell is put into a test bench circuit providing input signals, power supply voltage and output loads. Figure 3 shows a test bench circuit ready for simulation. Values of input signals' slopes, power supply voltage, and output capacitances vary according to the operating corner conditions described later.

For every cell the transistor topology and input test signals are given. Time-domain measurements are described. Together with chip area and power consumption they represent the measurements  $x_i$ , that contribute to the cost function (2). The chip area is the sum of all transistor areas, and the power consumption is the time integral of the power supply current.



Fig. 4: Half adder

There were 24 time-domain measurements for the half adder case. 12 input to output delays and 12 output signal slopes were taken into account. For instance a delay from falling b to rising s edge is depicted in figure 4. The figure also depicts a rising edge slope of co at rising a.

There were 24 time-domain measurements for the full adder case. 12 input to output delays and 12 output signal slopes were taken into account. A delay from rising ci to rising s edge is depicted in figure 5. A falling edge slope of co at rising b is also shown.

Fig. 3: Test bench circuit



Fig. 5: Full adder

There were 20 measurements for the D flip-flop case without scan inputs. Beside 6 input to output delays and 6 output signal slopes, output setup, input hold, recovery after reset, minimal clock, and minimal reset impulse duration were taken into account. Time-domain measurements excluding the delays and slopes are depicted in figure 6. A delay from rising c to falling q edge and a falling edge slope of qn at rising c are shown for illustration.

There were 26 measurements for the D flip-flop case with scan inputs. Beside 6 input to output delays and 6 output signal slopes, output setup, input hold, recovery after re-



*Fig. 6: D flip-flop without scan inputs* 

set, minimal clock, and minimal reset impulse duration were taken into account. Time measurements, excluding the delays and slopes, are depicted in figure 7. A delay from rising c to falling q edge and a falling edge slope of qn at rising c are shown for illustration.



Fig. 7: D flip-flop with scan inputs

The delay was defined as the time between the points where the input voltage reaches its 50% level until the output voltage reaches its 50% level. The slope was defined as the time between the 10% and 90% signal level.

#### 4. Optimisation

Two optimisation runs were performed for every cell. In the first run the goal was to obtain a fast circuit, but at the same time the power consumption should not exceed that of the foundry-provided cell (optimisation for speed). In the second run the goal was to decrease the power consumption while keeping the timings at least as good as those exhibited by the foundry-provided cell (optimisation of power consumption).

The two optimisation runs differed in trade-off (t<sub>i</sub>) and penalty (pi) weights in (1). Actually only the power consumption trade-off weight was varied making the measurement more or less important in comparison to area and time measurements. The original cell properties were used as goals gi. The penalty weights pi (» tj) were identical for all runs.

There were 65 process and operating condition corners taken into account. They consisted of:

- \_ four process corners (worst power (wp), worst speed (ws), worst one (wo), and worst zero (wz)),
- two temperatures (-25°C and 105°C), -
- two power supply voltages (2V and 3.3V for adders \_ and 3V and 3.6V for flip-flops),
- 10fF and 220fF output loads (and an additional 230fF output load for adders) and
- 60ps and 4ns input signal slopes (and additional 90ps and 6ns slopes for adders).

The typical corner was added to the 64 extreme combinations, resulting in 65 corners.

Out of 65 corners only a few are important. For instance power consumption is always the highest in wp/-25°C/ 3.3V(or 3.6V)/220fF/4ns corner. For this reason evaluating power consumption in other corners is needless. Number of corners was therefore significantly reduced since only the worst corner measurement value is considered in the cost function. Performing analyses and evaluating measurements was unnecessary in most corners. So, only three<sup>1</sup> corners for the adders and four<sup>2</sup> corners for the flip-flops were taken into account during the optimisation. The final results were verified across all 65 corners.

Transistor channel widths were independent optimisation variables with explicit constraints from 400nm to 2um (3um in half adder case). All transistors shared the same channel length, which resulted in one additional optimisation variable with explicit constraints from 350nm to 2um. Therefore the number of optimisation variables was equal to the number of transistors plus one, which means 15 for half adder, 29 for full adder, 33 for D flip-flop without scan inputs, and 41 for D flip-flop with scan inputs. The channel length variable was added for generality because we expected that it tends to be as small as possible. The optimisation process did not confirm that, since the resulting length was not equal to the lower constraint (350nm) in all runs.

Due to many optimisation variables and a harsh cost function landscape, which will be explained later, a robust global optimisation method was used. We decided for Parallel Simulated Annealing with Differential Evolution (PSADE) /11/, since it is able to run on several processors in parallel. We used eight AMD Athlon 3GHz processors. The method was started from a random initial point and was stopped after 150000 evaluations.

#### 5. Results

The results of all eight optimisation runs are listed in tables 1 to 4. The foundry-provided original cell properties are compared to the results from the speed and power optimisation runs.

#### Table 1: Results for the half adder

|                          | foundry             | high speed          | low power           |
|--------------------------|---------------------|---------------------|---------------------|
| area [pm <sup>2</sup> ]  | 47.6                | 47.6                | 33.5                |
| delays <sup>a</sup> [ns] | 7.28/10.4/7.11/10.6 | 3.63/4.86/3.22/5.28 | 7.25/9.85/6.92/10.3 |
| delays <sup>b</sup> [ns] | 8.78/8.31/10.9/9.46 | 3.95/5.65/4.57/6.62 | 8.13/8.21/8.61/9.41 |
| $delays^c$ [ns]          | 7.56/9.44/10.7/9.36 | 3.78/6.18/4.15/6.65 | 6.54/9.42/8.28/9.30 |
| slopes <sup>a</sup> [ns] | 10.1/15.4/10.1/15.5 | 4.27/3.61/4.26/3.62 | 8.23/13.0/8.24/13.0 |
| slopes <sup>b</sup> [ns] | 10.1/10.1/15.4/15.4 | 3.85/3.81/3.14/3.57 | 8.31/8.17/7.28/7.66 |
| slopes <sup>c</sup> [ns] | 10.1/10.1/15.4/15.4 | 3.85/3.83/3.13/3.57 | 8.18/8.31/7.28/7.66 |
| power [pAs]              | 5.66                | 5.63                | 4.58                |

 $\begin{array}{c} ^{a}a\ \mathrm{to}\ co^{\uparrow}\ /\ a\ \mathrm{to}\ co^{\uparrow}\ /\ b\ \mathrm{to}\ co^{\downarrow}\ /\ b\ \mathrm{to}\ co^{\downarrow}\ /\ b\ \mathrm{to}\ co^{\downarrow}\ /\ b\ \mathrm{to}\ s\downarrow \\ ^{b}a^{\uparrow}\ \mathrm{to}\ s^{\uparrow}\ /\ a^{\downarrow}\ \mathrm{to}\ s^{\downarrow}\ /\ a^{\downarrow}\ \mathrm{to}\ s\downarrow \ /\ a^{\downarrow}\ \mathrm{to}\ s\downarrow \\ ^{c}b^{\uparrow}\ \mathrm{to}\ s^{\uparrow}\ /\ b^{\downarrow}\ \mathrm{to}\ s^{\downarrow}\ /\ b^{\downarrow}\ \mathrm{to}\ s\downarrow \\ \end{array}$ 

Table 1 summarises results for the half adder benchmark circuit. The results show that both optimisation runs found a solution with properties (area, timings and power consumption) at least as good as in the original foundry-provided cell. The speed optimisation run resulted in improvements up to 80% without an increase in power consumption. Vice-versa in power optimisation an improvement of 19% without an increase in timings was obtained. Symbols  $\uparrow = \downarrow$  in figure 4 depict transistor channel width changes after both optimisation runs with respect to the foundryprovided values. The left symbol corresponds to speed optimisation and the right symbol corresponds to power optimisation. Speed optimisation resulted in final channel

wp/-25°C/3.3V/220fF/4ns, ws/105°C/2V/230fF/90ps, ws/105°C/2V/230fF/6ns 1

wp/-25°C/3.6V/220fF/4ns, ws/105°C/3V/10fF/60ps, ws/105°C/3V/220fF/60ps, ws/105°C/3V/220fF/6ns 2

length of 350nm (as expected). Interestingly the final channel length was to 450nm in the power optimisation run. Input capacitances were not taken into account in the cost function. But since input transistor gates became smaller the input capacitances also decreased. To find a minimum 143347/125519 evaluations were needed in the speed/ power optimisation run. The first candidate circuit, that was better than the foundry-provided original, was found after 1661/1412 evaluations.

#### Table 2: Results for the full adder

|                          | foundry             | high speed          | low power           |
|--------------------------|---------------------|---------------------|---------------------|
| area [pm <sup>2</sup> ]  | 46.9                | 46.9                | 43.6                |
| delays <sup>a</sup> [ns] | 8.08/10.4/8.79/12.2 | 4.72/4.88/6.33/6.55 | 8.06/10.2/8.63/12.2 |
| $delays^{b}$ [ns]        | 8.48/10.7/8.85/12.6 | 4.64/5.05/6.51/6.36 | 8.44/10.5/8.84/11.4 |
| $delays^c$ [ns]          | 8.72/10.9/8.68/12.8 | 4.95/5.51/6.83/6.52 | 8.70/10.5/8.65/12.2 |
| slopes <sup>a</sup> [ns] | 10.2/15.5/10.2/15.5 | 4.87/3.71/4.81/3.51 | 8.81/10.7/7.00/10.6 |
| slopes <sup>b</sup> [ns] | 10.2/15.5/10.1/15.4 | 4.78/3.57/4.79/3.48 | 8.76/10.6/6.96/10.5 |
| slopes <sup>c</sup> [ns] | 10.2/15.5/10.1/15.5 | 4.80/3.60/4.83/3.62 | 8.77/10.5/6.95/10.6 |
| power [pAs]              | 7.46                | 7.23                | 5.83                |

 $a a \text{ to } co\uparrow / a \text{ to } co\downarrow / a \text{ to } s\uparrow / a \text{ to } s\downarrow$  $b b \text{ to } co\uparrow / b \text{ to } co\downarrow / b \text{ to } s\uparrow / b \text{ to } s\downarrow$ 

 $^{c}ci$  to  $co\uparrow$  / ci to  $co\downarrow$  / ci to  $s\uparrow$  / ci to  $s\downarrow$ 

The results for the full adder circuit are in table 2. Again both optimisation runs found a solution at least as good as that provided by the foundry. The speed optimisation resulted in timing improvements up to 77% while keeping the same power consumption. Power optimisation resulted in an improvement up to 22%. Transistor channel width changes during both optimisations are depicted in figure 5. The resulting channel length was 350nm/450nm. 139817/91550 candidate circuits were evaluated for the speed/power optimisation. The first circuit candidate, better than the foundry-provided circuit, appeared in the 3556<sup>th</sup>/2286<sup>th</sup> evaluation. Since input transistor gates became smaller the input capacitances decreased.

#### Table 3: Results for the D flip-flop without scan inputs

|                                                                                | foundry                                                                            | high speed          | low power           |
|--------------------------------------------------------------------------------|------------------------------------------------------------------------------------|---------------------|---------------------|
| area [pm <sup>2</sup> ]                                                        | 59.1                                                                               | 59.1                | 48.5                |
| delays <sup>a</sup> [ns]                                                       | 5.17/6.71/4.80/7.27                                                                | 2.95/2.78/2.67/2.84 | 4.19/5.97/4.76/6.54 |
| $delays^{b}$ [ns]                                                              | 4.19/6.03                                                                          | 1.82/1.60           | 3.28/4.17           |
| slopes <sup>a</sup> [ns]                                                       | 6.49/9.96/6.41/9.95                                                                | 4.45/2.61/3.03/3.11 | 6.47/7.88/6.38/9.60 |
| $slopes^{b} [ns]$                                                              | 6.41/9.95                                                                          | 3.03/2.32           | 6.37/7.77           |
| $\operatorname{setup}^{c}[\operatorname{ns}]$                                  | 1.23/0.984                                                                         | 0.625/0.983         | 1.03/0.982          |
| hold <sup>c</sup> [ns]                                                         | 0.566/0.560                                                                        | 0.324/0.325         | 0.548/0.559         |
| recovery <sup>d</sup> [ns]                                                     | 1.27                                                                               | 0.665               | 1.08                |
| width <sup>e</sup> [ns]                                                        | 1.71/1.30/0.811                                                                    | 1.54/0.970/0.375    | 1.70/1.30/0.463     |
| power [pAs]                                                                    | 6.22                                                                               | 5.86                | 4.96                |
| $ac$ to $q\uparrow / c$ to<br>brn to $qn / rncd\uparrow to c / d\downarrow to$ | $\frac{q\downarrow / c \text{ to } qn\uparrow}{to q} / c \text{ to } qn\downarrow$ |                     |                     |

<sup>e</sup>minimum impulse width of c<sub>high</sub> / c<sub>low</sub> / rn

Similar results were obtained for the D flip-flop circuits (tables 3 and 4). Speed optimisation resulted in speed improvement up to 77% while keeping the same power consumption for both cases. Power optimisation resulted in 20% improvement for both cases. Transistor channel width changes during both optimisations are depicted in figures 6 and 7. The final channel length was 350nm/390nm after 149689/121406 evaluations (speed/power optimisation) of the D flip-flop without scan inputs. For the D flipflop with scan inputs the final channel length was 350nm/ 420nm after 145247/147554 evaluations. The first circuit that performed better than the foundry-provided cell was found in the 3285<sup>th</sup>/3398<sup>th</sup> and 4104<sup>th</sup>/3505<sup>th</sup> evaluation, respectively. Since input transistor gates are again smaller than in the original cell the input capacitances were decreased by the optimisation.

#### Table 4: Results for the D flip-flop with scan inputs

|                            | foundry                 | high speed              | low power               |
|----------------------------|-------------------------|-------------------------|-------------------------|
| area [pm <sup>2</sup> ]    | 73.3                    | 73.3                    | 63.5                    |
| delays <sup>a</sup> [ns]   | 5.17/6.71/4.80/7.27     | 2.97/2.73/2.69/2.91     | 4.31/6.25/4.79/6.57     |
| delays <sup>b</sup> [ns]   | 4.19/6.03               | 1.82/1.55               | 3.24/4.41               |
| slopes <sup>a</sup> [ns]   | 6.49/9.96/6.41/9.95     | 4.56/2.57/3.03/3.33     | 6.46/8.18/5.79/9.65     |
| slopes <sup>b</sup> [ns]   | 6.41/9.95               | 3.03/2.28               | 5.79/8.05               |
| setup <sup>c</sup> [ns]    | 2.01/1.44/1.80/1.46     | 1.01/1.27/1.27/1.20     | 1.71/1.44/1.80/1.45     |
| hold <sup>c</sup> [ns]     | 0.563/0.561/0.566/0.561 | 0.260/0.263/0.258/0.264 | 0.553/0.560/0.555/0.560 |
| recovery <sup>d</sup> [ns] | 2.13                    | 1.11                    | 1.84                    |
| width <sup>e</sup> [ns]    | 1.70/1.95/8.15          | 1.49/1.23/3.91          | 1.70/1.93/0.621         |
| power [pAs]                | 6.22                    | -5.86                   | 4.96                    |

"c to q 1 / c to q 1 / c to qn 1 / c to qn 1

m to qn / rn to q  $cd\uparrow$  to  $c / d\downarrow$  to  $c / sd\uparrow$  to  $c / sd\downarrow$  to c  $d^{rn}$  to c

n impulse width of chigh / clow / rm

For all of the test circuits only a few thousand evaluations were needed for finding the first circuit that performed better than the foundry-provided cell. Most of the 150000 available iterations were spent to fine tune the circuit. At first glance this should be an easy task for a fast local optimisation method /12/. For that reason we used PSADE to provide a useful initial point for the local method. We further speculated that the speed optimised cell should be a suitable initial guess for the power optimisation and viceversa. Unfortunately all our attempts to accelerate the optimisation by using a local method failed. The answer lies in the harsh cost function landscape /13/. Three cost function profiles for the D flip-flip with scan inputs are depicted in figure 8. Each profile represents a cross section of the cost function along one transistor channel width. All other widths had foundry-provided values.



Fig. 8: Cost function profiles for three transistor widths

From profiles it can be seen that the main cause for the failure of the local method is numerical noise. The noise is a result of limited numerical accuracy and non-infinitesimal time-step in transient analysis. By reducing the time-step numerical noise becomes smaller. Even with a fairly small time-step the cost function landscape still caused problems for local methods. Because local optimisation methods failed on our circuits we were forced to rely entirely on

a robust global method. Due to the small time-step the number of calculated points in the transient analysis becomes very high. This leads to prolonged simulations and long optimisation runs. Despite using several processors in parallel one optimisation run took one day for the smallest half adder circuit up to a week for the D flip-flop with scan inputs.

#### 6. Summary

Pre-designed foundry-provided digital cells are designed to be general. They are not meant to be altered at transistor level and represent a pool of cells available to the synthesis. However they can be significantly improved by changing transistor channel widths and lengths. Using transistor-level optimisation techniques we managed to get up to 80% faster cells at the same power consumption and saved up to 20% of power at the same cell speeds. Therefore instead of using only foundry-provided cells each cell in a larger digital or mixed circuit could be independently optimised to satisfy the specific demands for the cell. A circuit as a whole would become faster with smaller power consumption. By automating the transistor-level optimisation procedure during the synthesis the entire process of digital circuit design would became more efficient. The main obstacle at the moment is the time needed for the optimisation. Noisy cost functions are the main reason why fast local methods cannot be used to speed up the optimisation. Also detailed property extraction of the optimised cells and back annotation of the synthesis was not done since the extraction and synthesis tools were not available to the authors.

#### References

- /1/ H. Kaeslin. Digital Integrated Circuit Design: From VLSI Architectures to CMOS Fabrication. Cambridge University Press, 2008
- /2/ HSPICE ® Simulation and Analysis User Guide. Synopsys R, 2005
- /3/ K.S. Kunderth. The Designer's Guide to SPICE and Spectre. Kluwer Academic Publishers, 1995
- /4/ Virtuoso ® Spectre ® Circuit Simulator User Guide. Cadence Design Systems, Inc., 2008
- /5/ T. Tuma, Á. Bürmen. Circuit Simulation with SPICE OPUS, Theory and Practice. Birkhauser, 2009
- /6/ D.E. Thomas, P.R. Moorby. The Verilog Hardware Description Language, fifth edition. Kluwer Academic Publishers, 2003
- /7/ V.A. Pedroni. Circuit Design with VHDL. Massachusetts Institute of Technology, 2004
- /8/ P. Kurup, T. Abbasi. Logic Synthesis Using Synopsys ®, second edition. Kluwer Academic Publishers, 1997
- /9/ H. Bhatnagar. Advanced ASIC Chip Synthesis Using Synopsys
   ® Design Compiler TM Physical Compiler TM and PrimeTime ®, second edition. Kluwer Academic Publishers, 2002
- /10/ Á. Bürmen, D. Strle, F. Bratkovič, J. Puhan, I. Fajfar, T. Tuma. Automated Robust Design and Optimization of Integrated Circuits by Means of Penalty Functions, AEÜ, International Journal

- /11/ J. Olenšek, Á. Bürmen, J. Puhan, T. Tuma. DESA: A New Hybrid Global Optimization Method and Its Application to Analog Integrated Circuit Sizing, Journal of Global Optimization, Volume 44, No. 1, pages: 1-25, 2008
- /12/ R. Hooke, T. Jeeves. Direct Search Solutions of Numerical and Statistical Problems, Journal of the Association for Computing Machinery, Volume 8, No. 2, pages: 212-229, 1961
- /13/ Á. Bürmen, I. Fajfar, T. Tuma. Combined Simplex-Trust-Region Optimization Algorithm for Automated IC Design, Proceedings of ECCTD'07 European Conference on Circuit Theory and Design, pages: 543-546, 2007

Asst. Prof. Dr. Janez Puhan, univ.dipl.ing.el. Faculty of Electrical Engineering, University of Ljubljana Tržaška cesta 25, 1000 Ljubljana tel.: (01) 4768 322, fax: (01) 4264 630 e-mail: janez.puhan@fe.uni-lj.si

Assoc. Prof. Dr. Dušan Raič, univ.dipl.ing.el. Faculty of Electrical Engineering, University of Ljubljana Tržaška cesta 25, 1000 Ljubljana tel.: (01) 4768 324, fax: (01) 4264 630 e-mail: dusan.raic@fe.uni-lj.si

> Prof. Dr. Tadej Tuma, univ.dipl.ing.el. Faculty of Electrical Engineering, University of Ljubljana Tržaška cesta 25, 1000 Ljubljana tel.: (01) 4768 329, fax: (01) 4264 630 e-mail: tadej.tuma@fe.uni-lj.si

> Prof. Dr. Sašo Tomažič, univ.dipl.ing.el. Faculty of Electrical Engineering, University of Ljubljana Tržaška cesta 25, 1000 Ljubljana tel.: (01) 4768 432, fax: (01) 4264 630 e-mail: saso.tomazic@fe.uni-lj.si

Asst. Prof. Dr. Arpad Burmen, univ.dipl.ing.el. Faculty of Electrical Engineering, University of Ljubljana Tržaška cesta 25, 1000 Ljubljana tel.: (01) 4768 322, fax: (01) 4264 630 e-mail: arpad.buermen@fe.uni-lj.si

## AUTOMATIC TUNING OF ELECTRICAL SMALL ANTENNAS

Maja Atanasijević-Kunc<sup>1</sup>, Vinko Kunc<sup>2</sup>, Maksimilijan Štiglic<sup>3</sup> <sup>1</sup>University of Ljubljana, Faculty of Electrical Engineering, Slovenia <sup>2</sup>IDS d.o.o, Ljubljana, Slovenia <sup>3</sup>Austiamikrosystems AG, Austria

Key words: electrically small antennas, antenna tuning, phase measurement

Abstract: RF frequency band is widely used for different applications such as RFID and digital radio communication. RFID systems operating at 13.56MHz are used for access control, public transport and credit cards. A 300kHz band centered around 27MHz which has been used for citizen band radio (CB) and is now finding new use for digital radio devices mainly for human interface applications. All these devices use electrically small antennas since electrically proper antennas would have dimensions in meters. Electrically small antennas are in essence LC tanks that project magnetic field (magnetic dipole) or electrical field (electrical dipole) into their surroundings. LC tanks operating as antennas are relatively sensitive to mistuning due to large dimension of coils (in case of magnetic dipole). Proximity of conducting material changes the inductance of the coil due to eddy current generated in the conductive material. Material with different dielectric constant changes coil's parasitic capacitance. Proposed automatic antenna tuning system solves this problem for low power and medium power systems up to 0.5W

The comparison of amplitude and phase criteria of resonance is given in section 2. Section 3 describes the implementation of phase measurement which offers sufficient precision for antenna tuning. The actuation part of the automatic tuning is described in section 4. Added are some practical data gained by implementing this principle in RFID reader for proximity standards on 13.56MHz.

## Samodejno uglaševanje električno majhnih anten

Kjučne besede: električno majhne antene, uglaševaje anten, meritev faze

Izvleček: Frekvenčno področje RF se uporablja za različne aplikacije, kot so RFID sistemi in sistemi za digitalno brezžično komunikacijo. V tem področju delujejo tudi RFID sistemi z nosilno frekvenco 13.56MHz, ki jih uporabljamo za kontrolo dostopa, plačilo javnega prevoza in tudi za kreditne kartice. 300kHz pas v okolici 27MHz, ki je bil včasih namenjen amaterskim CB postajam pa sedaj pogosto uporabljajo sistemi za bližinsko brezžično digitalno komunikacijo, kot so brezžične računalniške miške ali pa športne ure s tipali. Vse te naprave uporabljajo električno majhne antene, saj bi drugače antene bile bistveno prevelike. Električno majhne antene so LC nihajni krogi, ki v okolico širijo magnetno (magnetni dipol) ali električno (električni dipol) polje. Takšne antene pa so občutljive na razglasitev, saj so dimenzije tuljav (v primeru magnetne dipol antene) relativno velike. Če takšni tuljavi približamo prevodni material, se njena induktivnost zniža zaradi vrtinčnih tokov, induciranih v prevodnem materialu. Material z drugačno dielektričnostjo pa spremeni parazitno kapacitivnost tuljave. Predlagana avtomatska uglasitev rešuje te probleme za sisteme z malo in srednjo oddajano močjo do 0.5W.

Primerjava meritve faze in amplitude kot kriterija resonance, je podana v drugem razdelku. V tretjem opišemo izvedbo meritve faze, ki zagotavlja točnost, potrebno za izvedbo avtomatske uglasitve antene. Vezje za uglasitev antene pa je opisano v četrtem razdelku. Podani so tudi nekateri praktični vidiki uporabe takšnega sistema v izpraševalniku pametnih kartic delujočem v frekvenčnem področju 13.56MHz.

#### 1. Introduction

Electrical small antennas can be configured as parallel resonance circuit, serial resonance circuit or a combination of both. Pure parallel resonance is used for receiving antenna where high impedance of parallel resonance yields higher receive voltage. Pure serial resonance is best fit for high power transmit antenna where low impedance of serial resonance enables high power at limited supply voltage. Combination of both offers a very versatile solution which can be used for low and mid power transmit systems (from few mW to 1W) and can provide separate connections points for receive and transmit side. Fig. 1 presents eclectically small magnetic dipole antenna employing such configuration. The transmit power amplifier is connected to RF\_tran node and the ratio between the serial capacitance Cser and the parallel capacitance composed of the sum of Cpar and capacitor divider C2/C1 defines the impedance on the node RF\_tran. This in turn defines the power the amplifier will deliver to the antenna. The power can thus be efficiently defined by the ration of parallel and serial capacitance. Receiving signal is taken

directly from the parallel resonance where the signal is highest. If the transmitter is not active during receive the node RF\_out is connected to ground not to change the resonant frequency. If the transmit is active during receive as in case of RFID systems, receive signal level can be adjusted using a capacitive divider (C1/C2 in fig. 1) since the signal amplitude on the coils is typically much higher than the supply voltage of the chip.

#### 2. Basic principle of antenna tuning

It is well known fact that the voltage amplitude on antenna coil connections is highest when the LC tank is in resonance. When in resonance the phase difference between the signal on node RF\_tran, where the transmit signal is forced to the antenna, and signal on node ant is 90 degrees. This gives us two evaluation parameters on which the antenna tuning system can be based. Figure 2 presents the dependence of the signal amplitude on node ant as a function of frequency for three different Q factors in canvas one and the phase difference for same frequency span and Q factors in canvas 2. Solid lines are simulation re-

sults for Q=10, dashed lines for Q=20 and dash-dot lines for Q=30.

Phase difference is obviously a much better criterion for evaluation the antenna tuning since it not relative but absolute criteria. In case of amplitude the maximum has to be indentified before amplitude can be used as tuning criteria but the phase criteria is absolute. Phase also contains the information if the antenna resonance is to low or too high, while the amplitude does not discriminate between the two sides. In spite of this practically all existing systems use amplitude criteria /1, 2/ for determination the antenna resonance. The amplitude is much easier to measure compared to the phase. The phase measurement is extremely sensitive to the difference in delays in both comparison branches what makes the implementation of phase based calibration more problematic in spite of its obvious advantages as resonance criteria. Solving these problems is the key to an efficient antenna tuning system.



Fig. 1: Electrically small antenna with separate connection nodes fro receive and transmit.

#### 3. Phase measurement system

The signal the transmit output is generating is usually square-wave rail to rail signal since this is the most power efficient way to deliver power to the antenna. The antenna LC tank shapes this into sine-wave signal so the signal entering the receive path is sine-wave shaped. The input signal must be scaled to fit the chip's supply rails by a capacitive divider. Using resistor divider would introduce additional phase shift due to the input capacitance of chip's input pin. This means the phase measurement must be done on two signals, one being a square-wave (transmitter output signal) and one being sine-wave, having similar amplitude close to chip supply rails. To use analog down-conversion mixing stage for phase measurement we would need to scale both signals to proper amplitude (~200mVp) and shift their DC levels accordingly using internal voltage

reference /3/. In case of significant variation in receive signal amplitude an automatic amplitude control system would be required /4/. We decided for a less complicated digital mixing system. This means both signals have to be square-wave shaped so the input signal has to be digitized before the digital mixing. The digitizing comparator introduces additional delay in the range of few nanoseconds thus corrupting the measurement since 1ns corresponds to 4 degree in phase at 13.56MHz. To compensate the comparator delay over the complete operational supply voltage and temperature range we introduced same comparator also in the second signal branch although the transmit output signal is already square shaped. The proposed solution is presented on figure 2.



Fig. 2. Dependence of amplitude and phase on the frequency



#### Fig. 2: Compensation of digitizing delay by adding a dummy comparator

Special care was taken to equalize the comparator delay for sine and square shaped signals so that the delay does not differ for more than 0,3ns. Using the capacitor divider to scale the signal and adding the dummy comparator to equalize the delay in both signal paths enables the use of digital mixing stage as presented in fig. 3.

In case of 90 degree phase difference between the two input signals the resulting signal is twice the input frequency with 50% duty ratio. The high frequency is removed with second order passive low pass so that the DC mixing component is presented on the phase output. In case of 90



Fig. 3: Digital mixing circuit with improved PSRR

degree phase difference the DC level is equal to half of supply level. This means the absolute value of the phase output also dependant on supply level resulting in extremely poor PSRR. To improve PSRR the output was made differential with half of supply as the reference voltage. PSRR stability owner frequency was improved additionally by introducing the same second stage low pass filtering used in signal path is also in the reference voltage path. The result is high and frequency stable PSRR.

#### 4. Implementation of antenna tuning

Once we have reliable criteria of antenna resonance like phase measurement the antenna tuning can be performed by changing the inductance or capacitance of the LC tank. The implementation can be done either by parallel elements or serial elements. To use serial elements for tuning low resistance switches have to be used not to degrade the quality factor of the LC tank. These switches would shunt part of the inductance or serial capacitance so all the currents in LC tank would flow through them. The voltage amplitude of the LC tank is typically much higher (10Vp to 100Vp for mid power systems) than the supply voltage of the integrated circuit so shunting even 15% of the inductor or capacitor would require capability to handle voltages far above the supply rail of the chip together with the demand for low on resistance. As a rule high voltage and low resistance are conflicting demands for switch deign. For practical realization the use of parallel elements (parallel capacitors) proved to be more feasible to realize with simple CMOS technology. The simplest high voltage element to be implementing in standard CMOS technology flow is a high voltage NMOS transistor where only the drain can be exposed to high voltage. Such transistors can be used to switch the parallel capacitors to the LC tank thus changing the effective resonance capacitance. When the switch is off the drain of high voltage transistor is exposed to similar maximum voltage equal to peak-to-peak voltage on the LC tank, since the drain of high voltage transistor cannot swing below the low supply of the integrated circuit so the serial capacitor charges to a DC voltage equal to one half of peakto-peak amplitude. When the switch is on the resistance of the switch is not so critical since only a minor portion of the current in LC tank flows through it. The on resistance can be an order of magnitude higher compared to the serial trimming topology and thus much easier to implement. The simplified schematic of actual realization is presented in figure 4.



## Fig. 4: Implementation of trimming using high voltage NMOS transistors.

The automatic calibration loop was implemented using 4 bit counter which was controlled by the output of phase measurement system. When the automatic calibration is initiated the counter is preset to a mid value and the phase measurement is switched on. Following a delay required the by the settling behavior of the system the output is evaluated by a window comparator. The result is either resonant frequency low, resonant frequency high or resonant frequency inside the desired limits. If the resonant frequency is to low or two high the counter controlling the switches is increment of decremented depending on the phase measurement result. The measurement period is repeated and the counter value changed till the phase measurement yields a result inside desired limits.

#### 5. Conclusion

In practical application a +- 15% trimming range of capacitance resulting in +-7.5% resonant frequency trimming range already proves to be sufficient. It suffices to compensate the effects of mistuning due to temperature changes, aging and the proximity of conductive objects. The main advantage of system equipped with automatic antenna tuning is that for the lifetime of the product there is no need for service intervention which is often required in other systems due to mistuning of the antenna LC tank.

The main limitation of this principle is the maximum voltage the high voltage NMOS switches can handle. The maximum voltage allowed directly limits the transmit power. For this reason we used differential drivers and differential antenna configuration thus doubling the ration between the antenna peak-to-peak voltage and the maximum voltage NMOS switches are designed for. Using differential topology and antenna coil geometry which minimizes the inductance value the antenna trimming system can operate with 400mW reader output power. Applications with higher power are possible, but they need a capacitive diver for the signal connected to the trimming capacitor bank limiting the peak-to-peak voltage to 30V. The penalty paid is a reduced trimming range.

#### References

- /1/ S. Anatol, Reading device for a detection label, EP 0 625 832, Evropski patent, 1998.
- /2/ J. P. Baradin, Reader for a radio frequency identification system having automatic tuning capability, US 6,650,227, US patent, 2003.
- /3/ A. Pleteršek, A compensated band gap voltage reference with Sub-1-V supply voltage, *Analog integr. circuits signal process.*, vol. 44, pp. 5-15, 2005.
- /4/ A. Pleteršek, A. Vodopivec, Postopek za samodejno reguliranje amplitude vhodnih signalov, patent št. 22403, podeljen z odločbo z dne 5. 5. 2008 : št. prijave P-200600218, datum prijave 21. 9. 2006. Ljubljana: Urad Republike Slovenije za intelektualno lastnino, 2008.

Maja Atanasijević-Kunc University of Ljubljana, Faculty of Electrical Engineering, Slovenia

> Vinko Kunc IDS d.o.o, Ljubljana, Slovenia

Maksimilijan Štiglic Austiamikrosystems AG, Austria

Prispelo (Arrived): 17.03.2010

Sprejeto (Accepted): 09.09.2010

## OPTIMIZED SELECTION OF MATERIALS AND COMPONENTS FOR POWER MODULE REALIZATION

<sup>1</sup>Jurij Podržaj, <sup>2</sup>Janez Trontelj

### <sup>1</sup>IskraLAB raziskave in razvoj d.o.o., Šempeter pri Gorici, Slovenia <sup>2</sup>University of Ljubljana, Faculty of Electrical Engineering, Ljubljana, Slovenia

Key words: materials and components, power module, power module performance

Abstract: Most simple electrical traction systems consist of an electrical motor, a power module and an electrical energy source. The power module converts electrical energy supplied by the battery source into suitable power signals required for driving the electrical motor. The power module is usually realized with semiconductor power switches (e.g. MOSFET transistors) which are used for driving electrical loads of up to 10kW and more. To achieve power module electrical and mechanical requirements the materials and components of the power module must be carefully selected. The materials and components proposed for the power module have a direct effect on the overall system performance and on the price/performance ratio. In this paper we focus on the properties of materials and components and their optimal selection. With proposed materials, components, some modifications of technology processes of the power module construction were put in effect. With the proposed materials, components and a new power module design the price/performance ratio achieved was 1.6 times higher compared to the state-of-the-art power module realizations currently available in the market.

## Izbira optimalnih materialov in komponent za izvedbo močnostnega modula

Kjučne besede: materiali in komponente, močnostni modul, izkoristek močnostnega modula

Izvleček: Močnostni moduli služijo za pretvorbo električne energije akumulatorja v ustrezne električne signale za krmiljenje električnih motorjev vgrajenih v električnih pogonskih sistemih. Močnostna stikala modulov so najpogosteje realizirana z močnostnimi polprevodniškimi elementi (npr. MOSFET tranzistor), s katerimi je možno krmiliti električna bremena do 10kW in več. Za doseganje električnih in mehanskih lastnosti - zahtev močnostnih modulov je potrebno posebno pozornost posvetiti optimalni izbiri uporabljenih materialov in komponent. S prikazano realizacijo močnostnega modula ter s predlaganimi materiali in komponentami je dosežen cilj zvišanja faktorja cena/učinek. V tem članku je poudarek na izbiri materialov z optimalnimi lastnostmi in komponent za realizacijo močnostnega modula, prikazane pa so tudi potrebne izboljšave in prilagoditve tehnoloških procesov, ki so uporabljeni za samo realizacijo močnostnega modula. S predstavljenimi materiali, komponentami in novo zasnovo realizacije močnostnega modula, je doseženo razmerje cena/učinek za 1.6 večje kot pri obstoječih rešitvah na trgu.

#### 1. Introduction

The power modules built in electrical traction systems can be used for driving a diverse range of electrical motors or loads. For driving a three phase electrical motor the power module is realized as three branches in parallel connection where each individual branch is used for driving a single phase of the motor. Each branch is realized in the form of as two independently controlled switches connected in serial between battery terminals. The middle point of two switches is connected to the motor phase terminal. The switches are used to connect the motor phase either to the positive or the negative battery terminal. The power module switches are usually realized with the semiconductor power devices (e.g. MOSFET, IGBT) the switching action of which can be controlled with external logical circuit. Each switch state is controlled with external circuitry, usually controlled with an additional DSP. The DSP has a builtin algorithm for driving each individual switch in all three branches. The DSP algorithm is determined with the application of the electrical traction system.

Performance of the electrical traction system is influenced by each construction part of the system. In this article the focus will be on the power module's electrical and thermal performance. The power module is connected between the power supply battery and electrical motor as shown in figure 1.



#### Fig. 1: Electrical traction system

Performance of the power module is - beside properties of semiconductor switches - also affected by the selection of the optimal materials and components of the power module's construction and its design. To achieve high performance of the power module the construction materials and components must be carefully selected. Some aspects for future design approaches of the power module realizations are presented in /1/. The selection of the materials and components has to meet electrical, mechanical and thermal requirements of the power module. With the selection and use of the proposed materials and components in the power module, construction should also be subjected to achieving a higher price/performance ratio. To make the best use of the proposed materials and components the design and construction of the power module requires development of some new technological approaches and modifications of existing microelectronics and PCB manufacturing technologies.

# 2. Selection of materials and components

The design and material selection of the power module construction parts should be subjected to fulfilling the goal of the final realization achieving high performance at reasonable costs. The requirements can be divided into several categories. Each group of these requirements should be met and taken into account during the power module's design.

#### 2.1 Electrical requirements

Electrical requirements of the power module's design should pursue the goal of keeping the parasitic influences as low as possible. At this point the designer of the power module has to consider two major types of parasitic influences: parasitic contact resistances and parasitic inductivities. Both influences have a direct effect on the power module's performance. The source of the here mentioned parasitic influences can be found in immediate correlation with the materials used in the construction parts of the power module's realization. An important source of the parasitic influences is geometry of the construction parts and the power module's design, which will not be discussed in detail. More detailed power module design considerations and measurement results are also presented in certain previous works /2/, /5/, /6/ and /7/.

Designer's first concern is to minimize the parasitic contact resistances which contribute to the power module's efficiency and consequently also to its thermal performance. Special attention must be paid to optimizing the contact resistances between the power MOSFET transistor terminals and a busbar, and contacts between busbars of equal voltage potential /2/, /7/. The selection of the busbars material plays an essential role in providing low contact resistances between busbars. To ensure low contact resistances between busbars the design geometry of the busbars should be carefully carried out. The surfaces of the busbars where the electrical contacts are realized should be designed so that sufficient high mechanical pressure between two contact surfaces of connected busbars can be applied. Figure 2 provides an example of how two construction parts can be mechanically fastened. Ensuring contact with mechanical fastening of two construction parts has an important influence on the electrical and thermal performance of the power module.



## Fig. 2: Example of mechanical fastening of two construction parts

The second group of parasitic influences which directly affect the power module's performance are the parasitic inductances. The origin of the power module's parasitic inductances is related to the power module's construction and to materials used in the construction parts. Efficiency of the power module where parasitic inductances are present is affected when the switching transitions of high currents are performed. The equation (1) gives a rough estimate of how high the induced voltage can be when 400Amps are switched in 400nsec time, with parasitic inductance of 10nH.

$$u_L = L \frac{di_L}{dt} = 10 \cdot 10^{-9} H \frac{400A}{400 \cdot 10^{-9} s} = 10V$$
(1)

The inducted voltage of 10V could present serious influence on the semiconductor power MOSFET transistors and performance of other electronic components. When designing the power module the designer should consider EMI legislation for the specified power application. More detailed studies of the power module construction were presented in /3/, /4/, /7/.

#### 2.2 Thermal requirements

To ensure stable operation and adequate lifetime of the power module the heat dissipation generated within the power module must be kept to minimum and controlled so that it does not exceed the maximum allowable operating temperature. The power losses conducted within the power module generally manifest in higher working temperatures of the power module /6/. Consequently, with higher temperatures, the losses on the power MOSFET transistors and busbars increase which affects the power module's performance. To prevent the degradation of the power module's performance due to high temperature the power module should be designed in a way that the conducted thermal losses are efficiently transferred to the cooling body.

The power modules are usually mounted on the vehicle and therefore the modules are exposed to several ambient influences (e.g. temperature changes, vibration, corrosion, etc.) in their lifetime period. In case of improper selection of the materials and components the mechanical stress can lead to lower performance or, eventually, also to failure of the power module. Selection of the optimal materials and components should be subjected to satisfying the required properties of the power module and also to getting higher price/performance ratio.

#### 2.3 Mechanical requirements

Electrical traction systems installed in vehicles are usually exposed both to the ambient conditions and to mechanical stress. Beside temperature and humidity changes the power module also has to withstand stress caused by vibrations. In this article we will not pay detailed attention to mechanical/environmental requirements of the power module which are different in each specific application.

# 3. Properties of materials and components used in power modules

The power module switches are realized as in parallel connected semiconductor devices which are capable of controlled switching of high current densities. The performance of each switch depends on two main factors. The first factor refers to the quality of the electrical connections of the semiconductor power device while the second very important factor is the way in which conducted heat dissipation of the device is transferred to the attached heat sink. The most common approach of attaching the semiconductor power device is the attachment on the PCB. For lower power application commercially available standard PCB substrates are usually used (e.g. FR4). The middle power range applications require more sophisticated substrates with better thermal performance, such as IMS /8/, /9/. Advanced power application requires the use of high quality substrates which have excellent electrical and thermal properties. In such application often a DBC substrate (Direct Bonded Copper) is used, basically because of its high thermal conductivity /10/. On the figure 3 DBC and IMS substrate structure, together with their corresponding cross-sections, are presented.



Fig. 3: DBC and IMS substrate structure

The DBC material is a "sandwich" structure where a ceramic substrate is placed between two copper layers. The detailed structure of power module construction with DBC substrate is presented in /6/. Because of the DBC substrate's main advantages in comparison to other substrates, such as low thermal resistivity, low thermal expansion, high voltage isolation and high working temperatures, the DBC substrate is widely used in power applications.

The main drawbacks of using the DBC substrate in power module realizations are the limited ability of the mechanical fixture of the DBC substrate to the heat sink, relatively thin circuit layer, substrate processing and price. To ensure good thermal contact between the DBC substrate and the heat sink, sufficient mechanical surface pressure through the entire substrate area must be establish. In some cases such requirements cannot be easily achieved. The source of the DBC substrate's mechanical fixture limitations can be found in the mechanical properties of the ceramic layer (e.g.  $Al_2O_3$ ). In applications where high current densities are conducted the DBC substrate is limited with the circuit layer thickness which is manufactured only up to 300um.

The IMS substrate consists of three layers: conductive layer, insulation layer and base plate material. Top conductive layer is usually made of by a thin layer of copper which is used for electrical circuit realization. Also available on the market are IMS substrates with more than one circuit layer which enables the realization of multilayer and more complex PCB board. On the other side, thermal performance is affected because of one or more additional insulation layers. Between the conductive layer and the base plate an insulation layer can be found which is used for electrical insulation. The base plate is usually made of highly thermal conductive metal (e.g. copper, aluminium) and is used for two main aspects: first to provide good thermal contact with the heat sink and second provide solid physical support for the busbars or any other electronic-mechanical parts. In Table 1 properties of DBC and IMS substrate as well as standard PCB FR4 material are presented.

Table 1: Properties of DBC, IMS and FR4 substrate

| Properties           | DBC       | IMS       | FR4       |
|----------------------|-----------|-----------|-----------|
| electrical           | good      | excellent | excellent |
| thermal              | excellent | good      | poor      |
| mechanical – fixture | good      | excellent | good      |
| price                | high      | medium    | low       |

As Table 1 indicates the DBC substrate is the best substrate regarding thermal properties. But the bottleneck of the DBC substrate, compared to the IMS, can be found in its electrical and mechanical properties as well as in its price. The FR4 substrate possesses excellent electrical properties and comes at acceptable price but its thermal properties make it unsuitable for a high performance power application, where good thermal transfer to the heat sink is required.

#### 4. Power module construction

With the selection of proper materials and components and with the proposed power module construction /4/ the performance cost ratio is improved (compared to the previously proposed art solutions). In the previous art solutions the semiconductor power transistors are soldered or sintered directly to the conductive circuit layer of the used substrate. The main limitation of this solution is in the limited thickness of the circuit layer which is determined by the manufacturing process of the DBC substrate. Another limitation is the realization of the mechanical fixture options for assuring good thermal contact to the heat sink.

To achieve a higher capability of conducting higher current densities the conductive circuit layer has to be modified. This can be achieved with two different approaches. The first approach is to replace the conductive circuit layer material with a material which has a higher specific electrical conductivity. Our other option, beside the possibility of replacing circuit layer material (e.g. silver instead of copper), is to increase the circuit layer thickness. Currently available conducting layer thicknesses on the market range up to approx. 350um. The second approach for increasing electrical conductivity of the circuit layer is implementation of an additional layer of highly electrical conductive layer (such as busbars), which is placed on top of the circuit layer. Such solution is shown in figure 4 below.



Fig. 4: The propose structure with IMS and busbars

The material used for a busbar should have high electrical and high thermal conductivity. The most suitable material is copper. The main advantage of a custom busbar, compared to DBC substrate copper layer, is in its ability to achieve lower resistance. Consequently, a thicker conductive layer, realized by a busbar, contributes to lower parasitic resistance and lower parasitic inductances of the power module. The internal power module connections (connections of the power devices and connections to external terminals) also have an important impact on the overall performance of the power module. The use of the busbar also offers the possibility of direct connection of the power semiconductor device terminals to the power module output terminal (e.g. to a motor phase cable), which - in case of using only the circuit layer - is difficult to achieve. In the proposed realization of the power module, the metal busbars are placed on the highly thermal conductive insulated metal substrate (IMS). The electrical and thermal contact between busbars and IMS circuit layer is realized with a layer of solder which has excellent electrical and thermal properties.

## 5. Thermal power module performance

Selection on improper materials and components for the power module, besides the power module's construction, can also reflect on the power module's thermal performance. For the purpose of qualifying the optimal materials and components proposed for the power module, thermal measurements of the realized power modules were performed. For evaluating the power module's thermal behavior, a series of thermal tests on three different power modules was performed. Besides the unique construction of each power module, the main difference was also the number of implemented semiconductor power devices. In our case MOSFET transistors were used. Testing conditions for all three power modules were the same: duration 1 hour, AC motor phase current of 250Amps. In figure 5 temperature behavior for each individual DUT is shown.





The temperature behaviors shown in figure 5 indicate that after 1 hour of operation the power module with the lowest temperature was the one with the highest number of MOS-FETs. The power module proposed in this article is realized with 5x MOSFET transistors per individual switch. After 1 hour of testing the realized power module reached the highest temperature 94.6°C which is 6.8°C higher compared to the power module with 9x MOSFET and 3.5°C higher compared to the power module with 8x MOSFET. The performance ratio between the proposed power module and the power module with 9x MOSFET can be simply deducted and is approx. 1.6 times higher.

## 6. New construction approaches – assembly technologies

Besides the use of advanced material and components, construction of high performance power modules also requires employing new and advanced assembling technologies. Special attention was paid to electrical and thermal connection of the semiconductor power devices. From the electrical aspect of semiconductor power device connections special focus should be paid to the lengths of current paths and parasitic inductances' optimization. On the top, each semiconductor power device in die form is connected with bonding wires and on the bottom soldered directly to the busbar, as shown in figure 6 below.



Fig. 6: Power module cross-section

To achieve excellent thermal contact between the parts used of the power module, an approach of direct soldering of the semiconductor power dies to the busbar and the busbar to the IMS substrate is used. This approach ensures excellent electrical and thermal properties of the assembly /4/, /7/.

#### 7. Conclusion

The article focuses on the optimized selection of materials and components for the realization of a high performance power module. The use of proposed materials and components, together with the proposed power module structure, fulfills the goal of achieving the power module realization with a smaller number of semiconductor power devices. The presented electrical and thermal measurements indicate that, with the use of proposed materials, components and power module's design, a 1.6 times better performance was achieved, compared to similar power modules available on the market.

### 8. Acknowledgements

The author wishes to thanks IskraLAB d.o.o. and Iskra Avtoelektrika d.d. for their support and also to the laboratory personnel of the Laboratory for Microelectronics at the University of Ljubljana.

The author also wishes to express gratitude to the European Social Fund which partially finances this operation.

#### 9. References

- /1/ J.D.van Wyk, F.C. Lee, D. Boroyevich, L. Zhenxian, Y. Kaiwei: A future approach to integration in power electronics systems, IEEE Industrial Electronics Sociaty – IECON '03, vol. 1, p1008-1019, 2004;
- /2/ J. Podržaj, J. Trontelj: Design consideration for power modules of electro-motor drives; Inf. MIDEM, sep. 2007, Vol. 37, Nr. 3, p142-145;
- /3 / U. Nicolai, T. Reimann, J. Petzoldt, J. Lutz: Application Manual Power Modules, Verlag ISLE, Ilmenau, Nemčija, 2000;
- /4/ J. Trontelj, J. Podržaj: Power switching module: PCT/SI2008/ 000006, Application date: 25. 1. 2008. Ljubljana: Slovenian Intellectual Property Office, Patent office, 2008
- J. Podržaj, G. Babič, J. Trontelj: Dynamic switching behaviour of power drive module for 3 phase electrical motor; 44th International Conference on Microelectronics, Devices and Materials and the Workshop on Advanced Plasma Technologies, Sept 17.
   19. 2008; Proceedings, Ljubljana: MIDEM - Society for Microelectronics, Electronic Components and Materials, 2008, p271-274;
- /6/ J. Podržaj, J. Trontelj: Power module packaging for thermal and electrical performance optimization; 45th International Conference on Microelectronics, Devices and Materials and the Workshop on Advanced Photovoltaic Devices and Technologies, Sep. 9 - 11, 2009; Proceedings, Ljubljana: MIDEM - Society for Microelectronics, Electronic Components and Materials, 2009, p147-151;
- J. Podržaj: Zasnova integriranega krmilnika in močnostne stopnje trifaznega elektromotorja z baterijskim napajanjem = Design of integrated driver and power module for ACA motor with battery supply: Thesis. Ljubljana, 2009;
- /8/ M. Correven, J. Nagashima, R. Apter: Power modules with IMS substrates for automotive applications, IEEE Vehicular Technology Conference 2002, vol. 4, p2056-2062, 2002;
- /9/ X. Jorda, X. Perpina, M. Vellvehi, J. Millan, A. Ferriz: Thermal characterization of insulated Metal Substrates with a power test chip, Power Semiconductor Devices & Ic's 2009 – ISPSD2009, p172-175, 2009;
- /10/ J. Schulz-Harder:Advanced DBC (direct bonded copper) substrates for high power and high voltage electronics, IEEE Semiconductor Thermal Measurement and Management Symposium, p230-231, 2006;

Dr. Jurij Podržaj IskraLAB raziskave in razvoj d.o.o. Polje 15, SI-5290 Šempeter pri Gorici, Slovenia Phone: + 386 (0) 1 4768 340 Email: jure.podrzaj@iskralab.si

> Prof. Dr. Janez Trontelj University of Ljubljana, Faculty of Electrical Engineering Tržaška c. 25, SI-1000 Ljubljana, Slovenia Phone: + 386 (0) 1 4768 335 Email: janez.trontelj1@guest.arnes.si

## DESIGN OF PRECISE AND LONG-TERM ACCURATE TEMPERATURE REGULATION USING FEATURES OF A LOW-POWER MICROCONTROLLER

Marjan Jenko

### Laboratory for Digital Systems and Electrical Engineering, Faculty of Mechanical Engineering, University of Ljubljana, Ljubljana, Slovenia

Key words: ageing effects, component-built software, fuzzy logic, long-term measurement accuracy, low-power microcontroller, pasteurized soft-boiled eggs, precise temperature measurement, precise temperature regulation, ratiometric measurement

**Abstract:** Low-power microcontrollers were primarily developed for applications with weak energy sources such as batteries and solar cells. The contribution of this paper is the design of precise and long-term accurate temperature regulation with the help of ratiometric temperature measurement, fuzzy logic and features of a low-power microcontroller. Low-power mode is used for measurement to minimize noise. Based on fuzzy logic, a code-optimized software component for precise temperature regulation was developed. The resulting regulated temperature is within +-0.25 °C of the required value at the reference spot of a water-filled volume of ten liters. Temperature regulation is designed to remain accurate for a period of fifteen years, which is the lifetime of the apparatus in the case study.

The two components – for precise and long-term accurate temperature measurement, and for precise temperature regulation – are self-contained. As such, they can be used in any application with stringent requirements for precision and long-term accuracy.

## Zasnova precizne in dolgotrajno točne temperaturne regulacije z uporabo lastnosti mikrokontrolerja za majhno porabo moči

Kjučne besede: staranje vezij, komponentna gradnja programske opreme, mehka logika, trajna točnost meritve, mikrokontroler za majhno porabo energije, pasterizirana mehko kuhana jajca, precizna meritev temperature, precizna regulacija temperature, razmerna meritev

**Izvleček:** Mikrokontrolerje, narejene za izrazito majhno porabo energije, običajno uporabljamo v napravah z baterijskim ali šibkim napajanjem (sončna energija, gibanje ipd.) V prispevku je glavno opravilo mikrokontrolerja z nizko porabo energije precizno krmiljenje termičnega procesa. Stanje nizke porabe z izklopom večjega dela mikrokontrolerja uporabljamo za maksimiziranje razmerja signal/šum pri izvajanju precizne temperaturne meritve. Za samo temperaturno regulacijo pa je na osnovi mehke logike razvit kodno optimiran algoritem preciznega krmiljenja temperature. Na referenčnem mestu je dosežena regulacija +-0.25 °C za volumen desetih litrov. Zahtevano delovanje prikazane naprave je petnajst let, brez vmesnih kalibracij.

Razvita elektronska sestavna dela: precizna in dolgotrajno točna temperaturna meritev, in precizen temperaturni regulator sta samostojni komponenti, uporabni tudi za krmiljenje drugih temperaturnih procesov z zahtevami po preciznosti in dolgotrajni točnosti.

#### 1 Introduction

The foundation for the contribution of this paper is the development of an autonomous thermal process for industrial production of soft-boiled eggs, which is a novelty on the market. Requirements in industrial food preparation are a superset of domestic requirements. Besides taste and appearance, the food needs to systematically adhere to microbiological constraints – it must be pasteurized or potential bacteria need to be destroyed by some other means. The industrial food preparation process must be time-invariant, and traceability needs to be built in according to HACCP (Hazard Analysis Critical Control Points) directives.

Pasteurization and cooking of soft-boiled eggs have contradictory requirements. For the former, salmonellae, if present in the center of the egg's mass, need to be definitively destroyed. For the latter, the yolk is to remain soft, i.e. coagulation must not take place. Experiments, along with thermal simulations and microbiological tests, led to a patented thermal profile /1, 2/, where both pasteurization and cooking of soft-boiled eggs in fact coexist. The profile is shown in Figure 1.



#### Fig. 1: Required temperature profile

The required precision of temperature regulation at a reference spot is to be within +- 0.25 °C for the 15-year lifetime of the apparatus. When regulating within +-0.25 °C, measurement imprecision must be at most within +-0.1 °C. It is commonly understood that the professional kitchen appliances market does not tolerate much servicing during the lifetime of a product. Calibration of the temperature measurement circuit every some years of operation is thus out of the question.

The problem of implementation of the required temperature function divides into precise measurement and precise regulation. Both undertakings are developed as two self-sustained components. As such, they are ready for integration into other applications governing functionalities of other embedded systems with similar functional requirements.

#### 2 Technical constraints



#### Fig. 2: Scheme of internals of the apparatus

The scheme of the internals of the egg cooker is shown in Figure 2. It is intentionally designed as the simplest solution for the required functionality: a heater heats the water and the eggs immersed in the inner tank; a hot water pump circulates the water to minimize the temperature gradient. A cold water tank encircles the hot water tank. The temperature of the hot water is regulated by two mechanisms, one being heating with an electric heater and the other cooling by mixing hot and cold water. An electromagnetic three-way valve controls water circulation or mixing when needed for cooling the hot water. The cold water is cooled down in a radiator with the help of a cold-water pump and a fan. Volumes of the cold water tank and radiator, and the fan power are designed for the cooling needs of consecutive iterations of the thermal process in Figure 1. The precision of the required temperature regulation depends on temperature measurement at the reference spot and on the temperature regulator, which supplies power for heating and cooling.

## 2.1. Component built software for the required functionality

The primary functionality of the embedded system is temperature regulation and measurement. Other tasks are interaction with the user, governing proper water levels in both tanks, cooling the water in the outer tank, pumping water out of the system when cleaning the apparatus and communication with the diagnostic program on a personal computer – for assembly tests, initial calibration and field diagnostics. Embedded software is structured into functional components by the concept developed in /3/. The structure of the component-built embedded software is shown in Figure 3.



## Fig. 3: Structuring the application into functional components

The functional components are self-sufficient and reusable. As such, they can be individually tested and verified.

The component approach to embedded software design minimizes, to the greatest extent possible, the probability of introducing functional errors into the application.

#### 3 Component for precise and longterm accurate temperature measurement

Measured analog variables, such as temperature, are usually converted to voltage, which is sampled by an analogto-digital converter. Long-term measurement accuracy can be achieved by periodic calibration or by systematic elimination of those components that are most sensitive to ageinduced drift, or by both – design and periodic calibration. Implementations of ratiometric measurement have the potential to be more robust to time-induced drifts than implementations of absolute measurements /4/. The required precision for temperature measurement is within +-0.1 °C for the lifetime of the apparatus, which is declared as fifteen years.

Implementation of the scheme of ratiometric temperature measurement, by /4/, is shown in Figure 4.



Fig. 4: Scheme for a ratiometric temperature measurement, by /4/

R(T) in Figure 4 is a platinum temperature sensor, in which resistance is a function of temperature. What results is measurement of the resistance. This is measured by comparing capacitor discharge times through a reference resistor and through a temperature-dependent resistor, by (1)/4/.

$$R(T) = \frac{t_{R(T)} - t_{Rref1}}{t_{Rref2} - t_{Rref1}} (R_{ref2} - R_{ref1}) + R_{ref1}$$
(1)

where  $t_{Rx}$  in (1) is the time to discharge the capacitor C1 from  $V_{co}$  to  $V_c$  via the resistor Rx. Temperature measurement by equation (1) is not sensitive to values of voltages  $V_{co}$ ,  $V_c$ , capacitor C, offset voltage of comparator C1 and resistances of analog switches S1 to S4 in Figure 4.

R1 is used to charge capacitor C1 to  $V_{co}$ . The capacitor is discharged to  $V_c$  via Rref1, (T) and RRef2.

## 4 Component for precise temperature regulation

First, on/off regulation was evaluated, not as a design option but rather to gain insight into temperature delays. The existence of time delays is a significant problem for control of any process. In this application, time delays between heating and cooling differ significantly, since different amounts of power and different mechanisms are used for heating and cooling. As expected for on/off temperature regulation, large temperature swings (up to 6 °C) occurred, and thus simple on/off regulation cannot meet the requirements.

PID regulation gives excellent results for control of systems with symmetrical response (heating, cooling), however the egg cooker has unsymmetrical responses. The physical nature of the mechanisms for heating and cooling is different. The course of cooling is dependent on the temperature difference between the hot and cold water, which is changing in the process. The available quantity of power, when heating up and when cooling down, and the time delays of both processes are different. Our understanding is that thorough mathematical modeling of this cooking process is a complex activity.

After some days of manually experimenting with controlling the thermal process, we were able to describe how to successfully control it. We were generally too slow in reacting to temperature readouts in the constant temperature region, where precision is needed. The results of manually experimenting with regulation gave us the confidence to proceed with fuzzy logic.



*Fig. 5: Membership functions for inputs and outputs of the temperature regulator* 

Fuzzy-type regulation has previously been used in food processing operations less sensitive to precision: peanut roasting /5/, microfiltration /6/, frying /7/ and baking /8/. Fuzzy controllers are conceptually simple. They consist of a) an input stage, b) a processing stage and c) an output stage. They mimic the human approach to regulation in a sense: if the process variable is somewhat off the value, some small corrective action must be taken. For regulation effectiveness, many rules are considered at once, and the terms "somewhat", "small" and "similar" are defined in an exact manner, i.e. in mathematical language. This is done by upgrading variables to probability functions, named membership functions. Figure 5 shows the relevant input and output membership functions for the temperature regulation.

The input stage of the fuzzy controller maps measured temperature to the input membership functions of the Temperature Error (TE, difference between measured and required temperature) and of the Temperature Error Gradient (TEG). The advantage of having more membership functions that overlap at many input values (TE and TEG) is that more membership functions can influence the regulation at the same time. This gives more experimental space when developing and fine-tuning the temperature control.

The processing stage is defined by the behavioral rules in Table 1.

The output stage converts the combined result into an output value of specific process control. Figure 6 shows the required temperature function /1, 2/ and results of fuzzy temperature control after fine-tuning iterations.

| Table 1: Behavioral rules of the | processing stage |
|----------------------------------|------------------|
|----------------------------------|------------------|

| Rule<br>number | Temperature | Temperature derivative | Heater<br>power | Valve<br>opening |
|----------------|-------------|------------------------|-----------------|------------------|
| 1              | very low    |                        | full            | closed           |
| 2              | low         |                        | modest          | closed           |
| 3              | normal low  | very low               | full            | closed           |
| 4              | normal low  | low                    | modest          | closed           |
| 5              | normal low  | neutral                | low             | closed           |
| 6              | normal low  | high                   | minimum         | closed           |
| 7              | normal low  | very high              | zero            | closed           |
| 8              | normal high | very low               | low             | closed           |
| 9              | normal high | low                    | minimum         | closed           |
| 10             | normal high | neutral                | zero            | closed           |
| 11             | normal high | high                   | zero            | closed           |
| 12             | normal high | vory high              | 7070            | partially        |
|                | normai nign | very mgn               | Zero            | open             |
| 13             | high        |                        | 7950            | partially        |
|                | mgn         |                        | Zero            | open             |
| 14             | very high   |                        | zero            | open             |



Fig. 6: Required and achieved curves of temperature versus time

### 5 Implementation of precise and long-term accurate temperature measurement, fuzzy-type precise temperature regulation and other required functionality

In developing a control algorithm, it is crucial that the finished algorithm meets the requirements and that it be developed in a reasonable time. The algorithm has to be developed on a powerful workstation supplemented by accurate sensors and controls for actuators. The necessity of a well-loaded workstation for development is advocated for two reasons: one is the richness of available Rapid Application Development (RAD) and mathematical tools with excellent Graphical User Interfaces (GUIs) that are available for use on a workstation. The other reason is that the target platform (microcontroller and peripherals) is not available at the outset of the project. In the project, a combination of MATLAB, Simulink, LabVIEW and measurement and actuating hardware was used to develop temperature regulation based on fuzzy logic. The schematic of the setup is shown in Figure 7.



Fig. 7: Schematic of the setup for developing the algorithm for precise temperature regulation

As the algorithm is developed and requirements for other, simpler functional components become known, one chooses the microcontroller and designs the peripherals. Software development systems for microcontrollers have less functionality then RAD tools for developing programs for workstations. This is the reason to do as much development work as possible on a workstation, and not on the target platform. The microcontroller of choice must have the needed periphery integrated – for improved reliability, lower assembly cost and smaller size.

## 5.1. Implementation of temperature measurement

 $V_c$  at the comparator, Figure 4, is to be set within the steep part of the capacitor discharge curve to result in sharp transitions on the comparator's output. Voltages  $V_{co}$ ,  $V_c$  and comparator power are to be filtered by block capacitors to keep the signal-to-noise ratio of the measurement high.

Most of required circuitry and components should be integrated to minimize assembly and to improve reliability. Instrumentation microcontrollers are built for such a purpose. Instrumentation implies a low-power design of the microcontroller for two reasons: one is the potential for field measurement where an electrical grid is not always available, and the other is the capability of switching off some or most functional blocks when performing measurements, i.e. when microcontroller noise must be minimized in order to maximize measurement accuracy.

Switching off the different functional blocks of a microcontroller is performed via different sleep modes. Returning to higher performance is achieved by different triggers, in our case by an interrupt from the comparator, as the capacitor discharges until  $V_c$ .

Block scheme of the selected low-power microcontroller is in Figure 8.



Legende: \* Bus convertor 16/8, \*\* Comparator, \*\*\* USART

#### Fig. 8: Block scheme of the selected low-power microcontroller

When measuring, only the comparator and the timer 1, set as a free-running counter, are active. As the comparator changes output, the free-running counter is stopped. The resulting number times the period of the counter clock equals the capacitor discharge time. Figure 9 is illustrative of the measurement inaccuracy, by measuring in the lowpower state of the microprocessor, by equation (1) and by the circuit in Figure 4.



Fig. 9: Inaccuracy of temperature measurement

Each measurement takes about twenty-five milliseconds. Measurements can be smoothed out, e.g. by calculating a running average or average only when memory locations are scarce. It is only that the measurement responsiveness needs to stay high enough to fulfill the required dynamics of the algorithm for temperature control.

## 5.2. Implementation of the fuzzy logic temperature regulator

A low-power microcontroller was selected because of temperature measurement constraints. Development tools for such a microcontroller have fewer capabilities and less perfect user interaction than RAD tools, the implication being that the algorithm needs to be simple and well thought out in advance, before coding. Such an approach minimizes the need for debugging and fixing. The latter has devastating effects on maintenance costs, sales and reputation if performed after product release.

The bases for implementation of fuzzy control of the thermal process are the membership functions in Figure 5 and behavioral rules in Table 1. The procedure, which results in two real numbers between zero and one that define power to be applied for heating and cooling, is the following:

Each point in the plane defined by temperature error TE and temperature error gradient TEG is checked for applicability of each input membership function, according to Figure 5. For each point in the plane (TE, TEG), i.e. for each applicable combination of membership functions in m(TE) and m(TEG), appropriate output membership functions in m(heater H) and m(valve V) (Figure 5) are selected by the rules in Table 1. Probabilities of the selected output functions in m(H) and m(V) are calculated from the probabilities of the corresponding input functions in m(TE) and m(TEG).

Since there can be more applicable combinations of input functions in m(TE) and m(TEG) for each point in the plane (TE, TEG), there can be more relevant rules in Table 1, which select different output functions in m(H) and m(V) for the particular point in the plane (TE, TEG). In the particular case being elaborated, it so happens that five is the maximum number of rules that contribute to the final solution at some points in the (TE, TEG) plane. At other points, fewer rules, but not less than one, select the output membership functions in m(H) and m(V).

The weighted average of the selected output membership functions in m(H) and m(V) defines heating and cooling power for each point in the plane (TE, TEG). The results of weighted averaging are two surfaces, both defined over the plane (TE, TEG). The surface that represents heating power is shown in Figure 10, while the surface that represents the three-way valve opening for cold water is shown in Figure 11.



Fig. 10: Surface of heating power



## Fig. 11: Surface of cold water opening at the three-way valve

The corresponding programming, from the plane (TE, TEG) to the surfaces in Figures 10 and 11, using the membership functions in Figure 5 and rules in Table 1, written in the RAD environment for development in C++, is available for download and examination from http://www2.arnes.si/~mjenko9/EggCooker.

The two surfaces that define heating and cooling power for precise temperature regulation are simple to fill in memory locations or to be described by constants and linear functions in simple *if ... then* or *switch ... case* statements, which is the case in the implementation of the egg cooker. This is an important implementation simplification. All of the manipulation with the probability functions is coded and executed only once on a well-equipped workstation in a suite of modern RAD tools that permits ease of debugging and visualization. The only result of this, which is easily reproduced with simple C decision statements and realtype variables, is actually coded into the target system.

#### 5.3. Implementation of other required functionality into the embedded system

Implementation of other required functionalities is structured into other functional components. These encapsulate finite state machines that define their functionality. For example, control of blinking LEDs and periodic buzzing (warning signals) is encapsulated in hardware-dependent components.

USART, in Figure 8, is governed by a hardware-dependent component for serial communication, which is needed for computer-supported tests and calibration in production and for field servicing. The twelve-bit analog-to-digital converter, which is an integral component of the microcontroller, is not used. For about ninety percent of the operating time, the microcontroller is in sleep mode, during which temperature measurement takes place. All other procedures, including temperature regulation, are performed in the time between measurements of capacitor discharge time. Figure 12 illustrates the timing of microcontroller operation. The signal in Figure 12 is voltage on the capacitor C1 in Figure 4, i.e. voltage on the comparator's input.



Vertical scale: 2V/div, Horizontal scale: 5ms/div White background: regulation and other functionality, active mode Gray background: measurement, sleep mode

#### Fig. 12: Timing of microcontroller operation

Since the implementation of temperature regulation is optimized on the system level, the microcontroller's active time needed for one iteration of all required procedures never exceeds ten percent of the time needed for one temperature measurement. As such, temperature measurement is a continuous and consistent process.

#### 6 Case study

Figure 13 shows the apparatus for industrial production of pasteurized soft-boiled eggs, in which precise temperature regulation is needed. Figure 14 shows the custom electronics board.

Two details are extremely important in the design of the electronics board in Figure 14: in the low-power circuitry, ground and power planes are as complete as possible, and the microcontroller's IOs are equipped with filters to prevent potential functional glitches that could result from voltage transients.



Fig. 13: Apparatus



Fig. 14: Electronics board in the apparatus

On the right in Figure 14 is the temperature measurement circuit, consisting of two precision resistors, capacitor and integrated circuit with analog switches. Circuitry with grid voltage, consisting mostly of thyristor switches, corresponding drivers and protection from high voltage surges is on the left. The board is enclosed in a metal box for shielding and protection from potential water spills.

#### 7 Discussion

One would not choose a low-power microcontroller to control heating with approximately three kilowatts of available electric power at first thought. After some elaboration, however, one finds that the low-power low-noise modes and rich set of built-in peripherals define a low-power microcontroller as a useful platform for precise temperature measurement and control.

All the electronics circuitry, for the grid and for low voltage, is on the same Printed Circuit Board (PCB) in the case study for reasons of engineering optimization. Less wiring results in higher reliability. Such an approach requires careful design of ground and power planes to minimize inductance /9/. Effects of a less-than-perfect layout would be stochastic errors in the operation of the microcontroller induced by voltage transients. In the hypothetical case of somewhat sloppy coding, it would be close to impossible to know which functionality error would have an electrical and which would have a programming cause.

A professional approach to software design, to layout, to filtering potential causes of noise, and having a low-power microcontroller and grid components on the same PCB results in a robust system that passes the EMC test of superimposed four-kilovolt bursts to the grid voltage with excellent results.

Low-power electronic components are being designed and used primarily to lengthen operation time between consecutive battery chargings. Improvements in low-power consumption are achieved by developments in two directions: one, the ongoing shrinking of circuit dimensions and the other in circuit design. The most common approaches are throttling the system clock, lowering supply voltage when the processing load is low and turning off different functional blocks when they would otherwise be idling. In the present application, low-power mode is used to minimize measurement noise. As a result, the precision of the temperature readout is improved.

In the design of the embedded software, a component approach was used. The application was first structured into functional components. As self-sufficient entities, these were individually designed, coded and verified. Then, the application was built from already verified functional components. Such an approach results in verifications on the component level and on the application level. This approach on the system level eliminates the need for software fixing.

When developing complex components such as the present precise temperature regulator, it is most beneficial to use a rich and productive environment of RAD and visualization tools to the greatest extent. In the case study, the environment of MATLAB, Simulink, LabVIEW and hard-

ware extensions for measurement and activation were used to define membership functions and rules. Algorithm design, refinement and verification were performed using this rich set of tools. The RAD suite was used to perform the coding related to probability functions and fuzzy rules. This was performed in the C++ language with the most convenient means for encapsulation and overall structuring. Only the most essential code – definition of section-wise flat surfaces – is implemented in the target system.

The precondition for precise temperature regulation is precise temperature measurement. Different approaches were studied. The presented ratiometric measurement, using the low-power mode of the microcontroller, yields precision and long-term accuracy. In an environment with more noise one could filter the measurement results, however filtering slows down the responsiveness of the measurement, which has a negative influence on precise temperature regulation.

#### 8 Conclusion

A component approach to the development of an embedded system is presented. The design requirement is regulation precision of +-0.25 °C in the constant regions of the required temperature profile. As a precondition, temperature must be measured with a precision of +-0.1 °C for the lifetime of the product, which implies the need for longterm accuracy.

The contribution of this paper is the development of two self-sufficient functional components: one performs precise and long-term accurate temperature measurement; the other is a precise temperature regulator.

Precise and long-term accurate temperature measurement is performed by a ratiometric method and with utilization of one of the microcontroller's low-power modes. Since continuous measurement is needed and the system needs to be responsive at the same time, coding is optimized for efficiency and, because of hardware constraints, for a small footprint.

The development of a precise fuzzy-type temperature regulator is structured into a) iterative work on the definition of membership functions and rules in the environment of MATLAB, Simulink, LabVIEW and dedicated hardware; b) coding in C++ on a workstation that starts with membership functions and rules, and results in surfaces that govern heating and cooling controls; and c) the use of surfaces for real-time regulation in the target system, which is programmed in ANSI C.

Both components, for precise and long-term accurate temperature measurement, and for precise temperature regulation are self-sufficient. As such, they can be reconfigured and reused in other applications with similarly stringent requirements for precision of measurement and regulation, and long-term measurement accuracy.

The overall embedded system application was first structured into functional components. These were designed, coded and verified as self-sufficient entities. Then, the application was built from the components. This approach has proved to generate reliable embedded software products /3/.

The case study involved the development of a new type of industrial kitchen appliance that simultaneously pasteurizes and cooks soft-boiled eggs. This is a novelty on the industrial food preparation appliance market.

#### 9 References

- /1/ Otto Koch. Vorrichtung zur Zubereitung von Huhnereieren. Europaische Patentschrift EP 0 920 273 B1, July 1997.
- /2/ Otto Koch. Method and device for preparing chicken eggs. US Patent, No.: 6,162,478, December 2000
- /3/ M. Jenko, N. Medjeral, P. Butala, Component-based software as a framework for concurrent design of programs and platforms - an industrial kitchen appliance embedded system. Microprocessors and Microsystems, 2001, vol. 25, no. 6, pages 287-296.
- /4/ M. Jenko, Ratiometric measurement for long-term precision, reasoning and case study, submitted to Informacije MIDEM, october 2009
- /5/ V. J. Davidson, R. B. Brown, and J. J. Landman. Fuzzy control system for peanut roasting. Journal of Food Engineering, 41(3– 4):141–146, August 1999.
- /6/ N. Perrot, L.M 'e, G. Trystram, J.-M. Trichard, and M. Decloux. Optimal control of the microfiltration of sugar product using a controller combining fuzzy and genetic approaches. Fuzzy Sets and Systems, 94(3):309–322, March 1998.
- /7/ Ryszard Rywotycki. Food frying process control system. Journal of Food Engineering, 59(4):339–342, October 2003.
- /8/ N. Perrot, G. Trystram, D. Le Guennec, and F. Guely. Sensor Fusion for Real Time Quality Evaluation of Biscuit during Baking. Comparison between Bayesian and Fuzzy Approaches. Journal of Food Engineering, 29(3–4):301–305, August–September 1996.
- /9/ H. Ott, Noise reduction techniques in electronic systems, 2nd ed., Wiley, 1988, ISBN 0-471-85068-3, pages 274-298.

Marjan Jenko Laboratory for Digital Systems and Electrical Engineering, Faculty of Mechanical Engineering, University of Ljubljana, 1000 Ljubljana, SI

Prispelo (Arrived): 02.11.2009 Sprejeto (Accepted): 09.09.2010

## FPGA-BASED HARDWARE REALIZATION FOR 4G MIMO WIRELESS SYSTEMS

<sup>1</sup>Mostafa Wasiuddin Numan, <sup>2</sup>Mohammad Tariqul Islam, <sup>3,4</sup>Norbahiah Misran

<sup>1</sup>Dept. of Electrical, Electronic and Systems Engineering, Universiti Kebangsaan Malaysia, Selangor, Malaysia <sup>2</sup>Institute of Space Science (ANGKASA), Universiti Kebangsaan Malaysia, Selangor, Malaysia <sup>3</sup>Dept. of Electrical, Electronic and Systems Engineering <sup>4</sup>Institute of Space Science (ANGKASA), Universiti Kebangsaan Malaysia, Selangor, Malaysia

Key words: MIMO; Alamouti; FPGA; testbed

**Abstract:** Emerging multiple-input multiple-output (MIMO) systems are called to play a key role in fourth generation (4G) wireless systems in order to achieve higher data rate and advanced spectral efficiency. Even with extensive research on the design of transmission and reception algorithms, little is known about the complexity of hardware implementation. The MIMO encoder design and implementation is straight forward, however, the decoder implementation is little more complex as it requires resource utilization. This paper presents an efficient hardware realization of MIMO systems that utilizes the resources of the device by adopting the technique of parallelism. The hardware is designed and implemented on a Xilinx Virtex<sup>TM</sup>-4 XC4VLX60 Field Programmable Gate Arrays (FPGA) device. In this paper, a comprehensive explanation of the complete design process is provided, including an illustration of the tools used in its development. The results are obtained for  $2 \times 2$  MIMO system for coding and decoding at the transmitter and the receiver. The system is developed based on modular design which simplifies system design, eases hardware update and facilitates testing the various modules in an independent manner.

### Izvedba 4G MIMO brezžičnega sistema na osnovi FPGA vezij

Kjučne besede: MIMO, Alamouti, FPGA, testiranje

**Izvleček:** Več-vhodni in **več**-izhodni (MIMO) sistemi bodo igrali pomembno vlogo v 4. generaciji (4G)brezžičnih sistemov za prenos podatkov z visoko hitrostjo. Čeprav se že izvajajo obsežne raziskave na področju načrtovanja oddajnih in sprejemnih algoritmov, se le malo ve o kompleksnosti izvedbe strojne opreme. Zasnova in izvedba MIMO kodirnika je enostavna, a je implementacija dekodirnika bolj zapletena. V članku je predstavljena učinkovita izvedba strojne opreme MIMO sistema z vzopredenjem virov naprave. Strojna oprema je zasnovana in izvedena na Xilinx Virtex™-4 XC4VLX60 FPGA vezju. Prav tako je v članku predstavljen proces načrtovanja vezja z ilustracijo pripomočkov, ki so bili uporabljeni pri razvoju. V prispevku so prikazani rezultati za 2x2 MIMO sistem za kodiranje in dekodiranje na oddajni in sprejemni strani. Sistem je zasnovan na osnovi modularnega načrtovanja, ki poenostavi načrtovanje celega sistema, olajša posodobitev strojne opreme in olajša testiranje posameznih modulov.

#### 1. Introduction

The use of multiple antennas, usually referred as multipleinput multiple-output (MIMO) systems, has gained overwhelming interest during the last decade-both in academia and industry. MIMO systems have evolved rapidly as a generic technology that promises to be a strong contender for 4G wireless systems to accomplish multiplexing gain, diversity gain, or antenna gain, thus enhancing the bit rate, the error performance, or the signal-to-noise-plus-interference ratio of wireless systems, respectively without increasing total transmission power or bandwidth /1, 2/. Recent progress in MIMO standardization and prototyping has forced manufacturers worldwide to pay more attention to implementation aspects. Theoretical performance analysis and simulations of a system confirm predictions under idealistic conditions, but to validate the performance in a practical environment, a testbed is essential as many imperfections of the real world are neglected in simulations /3/. Testbeds have traditionally been implemented on general-purpose and sequential Digital Signal Processors (DSP) or on Application Specific Integrated Circuits (ASIC). Enhanced algorithms, which are generally highly parallelizable, and higher data transmission rates can burden DSP beyond its capacity for real-time processing /4/. Although ASIC is fast and power-efficient, its implemented designs are inflexible /5/ and productions are time-consuming and extremely expensive. To overcome the drawbacks of DSP and ASIC, one hardware platform that has become very popular for design, prototyping and validation of such digital signal processing algorithms is Field Programmable Gate Arrays (FPGA). Unlike ASIC, FPGA is reconfigurable, that is, their internal structure is only partially fixed at fabrication, leaving the wiring of the internal logic to the application designer for the intended task. FPGA allows control over parallelism in resource utilization and also the measurement of resource utilization and power consumption.

The design and implementation of MIMO testbeds have become more and more attractive to researchers as has been observed in the past few years /3, 6, 7, 8/. Rao et al. put forward a classification scheme for different types of testbeds /9/. The simplest approach they recognized is targeted towards burst mode transmissions, and offline signal processing. This design minimizes the cost; however it severely limits the scenarios in which the testbed can be used, because the signal processing is not done in realtime. Real-time FPGA design and implementation of MIMO testbed has received a significant attention in recent years. Different practical implementation aspects of real-time MIMO testbed are presented in /10/ and /11/. A realtime MIMO processing platform is developed by Dowle et al. which can be used to investigate different space-time algorithms /4/. The hardware implementation of a low complexity decision feedback equalization detection method for MIMO systems is described by Yu et al. /12/. In /13/ an FPGA based hardware module is designed for MIMO decoding that is embedded in a prototype of a 4G mobile receiver. These systems are designed based on sequential processing and hence resource is not properly utilized. The main scope of this paper is to present the design and implementation of an FPGA based 2×2 MIMO testbed that not only provides a faster and real-time solution but also the hardware is similar to the final deployment environment. The rest of the paper is organized as follows; Section 2 introduces the 2×2 MIMO system model. This is followed by a hardware design description of the MIMO testbed in Section 3. Section 4 discusses hardware implementation of the testbed. Finally the paper is concluded in Section 5.

#### 2. MIMO System Model

The goal of a good wireless communication system is to provide a reliable link between the transmitter and the receiver. Since a wireless link is affected by multipath fading, scattering and shadowing /15/, a severely attenuated and distorted transmitted signal may arrive at a receiver. Recent research on wireless systems shows that MIMO is effective to reduce the fading effect in the wireless channel by providing diversity /16/ and this improves BER performance in receiver.

Alamouti /17/ presented a remarkable spatial and time diversity scheme for MIMO transmission that improves quality of the received signal by using simple processing scheme at the transmitter and linear decoding at the receiver. In a classical one-transmitter system, symbols S<sub>0</sub>, S<sub>1</sub>, S<sub>2</sub> ... are transmitted at symbol periods *t*, *t+T*, *t+2T*, ... respectively. In a two transmitter Alamouti scheme, however, the symbols S<sub>0</sub> and S<sub>1</sub> are transmitted simultaneously from two transmit antennas Tx<sub>1</sub> and Tx<sub>2</sub> respectively, at symbol period *t*. At the next symbol period *t+T*, Tx<sub>1</sub> transmits symbol  $-S_1^*$  and Tx<sub>2</sub> transmits symbol S<sup>\*</sup><sub>0</sub>, where \* represents the complex conjugate. Table 1 presents an example of encoding and transmission sequence for four symbols and two transmit antenna scheme.

|        | Time Intervals        |              |                      |                      |
|--------|-----------------------|--------------|----------------------|----------------------|
|        | t                     | t+T          | <i>t</i> +2 <i>T</i> | <i>t</i> +3 <i>T</i> |
| $Tx_1$ | <i>s</i> <sub>0</sub> | $-s_{1}^{*}$ | S <sub>2</sub>       | $-s_{3}^{*}$         |
| $Tx_2$ | <i>S</i> <sub>1</sub> | $s_0^*$      | S <sub>3</sub>       | $S_2^*$              |



#### Fig. 1: Block diagram of MIMO system

The channels between the transmit and receive antennas are  $h_{11}$ ,  $h_{12}$ ,  $h_{21}$  and  $h_{22}$  respectively as mentioned in Fig. 1. A low complexity channel estimator can be used to successfully approximate the channel at the receiver, as described in /14/. The received signal can be expressed as

$$R=HS+W$$
 (2)

Where H is the channel response matrix and W is additive white Gaussian noise (AWGN). The received signals at time t are

at receive antenna one:  $r_1 = h_{11}s_0 + h_{21}s_1 + w_1$  (3)

at receive antenna two:  $r_3 = h_{12}s_0 + h_{22}s_1 + w_3$  (4)

The signals received at time t+T are

at receive antenna one:  $r_2 = -h_{11}s_1^* + h_{21}s_0^* + w_2$  (5)

at receive antenna two:  $r_4 = -h_{12}s_1^* + h_{22}s_0^* + w_4$  (6)

 $w_1$ ,  $w_2$ ,  $w_3$ , and  $w_4$ , are complex Gaussian random variables representing noise and interference. Alamouti states that the transmitted symbols  $S_0$  and  $S_1$  can be estimated in a maximum likelihood fashion by first combining the received signals according to the following equations

$$\tilde{s}_0 = h_{11}^* r_1 + h_{21} r_2^* + h_{12}^* r_3 + h_{22} r_4^* \tag{7}$$

$$\tilde{s}_1 = h_{21}^* r_1 - h_{11} r_2^* + h_{22}^* r_3 - h_{12} r_4^*$$
(8)

and then using a standard maximum likelihood detector to attempt to recover  $S_0$  and  $S_1$  from  $\tilde{s}_0$  and  $\tilde{s}_1$ .

#### 3. Design of a MIMO System

This section illustrates the hardware design of a  $2 \times 2$  MIMO system. The hardware is designed in modular fashion in order to simplify system design. The main emphasis is led on the ability to extend the hardware in an easy way if the

system requires hardware updates. The MIMO system can be broadly divided into two parts: the transmitter and the receiver.

#### 3.1. MIMO Transmitter

Alamouti scheme for two transmit antennas is used in this system design which outputs two streams of symbols and the outputs are fed to identical transmit chains. The transmit module of the design consists of four small sub-modules: MIMO encoder, selection block, in-phase and quadrature (*I-Q*) modulator and a numerically controlled oscillator.



Fig. 2: Transmitter design of a MIMO system

Fig. 2 shows the system architecture to implement a realtime, continuously operating two transmit antenna Alamouti scheme. The BPSK modulated symbols are encoded into space-time code in MIMO encoder. The output of the encoding process is two streams of modulated symbols. Each stream is fed to identical transmit chain each driving a separate antenna. A selection block is used to indicate the symbol period. Depending on the symbol period, symbols are selected from the encoder. Carrier signals are generated by numerically controlled oscillator which comprises a look-up table and a counter. The *I-Q* carriers are generated which are multiplied with encoded symbols to perform *I-Q* modulation. Adding together the *I* and *Q* components, the corresponding symbols are generated and they are then transmitted through transmit antennas.

#### 3.2. MIMO Receiver

Hardware design and implementation of the MIMO receiver is based on equations (7) and (8). However the complex values and their operations of the equations cannot be simply implemented using hardware description language (HDL). Hence the complex values are expanded to real and imaginary parts to simplify implementation. The resulting expressions are shown in equations (9) to (12).

$$s0re = \operatorname{Re}(h_{11}) \times \operatorname{Re}(r_{1}) + \operatorname{Im}(h_{11}) \times \operatorname{Im}(r_{1}) + + \operatorname{Re}(h_{12}) \times \operatorname{Re}(r_{2}) + \operatorname{Im}(h_{12}) \times \operatorname{Im}(r_{2}) + + \operatorname{Re}(h_{21}) \times \operatorname{Re}(r_{3}) + \operatorname{Im}(h_{21}) \times \operatorname{Im}(r_{3}) + + \operatorname{Re}(h_{22}) \times \operatorname{Re}(r_{4}) + \operatorname{Im}(h_{22}) \times \operatorname{Im}(r_{4})$$
(9)

$$s0im = \operatorname{Re}(h_{11}) \times \operatorname{Im}(r_1) - \operatorname{Im}(h_{11}) \times \operatorname{Re}(r_1) - \operatorname{Re}(h_{12}) \times \operatorname{Im}(r_2) + \operatorname{Im}(h_{12}) \times \operatorname{Re}(r_2) + \operatorname{Re}(h_{21}) \times \operatorname{Im}(r_3) - \operatorname{Im}(h_{21}) \times \operatorname{Re}(r_3) - (10) - \operatorname{Re}(h_{22}) \times \operatorname{Im}(r_4) + \operatorname{Im}(h_{22}) \times \operatorname{Re}(r_4)$$

$$slre = \text{Re}(h_{12}) \times \text{Re}(r_1) + \text{Im}(h_{12}) \times \text{Im}(r_1) - - \text{Re}(h_{11}) \times \text{Re}(r_2) - \text{Im}(h_{11}) \times \text{Im}(r_2) + + \text{Re}(h_{22}) \times \text{Re}(r_3) + \text{Im}(h_{22}) \times \text{Im}(r_3) - - \text{Re}(h_{21}) \times \text{Re}(r_4) - \text{Im}(h_{22}) \times \text{Im}(r_4)$$
(11)

$$slim = \text{Re}(h_{12}) \times \text{Im}(r_1) - \text{Im}(h_{12}) \times \text{Re}(r_1) + + \text{Re}(h_{11}) \times \text{Im}(r_2) - \text{Im}(h_{11}) \times \text{Re}(r_2) + + \text{Re}(h_{22}) \times \text{Im}(r_3) - \text{Im}(h_{22}) \times \text{Re}(r_3) + + \text{Re}(h_{21}) \times \text{Im}(r_4) - \text{Im}(h_{21}) \times \text{Re}(r_4)$$
(12)

If these equations are directly converted into HDL and synthesized, they use up most of the resources available on the FPGA of the testbed. In order to overcome the problem a new design is considered as shown in Fig. 3. The design consists of four functional 'multiplier units', and four associated 'add/subtract units' with registers to accumulate the totals. There is also a 'control unit', implemented as a state machine, to multiplex inputs to different functional units, and also control whether the add/subtract units add or subtract. The meaning of the A, B, C and D signal can be found by careful examination of equations (9) to (12). The apparently complex equations follow a pattern. In particular, it can be noted that there are four distinct sets of operands for the multiplication operations. These four sets, which have been labelled A, B, C and D are shown in Table 2.



Fig. 3: Block diagram of hardware design of MIMO decoder

| Set | First use                  | Second use                 |
|-----|----------------------------|----------------------------|
| А   | Operand 1 in equation (9)  | Operand 1 in equation (10) |
| В   | Operand 2 in equation (9)  | Operand 2 in equation (11) |
| С   | Operand 2 in equation (10) | Operand 2 in equation (12) |
| D   | Operand 1 in equation (11) | Operand 1 in equation (12) |

To further explain the meaning of Table 2, consider set A as an example. The first usage of A is listed as 'Operand 1 in equation (9)' and the second is 'Operand 1 in equation (10)'. Note, in particular, the first (left hand) operand of any multiplication in equation (9) is the same as the first oper-

and of the corresponding multiplication in equation (10). Because these operands are always the same they are grouped together as set A. Table 2 similarly specifies the members of the other sets. These grouping can be verified by checking them against equations (9) to (12).

It can be seen that the design calculates all the equations for the symbol estimates in parallel. There is one multiplier and one add/subtract unit for each equation being implemented. By exploiting these pairings the control unit is able to multiplex the required inputs through to all of the multiplier functional units using only four multiplexers instead of the eight that would otherwise be required.

#### 4. Hardware Implementation

MIMO testbed development process on FPGA starting from system specification is outlined in Fig. 4. The testbed is first examined with a high level simulation using MATLAB 7.0. Different sub-blocks of the system are then translated for hardware implementation. The HDL used in this work is VHDL for its flexibility of coding styles and suitability for handling very large and complex designs. Xilinx ISE 10.1i and XST engine are used for VHDL synthesis and placeand-route, while Mentor ModelSim XE III 6.3c is used to run functional and post place-and-route simulations. After compilation, simulation and synthesis, configuration files are generated which are used to configure FPGA device. In every step the outputs are verified by comparing with MATLAB simulation result.

The MIMO testbed is implemented on a Xilinx Virtex<sup>™</sup>-4 LX MB Development Kit. The board included with this kit has a Xilinx XC4VLX60 FPGA device, programmable clock source (25-700 MHz), on-board 100MHz oscillator, a configuration device and access to all of the device signals through available connectors. With all these features the device can be configured to implement very complex systems. Hardware specification for the design of the MIMO realization is listed in Table 3.



Fig. 4: Design steps of FPGA implementation

Table 3: Hardware Specification of the FPGA Board

| FPGA family         | Virtex-4           |
|---------------------|--------------------|
| Device              | Xilinx XC4VLX60    |
| Programmable clock  | 25-700 MHz         |
| On-board oscillator | 100MHz             |
| Memory              | 64MB of DDR SDRAM, |
|                     | 4MB of Flash       |
| Logic elements      | 59,904             |

The Alamouti scheme based MIMO transmitter contains sequential logic and thus requires some control logic and a clock signal. Fig. 5 shows the schematic of a two transmitter Alamouti encoder implemented using Xilinx<sup>®</sup> ISE 10.1i. At any symbol period two input bits are modulated by two BPSK modulators. This outputs the real and imaginary components and these are inserted to the Alamouti encoder which encodes the symbols as described above.



Fig. 5: Schematic of MIMO transmitter

The only operation that the Alamouti encoder performs on modulated symbols is the negation of either the real or imaginary part of a symbol. System clock is used to ensure that all signals are latched at the correct instant of time. It is designed to operate at the same clock speed as the data rate of the system, so one clock cycle is assumed to be one symbol period.

Since it takes two clock cycles to encode two symbols the encoder maintains a 'state' signal to indicate if it is currently the first or second symbol period. This 'state' signal is implemented as a single bit signal that is toggled each clock cycle. The 'state' signal indicates first symbol period when it is 1 and second symbol period when 0. The two BPSK modulated input symbols 'in\_i1' and 'in\_i2' are encoded to 'out\_i1' and 'out\_i2' according to Alamouti scheme in two consecutive symbol periods. Fig. 6 presents the FPGA implementation result of Alamouti encoder.

| Messages                  |                                         |                                               |
|---------------------------|-----------------------------------------|-----------------------------------------------|
| /al_encoder_tb/clk        | 1                                       |                                               |
| 🗉 🔶 /al_encoder_tb/in_i1  | 011111111111111111                      | 0111                                          |
| 🕀 🔶 /al_encoder_tb/in_q1  | 000000000000000000000000000000000000000 | 000000000000000000000000000000000000000       |
| 🕀 🔶 /al_encoder_tb/in_i2  | 10000000000000000                       | 0111                                          |
| 🕀 🔶 /al_encoder_tb/in_q2  | 000000000000000000000000000000000000000 | 000000000000000                               |
| 🕀 🔶 /al_encoder_tb/out_i1 | 01111111111111111                       |                                               |
| 虫 🔶 /al_encoder_tb/out_q1 | 000000000000000000000000000000000000000 | 000000000000000000000000000000000000000       |
| 🕀 🚸 /al_encoder_tb/out_i2 | 10000000000000000                       | <u>)0111111111111111111111110000000000000</u> |
| 🕀 🔶 /al_encoder_tb/out_q2 | 000000000000000000000000000000000000000 | 0000000000000000                              |
| /al_encoder_tb/uut/state  | 1                                       |                                               |

Fig. 6: Simulation result of MIMO encoder

The task of the decoder is to combine the signals simultaneously received in all antennas to construct an improved signal, from which the transmitted signal can be recovered. The MIMO decoder at the receiver takes the input of four 16 bit real and imaginary parts of the channel estimate and four 16 bit real and imaginary parts of the received signals. The design is a multi-cycle implementation; it takes multiple clock cycles to compute the results. The multipliers take one clock cycle to calculate a product and the add/ subtract units also take one clock cycle. Therefore two symbol estimates (real and imaginary parts) are produced every 8 clock cycles. The 'done' signal points the end of this 8 clock cycles. A 'reset' signal is also used to reset the decoder. Because of the pairing of the operands as mentioned in Table 3.2, the control logic is able to multiplex the required inputs through to all of the multiplier functional units using only four multiplexers instead of the eight that would otherwise be required. Fig. 7 presents the top level schematic of the MIMO decoder and Fig. 8 shows the simulation result of the decoder. The simulation results are validated against 'bit accurate' MATLAB outputs. Bit accurate refers to the fact that for a given set of input bits the MATLAB simulation will produce the correct output bits.

The MIMO encoder and decoder designs are successfully synthesized using XST engine and then placed and routed on the targeted FPGA. Table 4 shows the device utilization summery of the implementation. This utilization should be considered as an upper bound as there exists a variety of possible optimizations not yet applied to the design.



Fig. 7: Top level schematic of MIMO decoder



Fig. 8: Simulation result of MIMO decoder

Table 4: Device Utilization Summary

|                  | Available<br>Resources | Used by MIMO<br>Encoder | Used by MIMO<br>Decoder |
|------------------|------------------------|-------------------------|-------------------------|
| Slices           | 26,624                 | 6                       | 393                     |
| Slice Flip-flops | 53,248                 | 10                      | 355                     |
| LUTs             | 53,248                 | 6                       | 472                     |
| Pins             | 448                    | 67                      | 323                     |

### 5. Conclusion

4G wireless systems employ multiple antenna techniques to provide high performance while maximizing spectral ef-

ficiency. This prevalence of MIMO systems highlights the need for designed platforms to evaluate such algorithms under realistic conditions. In this paper, the design methodology and implementation of a MIMO testbed is presented, which involves FPGA for fast parallel processing. Main emphasis is led on the ability to extend the hardware in an easy way if the system requires hardware update. The encoder and decoder can be used as standalone units in a single FPGA, or as an element of a complete communication system. The flexibility and wide range of resources of FPGA can thus be very efficient for embedded hardware implementations of future generations of wireless communications systems.

#### Acknowledgements

The authors would like to thank Institute of Space Science (ANGKASA), UKM and the Ministry of Science, Technology and Innovation (MOSTI) of Malaysia, for sponsoring this work under the e-Science fund: 01-01-02-SF0376.

#### References

- /1./ J. Paulraj, D.A. Gore, R.U. Nabar, and H. Bolcskei, "An overview of MIMO communications—A key to gigabit wireless," Proceedings of the IEEE, vol. 92, no. 2, pp. 198–218, 2004.
- /2./ J. Mietzner, R. Schober, L. Lampe, W. H. Gerstacker, and P. A. Hoeher, "Multiple-Antenna Techniques for Wireless Communications – A Comprehensive Literature Survey," IEEE Communications Surveys & Tutorials, vol. 11, no. 2, pp. 87-105, Second Quarter 2009.
- /3./ S. Caban, C. Mehlfuhrer, R. Langwieser, A. L. Scholtz, and M. Rupp, "Vienna MIMO testbed," EURASIP Journal on Applied Signal Processing, vol. 2006, Article ID 54868, 13 pages, 2006.
- /4./ J. Dowle, S.H. Kuo, K. Mehrotra, and I.V. McLoughlin, "An FPGA-Based MIMO and Space-Time Processing Platform," EURASIP Journal on Applied Signal Processing, vol. 2006, pp. 1-14, Article ID 34653, 2006.
- /5./ C. Ebeling, C. Fisher, G. Xing, M. Shen, and H. Liu, "Implementing an OFDM receiver on the RaPiD reconfigurable architecture," IEEE Transactions on Computers, vol. 53, no. 11, pp. 1436– 1448, 2004.
- /6./ A. Teramoto, K. Nishijo, T. Maemura, Y. Nagao, M. Kurosaki, H. Ochi, "Design of 600Mbps 4×2 MIMO-OFDM Wireless LAN System and Its FPGA Implementation," 10th International Conference on Advanced Communication Technology 2008 (ICACT 2008), vol. 1, pp. 579–582, 2008.
- /7./ M. Baghaie, S.H. Kuo, and I.V. McLoughlin, "FPGA implementation of space-time block coding systems," IEEE 6th Circuits and Systems Symposium on Emerging Technologies: Frontiers of Mobile and Wireless Communication (MWC '04), vol. 2, pp. 591–594, 2004.
- /8./ H. Eslami, S. V. Tran, and A. M. Eltawil, "Design and Implementation of a Scalable Channel Emulator for Wideband MIMO Systems," IEEE Transactions on Vehicular Technology, vol. 58, no. 9, pp. 4698-4709, Nov 2009.
- /9./ R. Rao, W. Zhu, S. Lang, C. Oberli, D. Browne, J. Bhatia, JF Frigon, J. Wang, P. Gupta, H. Lee, D.N. Liu, S.G. Wong, M. Fitz, B. Daneshrad, O. Takeshita, "Multi-Antenna Testbeds for

Research and Education in Wireless Communications," IEEE Communications Magazine, vol. 42, no. 12, pp. 62-70, Dec. 2004.

- /10./ T. Kaiser, A. Bourdoux, M. Rupp, and U. Heute, "Implementation Aspects and Testbeds for MIMO Systems," EURASIP Journal on Applied Signal Processing, vol. 2006, pp. 1-3, Article ID 69217, 2006.
- /11./ K. Zheng, L. Huang, G. Li, H. Cao, W. Wang, and M. Dohler, "Beyond 3G Evolution," IEEE Vehicular Technology Magazine, vol. 3, no. 2, pp. 30–36, 2008.
- /12./ S. Yu, T. H. Im, C. H. Park, J. Kim, and Y. S. Cho, "An FPGA Implementation of MML-DFE for Spatially Multiplexed MIMO Systems," IEEE Transactions on Circuits and Systems–II: Express Briefs, vol. 55, no. 7, pp. 705-709, July 2008.
- /13./ A. J. -Pacheco, Á. F. -Herrero and J. C. -Quirós, "Design and Implementation of a Hardware Module for MIMO Decoding in a 4G Wireless Receiver," *VLSI Design*, vol. 2008, Article ID 312614, pp. 1-8, Hindawi Publishing Corporation, 2008.
- /14./ M. W. Numan, M. T. Islam, and N. Misran, "Performance and Complexity Improvement of Training based Channel Estimation in MIMO Systems," Progress In Electromagnetics Research C, vol. 10, pp. 1-13, 2009.
- /15./ T. Rappaport, Wireless Communications: Principles and Practice, 2nd Edition, Upper Saddle River, NJ: Prentice Hall, 2002.
- /16./ E. G. Larsson, P. Stoica, Space Time Block Coding for Wireless Communications, Cambridge, UK: Cambridge University Press, 2003.
- /17./ S. Alamouti, "A simple transmit diversity technique for wireless communications," IEEE Journal On Selected Areas in Communications, vol. 16, pp. 1451-1458, 1998.

Mostafa Wasiuddin Numan Dept. of Electrical, Electronic and Systems Engineering Universiti Kebangsaan Malaysia Bangi 43600, Selangor, Malaysia Email: mwnuman@gmail.com

Mohammad Tariqul Islam Institute of Space Science (ANGKASA) Universiti Kebangsaan Malaysia Bangi 43600, Selangor, Malaysia

Norbahiah Misran <sup>1,2</sup> <sup>1</sup>Dept. of Electrical, Electronic and Systems Engineering <sup>2</sup>Institute of Space Science (ANGKASA) Universiti Kebangsaan Malaysia Bangi 43600, Selangor, Malaysia

Prispelo (Arrived): 11.01.2010 Sprejeto (Accepted): 09.09.2010
# LOGIC-BASED QCA IMPLEMENTATION OF A 4×4 S-BOX

<sup>1</sup>Mohammad Amin Amiri, <sup>1</sup>Sattar Mirzakuchaki, <sup>2</sup>Mojdeh Mahdavi

<sup>1</sup>E. E. Department, Iran University of Science and Technology, Tehran, Iran <sup>2</sup>Department of Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran

Key words: Quantum Cellular Automata; Substitution Box

**Abstract:** Quantum Cellular Automata (QCA) represents an emerging technology at the nanotechnology level. Nowadays, many applications of QCA technology are introduced and cryptography can be an interesting application of QCA technology. Substitution boxes are important components in many modern day block and stream ciphers. Here, we have implemented a specific 4×4 S-Box using QCA technology. Simulation results are obtained from QCADesigner software.

# Izvedba 4x4 S-Box vezja s QCA tehnologijo

#### Kjučne besede: QCA, S-Box

**Izvleček:** Tehnologija QCA predstavlja rastočo tehnologijo na nivoju nanotehnologije. Dandanes se poraja veliko aplikacij s QCA tehnologijo kot npr v kriptografiji. V prispevku opišemo izvedbo posebne 4x4 S-Box vezja z uporabo QCA tehnologije. Rezultate simulacij smo pridobili s programsko opremo QCADesigner.

### 1. Introduction

The microelectronics industry has improved the integration, the power consumption, and the speed of integrated circuits during past several decades by means of reducing the feature size of transistors. But it seems that even by decreasing the transistor sizes, some problems such as power consumption cannot be ignored. Utilizing the QCA technology for implementing logic circuits is one of the approaches which in addition to decreasing the size of logic circuits and increasing the clock frequency of these circuits, reduces the power consumption of these circuits. QCA which was first introduced by Lent et.al /1/ represents an emerging technology at the nanotechnology level. QCA cells have quantum dots, in which the position of electrons will determine the binary levels of 0 and 1.

Substitution provides a significant role in modern cryptography. For some applications, the substitutions are formed by simple Boolean functions (which take several Boolean inputs and give a single output as a result). The design of suitable functions has received significant attention from cryptographers for decades. Substitution is typically implemented by substitution boxes (S-Boxes). These functions have multiple inputs and multiple outputs. Perhaps the most famous S-Boxes are those of the Data Encryption Standard (DES) /2/. Within each round of DES, the most significant contribution to security is made by eight 6-input, 4- output functions. These are specified via lookup tables. The DES algorithm has been subject to a great deal of controversy. Much of this has revolved around the particular substitutions implemented by the eight S-Boxes. The S-Box idea has a firm hold in modern day cryptography. The new international symmetric key cryptography standard, the Advanced Encryption Standard (AES), also uses S-Boxes to perform substitutions /3/. As an application of QCA technology, we have implemented a specific 4×4 S-Box. The method which is used to implement the S-Box is the Logic-Based method. In this method, the S-Box is implemented by logic gates. In the next Section, we will briefly explain the Quantum dot Cellular Automata. It includes the cell introduction, cell-cell coupling, QCA logic, and QCA clocking. In Section III, our work is explained and the simulation results are illustrated.

Simulation results of this implementation are obtained from QCADesigner v2.0.3 software (QCADesigner is developed by the ATIPS lab at the University of Calgary in Canada). QCADesigner v2.0.3 features different simulation engines. Throughout this paper, the coherence vector engine is used due to its accurate and detailed evaluation of QCA.

### 2. QCA Review

In Quantum Cellular Automata (QCA), a cell contains four quantum dots, as schematically shown in Fig. 1. The quantum dots are shown as the open circles which represent the confining electronic potential. Each cell is occupied by two electrons which are schematically shown as the solid dots.

In a cell, the electrons are allowed to jump between the individual quantum dots by the mechanism of quantum mechanical tunneling but they are not allowed to tunnel between cells. The barriers between cells are assumed sufficient to completely suppress intercellular tunneling.



Fig. 1. QCA cell and its ground states

If they are left alone, they will meet the configuration corresponding to the physical ground state of the cell. It is in an obvious manner that the two electrons will tend to occupy different dots because of the Coulombic force associated with bringing them together in close proximity on the same dot.

By these concepts, it's concluded that the ground state of the system will be an equal superposition of the two basic configurations with electrons at opposite corners, as shown in Fig. 1. The positions of the electrons are also shown in this figure.



Fig. 2. Coupling of QCA cells

Coupling between the two cells is provided by the Coulomb interaction between electrons in different cells. Fig. 2 shows how one cell is affected by the state of its neighbor /4/. This figure shows the two cells where the polarization of cell 1 (P1) is determined by the polarization of its neighbor (P2). P2 is assumed to be fixed at a given value, corresponding to a specific arrangement of charges in cell 2 and this charge distribution exerts its influence on cell 1, thus determining its polarization. The result which can be drawn here is the strongly non-linear nature of the cell-cell coupling. Cell 1 is almost completely polarized even though cell 2 might only be partially and not completely polarized /3,4/.



Fig. 3. (a) Redundant inverter gate, (b) Inverter gate



Fig. 4. (a) Majority logic gate, (b) Binary wire, (c) Inverter chain

The physical interactions between cells may be used to realize elementary Boolean logic functions. The basic logic gates in QCA are the Majority logic function and the Inverter which are illustrated in Fig. 4(a) and Fig. 3, respectively. The Majority logic function can be realized by only 5 QCA cells /5/.

The logic AND function can be implemented by a Majority logic function by setting one of its inputs permanently to 0 and the logic OR function can be implemented by a Majority logic function by setting one of its inputs permanently to 1.

QCA clocking provides a mechanism for synchronizing information flow through the circuit. It should be considered that the clock also controls the direction of information flow in a QCA circuit. The QCA clock also provides the power required for circuit operation. More precisely, the QCA clock is used to control the tunneling barrier height in cells. When the clock is low, the electrons are trapped in their associated positions and can't tunnel to other dots, therefore latching the cell (Hold phase). This is caused by the intracellular barriers which are held at their maximum height. When the clock signal is high, the cell goes to the null polarization state (Relax phase). This is caused by the intracellular barriers which are held at their minimum height. Between these two cases, the cells are either releasing or switching.



Fig. 5. Barrier height in four phases of clock

Fig. 5 shows the barrier height in four phases of clock. Each cell in a particular clocking zone is connected to one of the four available phases of the QCA clock shown in Fig. 6. Each cell in the zone is latched and unlatched in synchronization with the changing clock signal and therefore the information is propagated through cells /6-9/.



Fig. 6. QCA clock zones

## 3. Logic-Based Implementation of S-Box

At first, we have chosen a specific substitution function to implement. This 4×4 substitution function is illustrated in TABLE I. Input and output values are shown in hexadecimal format.

| Table 1 The Input and Output of S-Bo | ЭX |
|--------------------------------------|----|
|--------------------------------------|----|

| S-Box Input |    |           |            |    | S-Box | Output |    |
|-------------|----|-----------|------------|----|-------|--------|----|
| <b>S</b> 3  | S2 | <b>S1</b> | <b>S</b> 0 | 03 | 02    | 01     | 00 |
| 0           | 0  | 0         | 0          | 0  | 0     | 1      | 1  |
| 0           | 0  | 0         | 1          | 1  | 0     | 0      | 0  |
| 0           | 0  | 1         | 0          | 1  | 1     | 1      | 1  |
| 0           | 0  | 1         | 1          | 0  | 0     | 0      | 1  |
| 0           | 1  | 0         | 0          | 1  | 0     | 1      | 0  |
| 0           | 1  | 0         | 1          | 0  | 1     | 1      | 0  |
| 0           | 1  | 1         | 0          | 0  | 1     | 0      | 1  |
| 0           | 1  | 1         | 1          | 1  | 0     | 1      | 1  |
| 1           | 0  | 0         | 0          | 1  | 1     | 1      | 0  |
| 1           | 0  | 0         | 1          | 1  | 1     | 0      | 1  |
| 1           | 0  | 1         | 0          | 0  | 1     | 0      | 0  |
| 1           | 0  | 1         | 1          | 0  | 0     | 1      | 0  |
| 1           | 1  | 0         | 0          | 0  | 1     | 1      | 1  |
| 1           | 1  | 0         | 1          | 0  | 0     | 0      | 0  |
| 1           | 1  | 1         | 0          | 1  | 0     | 0      | 1  |
| 1           | 1  | 1         | 1          | 1  | 1     | 0      | 0  |

The name of S-Box refers to the length of its input and output. Here, the  $4 \times 4$  S-Box means that the input and output of this S-Box have the length of 4 bits. With naming the input bits as A, B, C, D, and output as O3, O2, O1 and O0, the following logic functions are extracted:

 $O3 = \overline{BCD} + \overline{ABCD} + \overline{ABCD} + BCD + ABC + A\overline{BC}$  $O2 = \overline{ACD} + \overline{ABCD} + ABCD + A\overline{CD} + A\overline{BC} + A\overline{BD}$ 

 $O1 = \overline{CD} + \overline{ABD} + \overline{ABD} + \overline{ABD} + \overline{ABCD}$  $O0 = \overline{ABD} + AB\overline{D} + \overline{AC} + \overline{ABCD}$ 

Considering the O3 output, it is implemented using QCA cells and is illustrated in Fig. 7. The inputs are applied to the circuit through binary wires. Each term of logic func-



Fig. 7. O3 output implementation

tion is composed of two or three majority gates which are used as logic AND functions. Two majority gates are for the term which contains only three inputs and three majority gates are for terms which contain four inputs. The outputs of AND functions are then logically ORed to result the O3 output. An exhaustive simulation is accomplished for O3 output and simulation result of the O3 output is shown in Fig. 8. As illustrated, the O3 output is valid after eight clock cycles. The "0110100111000011" pattern in O3 which corresponds to the values of inputs from 0 to F, can be seen in Fig. 8.



Fig. 8. O3 output simulation result

Considering the O2 output, it is also implemented using QCA cells and is illustrated in Fig. 9. The inputs are applied to the circuit through binary wires. Each term of logic function is composed of two or three majority gates which are used as logic AND functions. Two majority gate is for the term which contains only three inputs and three majority gates are for terms which contain four inputs. The outputs of AND functions are then logically ORed to result the O2 output.

An exhaustive simulation is also accomplished for O2 output and simulation result of the O2 output is shown in Fig. 10. As illustrated, the O2 output is valid after eight clock cycles. The "0010011011101001" pattern in O2 which corresponds to the values of inputs from 0 to F, can be seen in Fig. 10.



Fig. 9. O2 output implementation



Fig. 10. O2 output simulation result



Fig. 12. O1 output simulation result



Fig. 11. O1 output implementation

Considering the O1 output, it is also implemented using QCA cells and is illustrated in Fig. 11. The inputs are applied to the circuit through binary wires. Each term of logic function is composed of one or two or three majority gates which are used as logic AND functions. One majority gate is for the term which contains only two inputs and two majority gates are for terms which contain three inputs and three majority gates are for terms which contain four inputs. The outputs of AND functions are then logically ORed to result the O1 output.

An exhaustive simulation is also accomplished for O1 output and simulation result of the O1 output is shown in fig. 12. As illustrated, the O1 output is valid after six clock cycles. The "1010110110011000" pattern in O1 which corresponds to the values of inputs from 0 to F, can be seen in Fig. 12.

Considering the O0 output, it is also implemented using QCA cells and is illustrated in Fig. 13. The inputs are applied to the circuit through binary wires. Each term of logic

function is composed of one or two or three majority gates which are used as logic AND functions. One majority gate is for the term which contains only two inputs and two majority gates are for terms which contain three inputs and three majority gates are for terms which contain four inputs. The outputs of AND functions are then logically ORed to result the O0 output.



Fig. 13. O0 output implementation

An exhaustive simulation is also accomplished for O0 output and simulation result of the O0 output is shown in Fig. 14. As illustrated, the O0 output is valid after six clock cycles. The "1011001101001010" pattern in O0 which corresponds to the values of inputs from 0 to F, can be seen in Fig. 14.

The implementation results are illustrated in TABLE II. The Complexity, Area and Delay of this implementation are illustrated. The maximum Delay among four output bits is considered to be the Delay of this S-Box. Like previous works such as /10, 11/ each QCA cell is assumed to have



Fig. 14. O0 output simulation result

the width and length of 18 nm. The neighbor cells have a center to center distance of 20 nm.

Table 2 Implementation Results of S-Box

|                   | 03     | 02     | 01     | 00     | S0 S-Box |
|-------------------|--------|--------|--------|--------|----------|
| Complexity(Cells) | 1204   | 1205   | 758    | 798    | 3965     |
| Area(µm²)         | 1.7236 | 1.7236 | 1.1532 | 1.1532 | 5.7536   |
| Delay(Clocks)     | 8      | 8      | 6      | 6      | 8        |

# 4. Conclusion

We have implemented a specific 4×4 S-Box using QCA technology. This S-Box has four bits of input and four bits of output. Every output is implemented and exhaustively simulated. Simulation results of all outputs are illustrated in Fig.8, Fig.10, Fig.12 and Fig.14.

Any type of S-Box with any length of input and output can be implemented and simulated in such a routine, even though there are some other methods for implementation as well.

### References

/1/ C. S. Lent, P. D. Tougaw, W. Porod, G. H. Bernstein, "Quantum Cellular Automata," Nanotechnology, vol. 4, no. 1, 1993, pp. 49–57.

- /2/ National Bureau of Standards, "Data Encryption Standard", NBS FIPS PUB 46, 1976.
- /3/ John A. Clark, Jeremy L. Jacob, Susan Stepney, "The Design of S-Boxes by Simulated Annealing," International Conf. on Evolutionary Computation, Portland OR, USA, pages 1533-1537, IEEE 2004.
- /4/ P. D. Tougaw, C. S. Lent, "Dynamic Behavior of Quantum Cellular Automata," J. Appl. Phys., vol. 80, no. 8, October 1996, pp. 4722-4735.
- /5/ P. D. Tougaw, C. S. Lent, and W. Porod, "Bistable Saturation in Coupled Quantum-dot Cells," J. Appl. Phys., vol. 74, no. 5, Sep. 1993, pp. 3558–3565.
- /6/ P.D. Tougaw and C.S. Lent, "Logical Devices Implemented Using Quantum Cellular Automata," J. Appl. Phys., vol. 75(3), 1994, pp. 1818-1825.
- /7/ K. Hennessy and C. S. Lent, "Clocking of Molecular Quantumdot Cellular Automata," J. Vac. Sci. Technol., vol. 19, no. 5, Sep. 2001, pp. 1752–1755.
- /8/ C. S. Lent and Beth Isaksen, "Clocked Molecular Quantum-dot Cellular Automata," IEEE Trans. on Electron Devices, vol. 50, no. 9, Sep. 2003.
- /9/ M. A. Amiri, M. Mahdavi, S. Mirzakuchaki, "QCA Implementation of a Mux-Based FPGA CLB," Proc. of International Conf. On Nanoscience and Nanotechnology, Australia, Feb. 2008, pp. 141-144.
- /10/ Heumpil Cho, Earl E. Swartzlander, Adder Designs and Analyses for Quantum-Dot Cellular Automata, IEEE Trans. on Nanotechnology, vol. 6, n. 3, May 2007, pp. 374–383.

/11/ Heumpil Cho, Earl E. Swartzlander, Adder and Multiplier Design in Quantum-Dot Cellular Automata, IEEE Trans. on Computers, vol. 58, n. 6, June 2009, pp. 721–727.

> Mohammad Amin Amiri E. E. Department, Iran University of Science and Technology, Tehran, Iran amiri@ee.iust.ac.ir

> Sattar Mirzakuchaki E. E. Department, Iran University of Science and Technology, Tehran, Iran m\_kuchaki@iust.ac.ir

Mojdeh Mahdavi Department of Engineering, Science and Research Branch, Islamic Azad University Tehran, Iran m.mahdavi@ieee.org

Prispelo (Arrived): 24.11.2009 Sprejeto (Accepted): 09.09.2010

# REAL-TIME KEYSTONE CORRECTION OF VIDEO IMAGE USING FPGA

Zmago Jereb<sup>1</sup>, Janez Diaci<sup>2</sup>

<sup>1</sup> Kolektor Group d.o.o., Vojkova 10, 5280 Idrija, Slovenia <sup>2</sup> University of Ljubljana, Faculty of Mechanical Engineering, Ljubljana, Slovenia

Key words: real-time image processing; keystone deformation; FPGA

**Abstract:** In this paper we present a method for geometrical keystone correction of a video image using digital image processing. The method is divided into image deformation analysis and real-time image correction. The analysis is based on comparing a projected reference image against its original. The transformation is formulated as a sequence of simple operations which can be implemented for real-time execution using existing hardware technology. An implementation uses a programmable gate array (FPGA) and demonstrates the feasibility of real-time transformation of the image. The aim of the implementation is to minimize the hardware footprint and allow using a low-cost FPGA device.

# Digitalna geometrijska korekcija trapeznega popačenja projicirane video slike v realnem času z uporabo FPGA

Kjučne besede: realni čas; obdelava slik; trapezna deformacija; FPGA

**Izvleček:** To delo podaja metodo, ki omogoča digitalno geometrijsko korekcijo trapeznega popačenja projicirane video slike. Metoda je razdeljena na analizo popačenja prikazane slike in njeno korekcijo v realnem času. Analiza deformacije je osnovana na podlagi referenčne slike, katere original je primerjan s projicirano sliko. Digitalna korekcija slike zajema zaporedje matematičnih operacij, ki so zasnovane z namenom implementacije v obstoječe sisteme za obdelavo slik v realnem času. V delu je predstavljena implementacija z uporabo integriranega vezja s poljem logičnih vrat (FPGA), ki prikazuje zmožnosti metode. Implementacija je izvedena s poudarkom na minimizaciji potrebnih logičnih funkcij, s čimer je dana možnost uporabe manjših in cenejših FPGA integriranih vezij.

# 1. Introduction

Keystone deformation is a common geometric distortion of a projected video image. A typical installation of a video projector under the ceiling, aimed with its optical axis at an arbitrary vertical angle to the projection screen, causes a vertical keystone deformation of the projected image. Common video projection systems incorporate build-in correction based on a lens displacement /1/, /2/. The alternative to the optical correction method is the digital image correction method, which remaps the image pixels of the original image to the pre-warped image in such a way, that the projected image appears undistorted.

The available literature addresses the geometrical transformations with a use of graphic card libraries (e.g. OpenGL) /3/, /4/ or with special hardware designs.

The designs using a field programmable gate array (FPGA) devices have become a popular practice in real-time image processing studies. The reprogrammable logic and the capability to implement parallel processing make FPGA applicable for real-time image processing /5/, /6/, /7/, /8/, /9/. The methods often require relatively large silicon footprint and external high-bandwidth data buffers due to large amount of image data that needs to be processed in a short period of time /9/.

In this paper we present an FPGA based digital image processing method that allows a vertical keystone correction of the projected video image in real-time. The implementation is designed with a consideration to minimize the FPGA resources and the other supplementary hardware to achieve a low-cost solution. The algorithm is based on a comparison between the reference points taken from the original image and the points projected on the projection screen. The presented experiments demonstrate the potential of the method to remove the keystone image deformation as well as more complex horizontally warped image deformation.

# 2. Method

The presented method is shown schematically in Fig. 1. The basic video projection set-up: a multimedia player, a video projector and a projection screen is augmented by three additional components: a camera, an image comparator and an image processor. The image processor, placed between the multimedia player and projector, receives the original video image  $I_{in}$ , geometrically corrects it in real-time and sends the corrected image  $I_{out}$  to the projector.

The function of the camera and the image comparator is to determine the parameters  $c_{ik}$  of the correction algorithm. The transformation addresses vertical keystone distortions



Fig. 1: Schematic representation of the method.

and is therefore applied to the image in the horizontal direction only.

The reference image (RI) and its projected image (PRI) represent an orthogonal lattice of nine equally distributed horizontal and vertical lines as illustrated in Fig.2. The intersections between the two outer left and right vertical lines and nine horizontal lines represent the coordinates ( $x_{ij}$ ,  $y_{ij}$ ) and ( $u_{ij}$ ,  $v_{ij}$ ) for the RI and PRI image respectively.



Fig. 2: Reference points on the original image.

The coordinate systems of (x,y) and (u,v), in general, do not match. Therefore, the  $u_{ij}$  values are translated and scaled to match the  $x_{ij}$  counterparts before the calculation of the  $c_{ik}$  parameters. The following equations are presented assuming the RI image of size 1024 x 768 pixels (XGA).

$$u_{ij}^{\prime} = \left(u_{ij} - \max_{0 \le l \le 8} (u_{l0})\right) / \left(\min_{0 \le l \le 8} (u_{l1}) - \max_{0 \le l \le 8} (u_{l0})\right) \cdot 1023$$
(1)

To simplify the notation we use the symbols of the calibrated values  $u'_{ij}$  from Eq. 1 as  $u_{ij}$  from here on.

The transformation functions u(x,y) and v(x,y) are determined as a collection of bi-linear functions  $u_i(x,y)$  and  $v_i(x,y)$ , each defined on a domain bound by four neighboring reference points:  $x_{i0} < x < x_{i,1}$  and  $y_i < y < y_{i+1,j}$ . The horizontal functions are defined as:

$$u_{i}(x, y) = \begin{bmatrix} c_{i0} & c_{i1} & c_{i2} & c_{i3} \end{bmatrix} \begin{bmatrix} 1 \\ x \\ y - y_{i} \\ x(y - y_{i}) \end{bmatrix}$$
(2)

where the parameters  $c_{ik}$  are defined from the intersection coordinates  $x_{ij}$ ,  $y_{ij}$  and their projected counterparts  $u_{ij}$ :

$$\begin{aligned} c_{i0} &= u_{ij} \\ c_{i1} &= \left(u_{i,1} - u_{i0}\right) / \left(x_{i,1} - x_{i0}\right) \\ c_{i2} &= \left(u_{i+1,0} - u_{i0}\right) / \left(y_{i+1,0} - y_{i0}\right) \\ c_{i3} &= \left(u_{i0} - u_{i+1,0} - u_{i1} + u_{i+1,1}\right) / \left(\left(x_{i,1} - x_{i0}\right) \left(y_{i+1,0} - y_{i0}\right)\right) \end{aligned}$$
(3)

The vertical transformation functions are defied as identity transformations in accord with the above assumption of vertical keystone deformation:

$$v_i(x, y) = y + y_i \tag{4}$$

In the final corrective transformation of the method, the pixel intensity values of the input image are remapped using the inverse mapping algorithm /10/ that remaps the pixel intensity values of  $I_{in}$  to the corresponding pixel intensity values of  $I_{out}$ :

$$I_{out}[x,y] = I_{in}[\lfloor u \rfloor, y] + \{x\} (I_{in}[\lceil u \rceil, y] - I_{in}[\lfloor u \rfloor, y])$$
(5)  
where  $\{x\} = x - \lfloor x \rfloor, \lfloor x \rfloor$ , and  $\lceil x \rceil$  represent the fractional  
part, floor, and ceiling functions. The *u* and *v* are short-  
hand notations for the transformation functions of  $u(x,y)$ 

The above transformation also employs image filtering based on linear interpolation between neighboring pixel intensity values. The filter improves the quality of the transformed image by reducing the aliasing effect of the transformation. The filter is applied to each color channel (red, green, blue) of the image separately (*I*<sub>outr</sub>, *I*<sub>outg</sub>, *I*<sub>outb</sub>).

### 3. Implementation

and v(x,y), respectively.

A schematic organization of the method implementation is shown in Fig. 3. The implementation is designed to apply horizontal geometrical correction to a 24 bit XGA video image (1024 pixels wide, 768 pixels tall) at the 60 fps rate. The transformation parameters *c<sub>ik</sub>* are calculated with a personal computer using Eq. (3) and transmitted to the image processor over a RS232 serial link. The digital video (DVI) serial protocol is handled by two interface circuits (Texas instruments TFP101 and TFP410) which transform fast serial DVI data to parallel data and vice versa. The image processor algorithm is implemented using *Digilent Inc. Spartan-3 starter board*, based on Xilinx XC3S200 FPGA. The FPGA algorithms are developed with Xilinx ISE development environment and the VHDL hardware description language.

The video image data is transformed line by line starting from the top-left corner of the image. The presented correction is applied to the image horizontally, so that it coincides with the image data flow. For that reason, only two image lines of data are stored at a time. While buffer1 receives a new image line, the line buffer2 provides the previous image line to the image correction block. When the



Fig. 3: Image processor block diagram.

buffer1 gets full and simultaneously the image transformation is complete, the two buffers swap their functions.

The calculation of the transform function (Eq. (2)) is implemented as a series of additions as shown in Eq. (6) where the variable  $\Delta y$  represents the term *y*-*y<sub>i</sub>*. For the terms that use a multiplication, the linearity is exploited with the result that only an addition is needed when the x or  $\Delta y$  are incremented. The variables and the result are truncated to the most significant 24 bits.

$$u_{i}(x, y) = c_{i0} + (c_{i1} + (c_{i3}\Delta y))x + c_{i2}\Delta y$$
(6)

The remapping algorithm with bi-linear anti-aliasing filter (Eq. (5)) requires one hardware multiplier for each color. The filter is calculated using two neighboring pixel intensity values floor(u) and ceiling(u) from the image line buffer. The image pixels are stored in the buffer in pairs (...;  $\{I_{in}(x_i), I_{in}(x_{i+1})\}$ ;  $\{I_{in}(x_{i+1}), I_{in}(x_{i+2})\}$ ; ...). The reason for doubling the data is to simplify the calculation and speed-up the transformation process. The above solution uses only one access to buffer to provide the data needed to calculate the interpolated value for the  $I_{out}$ .

The result of the transformation function  $u_i$  (Eq. (6)) may fall out of the image range 0 < u(x,y) > 1024. For these pixels, Eq. 5 is bypassed and the intensity value is set to black color ( $I_{out r,g,b}(x,y) = (0, 0, 0)$ ) which represents the color of the background.

The transformation process is synchronized with the input image data flow and has a latency of one image line. The presented FPGA implementation uses 455 slices, three 18 x 18 bit hardware multipliers, 7 Block RAMs (126 kb) and 58 I/O pins.

## 4. Results and discussion

Experimental set-up used to validate the method and implementation includes a video projector Hitachi-ED-X12, a digital camera Canon EOS 350D with EF-S 18-55 mm lens, a personal computer and the image processor.

The obtained results are illustrated in Fig. 4. In the first experiment we tested a keystone deformation. The deformed PRI and corrected PRI images are shown on Figs. 4a and 4b respectively. In the second experiment illustrated in Fig 4-c,d, we tested the ability of the method to correct warped image deformations. The warped deformation was obtained by a small warp on the bottom of the projection screen.



### Fig. 4: Keystone deformed (a) and corrected image (b) and warped screen deformation (c) and its corrected image (d).

The measured error values are presented in Table 1 which shows the displacements of the reference points obtained from the deformed image and the residual displacements after applying the correction.

In both cases, the results show a great reduction of error after applying the correction. The residual error of the keystone and warped image correction doesn't exceed 2 pixels and 4 pixels respectively.

The results demonstrate that the method enables a significant reduction of the horizontal image distortion in realtime. The method is therefore applicable to the correction of the vertical keystone deformations frequently present in the projection systems as well as mild horizontally warped images which can be addressed due to the multiple reference points along the vertical image borders.

The drawback of the presented method is the change of aspect ratio due to horizontal shrinkage of the corrected image. The method is therefore limited to mild deformations which don't noticeably change the aspect ratio.

| Table 1: Reference | point | displacements. |
|--------------------|-------|----------------|
|--------------------|-------|----------------|

|   | K                      | eysto                  | ne [pi                 | ix]                    | Warped image [pi       |     |                 |                        |
|---|------------------------|------------------------|------------------------|------------------------|------------------------|-----|-----------------|------------------------|
|   | orig                   | jinal                  | corre                  | ected                  | original               |     | corrected       |                        |
| i | <b>x</b> <sub>i0</sub> | <b>X</b> <sub>i1</sub> | <i>u</i> <sub>i0</sub> | <i>u</i> <sub>i1</sub> | <b>X</b> <sub>i0</sub> | X'n | U <sub>i0</sub> | <i>u</i> <sub>i1</sub> |
| 0 | 72                     | 64                     | 0                      | 0                      | 0                      | 0   | 2               | 0                      |
| 1 | 63                     | 57                     | 0                      | 0                      | 1                      | 3   | 2               | 0                      |
| 2 | 53                     | 49                     | 0                      | 0                      | 3                      | 5   | 3               | 0                      |
| 3 | 44                     | 40                     | 0                      | 2                      | 4                      | 7   | 2               | 1                      |
| 4 | 36                     | 33                     | 0                      | 0                      | 6                      | 9   | 2               | 2                      |
| 5 | 27                     | 25                     | 0                      | 0                      | 6                      | 11  | 2               | 2                      |
| 6 | 18                     | 16                     | 0                      | 0                      | 7                      | 12  | 2               | 3                      |
| 7 | 9                      | 8                      | 1                      | 0                      | 7                      | 12  | 0               | 4                      |
| 8 | 0                      | 0                      | 1                      | 0                      | 0                      | 8   | 0               | 2                      |

With expansion of the reference points lattice over the hole image, the method could be also used to address a specific horizontal image deformations introduced by an omnidirectional display /11/, /12/.

The presented work is focused on real-time image transformation whereas the deformation analysis is performed off-line. Our future work will focus on development of realtime image deformation detection, which would allow online adaptation of image correction parameters.

## 5. Conclusion

We present a method for real-time horizontal geometrical distortion of video image based on digital image processing. The algorithms are divided into image deformation analysis implemented on a personal computer and realtime image processing implemented with a FPGA. The presented implementation allows the use of a small and low cost FPGA device.

The experimental validation of the method has been performed on a keystone deformed image and on a warped image. The results show that the method can significantly improve the geometry of an image.

The limitation of the proposed method is horizontal shrinkage of the correction image that results in a change of the aspect ratio. Therefore, for severely distorted image it would be more appropriate to apply two dimensional correction.

### 6. References

- /1/ A. Stolov, "Projector system including keystone correction", U.S. Patent 5706062, 1998.
- /2/ J.R. Biles, G.B. Kingsley, A.R. Conner, "Method and apparatus for distortion correction in optical projectors", U.S. Patent 5355188, 1994.
- /3/ M. Brown, W. Seales, "A Practical and Flexible Tiled Display System", Proceedings of the 10th Pacific Conference on Computer Graphics and Applications, 194 - 203, Oct. 2002.
- /4/ J.P. Tardif, S.Roy, M. Trudeau, "Multi-projectors for arbitrary surfaces without explicit calibration nor reconstruction", Proceedings of the Fourth International Conference on 3-D Digital Imaging and Modeling, 217 – 224, 2003.
- /5/ N. Sedcole, P. Cheung, G. Constantinides, W. Luk, "A Reconfigurable Platform for Real-Time Embedded Video Image Processing", Field-Programmable Logic and Applications, 606-615, 2003.
- /6/ P. Sedcole, B. Blodget, T. Becker, J. Anderson, P. Lysaght, "Modular dynamic reconfiguration in Virtex FPGAs", IEE Proceedings on Computers and Digital Techniques, vol. 153, 157-164, 2006.
- /7/ F. Kopač, A. Trost, "A systematic approach to real-time image segmentation in FPGA devices", Inf. MIDEM, vol. 35, No. 1, 13-19, 2005.
- /8/ A. Žemva, A. Trost, B. Zajc, "Educational programmable system for prototyping digital circuits", Int. J. of Electrical Engineering Education, 1998, Vol. 35, No. 3, pp. 236-244.
- /9/ D. Eadie, F.P. Shevlin, A. Nisbet, "Correction of geometric image distortion using FPGAs", Proceedings of Opto-Ireland 2002: optical metrology, imaging and machine vision, 28-37, June 2003.
- /10/ K.R. Castleman, "Digital image processing", Prentice Hall, 1996.
- /11/ D. Jurjavčič, "Device providing picture visibility from all sides," U.S. Patent 6460278, 2002.
- /12/ J. Babic, "Device providing simultaneous visibility of images within the area of 360 around itself," U.S. Patent 20060179693, 2003.

Zmago Jereb Kolektor Group d.o.o. Vojkova 10, 5280 Idrija, Slovenia zmago.jereb@kolektor.si

Janez Diaci University of Ljubljana, Faculty of Mechanical Engineering, Aškerčeva 6, Ljubljana, Slovenia

Prispelo (Arrived): 17.01.2010 Sprejeto (Accepted): 09.09.2010

# MICROFLOW GENERATOR FOR FUEL CELL METHANOL HYDROGEN MICROREACTOR

B. Pečar, M. Možek, D. Resnik, D. Vrtačnik, U. Aljančič, S. Penič and S. Amon Laboratory of Microsensor Structures and Electronics (LMSE), Faculty of Electrical Engineering, University of Ljubljana, Ljubljana, Slovenia

Key words: flow generator, methanol hydrogen microreactor, micropump, PID regulation, microchip, MPC control

Abstract: Design, fabrication and characterization of low-cost, energy efficient flow-generator for hydrogen production in microreactors is presented. Design guidelines of three approaches, based on micropump and pressure sensor are discussed in detail, followed by experimental setup description and system characterization. A comparative analysis of all three approaches was performed and most appropriate solution for methanol hydrogen microreactor is proposed. Due to adequate system energy efficiency and system stability, approach based on differential pressure sensing, supported by PID control regulation was found as the most appropriate, what was further confirmed with successfully running microreactor applications.

# Dozirni sistem za mikroprocesor goriva

Kjučne besede: generator pretoka, metanol / vodik mikroreformer, mikročrpalka, PID regulacija, mikročip, MPC kontroler

Izvleček: V prispevku predstavljamo tri pristope za učinkovit mikrodozirni sistem, namenjen doziranju goriva pri gorivnih celicah. Poudarek je na načrtovanju in izdelavi prototipov v makro izvedbi, čemur sledi testiranje, ki je ključnega pomena za objektivni izbor najprimernejše metode za našo aplikacijo. Vse metode združujejo uporabo mikročrpalke, tlačnega senzorja in regulatorja. Dokazali smo, da je s temi elementi možno realizirati zanesljiv dozirni sistem in se izogniti uporabi dragih, velikih komercialnih merilnikov pretoka. Kot najbolj stabilna in učinkovita se izkaže metoda, ki vključuje merjenje diferencialnega tlaka na zaslonki.

## 1. Introduction

Precise dosing of substances is becoming an important part of many microfluidic devices, for example in the field of biotechnologies, etc. This is dictated by the advantageous fluidic behavior at small volumes and consequently ever increasing need for high throughput assays in pharmaceutical and chemical industry, as well as other combinatorial based studies, such as in biotechnologies.

The integration of conventional techniques onto microchip platforms would render a substantially faster, less laborintensive and inexpensive approach, both in terms of operating cost and of low volume reagent usage. This allows their introduction into the clinical setting as powerful analytical tools, for example in disease detection and monitoring. Many molecular-biology techniques require the use of precision substance supply systems to catalyze successful and accurate biochemical reactions /1/.

Similar approaches as found in the pharmaceutical and chemical field /2, 3/ can be introduced also in application of microreactor hydrogen production for microfuel cells, where reliable and efficient fuel dosing is crucial for proper operation. Methanol steam reforming refers to the chemical reaction between methanol and water vapor for the production of hydrogen gas. This process is typically carried out in the presence of metal oxide catalysts at temperatures ranging from 220 °C to 300 °C /4/.

The main objective of present work was to assemble lowcost fuel supply system for hydrogen production microreactor with good efficiency. Providers of such microsystems are still scarce due to very specific requirements, depending on customer application.

In the field of measurements, there are quantities which are rather easy to measure, but also quantities where measurements on micro level represent difficulty, e.g. flow rate. Measurement and control of highly dynamic fluid flow at extremely low magnitudes are still a challenging task. To avoid expensive, usually bulky professional low-level fluid flowmeters, in our case pressure sensor was used for flowgenerator realization.

Presented paper presents three different approaches to fulfill flow-generator design requirements, dictated by fuelcell methanol hydrogen microreactor. A detailed insight into the design, optimization, fabrication, analysis and performance of the proposed approaches is presented as well as its applicability potentials.

# 2. Flow-generator design requirements

Flow-generator refers to apparatus which supplies defined amount of fluid into the system, independent from its dynamic hydraulic resistance and other conditions.

Classic flow-generator comprises pump controlled by appropriate controller and fluid flowmeter. Measured flow rate passing through the supplied system is fed back and compared with desired setpoint. To correct a difference between set point and measured flow rate, closed-loop controller is employed. Ideally, supplied flow rate must be independent of system hydraulic resistance. In fluid flow measuring, lots of various approaches can be applied /5/.

Methanol hydrogen microreactor is composed of several basic modules. The first module is evaporator. It consists of microchannels etched on the surface of silicon substrate. On the other side of silicon substrate a platinum heater /7/ is located. Microreactor evaporator is therefore thermally coupled with heater, thus enabling evaporation of input energent methanol and water mixture. The statistical nature of evaporating process causes stochastic changes in frequency and magnitude of micro droplet explosions. Therefore, the evaporator can be modeled as a highly dynamic hydraulic resistor. The hydraulic resistance is primarily dependent on the heater temperature, microchannel geometry, fuel mixture concentration ratio and foremost, on input fuel rate supply.

Therefore, our primary flow-generator design requirement is to achieve a constant output flow, independent from stochastic pressure impulses caused by evaporator micro droplet explosions.

Another important design requirement in flow meter design refers to output flow generation range. In order to provide efficient fuel evaporation, in our case a constant fuel supply in the interval from 0.1 ml/h up to 1 ml/min is required. This low level flow range is rather unusual but nowadays often met in microchannel applications.

Professional low level flow meters can be found on the market but were found inappropriate for flow-generator in microreactor applications due to their price and size. The majority of low level flow meters for microreactor application are based on thermal mass method, which requires expensive, electric power consuming custom designed electronic circuits. Therefore, the idea of microdosing system based on pressure sensor was adopted. Furthermore, data from pressure sensor can provide additional information about evaporation process quality. In addition, further investigation could be extended on defining and control-ling the fuel amount supply rate, based on pressure sensor data processing.

### 3. Approaches

# 3.1. Method I – Model Predictive Control (MPC)

To provide constant fuel flow through microreactor, an unconventional approach was employed. The basic idea was equipping controlled system (micropump in line with evaporator) with internal mathematical model in order to control fluid flow rate by considering only measured stagnation pressure p before evaporator and applied micropump actuating voltage U. To achieve this goal, feedback loop of internal variable was established and fed back to the reference flow rate value through controller, finally resulting in proper micropump voltage supply *U* (Fig. 1).



#### Fig. 1. Micro-flow generator with MPC approach.

Model Predictive Control (MPC) approach represents an advanced method of process control that has been in use in the process industries since the 1980s /8/. MPC controllers rely on dynamic models of the processes, most often linear empirical models obtained by system identification. The models are used to predict the behavior of dependent variables (e.g. outputs) of the modeled dynamical system with respect to changes in the process independent variables (e.g. inputs). The MPC controller uses the models and current measurements to calculate future moves in the independent variables. This will result in operation that honors all independent and dependent variable constraints.

For the purpose of MPC method, a simple micropump mathematical model (Fig. 2, block C) was introduced.

Simulated flow  $\Phi$  was obtained from actual micropump driving voltage U and stagnation pressure p measured before evaporator. In approximation, flow rate can be expressed as linear function of both two variables U and p. (Fig 2, block C). Coefficients  $k_1$  and k in mathematical model were determined from previously measured micropump characteristics.

In the next step, modeled flow value is sent through negative feedback where it is subtracted from the reference value (set point) to create the error signal which is then amplified by the PID controller. Block B (Fig. 2) can be used to transform  $\Phi_{err}$  signal into  $U_{err}$  signal, but due to its linear dependency it can also be achieved with modified PID parameters.

The controller then takes the error signal (difference) between the reference flow value and modeled flow value to change the input micropump driving voltage *U* for both, actual system (Fig. 2, block A) and mathematical model. If the mathematical model (Fig. 2, block C) is in good agreement with actual system, system flow rate  $\Phi$  for measured stagnation pressure *p* at pump driving voltage *U* is similar

(A)

input

to regulated modeled flow rate value. Closed-loop model predictive controller (MPC) was successfully constructed.



- $\phi_{ref}$  .... desired flow (set point)
- $\phi_{err}$  .... difference between calculated flow and desired set point
- φ .... calculated flow through stagnation pressure p and micropump driving voltageU
- Uerr ... micropump driving voltage error
- U ... micropump driving voltage

Fig. 2. MPC controler schematics.

In internal mathematical model (Fig. 2, block C), simulated flow rate  $\Phi$  needs to be obtained from actual micropump driving voltage *U* and stagnation pressure *p* measured before evaporator.

As claimed by manufacturer and confirmed by measurements, almost linear characteristics can be expected from micropumps applied (Fig. 3).





Therefore, as seen on Fig. 3, flow rate can be expressed as a linear function of both two variables U and p.

$$\Phi(U, p) = k U - kp \tag{1}$$

Presentation of mathematical model (1) for characteristics on Fig. 3 (water pumping @ 100Hz) was thus obtained by using *Mathematica* /9/ computer software. Results are shown on Fig. 4.

Here *k* is regression line slope coefficient, drawn through measured points in micropump  $\Phi(p)$  characteristics. *k* is assumed equal for any micropump operating voltage *U*.

 $k_1$  is the slope coefficient of regression line, drawn through measured points in micropump  $\Phi(U)$  characteristics.  $k_1$  is assumed equal for any micropump output stagnation pressure p.



output

Fig. 4. Micropump mathematical model.

The same linear model is expected to be valid for any optional fluid pumped through the system. In this case, coefficients *k* and  $k_1$  needs to be determined by flow rate  $\Phi$ and stagnation pressure *p* measuring at optional micropump supplied voltage *U*. For parameter determination, a simple procedure was implemented: Micropump operating voltage *U* was set to maximum, and micropump output flow rate  $\Phi_{max}$  was measured by using horizontal aligned pipette and time measurement:

$$k_{1} = \frac{\Phi_{\text{max}}(U_{\text{max}})}{U_{\text{max}}}$$
(2)

Pressure sensor was then attached to micropump output in order to measure maximum stagnation pressure  $p_{max}$ :

$$k = \frac{\Phi_{\max}(U_{\max})}{p_{\max}}$$
(3)

For the above mentioned method, sufficient agreement between model and real system as well as micropump stability was expected crucial for stable and accurate flow supply. It is difficult to estimate micropump stability and model linearity without adequate experiments as well as anomaly influence to overall system stability.

## 3.2. Method II – Electric current source analogy

The principle of second method originated from electric current source analogy. In electric circuit on Fig. 5, independent current source can be assembled with voltage source, followed by resistor *R* which resistance is always considerably greater then load resistance  $R_{load}$ . In this case, the current flowing trough load resistor  $R_{load}$  is independent.

ent of its resistance. Therefore, the quality of such current source depends on R to  $R_{load}$  ratio.



Fig. 5. Electric current source analogy.

In microfluidics, voltage source is in analogy to constant pressure fluid source while electric resistor R is in analogy with orifice nozzle. Fluid flow rate trough hydraulic load resistor  $R_{load}$  equals:

$$\Phi = \frac{P}{R_{orifice} + R_{load}}$$
(4)

In case when hydraulic orifice nozzle resistance  $R_{orifice}$  is some orders of magnitude greater then the hydraulic load resistance  $R_{load}$ , system flow  $\Phi$  can be given as follows:

$$R_{orifice} >> R_{load} \Rightarrow \Phi \approx \frac{p}{R_{orifice}}$$
 (5)

Therefore, fluid flow source at constant pressure can simply be realized with micropump as actuator, followed by pressure sensor measuring micropump output pressure. Pressure data are send trough negative feed back closed loop to set point comparator and its output is further led to PID controller, controlling micropump voltage supply. Reference pressure  $p_{ref}$  is determined as:

$$p_{ref} = R_{orifice} \cdot \Phi_{ref} \tag{6}$$

Closed loop control is established (Fig. 6).

### 3.3. Method III – Venturi meter / orifice plate

Venturi meter determines the flow by measuring the differential pressure before and within a local channel constriction. This method is widely used to measure flow rate in the transmission of gas through pipelines. Constriction can be implemented as an orifice. Orifice plate is a plate with a through hole, placed in the flow. Both Venturi and orifice plate flow measuring were performed. The data were then fed back and compared with desired set point. To correct discrepancy between set point and measured flow rate, again classic closed-loop PID controller is employed as shown on Fig. 7.

## 4. Experimental setup

Micropumps available on the market, silicone tubes, metering valve Hoke type 1345G2B and PC with developed PID implementation software were used to assemble micro flow-generator prototypes.

Piezoelectric type micropump was chosen as flow-generator actuator device due to small size and weight, with good particle tolerance and temperature resistance. It combines two piezoactuators inside a single housing and has an in-



pref ... desired pressure (set point)

perr ....difference between desired pressure and desired set point

p .... measured micropump output pressure

U .... micropump driving voltage

Fig. 6. Micro flow generator based on electric curent source analogy.



Fig. 7. Micro flow generator schematics. It is based on differential pressure measuring on venturi pipe or orifice plate.

creased priming capability. It provides high bubble tolerance, so that even gas-liquid-mixtures can be pumped without problems. Micropump controller with variable voltage amplitude and frequency was used for micropump driving.

During measurements, metering valve Hoke, type 1345G2B /10/ was used as evaporator simulator. Standard differential pressure sensor EST2233 /11/ was initially utilized, later replaced by temperature compensated EHT23000 /12/ type, additionally equipped with inlet and outlet acryl tubes, and in our lab assembled I<sup>2</sup>C/USB converter (based on 24LFXI /14/ integrated circuit) as shown on Fig. 8.



### Fig. 8. Temperature compensated differential pressure sensor EHT2300 with EST2233 smart logic and I2C/USB converter.

Acrylic glass prototypes of Venturi pipe and orifices were fabricated, utilizing acrylic glass tubes in combination with thermal treatment technique, mechanical drilling and acrylic adhesive Rohm Acrifix 192. Further study will be focused on device miniaturization by implementing components on silicon substrate, employing wet etching technique and Pyrex glass covering. Passive hydraulic prototypes are shown on Fig. 9.



### Fig. 9. Passive hydraulic prototypes.

Software implementations have the advantages that they are relatively cheap and flexible with respect to the implementation of the PID algorithm. For prototype version, *Visual Basic* /15/ program language on PC was initially used for discrete PID controller implementation to determine all required parameters and system test procedures. Once fully developed, the algorithm can be easily modified and compiled for any optional microcontroller board. The software development was obeyed to the following requirements: Real time pressure sensor data reading, micropump control ability, object oriented programming, simple control algorithm modifying capability, user friendly interface with option for essential parameters and reference editing, graphic output, FFT sensor signal processing and data saving.

Screenshot of developed user interface is shown in Fig. 10.



Fig. 10. User interface screenshot.

A Proportional-Integral-Derivative controller (PID) controller (Fig. 11) seemed to be suitable choice for fast and reliable control action and minimum system oscillation in our flow control application. However, it must be emphasized that the use of PID algorithm for control does not guarantee neither optimal control of the system nor system stability. The latter can be accomplished only through proper system architecture.

In short, PID represents a generic control loop feedback mechanism (controller) widely used in industrial control systems.



Fig. 11. PID controler schematics.

The PID parameters used in the calculation must be tuned according to the nature of the system. Several methods for tuning the PID loop exist /16/. In our case, however, precise manual tuning was found superior over conventional Zigler-Nicols method and was therefore used in described application.

Typical measurement setup for MPC approach is shown in Fig. 12. In characterization procedure, measuring valve 1345G2B in combination with measuring pipette was used.

Hydraulic orifice nozzle resistance  $R_{orifice}$  in electric current analogy approach is realized with acrylic glass orifice

as shown on Fig. 13. Two realizations of orifice plate and venture pipe are shown on Fig. 14, Fig. 15 and Fig. 16, respectively. After essential measurements, each flow-generator operation was finally confirmed by micro-evaporator as shown on Fig. 16.



Fig. 12. Typical measurement setup with measuring pipette for flow determination.



Fig. 13. Acrylic glass orifice. Intended for measurement setup for electric current analogy approach.



Fig. 14. Typical measurement setup for differential pressure measurement on orifice plate.



Fig. 15. Another prothotype of acrylic glass orifice.



Fig. 16. Venturi pipe.

To obtain dependency of flow rate  $\Phi$  vs. hydraulic load resistance  $R_{load}$ , measuring valve 1345G2B type was connected on micro flow-generator output. Later it was replaced by micro-evaporator to estimate its functioning in the real dynamic system. Flow-generator can only provide constant flow for a limited regulating range, which is defined primarily by a micropump head pressure.

Low level flow rate measuring was performed by placing horizontally 1 ml measuring pipette with scale resolution of 0.01ml and a stopwatch (volumetric method). This measurement technique was found appropriate for measuring flow rates as low as 0.1 ml/h.

# 5. Results and discussion

# 5.1. Method I – Model Predictive Control (MPC)

For the first proposed MPC method, verification of mathematical model and real system is crucial for stable and accurate flow supply. Therefore numerous micropump endurance tests were performed. Conducted tests enabled estimation of pump stability and repeatability (Fig. 17). All measured results apply to the DI water medium.



Fig. 17. Micropump two days stability test.

First order flow rate linear approximation in model was also taken into consideration by comparing measured and modeled data (Fig. 18).



Fig. 18. Micropump linear inconsistency. Full line: Measured data. Dashed line: Linear regression.

Nonlinearity can be compensated by implementing appropriate compensation table. Unfortunately, micropump stability over time can not be compensated. Sufficient micropump stability is prerequisite in order to achieve stable MPC flow-generator system. In our case, endurance tests (Fig. 17) revealed micropump stability issue. Over longer periods (e.g. one week) obvious micropump performance degradation was detected.

Due to micropump longterm instability, the whole system was found unstable and therefore only functional for a short period, but still sufficient for evaluation. All measurements were taken in one hour, then the drift was detected and after 5 hours the system collapsed, which resulted as zero output flow rate. System drift is not linear function of time. At the beginning it is almost undetectable, but then it enhances until the regulation control collapses. System flow

rate  $\Phi$  vs. load hydraulic resistance  $R_{load}$  characteristic is presented on Fig. 19.

Measured stagnation pressure p vs. load hydraulic resistance  $R_{load}$  characteristic is presented on Fig. 20 and micropump signal amplitude response U on load hydraulic resistance  $R_{load}$  is shown on Fig. 21, respectively. In all characteristics (Fig. 19 – Fig. 25), the change of hydraulic load resistance  $R_{load}$  is here given by number of valve handle turns. Number of turns is proportional to decreased measuring valve hydraulic resistance  $R_{load}$  which can be calculated by using manufacturer valve characterization curve (number of valve handle turns vs. Cv factor /18/) and a correction factor /18/ for different fluid types.

Short term operation was found functional. The main advantage of this system is foremost estimated energy efficiency, because this approach does not introduce any passive hydraulic elements. Flow rates down to 0.1 ml/ min were achieved with this approach.



Fig. 19. MPC approach: System flow rate  $\Phi$  vs. load hydraulic resistance  $R_{load}$  characteristic.



Fig. 20. MPC approach: Measured stagnation pressure p vs. load hydraulic resistance R<sub>load</sub> characteristic.



Fig. 21. MPC approach: Micropump signal amplitude response U on load hydraulic resistance R<sub>load</sub>.

## 5.2. Method II – Electric current source analogy

The major drawback of the second method with electric current source analogy was estimated energetic insufficiency due to additional orifice in flow generator design (Fig. 5). To achieve proper operation, its hydraulic resistance  $R_{orifice}$  must be few orders of magnitude greater then load hydraulic resistance  $R_{load}$ . But unfortunately, additional orifice causes major pressure drop which needs to be compensated by greater micropump effort. To achieve best system flow rate  $\Phi$  vs. load hydraulic resistance  $R_{load}$  characteristic, orifice resistance  $R_{orifice}$  is tuned according to load resistance  $R_{load}$ . Tuning is performed at desired system reference flow rate  $\Phi$  at near maximal micropump supply voltage U. During operation, this results as fully driven micropump independently on system load hydraulic resistance  $R_{load}$ .



Fig. 22. Electric current source analogy approach: System flow rate Φ and micropump signal amplitude response U vs. load hydraulic resistance R<sub>load</sub> characteristic.

On the other hand, system operated fast, reliably and was practically insensitive of back pressure strokes from mi-

croreactor. The lowest flow rates were achieved with this method. System flow rate  $\Phi$  and micropump signal amplitude response *U* vs. load hydraulic resistance  $R_{load}$  characteristic is presented on Fig. 22.

### 5.3. Method III – Venturi meter / orifice plate

The third method was implemented with orifice plate. In flow rate measuring, orifice plate was found superior to Venturi pipe, due insufficient measured Venturi effect at extremely low flow rates. Its energy efficiency was estimated conditional on orifice plate hydraulic resistance. It was prone to microreactor dynamic back pressure strokes sensitivity which could be to a certain point corrected by software pressure sensor signal filtering. System flow rate  $\Phi$  and micropump signal amplitude response U vs. load hydraulic resistance  $R_{load}$  characteristic is presented on Fig. 23.



Fig. 23. Orifice plate approach: System flow rate  $\Phi$  and micropump signal amplitude response U vs. load hydraulic resistance  $R_{load}$  characteristic.

### 6. Conclusions

In the presented paper, design guidelines of three approaches for micro-flow generator in hydrogen production microreactor were discussed in detail, followed by the description of experimental setup and characterization. To find the superior solution, basic properties of each method were estimated.

The main advantage of proposed MPC flow-generator method is its estimated energy efficiency. However, the main problem of this method lies in its dependency on longterm micopump stability. Based on our experimental work, used micropumps stability was found inadequate, therefore the whole system became unstable which on long term resulted in flow rate drift. Consequently, flow rate could even reach its zero or maximum value, and this cannot be predicted. To summarize, the assembled MPC system can for now only be efficient on shorter time periods, but as soon as it moves with time sufficiently from its starting stable point of operation, it can even collapse. Despite the fact that micropumps under test were performing rather unstable at long-term operation, the proposed MPC concept and realized flow-generator was successfully operated and tested, even for longer times. For very long operation times, a control flow calibration should be introduced. Due to the promising simplicity of this approach which employs no passive hydraulic elements and therefore estimated system energy efficiency, further investigations with better micropump actuators are going on.

Second, current source analogy approach, was found highly stable, fast and reliable, capable of generating extreme low level flow rates, but estimated as energy inefficient. Therefore, this approach should certainly be considered, when energy efficiency is not in question.

Finally, taking everything into account, differential pressure measuring on either side of orifice plate, supported with PID control regulation and micro pump actuator was found as the most appropriate, what was further confirmed also with successfully running microreactor applications.

Proposed flow-generator approaches are expected to enable a compact integrated hybrid implementation, comprised on a silicon substrate, anodic bonded with pyrex glass, including micropump, pressure sensor and microcontroller. Such integrated flow-generator is expected to be small enough for applications in advanced microreactors.

## 7. References

- /1/ J. S. Bertino, H. E. Greenberg, and M. D. Reed. American College of Clinical Pharmacology Position Statement on the Use of Microdosing in the Drug Development Process J Clin Pharmacol April 1, 2007 47:418-422
- /2/ Lappin G., Garner R. C.(2008) The utility of microdosing over the past 5 years. Expert Opin. Drug Metab. Toxicol 4(12):1499– 1506
- /3/ Galluppi GR, Rogge MC, Roskos LK, Lesko LJ, Green MD, Feigal DW Jr, Peck CC (2001) Integration of pharmacokinetic and pharmacodynamic studies in the discovery, development, and review of protein therapeutic agents: a conference report. Clin Pharmacol Ther 69:387–399
- /4/ B. Pečar et. al, An Integrated Thin Film Pt/Ti Heater, Faculty of Electrical Engineering, University of Ljubljana, 2009, MIPRO 2009.
- /5/ http://en.wikipedia.org/wiki/Flow\_measurement
- /6/ FAJFAR, Iztok, TUMA, Tadej, BÜRMEN, Arpad, PUHAN, Janez. A top down approach to teaching embedded systems programming. Pristop k učenju programiranja vgrajenih sistemov z vrha navzdol. Inf. MIDEM, mar. 2009, letn. 39, št. 1, str. 53-60, ilustr. /COBISS.SI-ID 7248212/
- /7/ D. Resnik et. al. Characterization of integrated thin film Pt heater and temperature sensors on Si platform. V: 35th Annual Conference of the IEEE Industrial Electronics Society, 3-5 November 2009, Porto, Portugal.
- /8/ J. B. Froisy. Model predictive control: Past, present and future. ISA Transactions, 33:235–243, 1994.
- /9/ http://www.wolfram.com/
- /10/ http://www.tamo.co.uk/PDF%20Files/Hoke/79013\_0706.pdf
- /11/ http://www.hyb.si/datasheet/EST2233.pdf

- /12/ http://www.hyb.si/datasheet/EHT23000.pdf
- /13/ BABIČ, Rudolf, JARC, Bojan. Uporaba modificirane oblike porazdeljene aritmetike za osnovno in kaskadno izvedbo digitalnih sit = The Modified Distributed Arithmetic Structure for the Basic and the Cascade Digital Filters Realization. Inf. MIDEM, 1999, let. 29, št. 3, str. 136-142, graf. prikazi, sheme. /COBISS.SI-ID 1726548/
- /14/ http://datasheet.octopart.com/CY8C24794-24LFXI-Cypress-Semiconductor-datasheet-133403.pdf
- /15/ http://msdn.microsoft.com/en-us/vbasic/default.aspx
- /16/ Finn Peacock, PID Tuning Blueprint Course, ESBN: F06-295J-9P78-54B9, Australian magelant, 1990
- /17/ UMBERGER, Mark, HUMAR, Iztok, KOS, Andrej, GUNA, Jože, ŽEMVA, Andrej, BEŠTER, Janez. The integration of home-automation and IPTV system and services. Comput. stand. interfaces. /Print ed./, Jun. 2009, vol. 31, no. 4, str. 675-684, ilustr. / COBISS.SI-ID 7093332/

/18/ http://www.engineeringtoolbox.com/flow-coefficient-factord\_238.html

B. Pečar, M. Možek, D. Resnik, D. Vrtačnik, U. Aljančič, S. Penič and S. Amon

Laboratory of Microsensor Structures and Electronics (LMSE), Faculty of Electrical Engineering, University of Ljubljana, Tržaška 25, SI-1000 Ljubljana, Slovenia e – mail: borut.pecar@fe.uni-lj.si

Prispelo (Arrived): 01.04.2010 Sprejeto (Accepted): 09.09.2010

# VOLTAGE SAG INDEPENDENT OPERATION OF INDUCTION MOTOR BASED ON Z-SOURCE INVERTER

Uroš Flisar, Danjel Vončina, Peter Zajec

# University of Ljubljana, Faculty of Electrical Engineering, Ljubljana, Slovenia

Key words: Z-Source inverter, space vector modulation, field oriented control, field weakening

Abstract: This paper describes an adjustable speed drive system for driving an induction motor beyond its nominal speed, even in the presence of input voltage sags. The system is based on the Z-Source inverter, which offers several advantages over traditional current or voltage source inverters as it can operate in both buck or boost mode. The boost operation is achieved with controlled short-circuiting of the inverter phase legs that is otherwise forbidden in traditional inverters. This shoot through states are accomplished with the modification of the space vector modulation which is thoroughly explained. In order to assure the required output voltage and ride-through ability during voltage sags, the method for selecting the proper inverter voltage is introduced. The control of the induction motor is carried out with the field oriented control coupled with the field weakening regime of the induction motor. The experimental setup is based on a prototype with a DSP control system to verify the operation of the proposed system.

# Nemoteno obratovanje asinhronskega motorja na osnovi pretvornika z impedančnim prilagodilnim vezjem

Kjučne besede: pretvornik z impedančno prilagodilnim vezjem, modulacija s prostorskim vektorjem, vektorska regulacija polja, slabljenje polja

Izvleček: Članek opisuje izvedbo nemotenega obratovanja asinhronskega motorja, ki temelji na trifaznem razsmerniku z impedančnim prilagodilnim vezjem. Razsmernik s prilagodilnim vezjem omogoča vrsto prednosti v primerjavi s klasičnimi napetostno ali tokovno napajanimi razsmerniki, saj poleg pretvorbe napetosti navzdol, hkrati omogoča tudi pretvorbo napetosti navzgor. Dvig vhodne napetosti dosežemo z reguliranim proženjem kratkih stikov v vejah razsmernika, kar je sicer prepovedano v omenjenih klasičnih razsmernikih. Tak način proženja kratkih stikov smo dosegli s prilagoditvijo pulzno-širinske modulacije napetosti na podlagi prostorskih vektorjev. Prav tako je prikazan izračun napetosti razsmerniškega mostiča, s katero zagotovimo zahtevano izhodno napetost za nemoteno obratovanje pri upadih vhodne napetosti. Regulacija vrtenja asinhronskega motorja je izvedena s pomočjo vektorske regulacije na osnovi polja, ki omogoča preprosto nadgradnjo za vključitev slabljenja magnetnega polja v rotorju, s katerim ga zavrtimo preko nazivne hitrosti. Eksperimentalni sistem, ki ga sestavljajo napetostni vir, impedančno prilagodilno vezje, trifazni razsmernik s priključenim asinhronskim motorjem in DSP nadzornim vezjem, je uporabljen za verifikacijo predlaganih rešitev.

# 1 Introduction

The traditional power converters used for the control of motor drives are voltage source inverter (VSI) and current source inverter (CSI). However, both have limitation in their operation. Because the ac output voltage of the VSI is limited below the dc bus, the VSI usually requires an additional boost converter. Similarly the buck converter is often added to the CSI. This additional converter stage increases cost and complexity and lowers the overall efficiency of the power conversion system.

The Z-Source inverter (ZSI) overcomes the restrictions of the previously mentioned topologies /1/. Its structure is comprised of two capacitors and inductors connected in a unique impedance network that is usually coupled with a voltage source and Inverter Bridge. The modus operandi of the ZSI includes a controlled short circuiting (also called shoot through) of the inverter phase legs. This enables the boost operation, whereas without it, the Z-Source inverter acts as a traditional voltage source inverter. The Z-Source inverter was originally intended for power systems with fuel cells /2-4/, due to their distinctive operating curve. As their output voltage dramatically decreases with the increased current demand, the need for the boost operation becomes essential. A similar conclusion can be drawn in conjunction with solar cells, since their output voltage changes according to the change in temperature and sun radiation. Another demanding power conversion process takes place in wind turbines, where the wind energy is transformed into the electrical energy. Because the output of the wind turbine is directly proportional to the change in wind speed, the uninterrupted power delivery to the electrical grid is essential.

Recently, the use of the ZSI is advancing into the motor drive applications /5/, because it offers several advantages over traditional solutions. The nature of ZSI's operation makes it less sensitive to EMI which could short-circuit inverter phase legs that would normally destroy the switching devices. Because this kind of shoot through is allowed, the insertion of dead time is not necessary anymore. Consequently, this reduces the current and voltage harmonics. Controlling the short-circuiting of inverter phase legs can theoretically step up the input voltage to any value up to the infinity. The practical values are however limited with the device voltage ratings. Nevertheless, with controlled boost of the input voltage an important benefit of the ZSI is gained - the ability to provide ride-through during voltage sags.

The ac output of the VSI is usually controlled with sinusoidal pulse-width modulation (SPWM) or with a computational more intensive space vector modulation (SVM). Regarding the utilization of the inverter voltage, the SVM is preferable as it is capable to utilize the inverter voltage of about 15 percent more than the SPWM. Both strategies have to be modified in order to include the shoot through states, needed for the ZSI's boost operation /6-8/. Another factor that favors the use of the SVM is the compatibility with Field Oriented Control (FOC) of the induction motor (IM). This type of control is frequently used with the operation in the extended speed range, where the field weakening enables the motor operation with the constant power above base speed.

## 2 Configuration and control of the zsi for motor drives

The voltage source PWM inverter is the common choice for powering the induction motor drives. The preferred PWM method for voltage source inverter is SVM, where the concept of space vectors (representing the states of the inverter switches) is used to control the ac output of the inverter bridge. Figure 1 depicts how the output voltage vector *V* can be expressed as a linear combination of adjacent space vectors  $V_1$  and  $V_2$ 

$$V = V_1 + V_2 = V_{100} \frac{T_1}{T_s} + V_{110} \frac{T_2}{T_s} + (V_{000} + V_{111}) \frac{T_0}{T_s}$$
(1)

 $T_0$  is time duration of zero state  $V_{000}$  and  $V_{111}$ ,  $T_1$  and  $T_2$  are time durations for any of the neighboring active states and  $T_S$  is the switching period. In the linear or undermodulation region the output voltage vector V always remains within the inscribed circle in the hexagon formed by the six space vectors. The pulse pattern of SVM for three phase inverter is illustrated in Fig. 2. The state sequence begins with the zero state where the  $V_{000}$  is impressed (all upper transistors are open). This is followed by the two active states and ends with another zero state  $V_{111}$ , where all up-



Fig. 1: Space vectors of three phase bridge inverter showing voltage trajectory and voltage vector limit

per transistors are closed. After half of the switching period  $T_s$ , this sequence repeats in reverse order. The SVM with the symmetrical pulse pattern has the two zero states distributed equally on both ends of the active states as illustrated in Figure 2.



Fig. 2: Symmetrical pulse pattern of SVM for three phase Inverter Bridge

The maximum line-to-line rms voltage ( $U_{ab}$ ) that can be utilized with the SVM from the inverter voltage  $u_i$  of the VSI is

$$U_{ab} = \frac{u_i}{\sqrt{2}}.$$
 (2)

When using an induction motor with the nominal line-toline rms voltage of 177 V as an example, it requires a minimum inverter voltage of 250 V. This value needs to be further increased if the compensation of voltage sags is required. Although the VSI usually includes input capacitor for such occurrences, it is a low energy storage element and is inefficient in case of severe voltage sags. Furthermore, increasing the input voltage is not always a viable solution. Often an additional converter stage is inevitable, which can be an additional boost converter or an addition of the Z Source impedance network.

### 2.1 Z-Source inverter operation

If the impedance network is added to existing VSI, the later is transformed into a Z-Source inverter shown in Figure 3. It can be separated into four major parts: the voltage source  $U_{DC}$  and diode D<sub>1</sub> to block the input voltage during shoot through; the symmetrical Z-Source impedance network, with capacitors C<sub>1</sub> = C<sub>2</sub> and inductors L<sub>1</sub> = L<sub>2</sub>; a three phase inverter bridge and the induction motor as the load.

The basic operating principle and control of the ZSI have been detailed in /1/. The summary of the ZSI main operating modes are:



Fig. 3: Z-Source Inverter with induction motor

**Mode 1):** The inverter bridge is operating in one of the six active states and diode  $D_1$  is conducting. From the load point of view, the inverter bridge behaves as a current source as depicted in Figure 4. The voltage and current relationships are

$$u_L = U_{DC} - u_C; u_i = 2u_C - U_{DC}$$
(3)

$$i_{DC} = i_L + i_C; i_i = i_L - i_C \rightarrow i_{DC} = 2i_L - i_i.$$
 (4)



Fig. 4: ZSI operating in active mode

**Mode 2):** The input diode is still conducting and the inverter bridge is operating in the zero state. The upper or the lower transistors are closed and the inverter bridge acts as an open circuit viewed from the Z-Source network. The voltage relationships are the same as in (3) while the currents are

$$i_{L1} = i_{L2} = i_{C1} = i_{C2} = \frac{i_{DC}}{2} \rightarrow i_{DC} = 2i_L$$
 (5)

**Mode 3):** Figure 5 illustrates the shoot through mode, where the input diode is reverse biased because the sum of the capacitor voltage is higher than the input voltage. The shoot through can be done in one phase leg, two or in all three phase legs. The voltage and current relationships in the shoot through mode are

$$u_L = u_C, u_i = 0 \tag{6}$$

$$i_{DC} = 0; i_{L1} = -i_{C1}; i_{L2} = -i_{C2}.$$
 (7)

The duration of the shoot through  $(T_{sh})$  depends on the required boost ratio (*B*) and can be calculated with

$$T_{sh} = \frac{B-1}{2 \cdot B} \cdot T_S, \tag{8}$$

where B is



Fig. 5: ZSI operating in shoot through state

$$B = \frac{u_i}{U_{DC}} = \frac{1}{1 - 2 \cdot \frac{T_{sh}}{T_s}}$$
(9)

The maximum duration of the shoot through is limited to half of the switching period, where the resulting inverter voltage is theoretically boosted to infinity. However in practical application the maximal boost or the maximal inverter voltage is limited with the device voltage ratings.

To summarize, the ZSI behaves in a similar way as a traditional VSI, however it also introduces a new operating state, called shoot through state that boosts the inverter voltage. This enables the ZSI to produce any output voltage, provided the inverter voltage stays within device voltage ratings.

### 2.2 Modified SVM

In order to include the shoot through states, we must modify the SVM (MSVM). We do this by positioning the shoot



Fig. 6: Modified SVM switching pattern showing the uniform distribution of T<sub>sh</sub>

through states at the transients of the switching states so that upper and lower transistors on-time is overlapping as illustrated in Figure 6. The shoot through states are distributed equally among all three phase legs and they utilize the zero states symmetrically. This kind of placement allows them to leave the existing active states ( $T_1$  and  $T_2$ ) uncompromised and more importantly, the number of switchings remains unchanged, however, at the cost of reduced duration of zero state ( $T'_0$ ).

It can be evident from (9) and Fig. 6 that the boost ratio depends on the available duration of the zero state. If the required  $T_{sh}$  is greater than the available  $T_0$ , the required voltage boost can not be achieved. Because the voltage boost has a higher priority, the duration of the active states has to be reduced. Figure 7 illustrates the clamped maximum voltage vector ( $V'_{max}$ ), which is reduced according to the required  $T_{sh}$  in order to assure the required voltage boost of inverter voltage

$$|V'_{\max}| = \frac{\sqrt{3}}{2} \cdot (1 - \frac{T_{sh}}{T_s}).$$
 (10)



# Fig. 7: Modified trajectory of the clamped voltage vector V<sub>max</sub>

The total voltage gain  $G_U$  is the combination of output voltage vector and the boost ratio. The relationship between  $G_U$  and  $T_{sh}$  is illustrated in Fig. 8 and can be calculated with

$$G_U = B \cdot |V'_{\max}| \tag{11}$$

The choice of the optimal  $G_U$  depends on the requirements for the intended application. However in general, we should maximize *V* and minimize *B* to reduce the voltage stress of the switching devices. The line-to-line rms voltage of the ZSI with MSVM is

$$U_{ab} = \sqrt{\frac{2}{3}} \cdot G_U \cdot U_{DC}.$$
 (12)

Operating an induction motor with a ZSI requires careful control of the voltage gain to guarantee the proper opera-

tion during voltage sags. Usually the inverter voltage is kept constant and the boost ratio is adapting according to the change in input voltage. The inverter voltage that will enable uninterrupted motor operation during voltage sag can be calculated by solving the (11) and (12) for  $u_i$ 

$$u_i = 2\sqrt{2} \cdot U_{ab} - U_{DC \min}$$
 (13)

The resulting  $u_i$  is valid for SVM based Z-Source inverter. If a different PWM technique is used, such as the SPWM, the (12) should be modified accordingly. It should also be noted, that the (13) considers the voltage vector is maximal and boost ratio is minimal. If a different voltage gain strategy is preferred, the (10) should be taken into consideration when choosing the desired inverter voltage.



Fig. 8: Maximum voltage gain of ZSI with MSVM

### 3 Experimental results

The experimental verification has been carried out on the proposed system with the following parameters:

- Input line voltage: 100 300 V DC
- Z-Source network:  $L_1 = L_2 = 165 \ \mu\text{H}$ ;  $C_1 = C_2 = 1000 \ \mu\text{F}$
- Switching frequency:  $f_S = 10 \text{ kHz}$
- Load: Induction motor ( $U_n = 177 \text{ V}$ ,  $I_n = 14,8 \text{ A}$ ,  $n_n = 1456 \text{ rpm}$ ,  $M_n = 20 \text{ Nm}$ ,  $f_n = 50 \text{ Hz}$ ,  $cos(\phi) = 0,785$ )
- Position sensor: 1024-lines incremental encoder
- Control system: DSP TMS320F2808

The IM was controlled with the field oriented control which is often used for adjustable speed drive applications, where the field weakening is applied to extend the speed range. The principle of field weakening was achieved by decreasing the flux producing current  $i_d$  according to the "1/ $\omega_r$ " method, while increasing the torque producing current  $i_q$ .

A typical functional diagram of the indirect FOC for the induction motor can be seen on Figure 9. The implementation of the existing FOC for VSI based induction motor requires minor modifications. These are mostly related to the chosen PWM method and to the control of the voltage boost. The measuring of the inverter voltage  $u_i$  is somewhat cumbersome because the  $u_i$  is zero at the time of the shoot through. A more elegant solution is measuring the capacitor voltage  $u_c$  and calculating the inverter voltage  $u_i$ from

$$u_{i} = \frac{u_{C}}{1 - \frac{T_{sh}}{T_{c}}}.$$
 (14)

From here forth, the control of the  $u_i$  is performed with a PI regulator, which outputs the required *B*. The  $T_{sh}$  is calculated with (8).



# Fig. 9: Functional diagram for FOC with the MSVM and control of the inverter voltage

The operation of the induction motor was first verified with the VSI without the Z-Source network. For the VSI, the minimal inverter voltage according to (2) is 250 V. Including a safe margin of 10 %, the  $u_i$  was set to 280 V. Figure 10 shows the ramp up of the rotor speed from nominal value up to 2400 rpm, together with the waveforms of  $i_d$ current. The motor was loaded with the 50 % of the nominal torque, because when the motor is operating at high speed, the required torque is normally low.



Fig. 10: Ramp up of the speed of IM with VSI  $(U_{DC} = 280 \text{ V})$ 

The same measurements were repeated with a ZSource inverter and the results are shown in Fig. 11. The input voltage was decreased to 180 V to demonstrate the capabilities of the ZSI. The waveforms verify the proper operation of the ZSI where the rotor speed is again increased to 2400 rpm at the 50 % of nominal torque.



# Fig. 11: Ramp up of the speed of IM with ZSI $(U_{DC} = 180 \text{ V})$

When operating the motor in the field weakening regime, voltage sags at the DC input present an even bigger problem, which can lead to rapid reduction of the rotor speed. Figure 12 illustrates the effect of simulated voltage sag of about 25 % of the input voltage for the VSI. Because the voltage sag is too severe, the rotor speed decreases below its nominal value. On the contrary, the flux producing current  $i_d$  greatly increases, which indicates the disrupted operation of the IM in the field weakening regime.



# Fig. 12: Motor operation with VSI when exposed to voltage sag

For the ZSI to be able to endure the input voltage drop of 25%, the minimal inverter voltage has to be set according to (13). The resulting  $u_i$  is 365 V; however the inverter volt-

age was set 10 % higher, at 400 V. Figure 13 shows the IM operating at 2400 rpm and at 50 % of the nominal torque. The ZSI successfully adapts its boost ratio during the voltage sag and the operation of the IM remains uninterrupted. The waveforms clearly demonstrate that the inverter voltage is boosted without any negative influence on the flux current  $i_d$ , which is maintained at a desired level



Fig. 13: Motor operation with ZSI when exposed to voltage sag

## 4 Conclusions

without any interruptions.

This paper has presented the motor drive system based on the Z-Source inverter and the induction motor. The field oriented control which is extensively used in high-performance drive applications was successfully adapted to the Z-Source inverter, where the modification of the SVM and control of the inverter voltage was explained. The operation of the system was verified with the practical use with the DSP control system together with the operation in the extended speed range. The use of the Z-Source based inverter is beneficial if the input voltage is limited, or if the voltage source is subjected to frequent voltage sags. The ZSI can easily adapt to such disturbances and keep the operation of the motor system reliable.

The findings of the performed experiments could also help when designing power converters with renewable energy sources in mind. In such cases, the changes in input voltage are seldom that sudden, nevertheless an insight into the behavior of the ZSI under comparable circumstances has been presented.

### References

- /1/ Peng, F.Z., "Z-source inverter", IEEE Trans. Ind.Applications., Vol 39, No 2,pp 504-510, March/April 2003.
- /2/ Shen, M.; Joseph, A.; Wang, J.; Peng, F.Z.; Adams, D.J. "Comparison of traditional inverters and Z-source inverter for fuel cell vehicles", Power Electronics in Transportation, IEEE, 125 – 132, February 2005.
- /3/ Peng, F.Z, Shen, M., Holland, K., "Application of Z-Source Inverter for Traction Drive of Fuel Cell–Battery Hybrid Electric Vehicles", IEEE Trans. Power.Electronics., Vol. 22, No. 3, pp. 1054 1061, May 2007.
- /4/ Holland, K. ; Shen, M. ; Peng, F.Z., "Z-source inverter control for traction drive of fuel cell - battery hybrid vehicles", Industry Applications Conference, Vol. 3, pp. 1651 – 1656, October 2005
- /5/ Peng, F.Z., "Z-source inverter for motor drives", Power Electronics Specialists Conference, 2004. PESC 04. 2004 IEEE 35th Annual, Vol. 1, pp. 249 – 254, November 2004.
- /6/ Poh Chiang Loh; Vilathgamuwa, D.M.; Lai, Y.S.; Geok Tin Chua; Li, Y., "Pulse-width modulation of Z-source inverters", IEEE Transactions on Power Electronics, Vol. 20, No. 6, pp. 1346 – 1355, November 2005
- /7/ Peng, F.Z.; Shen, M.; Qian, Z., "Maximum boost control of the Z-source inverter", IEEE Transactions on Power Electronics, Vol. 20, No. 4, pp. 833 – 838, July 2005
- /8/ Miaosen Shen; Jin Wang; Joseph, A.; Fang Zheng Peng; Tolbert, L.M.; Adams, D.J., "Constant boost control of the Z-source inverter to minimize current ripple and voltage stress", IEEE Trans. Ind.Applications., Vol. 42, No. 3, pp. 770 – 778, June 2006

U. Flisar University of Ljubljana, Faculty of Electrical Engineering, SI-1000 Ljubljana, Slovenia uros.flisar@fe.uni-lj.si

#### D. Vončina

University of Ljubljana, Faculty of Electrical Engineering, SI-1000 Ljubljana, Slovenia voncina@fe.uni-Ij.si

P. Zajec

University of Ljubljana, Faculty of Electrical Engineering, SI-1000 Ljubljana, Slovenia peter.zajec@fe.uni-lj.si

Prispelo (Arrived): 10.05.2010 Sprejeto (Accepted): 09.09.2010

# MODELING OF MEASURED SELF-SIMILAR NETWORK TRAFFIC IN OPNET SIMULATION TOOL

M. Fras<sup>1</sup>, J. Mohorko<sup>2</sup>, Ž. Čučej<sup>2</sup>

<sup>1</sup>Margento R&D, Maribor, Slovenia <sup>2</sup>University of Maribor, Faculty of Electrical Engineering and Computer Science, Maribor, Slovenia

Key words: network traffic, self-similarity, Hurst parameter, long-range dependence

**Abstract:** The Modeling, analysis and simulation of self-similar traffic has become the main goal of much research work around the world, over the last 15 years. In our research we measured many different types of real traffic in different networks and classified it on the basis of analysis in the sense of self-similarity and long-range dependence. We used estimated statistical parameters for measured network traffic in order to model this traffic in simulation tool OPNET. We used the following statistical criteria for successful modeling: average bit rate, average packet rate, Hurst parameter, and histograms of statistical network traffic processes. During measurements and simulations we discovered that the shape parameter of Pareto distribution has a great impact on simulated traffic, and also that classical estimation usually leads to significant discrepancies between measured and simulated traffic in the sense of average bit rate and also bursts, which are characteristic of self-similar traffic. So, we developed a novel method for estimating the shape parameter of Pareto distribution which shows successful results regarding the chosen criteria, during the testing process.

# Modeliranje izmerjenega samopodobnega omrežnega prometa v simulacijskem orodju OPNET

Kjučne besede: omrežni promet, samopodobnost, Hurstov parameter, dolgo območje odvisnosti

Izvleček: Modeliranje, analiza in simuliranje samopodobnega omrežnega prometa predstavlja osrednji cilj mnogih raziskovalnih del v zadnjih petnajstih letih. V našem članku smo se osredotočili na merjenje različnih tipov samopodobnega omrežnega prometa v različnih omrežjih, z namenom klasifikacije omrežnega prometa na osnovi analize samopodobnosti in dolgega območja odvisnosti. Model izmerjenega samopodobnega omrežnega prometa smo izvedli na osnovi ocenjenih statističnih parametrov porazdelitev naključnih procesov omrežnega prometa v simulacijskem okolju OPNET. Prav tako smo definirali kriterije ujemanja izmerjenega in simuliranega omrežnega prometa kot so povprečna bitna in paketna hitrost, Hurstov parameter ter histograme naključnih procesov omrežnega prometa velik vpliv na celoten modeliran promet. Pri tem smo prav tako ugotovili, da že majhna odstopanja v vrednosti omenjenega parametra vodijo do velikih odstopanj med izmerjenim in simuliranim prometom v kriterijih bitne in paketne hitrosti ter izbruhih, ki so značilni za samopodobnega prometa vodijo do velikih odstopanj med izmerjenega in movo ocenjevalno metodo parametra oblike Paretove porazdelitve, ki ponuja zadovoljive rezultate glede na izbrane kriterije ujemanja izmerjenega in modeliranega prometa.

## 1. Introduction

Over 15 years, new models of network traffic have been developed, which have replaced traditional models, such as Poisson and Markov. This self-similar model is based on fractal theory, and can be described using the Hurst parameter and long-range dependence (LRD). The pioneers on this field are Leland, Willinger, and many others /1/, who introduced the new description of network traffic in 1994. The new description appeared as an alternative to traditional models, as were Poisson and Markov /2/. It was shown, that heavy tailed distributions are more suitable for describing inter-arrival time and packet-size process than exponential. So Pareto's and Weibull's heavy-tailed distributions became the most frequently used distributions for describing self-similar network traffic /3, 4, 5, 6/.

Another aspect of self-similarity and long-range dependence appeared with the Hurst parameter. The Hurst parameter represents the measurement of self-similarity and variability of packet arrival rate /3, 7, 8/. There are several different methods for estimating the Hurst parameter which can lead to diverse results /9, 10/. Over the last decade several studies have been carriedout regarding the analysis of measured traffic. Researchers found, that network traffic can be best described by self-similarity and long-range dependence /11, 12, 13, 14, 15/. There are research properties of measured traffic for different protocols and applications (HTTP), and video and P2P traffic /6, 17, 29/.

Measuring, analyses and the modeling of self-similar traffic has still been one of the main research challenges over recent years. One of the important researchers' goals is also self similar traffic modeling by simulation communications' environments, such as OPNET /18, 19, 20, 21/. Many models for generating self-similar traffic are based on fractal models /22, 23/. A lot of research work has also existed, where interest is focused on estimating selfsimilar network traffic parameters /4, 9, 10, 25/, such are Hurst and distributions, but without verification in simulations tools. In such cases there are no possibilities for evaluating the successful for estimated parameters for simulation purposes. In our research we paid exact attention to this area of network traffic analysis and modeling. For measurement of the similarity between measured and modeled traffics we chose different criteria, such as average bit rate, packet rate, Hurst parameter (traffic bursts), and also histograms of measured and modeled traffic. We also tried to show how well-chosen distribution has an impact on success of traffic modeling and which properties of network traffic (i.e. H parameter, LRD or SRD) help us to choose the right distribution (heavy or light-tailed) for describing self-similar network traffic.

During analysis and simulations we discovered that classical estimation of packets' size distribution parameters give parameters which cause significant discrepancy between measured and simulated traffic. This large impact regarding discrepancies, between measured histogram and chosen distribution, brings maximal packets (MTU) in the packets size process, which is a consequence of the fragmentation mechanism of a TCP/IP stack. So we developed a novel method for decreasing these discrepancies.

The presented paper is organized as follows. The second section describes the mathematical background of selfsimilarity and long-range dependence. The next section describes the analytical methods for stochastic self-similar processes such as estimation of Hurst parameter for selfsimilar processes, and the probability distribution and its parameters. The fourth section describes the modeling of self-similar traffic in the simulation environment OPNET, using different communication devices, using different traffic generators mechanisms. In the fifth section, we present novel methods for estimating distributions' parameters. The sixth section contains the simulation results with comparison between measured and modeled traffics by OPNET. Finally, we complete this paper with the conclusions.

# 2. Self similarity and long range dependence

The self-similarity model replaced the traditional traffic models, such as the Poisson and Markov traffic models, in many areas. The Poisson process provide good approximation for telephone networks (PSNT networks) when describing the process for the durations of the calls and time between calls. It was widely-used in the past. But these models do not allow for descriptions of bursts which are distinctive in today's network traffic. Such bursts can be described by a self-similarity model, because it shows bursts over wide-range of time scales. This is in contrast to the traditional traffic model (Poisson model), which became very smooth during the aggregation process.

The most often used definition for self-similarity in many books /3, 7, 24/ intend to the standard time series:

Let  $X = (X_t, t = 0, 1, 2, ...)$  be a covariant stationary stochastic process; that is, a process with constant mean, finite variance  $\sigma^2 = E/(X_t - \mu)^2/$ , autocovariance function  $\gamma(k) = E/(X_t - \mu)(X_{t+k} - \mu)/$ , that depends only on *k*, autocorrelations function *r*(*k*):

$$r(k) = \frac{\gamma(k)}{\sigma^2} = \frac{E[(X_t - \mu)(X_{t+k} - \mu)]}{E[(X_t - \mu)^2]}, \quad k = 0, 1, 2, \dots$$
(1)

Assume X has an autocorrelation function form ( $\approx$  means <code>»asymptotic to«)</code>

$$r(k) \approx k^{-\beta} L_1(k), \quad k \to \infty, \quad 0 < \beta < 1,$$
 (2)

where  $L_1(k)$  is slowly varying at infinity, that is  $\lim_{t\to\infty} L_1(tx)/L_1(t) = 1$  for all x > 0 (i.e.,  $L_1(t) = const$ ,  $L_1(t) = log(t)$ ).

The measure of self-similarity is the Hurst parameter (*H*), which is in a relationship with parameter  $\beta$  in equation (3).

$$H = 1 - \frac{\beta}{2} \tag{3}$$

Let's define the aggregation process for the time series:

For each m = 1, 2, 3, ... let  $X^{(m)} = (X_k^{(m)}, k = 1, 2, ...m)$  denote a new time series obtained by averaging the original series *X* over a non-overlapping block of size *m*. That is, for  $m=1, 2, 3, ..., X^{(m)}$  is given by /13/:

$$X_{k}^{(m)} = \frac{1}{m} (X_{km-m+1} + ... + X_{km}), \quad k = 1, 2, 3, ...$$
(4)

 $X_k^{(m)}$  is the process with average mean and autocorrelation function  $r^{(m)}(k) / 3/$ .

The process *X* is called exactly second order with parameter *H*, representing measure of self-similarity if the corresponding aggregated  $X^{(m)}$  has the same correlation structures as *X* and var( $X^{(m)}$ ) =  $\sigma^2 m^{-\beta}$  for all m = 1, 2, ...:

$$r^{(m)}(k) = r(k)$$
, for all  $m = 1, 2, ... k = 1, 2, ...$  (5)

The process *X* is called asymptotically second order with parameter  $H = 1 - \beta/2$ , if for all *k* it is large enough,

$$r^{(m)}(k) \to r(k), \quad m \to \infty$$
 (6)

From definition 4, it follow that the process is second order self-similar in the exact or asymptotical sense, if their corresponding aggregated process  $X^{(m)}$  are the same as X or become indistinguishable from X-at least with respect to their autocorrelation function. The most striking property in both cases, exact and asymptotical self-similar processes, is that their aggregated processes  $X^{(m)}$  possess a no degenerate correlation structure as  $m \rightarrow \infty$ . This is in contrast to the Poisson stochastic models, where their aggregated processes tend to second order pure noise as  $m \rightarrow \infty$ :

$$r^{(m)}(k) \to 0, \quad m \to \infty, \quad k = 0, 1, 2, \dots$$
 (7)

Network traffic with bursts is self-similar, if show bursts over many time scales or we can also say over a wide-range of time scales. This is in contrast to traditional models such as Poisson and Markov, where their aggregation processes become very smooth.

Another property of this process that satisfies relationship (1) is described as long-range dependence (LRD). Let us define second order-self similarity and its autocovariance  $\gamma(k)/1/$ . Let  $r(k) = \gamma(k)/\sigma^2$  denote the autocorrelation function. For 0 < H < 1,  $H \neq 0.5$  it holds that

$$r(k) \approx H(2H-1)k^{-2H-2} \quad r \to \infty \tag{8}$$

For values 0.5 < H < 1 autocorrelation function *r*(*k*) behaves in asymptotic mean as  $ck^{-\beta}$  for values  $0 < \beta < 1$ , where *c* is constant c > 0,  $\beta = 2 - 2H$ , and we have

$$\sum_{k=-\infty}^{\infty} r(k) = \infty$$
(9)

The long range dependence of the process is characterized by slowly decaying autocorrelation function and not summable autocorrelation function. Autocorrelation function decays hyperbolically, as the *k* increases. This is opposite to the property of short-range dependence (SRD), where the autocorrelation function decays exponentially and the equation (9) has a finite value. Short and longrange dependence a have a common relationship with the value of the Hurst parameter of the self-similar process /3/, /24/:

- $0 < H < 0.5 \rightarrow$  SRD Short Range Dependence
- $0.5 < H < 1 \rightarrow LRD$  Long Range Dependence

Self-similarity and long-range dependences' (LRD) properties are described using heavy-tailed distributions. The shapes of the heavy-tailed distributions (Pareto, Weibull) are hyperbolic, which is in contrast to the light-tailed distributions where distributions decay exponentially.

Pareto is the simplest heavy tailed distribution and has hyperbolic decay over its entire range. The probability density function of the Pareto distribution is given by:

$$p(x) = \alpha k^{\alpha} \cdot x^{-\alpha - 1}, \quad k \le x, \quad \alpha, k > 0$$
(10)

Parameter  $\alpha$  is shape parameter, *k* is the local parameter, which represents the minimum possible positive value for the random variable *x*. Another very important heavy-tailed distribution is Weibull distribution, which is described by the next-probability density function:

$$p(x) = \frac{\alpha}{k} \cdot \left(\frac{x}{k}\right)^{\alpha-1} \cdot e^{-\left(\frac{x}{k}\right)^{\alpha}}, \quad x \ge 0, \quad \alpha, k > 0$$
(11)

## 3. Analysis methods of the stochastic self-similar process

### 3.1 Hurst parameter

Hurst parameter represents the measure of self-similarity and it is estimated for the arrival process of a packet-rate. Exact methods for calculating the value of Hurst parameter do not exist, so we can only estimate. There are several methods for estimating Hurst parameter (*H*) of stochastic self-similar processes. But there is no criteria as to which method gives the best results. The most often used methods for Hurst parameters' estimation are /3, 9, 24, 25/:

 Variance method is a graphical method, which is based on the property of slowly decaying variance. In a log-log scale plot, we draw sample variance versus a non-overlapping block of size *m* for each aggregation level. From the line with slope  $\hat{a}$  we can estimate Hurst parameter as a relationship, from equation (3).

- R/S method is also a graphical method. It is based on a range of partial sums regarding data series deviations from mean value, rescaled by its standard deviation. The slope in the log-log plot of the R/S statistic versus aggregated points is the estimation for Hurst parameter.
- **Periodogram method** plots spectral density in logarithm scale versus frequency and also in logarithm scale. The slope in periodogram allows the estimation of parameter *H*.

Variance and R/S methods represent estimators within the time-domain, which are based on a relationship between a specific statistical data series' properties and the aggregation process with an overlapping block of size m (4). The periodogram method represents the estimator within frequency domain. Every method gives a different estimated value of parameter H. In our experiments we used the average value of these three estimated parameters. This method of obtaining parameters H, we also used as classification criteria for self similar traffic. If H is within the range 0.5 and 1, such network traffic is classified as self-similar. Figure 1 present a example of test traffic, and estimations of Hurst parameter by different methods.

### 3.2 Probability distributions

Network traffic can be described by two stochastic processes, one for packet sizes and one for inter-arrival time. Both processes are described by probability distributions. Self-similar process can be described by heavy tailed distributions. The main task for modeling the stochastic process with probability distribution is to choose the right distri-



Fig. 1: Estimating parameter H for self similar traffic (upper-left) with the variances method (lower left), R/S method (upper-right) and periodogram method (lower-right)

bution, which would be a good representation of our network traffic stochastic process. We used mathematical fitting tools (EasyFit) which allowed us to automatically include the fit distribution of the stochastic process, and also estimate parameters of distribution from the captured traffic /9/.



Fig. 2: For the stochastic process of inter-arrival time we chose distribution and estimate parameters of these distributions based on the histogram (upper left), and cumulative distribution function (upper right). Differences between empirical and theoretical distributions in P-P plot (lower left), and deferential distribution (lower right).

### 3.3 Long-range dependence

Long range dependence describes the memory effect of a stochastic process and it is characterized by its autocorrelation function (5, 6), as defined in the second section. Figure 3 shows an example of the autocorrelations function of the process with long-range dependence property.



### Fig. 3: An example of autocorrelation function for the stochastic process, with LRD property

There does not exist a systematic and definitive way to estimate the property of long-range dependence. One of the ways of defining long-range dependence is estimation of Hurst parameter. But different estimation methods offer different estimated values H, which can also vary. This es-

timation is especially difficult around the value 0.5, which represents the boundary between long and short-range dependence. Thomas Karagianis /9, 10/ suggest an additional test called "bucket-shuffling" for confirmation of long-range dependence. This method can be described as a mixing of captured data. A method is based on random partitioning of the data series (buckets) of length *b*. This intuitive method confirms long range dependence, when the autocorrelation functions of the original process and the internal shuffling process, are almost the same.

# 4. Modeling and simulation of self similar traffic in OPNET

OPNET Modeler is one of the leading industrial environments for the simulations of various communication technologies. Different approaches are possible for generating self-similar traffic in OPNET. In our case we used two standard node models (stations) from the OPNET library:

- Raw Packet Generator (RPG)
- IP station

 $T^{\alpha}$ 

Raw Packet Generator (RPG) is a traffic source model /11, 19/ implemented specially for generating self-similar traffic, which is based on different fractal point processes (FPP) /22, 23/. Self similar traffic is modeled with an arrival process, which is described by Hurst parameter and the distribution probability for packet sizes. This arrival process can be based on many different parameters, such as Hurst parameter, average arrival rate, fractal onset time scale, source activity ratio and peak to mean ratio /11/. There are several different fractal point processes (FPP). In our case we used the superposition of the fractal renewal process (Sub-FRP) model, which is defined as the superposition of *M* independent and probably identical renewal fractal processes. Each FRP stream is a point renewal processes and M numbers of independent sources compose the Sub-FRP model. Common inter-arrival probability density function  $p(\hat{o})$  of this process is:

$$p(t) = \begin{cases} \gamma A^{-1} e^{-\gamma t/A} & 0 \le t \le A \\ \gamma e^{-\gamma} A^{\gamma} t^{-(\gamma+1)} & t \ge A \end{cases}$$
(12)

where  $1 < \gamma < 2$ . Process FRP can be defined as Sup-FRP process, when the number of independent identical renewal processes (*M*) is equal to 1. A model Sub-FRP is described by three parameters:  $\gamma$ , *A* and *M*.  $\gamma$  represent the fractal exponent, *A* is the location parameter, and *M* is the number of sources. These three parameters are in relationship with three OPNET parameters. These parameters are Hurst parameter, average arrival rate  $\lambda$ , and fractal onset time-scale (FOTS). The relationships between these three parameters of Sub-FRP and parameters in OPNET model are:

$$H = (3 - \gamma)/2$$
  

$$\lambda = M\gamma [1 + (\gamma - 1)^{-1} e^{-\gamma}]^{-1} A^{-1}$$
  

$$= 2^{-1}\gamma^{-2} e^{-\gamma} (\gamma - 1)^{-1} (2 - \gamma) (3 - \gamma) [1 + (\gamma - 1)e^{\gamma}]^{2} A^{\alpha},$$
(13)

where  $\gamma = 2 - \beta$ . Hurst parameter *H* is defined by equation (3). In the Sub-FRP model from OPNET, we can set Hurst parameter (*H*), average arrival-rate ( $\lambda$ ) and fractal onset time-scale (FOTS) in seconds. The recommended value for the parameter FOTS in OPNET is 1 second.

The IP station /11/ can contain an arbitrary number of independent simultaneous working-traffic generators. Each generator enables the use of heavy-tailed distributions such as Pareto or Weibull, for the generation of self-similar network traffic by two distributions, one for packet size process and another for packet inter-arrival time process.

In our case, we used a traffic generator contained in the IP station. The traffic generator is placed above the IP encapsulation layer, which takes care of the packets' formation and fragmentation. This means, that we can model packets before the fragmentation process. The process of fragmentation radically changes the histogram of the packet-size process, because a lot of MTU length size packets appear. These also impact on the arrival process, because fragmentation causes new packets. All these facts must be considered when modeling network traffic's process.

# 5. Proposed method

Traffic was captured by a Wireshark sniffer which provides information about captured traffic in the network. Network traffic modeling is often based on modeling the sizes of files transmitted through the network /4/. But we do not usually have information about files' sizes when measuring network traffic with sniffers, which only provide information about captured packets



### Fig. 4: A histogram of captured packet sizes (left) and transformed captured packets using novel method (right) and estimated parameters of Pareto distributions for both cases.

From the first histogram of captured packets, in Figure 4, we can see that there are a lot of packets with minimal sizes or close to minimal sizes, but there are also many packets with maximal size (MTU). In a case where a light or heavy-tailed distribution is chosen we must bear in mind, that the tail of a distribution has a great impact on the generated traffic, which is a consequence of the mechanism described at the end of Section 4.

For this reason we, developed a new method for estimating the parameters regarding Pareto distribution for packet size process. This most important parameter is shape parameter  $\alpha$  of Pareto distribution. The local parameter k of Pareto distribution is equal to minimal packets size in captured traffic. The developed method is based on packet defragmentation where all maximal packets in the seguence including the first packet, which is shorter then maximal size from the same source, combined in the new packets. This operation corresponds to estimating the original file sizes transmitted over the network. This way of transforming captured traffic is then used for estimating Pareto distribution parameters by EasyFit fitting tool, which is shown in the right histogram of Figure 4. Before transformation, we also subtract 20 bytes from each captured packet, which represent the IP headers' sites. These headers will later be automatically added in to the process of fragmentation by OPNET simulations. The new estimated parameters provide a very good description for packet size process (in simulation tool) in the sense of traffic bursts and also in the sense of bit and packet rates. This method was verified by simulation in OPNET tool.

## 6. Simulations results

With the help of a sniffer Wireshark, we captured different network traffic in different networks. Here we present two typical captured networks' traffic, which are even different at the first look. We used these two traffics for analysis, modeling and simulation purposes with the presented methods. Table 1 shows the main properties of these test traffics, which are shown in Figure 5.

| Table 1: The main properties of captured traffic and Hurst |
|------------------------------------------------------------|
| parameter estimated using different methods for both       |
| test traffics.                                             |

| measured<br>traffic | packet<br>rate<br>(p/s) | bit rate<br>(b/s) | variance<br>method | R/S<br>method | periodogra<br>m method |
|---------------------|-------------------------|-------------------|--------------------|---------------|------------------------|
| traffic 1           | 24.02                   | 108909.2          | 0.630              | 0.723         | 0.843                  |
| traffic 2           | 35.612                  | 114517.6          | 0.592              | 0.580         | 0.477                  |



Fig. 5: Measured test traffic 1 and 2 captured by Wireshark sniffer

We used three different methods for estimating Hurst parameters for these two traffics, which were described in Section 3. The Hurst parameters for both cases are bigger than 0.5, so we can classify these test traffic as self-similar network traffic. Table 1 contains the estimated parameters H for both test traffics, which were estimated by variance, R/S and periodogram methods. We also conducted tests about short and long-range dependence. In the case of the first test traffic, the autocorrelation function decayed hyperbolically, which means from Equation (9), that this traffic could have the property of long-range dependence. In this case, we can finally confirm long-range dependence using the »bucket shuffling« method, as described in Section 3.3. For the second test traffic autocorrelation function decayed exponentially towards 0. For this case Equation (9) has finite results and, therefore, the test traffic 2 has the property of short-range dependence. We must also define distribution for inter-arrival time and packet size process. Distributions and parameters of distributions were estimated by EasyFit tool. For both test traffics, we chose the suitably heavy (Pareto or Weibull) and also light-tailed (exponential) distributions.

In OPNET, we generated self-similar traffic with two different station types – RPG and IP stations. We created six different scenarios for each test traffic, where we used different combinations of estimated distributions. We intended to show the differences between heavy and light-tailed distributions. In the first two scenarios, the network traffic was generated by an RPG station, where self-similarity is described by Hurst parameter. During the first scenario we used heavy-tailed distribution for the packet size process, while in the second we used light-tailed distribution (exponential). In the next four scenarios, we generated network traffic using the IP station, where we used different combinations of used distributions for the packet size process and inter-arrival time.

| Table | 2: | Estimated   | distributions    | and   | parameters | for |
|-------|----|-------------|------------------|-------|------------|-----|
| propo | se | of modeling | g test traffic 1 | in Oł | PNET.      |     |

| pa           | rameters for mo                                  | paramet<br>m                                 | ers of measur<br>odeled traffic | ed and             |      |
|--------------|--------------------------------------------------|----------------------------------------------|---------------------------------|--------------------|------|
| traffic      | arrival<br>process                               | packet size<br>process                       | packet<br>rate (p/s)            | bite rate<br>(b/s) | H    |
| measured     | Х                                                | Х                                            | 24                              | 108909,2           | 0.73 |
| modeled<br>1 | H = 0.73                                         | Pareto<br>$\alpha = 0.9835$<br>$\beta = 432$ | 33.826                          | 128759.8           | 0.59 |
| modeled<br>2 | H = 0.73                                         | exponential $\lambda = 7547.2$               | 29.183                          | 181449.3           | 0.59 |
| modeled<br>3 | exponential $\lambda = 0.0458$                   | exponential $\lambda = 933.4$                | 27.568                          | 168943.3           | 0.51 |
| modeled<br>4 | Weibull<br>$\alpha = 0.304$<br>$\beta = 0.00578$ | exponential $\lambda = 933.4$                | 25.146                          | 153719.5           | 0.62 |
| modeled<br>5 | Weibull<br>$\alpha = 0.304$<br>$\beta = 0.00578$ | Pareto<br>$\alpha = 0.9835$<br>$\beta = 34$  | 25.32                           | 88705.33           | 0.66 |
| modeled<br>6 | exponential $\lambda = 0.0458$                   | Pareto<br>$\alpha = 0.9835$<br>$\beta = 34$  | 26.636                          | 81304.02           | 0.55 |

Table 2 shows modeling results for test traffic 1 over six different scenarios. There are estimated statistical parameters such as Hurst parameters and distributions used in



Fig. 6: Modeling measured test traffic 1 in OPNET with six modeling traffics (scenario 1 in 2 with RPG station, scenario 3, 4, 5, 6 with IP station)

models and simulation results using these models. Figure 6 shows all six modeled traffic traffics produced by OP-NET, with estimated parameters from Table 2. The traffics differ in the sense of burst intensities, packet and bit-rates. One of the criterions, for modeling successfully, was the differences between bit and packet-rates of the test traffic and modeled traffics in OPNET. Besides the average values of bit and packet-rates, the more important criteria is also bursts' intensity within the network traffic. For each test traffic, we chose from the six modeled traffics, the traffic which best represented the measured test traffic.

**Test traffic 1** poses the property of long-range dependence, so there are a lot of bursts in the traffic. We modeled this measured-test traffic over six different scenarios. The results are shown in Figure 6 and Table 2. The best approximation for test traffic 1 is modeled traffic 5 from Table 2. The described is by Pareto distribution for packetsize process and Weibull distribution for inter-arrival time. Figure 8 shows a comparison between the second test traffic and the modeled traffic for bit rates. We also compared histograms of processes for packet-size and the inter-arrival time of measured and simulated network traffic. Here we saw that the processes for modeled traffic are very close to those of measured traffic. We can also compare Hurst parameters from Table 2 between them. In the case of modeled traffic 5, the Hurst parameter of the modeled traffic is the closest to the estimated values of the measured network for all of simulated-traffic cases.



Fig. 7: Comparison between modeling and measuring of test traffic 2 in packets per second (p/s).

Test traffic 2 was also modeled over six different scenarios, such as in the first case. As the best modeled traffic of test traffic 2 from all six cases, we chose the case where simulated traffic was described by the exponential distribution for packet sizes and Weibull heavy-tailed distribution for inter-arrival time. The bit-rate of this traffic was 32.95 (p/s) and packet-rate was 118998 (b/s), which are very close to the measured values. The Hurst parameter of the simulated traffic was 0.521, which is also close to the estimated values of the measured traffic. In this case, we also compare the variances of packet and bit rates. The measured traffic variances are 9.81 (p/s) for packet-rate and 22177 (b/s) for bit-rate. In the case of measured traffic, the variances of the modeled traffic for packet-rate are 11.32 (p/s) and 29280 (b/s) for bite-rate. Figure 7 shows the comparison between measured and best-modeled traffic for bit rates. From all critera after comparison we can say that the simulated traffic is a good approximation of measured traffic 2.



Fig. 8: Comparison between modeling and measuring of test traffic 1 for bit rates.

### 7. Conclusion

In this paper we presented novel method for estimating the distribution parameters of measured network traffic. We also validated this method in simulation and also made comparisons, between the developed method and the method where parameters are estimated directly form captured packets. During the analysis phase we paid attention to the self-similar property, which has become the basic model for describing today's network traffic.

In network traffic theory, the properties of short and longrange dependence are direct prescribed by the values of estimated parameter H. Using our analysis of network traffic, we proved that network traffic can exist where Hurst parameter is bigger than 0.5, but this process does not have the property of long-range dependence.

From our simulations, we could also see that, in the case of modeling self-similar traffic, short-range dependence is more appropriate for choosing exponential distribution to describe a packet-size process. The exponential distribution does not impact on the extreme peaks in the modeled traffic. Pareto distribution is unsuitable for these reasons.

Heavy-tailed distributions, especially Pareto, are suitable for modeling packet-size process of measured network traffic, which are self-similar and also have the property of long-range dependence (test traffic 1).

There are discrepancies between measured and modeled traffics are discrepancy in the sense of packet-rate, bitrate, bursts intensity, and variances. With this developed method, we obtain good approximation of measured network traffic. We cannot claim that this is the optimal method but it shows good results through OPNET. We noticed that estimating the shape-parameter of Pareto is very delicate, because the small deviation in the parameter causes large discrepancies regarding of network traffics average values, which is one of the chosen criteria for traffic modeling.

### Acknowledment

This work is part of the target research programme "Science for Peace and Security": M2-0140 - Modeling of Command and Control information systems, financed by the Slovenian Ministry of Defence.

#### References

- /1/ W. E. Leland, M. S. Taqqu, W. Willinger and D. V. Wilson, On the self-similar nature of Ethernet traffic (Extended version), *IEEE/* ACM Transactions on Networking, Vol. 2, pp. 1-15, 1994.
- /2/ V. Paxon, and S. Floyd, Wide area traffic: the failure of Poisson modeling, *IEEE/ACM Transactions on Networking*, 3(3): 226– 244, 1995.
- /3/ K. Park, and W. Willinger, Self-Similar Network Traffic and Performance Evaluation, John Wiley & Sons, 2000.
- /4/ K. Park, G. Kim and M. E. Crovella, On the Relationship Between File Sizes Transport Protocols, and Self-Similar Network Traffic, International Conference on Network Protocols, pp.171– 180, 1996.
- /5/ W. Willinger, M. S. Taqqu, R. Sherman and D. V. Wilson, Selfsimilarity through high-variability: statistical analysis of Ethernet LAN traffic at the source level, *IEEE/ACM Transactions on Networking*, 5(1): 71–86, 1997.

- /6/ M. E. Crovella, and A. Bestavros, Self-Similarity in World Wide Web Traffic Evidence and Possible Causes, *IEEE/ACM Transactions on Networking*, 1997.
- /7/ O.Rose, Estimation of the Hurst Parameter of Long Range Dependent Time Series, Research Report, 1996.
- /8/ F. Xue, and S. J. Ben You, The effect of aggregation on selfsimilar traffic, Department of Electrical and computing engineering, University of California, 2004.
- /9/ T. Karagiannis, M. Molle and M. Faloutos, Long-range dependence: Ten years of Internet traffic modeling, *IEEE Internet Computing*, 8 (5), pp. 57-64, 2004.
- /10/ T. Karagiannis and M. Faloutos, A Tool For Self-Similarity and Long-Range Dependence Analysis, 1st Workshop on Fractals and Self-Similarity in Data Mining: Issues and Approaches (in KDD), Edmonton, Canada, July 23, 2002.
- /11/ P. Sagger, Does Circuit Emulation in Metropolitan Gigabit Ethernets require Service Priority?, Post Diploma Thesis NA-2005-02, Swiss Federal Institute of Technology, Zurich, 2005.
- /12/ H. Yölmaz, IP over DVB: Management of self-similarity, Master of Science, Bodaziçi University, 2002.
- /13/ M. Z. Jiang, Analysis of wireless data network traffic, Master of Applied Science, Simon Fraser University, Vancouver, Canada, 2000.
- /14/ C. Groth, Modellierung und Simulation von Ethernet-Netzwerkverkehr, Master of Science, Universität Rostock, 2004.
- /15/ B. Vujičić, Modeling and Characterization of Traffic in Public Safety Wireless Networks, Master of Applied science, Simon Fraser University, Vancouver, 2004.
- /16/ E. Casilari, F. J. Gonzalez and F. Sandoval, Modeling of HTTP traffic, IEEE Communication Letters 5(6): 272–274, 2001.
- /17/ J. Beran, R. Sherman, M. S. Taqqu, and W. Willinger (1995), Long-range dependence in variable bit rate video traffic, *IEEE Transactions on Communications*, vol. 43, 1566–1579, 1995.
- /18/ M. Fras, J. Mohorko and Ž. Čučej, Žarko. A new approach to the modeling of network traffic in simulations, Inf. MIDEM, 39(1), pp. 41-45, 2009.
- /19/ J. Potemans, B. Van den Broeck, Y. Guan, J. Theunis, E. Van Lil and A. Van de Capelle, Implementation of an Advanced Traffic Model in OPNET Modeler, *OPNETWORK 2003*, Washington D.C., USA, 2003.
- /20/ P. Leys, J. Potemans, B. Van den Broeck, J. Theunis, E. Van Lil and A. Van de Capelle, Use of the Raw Packet Generator in OPNET, OPNETWORK 2002, Washington D.C., USA, 2002.

- /21/ M. Jiang, S. Hardy and Lj. Trajkovic, Simulating CDPD networks using OPNET, OPNETWORK 2000, Washington D.C, 2000.
- /22/ K. B. Ryu, and S. Lowen, Fractal Traffic Model for Internet Simulation, Proceedings of the Fifth IEEE Symposium on Computers and Communications (ISCC 2000), 2000.
- /23/ B. K. Ryu, and M. Nandikesan, Real Time Generation of Fractal ATM Traffic: Model, Algorithm, and Implementation, Department of Electrical Engineering and Center for Telecommunications Research, New York, 1996.
- /24/ H. Yölmaz, IP over DVB: Management of self-similarity, Master of Science IP over DVB: Management of self-similarity, Master of Science, Bodaziçi University, 2002.
- /25/ M. Gospodinov, and E. Gospodinova, The graphical methods for estimating Hurst parameter of self similar network traffic, *International conference on computing systems and tehnology* – CompSysTech' 2005.
- /26/ M. Fras, J. Mohorko and Ž. Čučej, Estimating the parameters of measured self similar traffic for modeling in OPNET, *IWSSIP Conference*, Maribor, Slovenia, 2007.
- /27/ O. Sheluhin, S. Smolskiy and A. Osin, Self-Similar Processes in Telecommunications, John Wiley & Sons, 2007.
- /28/ J. Mohorko, M. Fras and Ž. Čučej, Modeling of IRIS replication mechanism in tactical communication network with OPNET, *IWS-SIP Conference*, Maribor, Slovenia, 2007.
- /29/ M. Fras, J. Mohorko and Ž. Čučej, Žarko. Analysis, modeling and simulation of P2P file sharing traffic impact on networks' performances. Inf. MIDEM, 38(2), str. pp. 117-123, 2009.

M. Fras, Margento R&D, Maribor, Slovenia J. Mohorko, Ž. Čučej University of Maribor, Faculty of Electrical Engineering and Computer Science, Maribor, Slovenia

Prispelo (Arrived): 24.11.2010 Sprejeto (Accepted): 09.09.2010

# EFFECT OF HUMAN HEAD SHAPES FOR MOBILE PHONE EXPOSURE ON ELECTROMAGNETIC ABSORPTION

<sup>1</sup>Mohammad Rashed Iqbal Faruque, <sup>2</sup>Mohammad Tariqul Islam, <sup>3</sup>Norbahiah Misran <sup>1,3</sup>Dept. of Electrical, Electronic and Systems Engineering, Faculty of Engineering and Built Environment, Universiti Kebangsaan Malaysia, Bangi, Selangor, Malaysia. <sup>2</sup>Institute of Space Science (ANGKASA), Universiti Kebangsaan Malaysia, Bangi, Selangor, Malaysia.

Key words: cellular phone, the finite-difference time-domain (FDTD) method, head size, specific absorption rate (SAR).

**Abstract:** In this paper, the local Specific Absorption Rate (SAR) induced in spherical, cubical, and realistic human head models exposed to a mobile phone is investigated. Three human head models of the highest degree of different shapes but of almost the same volume are considered. Obtained local maximum SAR induced in these three human head models for homogeneous cases are established to have average differences of about 12% and 9% at 900 and 1800 MHz respectively. A comparison analysis of SAR induced in the realistic human head model for homogeneous cases are also discussed. From the results, it can be observed that the local maximum SAR induced in the homogeneous human head model is larger than that induced in the inhomogeneous human head model.

# Vpliv oblike človeške glave na absorpcijo elektromagnetnega sevanja pri izpostavljenosti mobilnih telefonov

Kjučne besede: mobilni telefon, FDTD metoda, velikost glave, parameter SAR

Izvleček: V tem članku je raziskujemo odvisnost parametra SAR ( Specific Absorption Rate ), ki se inducira v sferičen, kubični in realističen model človeške glave, ki je izpostavljena mobilnemu telefonu. Predstavljeni so trije različni modeli človeške glave s skoraj enakimi volumni. Lokalni maksimumi parametra SAR pri treh modelih homogenih človeških glav so si v povprečju različni za okoli 12% in 9% pri 900 in 1800 MHz. Prav tako so predstavljene analize parametra SAR pri realističnih modelih človeških glav za homogene in nehomogene primere. Iz rezultatov lahko razberemo, da je lokalni maksimum parametra SAR pri homogenih modelih človeških glav večji kot tisti pri nehomogenih modelih.

# 1. Introduction

With the rapid and ever more widespread use of mobile phones, public concern regarding the possible health hazards has been growing, that brings an increased requirement on electromagnetic (EM) dosimetry for mobile phones. The basic parameter in the EM dosimetry is defined in terms of the specific absorption rate (SAR), or the absorbed power in unit mass of tissue /1/. The SAR is generally evaluated using either phantom measurement or computer simulation. The finite-difference time-domain (FDTD) method is currently the most widely accepted means for the SAR computations /2/. In Europe the basic limit of SAR set for the general public is 2 W/kg averaged over a volume equivalent to 10 gm and a period of 6 min. /3/. The ANSI/IEEE standard /4/ defines a stricter limit for an uncontrolled environment of 1.6 W/kg averaged over a volume of 1 gm and a period of 30 min.

To analyze the possible range of variations of the induced field strengths in the various tissues requires an extensive effort, since the local field strengths strongly depend on a large number of parameters, such as: operational frequency and antenna input power; position of the device with respect to the head; design of the device; the outer shape of the head; the distribution of the different tissues within the head, and the electric properties of these tissues.

The electric parameters of a human body vary with levels of physical and metabolic activity, health, and age. The variations in all these properties lead to a stretch in the analyzed absorption distribution. Currently the most often used system for testing handheld cellular telecommunications equipment is the measurement setup using a homogeneous anatomically-shaped phantom filled with a liquid simulating brain tissue /5-14/. The rationale behind this approach comes from the energy absorption mechanism in the close near-field of antennas /15-19/, which concludes that the most determining parameters for volume-averaged values are the time-averaged antenna input power, operational frequency, design of the device, and its position with respect to the head, and to a much lesser extent, on the physical properties of the head.

In this paper the authors focus on the effects of the factors of head properties, such as size, shape, and electrical properties of tissues in order to calculate the SAR induced in human head models exposed to a cellular phone. To find the effects of head size or shape on electromagnetic absorption characteristics, the realistic head model, spherical head model, and a cubical head model are scaled and
then SAR distributions for these models, when exposed to a cellular telephone model have been investigating using the FDTD method.

# 2. Numerical Method

The finite-difference time-domain (FDTD) method was used to obtain the electromagnetic field distribution in three types of head models. In electrodynamics by far the most flexible way to investigate effects which depends on multiple parameters is with computer simulations, since geometry and domain properties can be easily varied. Many numerical techniques exist for the analysis of complex near-field scattering problems, whereby the FDTD technique /2/ has proven to be the most efficient method for studying absorption in strongly inhomogeneous bodies. FDTD is currently used by various groups to study the absorption in the human head from mobile phones /6-9/ and to test novel antenna designs /10-11/.

CST Microwave Studio (CST MWS) which adopted finite integral time-domain technique (FITD) proposed by Weiland in 1976 was used as the main simulation instrument in permutation of the perfect boundary approximation (PBA). Thin sheet technique, significantly developed in geometry approximation with computation speed was used to achieve highly accurate results. Non-uniform meshing scheme was adopted so that major computation endeavor was dedicated to regions along the inhomogeneous boundaries for fast and perfect analysis. This technique is conceptually slightly different than FDTD but leads to the same numerical scheme. The open domain is bounded by second-order Mur absorbing boundary conditions. Excitation is done by a smoothly-increasing harmonic function and the computation is terminated after steady state is reached (usually after 10 to 20 periods). The results of local maximum SAR induced in the two homogeneous human head models will be presented. Then the FDTD method is again used to calculate the local maximum SAR induced in the realistic human-head model, which is simulated as a homogeneous model with six electrical property sets, counting the head tissue, simulating the bone tissue, the skin tissue, the eye tissue, the blood tissue, the muscle tissue, the brain tissue, and simulated as an inhomogeneous model with six tissues, at 900MHz and 1800MHz, respectively. Comparisons of the SAR induced in the realistic human head model for homogeneous and inhomogeneous models will also be presented in this paper.

# 3. Head Phantoms

Accurate phantoms of homogeneous human heads can be generated on the basis of magnetic resonance imaging (MRI). The translation of the three-dimensional (3-D) data sets of relaxation times into the tissue distribution is a difficult task and generally requires a person trained either in medicine or biology, and who is able to distinguish both transitional and marginal regions. MRI produced in different laboratories, by different scientists and from different test subjects predictably contains differing discretizations.

In these studies different shapes, different dielectric constant, different conductivity, and different mass density of homogeneous phantoms in different tissues based on MRI data, sets of three different adults are used, and they are also simulated to inhomogeneous phantoms and their results are compared. Fig. 1 shows the outer shapes and two cross-sectional views of these head phantoms and Table I gives the data of discretization. The ear closest to the antenna needs special attention: During normal use of a hand-held telephone, the ear is pressed against the head and, therefore, changes its shape. In order to avoid effects caused by the different ear modeling, which may mask the effects of the head itself, the outer right ear of the head phantoms was removed.

The first phantom in realistic head model was taken with the highest resolution. Its voxel size is 1 mm in all three Cartesian dimensions. Realistic head model has the largest volume. The brain region was segmented very carefully. For the entire head, 13 tissue types were simulated. However, the lower part of the head was assigned to only one tissue type.

The second head phantom is considered as cubical, and has nearly the same voxel size. It was developed for the training of medical students and distinguishes among 120 tissue types. For the EM analysis this large number is needed to be reduced to 13 different tissues for which electric parameters are available /16/.

The third head phantom is considered as spherical that has taken and has a voxel size of about 12 mm<sup>3</sup>. The discretization is relatively crude. In the original MRI model the skin was not identified. In the computer model the skin was added as an outer layer with a thickness of one voxel. The brain region of head phantom Spherical is homogeneous and assigned only one tissue type comparing the data of different publications reveals a spread in the values given for the electric parameters of different types of tissue. Table-2 and Table-3 show the electric parameters to be chosen for this study. The permittivity and the conductivity of the tissues of phantoms were taken from the dielectric database /16/.

|  | Table 1 | Three | different | head | models | for | simulations |
|--|---------|-------|-----------|------|--------|-----|-------------|
|--|---------|-------|-----------|------|--------|-----|-------------|

|                    | Realistic<br>head model<br>[20] | Spherical<br>head model<br>[21] | Cubical head<br>model [7] |
|--------------------|---------------------------------|---------------------------------|---------------------------|
| Head<br>Volume     | $4.44 \text{ dm}^3$             | 3.35 dm <sup>3</sup>            | 4.26 dm <sup>3</sup>      |
| Tissues            | 13                              | 120                             | 12                        |
| Computa-<br>tional | 175 × 230                       | 159 × 208                       | 159 ×206                  |
| Space              | ×226                            | ×201                            | ×249                      |
| Voxel              | $(1 \text{ mm})^3$              | (1.075                          | $(1.875 \text{ mm})^2$    |
| Size               |                                 | $mm)^3$                         | × 3 mm                    |

With numerical simulations, using FIT or other finite- difference codes, it is easy to attribute different tissue parameters to different mesh cells. However, the tissue discretization and the assignment of the electrical parameters to various tissues are fraught with considerable uncertainties. Therefore, the question of whether these parameters significantly alter the absorption has been studied by relating different parameters to the various tissues: 1) anatomically correct head phantoms with tissue distributions derived from MRI (referred to as Realistic, Cubical, Spherical), 2) simplified head phantoms, which have the outer shape of the MRI phantoms, but which contain only one tissue with high water content ( $\varepsilon_r$  = 43.5,  $\sigma$  = 0.9 mho/m) and one with low water content, using the parameter of bone tissue, i.e.,  $\varepsilon_r = 21$ ,  $\sigma = 0.33$  mho/m (referred to as realistic, cubical, spherical); and 3) homogeneous head phantoms with the outer shape of the MRI phantoms which contain only one tissue type with  $\varepsilon_r$  = 43.5,  $\sigma$  = 0.9 mho/m (referred to as realistic, cubical, spherical).



Fig. 1 Test view of three MRI phantoms. a).Realistic head model b) Spherical head model c) Cubical head model

### 4. Phone Model

The numerical phone model operating at 900 MHz and 1800 MHz consists of a conducting box, a plastic casing, and a helix antenna, as shown in Fig.2. The antenna was fed using the coaxial feeding method. The electric field components of the same amplitude, tangential to the top surface of the phone, were applied at the source point, as shown in Fig.2 (c). For both of the head models, the phone was located on the reference plane, which is defined by auditory canal openings of both ears and the center of the mouth. Regarding the direction of the phone, the touch position that is a normal operating position was used for the anatomical models, and for the simple models, phone was directed parallel to the sagittal plane of the head, as shown in Fig.2.

The cellular-phone model is constructed with a quarterwavelength helix antenna mounted on the top of a rectan-



Fig. 2 Phone model (unit: mm). (a) Front view. (b) Top view. (c) Feeding part in FDTD mesh

gular box with dimensions height *h* = 108 mm, width *w* = 46 mm, and thickness *d* = 23 mm, which is filled with equivalent material, as shown in Figure 2. The material on the outside surface of the handset box is adopted by a dielectric material ( $\varepsilon_r$  = 3.84,  $\sigma$  = 1.0 x10<sup>-5</sup> S/m) and ( $\varepsilon_r$  = 3.78,  $\sigma$  = 1.0x10<sup>-5</sup> S/m) at 900 and 1800 MHz, respectively. According to a study in ref. /20-23/, the dielectric material used for the outside surface of a handset box has significant impact on the reduction of the SAR induced in a human head. The transmitted power of the cellular phone is assumed to be 0.6 W at 900 MHz and 0.125 W at 1800

MHz, respectively. Following the equation in ref. /24-26/,  $V = \sqrt{px8xR}$ , where the excitation source voltage V at the feeding point of the monopole antenna can easily be calculated. Here R and P are the resistance and transmitted power of the cellular phone in free space, respectively. Using the method of moments (MoM) /16-18/, the impedance of the monopole antenna in air obtained is 45.0 + j19.34 Ω and 50.81 + j14.85 Ω at 900 and 1800 MHz, respectively. The calculations of SAR distribution induced in the human head are made with an initial sinusoidal timevarying electric .field  $Ez = (V/\sigma) \sin(\omega t)$  located at the gap between the helix antenna and the upper center of the box case, as shown in Fig. 2, where  $\delta$  = 2.0 mm is the cell size used in the FDTD simulation. A metallic material with  $\varepsilon_r = 1.0$ and  $\sigma$  = 3.72 10 S/m is employed to simulate the helix antenna. Three homogeneous human-head models (Fig. 1) including spherical, cubical, and realistic shapes, are discretized into 656419, 656253, and 656132 cubic cells of 2.0 mm on each side in the FDTD simulations, respectively. The relative dielectric constant  $\varepsilon$ r conductivity  $\sigma$  and mass density p of the three homogeneous phantom-muscle head models ( $\epsilon_r$  = 57.4,  $\sigma$  = 0.82 S/m,  $\rho$  = 1.04 g/cm<sup>3</sup>) and ( $\epsilon_r$  = 53.5,  $\sigma$  = 1.34 S/m,  $\rho$  = 1.04 g/cm<sup>3</sup>) at 900 and 1800 MHz are obtained from the literature /5-10/, respectively. The cellular phone and human-head models are assumed to be of nonmagnetic material ( $\mu r = 1.0$ ).

### 5. Results

The results of maximum local SARs induced in three homogeneous human-head models versus the distance between the cellular phone and the human-head model are shown in Figs. 3 and 4. From Figs. 3 and 4 it is clear that the spherical head model has the maximum value of the local maximum SAR, while the realistic human head model has the minimum value of the local maximum SAR. The results of the local maximum SAR induced in the cubical and spherical head models are closer and they are be-



Fig. 3 SAR induced in four homogeneous phantommuscle head models with relative dielectric constant  $\varepsilon_r$  = 57.4, conductivity  $\sigma$  = 0.82 S/m, and mass density  $\rho$  =1.04g/cm<sup>3</sup> at 900 MHz

tween those obtained by the realistic head models. The average difference of the local maximum SAR induced in these three head models is approximately within 12% at 900 MHz and 9% at 1800 MHz, respectively. This observation emphasizes that the shape of the human head plays a minor role in calculating the SAR induced in the humanhead models. These yield that the conclusion made in ref /25-26/ that the maximum local SAR is scarcely affected by the shape of the human head exposed to a cellular phone. In the following study, the realistic human-head model is simulated six times as a homogeneous model with six electrical property sets, such as the bone tissues, the skin tissues, the blood tissues, the eye tissues, the muscle, and brain tissues, and one time as an inhomogeneous model with six tissues taken into account to calculate the SAR results at 900 and 1800 MHz, respectively.



Fig. 4 SAR induced in four homogeneous phantommuscles head models with relative dielectric constant  $\varepsilon_r$  = 53.5, conductivity  $\sigma$  = 1.34 S/m, and mass density  $\rho$  = 1.04 g/cm<sup>3</sup> at 1800 MHz

The electrical properties of muscle /24/ are ( $\epsilon_r = 57.4$ ,  $\sigma = 0.82$  S/m,  $\rho = 1.04$  g/cm<sup>3</sup>) and ( $\epsilon_r = 53.5$ ,  $\sigma = 1.34$  S/m,  $\rho = 1.04$  g/cm<sup>3</sup>) 900 and 1800 MHz, respectively. A comparison of the data found in different publications reveals a spread in the values given for the electrical properties of different types of tissue /24-27/. The electrical properties of the inhomogeneous human-head model with six tissues at 900 and 1800 MHz are shown in Table 2, and Table 3.

| Table 2 Dielectric tissue | e properties at 900 N | ЛНz |
|---------------------------|-----------------------|-----|
|---------------------------|-----------------------|-----|

| Tissue<br>Type | Density, ρ<br>(1000 Kg-<br>m <sup>-3</sup> ) | Conductivity,<br>$\sigma$<br>(S-m <sup>-1</sup> ) | $\begin{array}{c} \text{Dielectric} \\ \text{Constant} \\ \boldsymbol{\epsilon}_{r} \end{array}$ |
|----------------|----------------------------------------------|---------------------------------------------------|--------------------------------------------------------------------------------------------------|
| Air            | 0.0012                                       | 0.0                                               | 1.0                                                                                              |
| Bone           | 1.85                                         | 0.34                                              | 20.8                                                                                             |
| Skin           | 1.10                                         | 0.68                                              | 43.7                                                                                             |
| Blood          | 1.06                                         | 1.54                                              | 61.4                                                                                             |
| Eye            | 1.01                                         | 1.90                                              | 70.0                                                                                             |
| Brain          | 1.03                                         | 0.77                                              | 45.8                                                                                             |
| Muscle         | 1.04                                         | 0.82                                              | 57.4                                                                                             |

| Tissue | Density, p        | Conductivity,        | Dielectric                       |
|--------|-------------------|----------------------|----------------------------------|
| Туре   | (1000 Kg-         | σ                    | Constant                         |
|        | m <sup>-3</sup> ) | (S-m <sup>-1</sup> ) | $\mathbf{\epsilon}_{\mathrm{r}}$ |
| Air    | 0.0012            | 0.0                  | 1.0                              |
| Bone   | 1.85              | 0.59                 | 19.3                             |
| Skin   | 1.10              | 1.21                 | 41.4                             |
| Blood  | 1.06              | 2.04                 | 59.37                            |
| Eye    | 1.01              | 2.03                 | 68.6                             |
| Brain  | 1.03              | 1.15                 | 43.5                             |
| Muscle | 1.04              | 1.34                 | 53.5                             |

Table 3 Dielectric tissue properties at 1800MHz

Comparisons of maximum local SAR induced in the realistic human-head model for homogeneous and inhomogeneous cases at 900 and 1800 MHz are shown in Figs 5 and 6, respectively. It is found that local maximum SAR induced in homogeneous models has larger values than those induced in inhomogeneous models. It should be noted that the electrical property of a human head significantly affect the result of the SAR induced in homogeneous or inhomogeneous head models. Today, correct calculation or measurement of SAR distribution in a human head while using a cellular phone has become an important issue.



Fig. 5 Comparisons of maximum local SAR induced in the realistic human head model for homogeneous and inhomogeneous cases at 900 MHz

### 6. Conclusions

In this paper, the results prove that the spatial peak SAR is barely affected by the size, different dielectric properties, and the shape of the human head for electromagnetic





sources at a defined distance from the human head. Compared to other factors, such as distance of the source from the head and design of the devices, the effects caused by the complex anatomy are minor especially in the case of volume-averaged values. The comparison of the results obtained from the inhomogeneous and homogeneous phantoms suggests that homogeneous phantoms are highly suited to be used in compliance tests for handheld cellular telecommunications equipment operating in the 900 and 1800 MHz respectively. The results of our study can also be used to investigate differences of biological effects between human species and ages.

### Acknowledgement

The authors would like to thank Institute of Space Science (ANGKASA), Universiti Kebangsaan Malaysia (UKM) and the MOSTI Secretariat, Ministry of Science, Technology and Innovation of Malaysia, e- Science fund: 01-01-02-SF0566, for sponsoring this work.

### References

- /1/ Report of Telecommunications Technology Council for the ministry of Posts and Telecommunications, Deliberation no. 89, "Radio-Radiation Protection Guidelines for Human Exposure to Electromagnetic Fields," Tokyo, 1997.
- /2/ K. S. Kunz and R. J. Luebbers, "The finite difference time domain method for electromagnetic," *Boca Raton, FL, CRC*, 1993.
- /3/ IEEE C95.1-2005, "IEEE standards for safety levels with respect to human exposure to radio frequency electromagnetic fields, 3 kHz to 300 GHz," *Institute of Electrical and Electronics Engineers*, New York, NY, 2005.
- /4/ A. Hirata, K. Shirai, and O. Fujiwara, "On averaging mass of SAR correlating with temperature elevation due to a dipole antenna" *Progress In Electromagnetics Research*, PIER 84, 221–237, 2008.

- /5/ Q. Balmno. 0. Garay, and T. I. Manning, "Electromagnetic energy exposure of simulated users of portable cellular telephones," *IEEE Trans. Veh. Technol.*, vol. 44, no. 3, pp. 390403, Aug. 1995.
- /6/ P. J. Dimbylow and S. M. Mann, "SAR calculations in an anatomically realistic model of the head for mobile communication transceivers at 900 MHz 1.8 GHz," *Phys. Med. Biol.*, vol. 39, no. 12, pp. 1537-1553, 1994.
- /7/ 0. P. Gandhi. .I. Y. Chen, and D. Wu, "Electromagnetic absorption in the human head for mobile telephones at 835 MHz and 1900 MHz," *Int. Symp Electromag. Compat.*, Roma, 1994, pp. 1-5.
- /8/ L. Martens, J. De Moerloose, and D De Zutter, "Calculation of the electromagnetic fields induced in the head of an operator of a cordless telephone." *Radio Sci.*, vol. 30. no. I, pp. 283-290, Jan. 1995.
- /9/ M. A. Jensen and Y. Rahmat-Sarnii, "EM interaction of handset antennas and a human in personal communications," *Proc. IEEE*, vol. 83, no. 1, pp. 7-17. Jan. 1995.
- /10/ G. F. Pedersen and J. B. Andersen, "Integrated antennas for handheld telephones with low absorption," in 44th IEEE Veh. Technol. Conf., Stockholm, Sweden, June 1994, pp. 1537-1541.
- /11/ J. Fuhl, P. Nowak, and E. Bonek, "Improved internal antenna for hand-held terminals," *Electron. Lett.*, vol. 30, no. 22, pp. 1816-1818, 1994.
- /12/ N. Kuster and Q. Balzano, "Energy absorption mechanism by biological bodies in the near-field of dipole antennas above 300 MHz," *IEEE Trans. Veh. Technol.*, vol. 41, no. 1, pp. 17-23, Feb. 1992.
- /13/ Chan, K. H., K. M. Chow, L. C. Fung, and S. W. Leung, "Effects of using conductive materials for SAR reduction in mobile phones," *Microwave and Optical Technology Letters*, Vol. 44, No. 2, 140-144, Jan. 2005.
- /14/ Ali. M., Sanyal, S, "A numerical investigation of finite ground Planes and reflector effects on monopole antenna factor using FDTD technique," *Journal of Electromagnetic Waves and Applications*, Volume 21, No. 10, 1379-1392 (14) 2007.
- /15/ Kiminami, K., A. Hirata, Y. Horii, and T. Shiozawa, "A study on human body modeling for the mobile terminal antenna design at 400 MHz band," *J. of Electromagnetic Waves and Appl*, Vol. 19, 671–687, 2005.
- /16/ Microwave Consultants, Dielectric Database, Microwave Consultants Ltd., London, pp. 1-5, 1994
- /17/ T. Schmid, O. Egger, and N. Kuster, "Automated E-field scanning system for dosimetric assessments," *IEEE Trans. Microwave Theory Tech.*, vol. 44, no. 1, pp. 105-113, Jan. 1996.
- /18/ Ae-kyoung, and Jeong-ki Pack, "Effect of head size for cellular telephone exposure on EM absorption," *IEICE Trans. Commun.*, vol. E85-B, no. 3, 2002.
- /19/ M. T. Islam, M. R. I. Faruque, and N. Misran, "Design analysis of ferrite sheet attachment for SAR reduction in human head," *Progress In Electromagnetics Research*, PIER 98, 191-205, 2009.

- /20/ G. Bielke and S. Meindl, "Dreidimensionale segmentierte MR-Bilddatenatze." Tech. Rep. Nr. 6564/33038, *Deutsche Klinik fur Diagnostik* e. V., Forschungsvertrdg DBP Telekom, 1993.
- /21/ K. H. Hohne, M. Bomans, M. Reimer, R. Schubert, U. Tiede, and W. Liersc, "A volume-based anatomical atlas," *IEEE Computer Graphics Applicati.*, pp. 72-77, 1992
- /22/ M. Okoniewski and M. Stuchly, "A study of handset antenna and human body interaction," *IEEE Trans. Microwave Theory Tech.*, vol. 44, pp. 1855-1864, oct. 1996.
- /23/ K. caputa, M. Okoniewski, and M. A. Stuchly, "An algorithm for computations of the power deposition in human tissue," *IEEE Trans. Antennas & Propag. & Mag.*, vol. 41, no. 4, pp. 102-107, Aug. 1999.
- /24/ M. Burkhard and N. Kuster, Appropriate modeling of the ear for compliance testing of handheld MTE with SAR safety limits at 900/ 1800 MHz, *IEEE Trans Microwave Theory Tech MTT*-48 (2000),1927–1934.
- /25/ V. Hombach, K. Meier, M. Burkhardt, E. Kuhn, and N. Kuster, The dependence of EM energy absorption upon human head modeling at 900 MHz, *IEEE Trans Microwave Theory Tech MTT*-44 (1996), 1865–1873.
- /26/ P. Bernardi, M. Cavagnaro, and S. Pisa, Evaluation of the SAR distribution in the human head for cellular phones used in a partially closed Environment, *IEEE Trans Electromagn Compat EMC*-38 (1996), 357–366.
- /27/ J.T. Rowely and R.B. Waterhouse, Performance of shorted microstrip patch antennas for mobile communications handsets at 1800 MHz, *IEEE Trans Antennas Propagat* AP-47 (1999), 815– 822.

Mohammad Rashed Iqbal Faruque, Norbahiah Misran Dept. of Electrical, Electronic and Systems Engineering, Faculty of Engineering and Built Environment, Universiti Kebangsaan Malaysia, 43600 UKM, Bangi, Selangor, Malaysia. rashedgen@yahoo.com, bahiah@vlsi.eng.ukm.my

> Mohammad Tariqul Islam Institute of Space Science (ANGKASA), Universiti Kebangsaan Malaysia, 43600 UKM, Bangi, Selangor, Malaysia. titareq@yahoo.com,

Prispelo (Arrived): 20.01.2010 Sprejeto (Accepted): 09.09.2010

# SPECIFIC ABSORPTION RATE ANALYSIS USING METAL ATTACHMENT

<sup>1</sup>Mohammad Tariqul Islam, <sup>2</sup>Mohammad Rashed Iqbal Faruque, <sup>3</sup>Norbahiah Misran

<sup>1</sup>Institute of Space Science (ANGKASA), Universiti Kebangsaan Malaysia, Bangi, Selangor, Malaysia

<sup>2,3</sup>Dept. of Electrical, Electronic and Systems Engineering, Faculty of Engineering and Built Environment, Universiti Kebangsaan Malaysia, Bangi, Selangor, Malaysia.

Key words: SAR, portable telephone, metal attachment, SAR reduction

**Abstract:** Specific absorption rate (SAR) reduction with metal attachment is analysed in this paper. The SAR reduction technique is discussed and the effects of attaching location, distance, and size of metal on the SAR reduction are investigated. The remarkable improvement has been achieved for SAR reduction of the initial SAR value for the case of 1 gm SAR. These results suggest a guideline to choose various types of metals with the maximum SAR reducing effect for a portable telephone.

# Analiza parametra SAR pri uporabi kovinske zaščite

Kjučne besede: SAR, prenosni telefon, kovinski nastavek, zmanjšanje SAR

Izvleček: V članku analiziramo vpliv kovinskega nastavka na zmanjšanje vrednosti parametra SAR. Raziščemo vpliv položaja, velikosti in razdalje le-tega na zmanjšanje SAR. Še poseben napredek smo dosegli v primeru 1gm SAR. Rezultati nam služijo kot smernica pri izbiri različnih vrst kovinskih materialov, ki najbolj vplivajo na zmanjšanje SAR pri uporabi prenosnega telefona.

# 1. Introduction

Mobile phone safety and human exposure to electromagnetic (EM) radiation, as well as the pertinent health effects, constitutes a matter of raised public concern, and this issue has undergoing continuous scientific investigation. Various studies on this subject exist /1-3/, most of which mainly investigate into the consequences of mobile-phone usage. Yet, devices and communication terminals operating in other frequency bands have also gained substantial interest in the last 15 years. In /4/, a ferrite sheet was adopted as protection between the antenna and the human head. A reduction of over 13% for the spatial peak SAR over 1 gm averaging was achieved. A study on the effects of attaching a ferrite sheet for SAR reduction was presented in /5/, and it was concluded that the position of shielding plays an important role in the reduction effectiveness.

In /3/, for the SAR in the human head, an effective approach is the use of a planar antenna integrated onto the back side (away from the head) of a phone model, but it brings additional design difficulties especially in achieving the required frequency bandwidth and radiation efficiency. Another approach is the use of a directional or reflecting antenna /6-8/. Such an antenna structure sacrifices the availability of signals received from all directions to the phone model. The mechanism of SAR reduction by ferrite sheet attachment was due to the suppression of surface currents on the front side of phone model /9/. However,

the relationship between the maximum SAR reducing effect and the parameters such as attaching location, size and material properties of ferrite sheet remains unknown.

In /10/ a perfect electric conductor (PEC) reflector was placed between a human head and the driver of a folded loop antenna. The result showed that the radiation efficiency can be enhanced and the peak SAR value can be reduced. In /11-12/, a study on the effects of attaching conductive materials to cellular phone for SAR reduction has been presented. It is shown that the position of the shield-ing material is an important factor for SAR reduction effectiveness. There is a necessity to make an effort for reducing the spatial peak SAR in the design stage of ferrite sheet because the possibility of a spatial peak SAR exceeding the recommended exposure limit cannot be completely ruled out.

# 2. Simulation model

The simulation model which includes the handset with PIFA type of antenna and the SAM phantom head provided by CST Microwave Studio<sup>®</sup> (CST MWS) is considered in the simulation model. A complete handset model composed of the circuit board, LCD display, keypad, battery, and housing was used for simulation. The relative permittivity and conductivity of individual components were set to comply with industrial standards. In addition, definitions in /3/, /6-7/ were adopted for material parameters involved in the SAM phantom head. In order to accurately characterize

the performance over a broad frequency range, dispersive models for all the dielectrics were adopted during the simulation /6/. The electrical properties of materials used for simulation are listed in Table 1. A PIFA type antenna constructed in a helical sense operating at 900 MHz for GSM application was used in the simulation model. In order to obtain a high-quality geometry approximation for such a helical structure, a predictable meshing scheme used in the FDTD method usually requires large number of hexahedrons which in turn makes it extremely challenging to get convergent results within reasonable simulation time.

Table 1: Electrical properties of materials used for simulation

| Phone Materials  | $\mathbf{\epsilon}_r$ | σ(S/m) |
|------------------|-----------------------|--------|
| Circuit Board    | 4.4                   | 0.05   |
| Housing Plastic  | 2.5                   | 0.005  |
| LCD Display      | 3.0                   | 0.02   |
| Rubber           | 2.5                   | 0.005  |
| SAM Phantom Head |                       |        |
| Shell            | 3.7                   | 0.0016 |
| Liquid @ 900MHz  | 40                    | 1.42   |

### 3. Numerical techniques

CST Microwave Studio with the finite integral time-domain technique (FITD), was used as the main simulation instrument. A non-uniform meshing scheme was adopted so that the major computation endeavor was dedicated to regions along the inhomogeneous boundaries for fast and perfect analysis. The minimum and maximum mesh sizes were 0.3 mm and 1.0 mm, respectively. A total of 2,097,152 mesh cells were generated for the complete model, and the simulation time was 1163 seconds (including mesh generation) for each run RAM on an Intel Core <sup>™</sup> 2 Duo E 8400 3.0 GHz CPU with 4 GB system.

The analysis workflow started from the design of the antenna with complete handset model in free space. The antenna was designed such that the S11 response was less than -10 dB over the frequency band of interest. The SAM phantom head was then included for SAR calculation using the standard definition as /4/

$$SAR = \frac{\sigma}{2\rho} E^2$$

where *E* is the induced electric field (V/m),  $\rho$  is the density of the tissue (kg/m<sup>3</sup>), and  $\sigma$  is the conductivity of the tissue (S/m). The resultant SAR values averaged over 1 gm and 10 gm of tissue in the head were denoted as SAR 1 gm and SAR 10 gm, respectively. These values were used as a benchmark to appraise the effectiveness in peak SAR reduction.

### **Result and Discussion:**

A metal sheet covering the human body is utilized to reducing the EM absorption in this section. A compact size of metal sheet is considered in this reasearch. A numerical calculation has been done for investigating the possibility of using a metal sheet with a small size to reduce the EM absorption. The metal sheet was modeled as an infinitely thin perfect conductor with the same sizes as the metal sheets. The efficiency  $\xi$  was still used to evaluate the effect on the reduction of EM absorption. The calculated results for efficiency  $\xi$  are shown in Table-1, Table-2, and Table-3 for Category-1, Category-2, and Category-3 respectively. Table-3, for Category-3, is the largest size among the three types and thus the largest reflection effect is expected. It should be noticed that the values of efficiency  $\xi$  are negative for both at 900 MHz and 1800 MHz. Especially, at 1800 MHz, the absorbed power is increased by 15.8%. Table-1 implies that a metal sheet with small size cannot reduce the SAR in the human head. For the Category-1, and Category-2, the efficiency is shown in the Table-1 and Table-2.

Table 1: Effect of metal sheet on sar reduction (Category-1: 3X4 cm metal size positioned in top of mobile phone)

| ξ[%]                     |         |          |  |  |  |  |
|--------------------------|---------|----------|--|--|--|--|
|                          | 900 MHz | 1800 MHz |  |  |  |  |
| Peak SAR gm for head     | -4.8    | -13.2    |  |  |  |  |
| Peak SAR 1gm for brain   | -5.3    | -14.3    |  |  |  |  |
| Average SAR for eyeball  | -2.13   | -0.62    |  |  |  |  |
| Average SAR for head     | -2.21   | -6.9     |  |  |  |  |
| P <sub>abs</sub> by head | -2.21   | -6.9     |  |  |  |  |

Table 2: Effect of metal sheet on sar reductioN (Category-2: 3X4 cm metal size positioned at the edge of mobile phone)

| ξ[%]                     |         |          |  |  |  |  |
|--------------------------|---------|----------|--|--|--|--|
|                          | 900 MHz | 1800 MHz |  |  |  |  |
| Peak SAR gm for head     | -4.9    | -13.4    |  |  |  |  |
| Peak SAR 1gm for brain   | -5.1    | -14.9    |  |  |  |  |
| Average SAR for eyeball  | -3.34   | -0.6     |  |  |  |  |
| Average SAR for head     | -2.71   | -6.8     |  |  |  |  |
| P <sub>abs</sub> by head | -2.71   | -6.8     |  |  |  |  |

This is due to the strong EM field that is induced in the neighbor of the edges of the metal sheet which is small compared to the head. The head is exposed to the strong EM field and thus absorbs more EM energy. However, for

| ξ[%]                     |         |          |  |  |  |  |
|--------------------------|---------|----------|--|--|--|--|
|                          | 900 MHz | 1800 MHz |  |  |  |  |
| Peak SAR gm for head     | -8.7    | -24.3    |  |  |  |  |
| Peak SAR 1gm for brain   | -10.2   | -23.8    |  |  |  |  |
| Average SAR for eyeball  | -7.4    | -0.98    |  |  |  |  |
| Average SAR for head     | -3.6    | -15.8    |  |  |  |  |
| P <sub>abs</sub> by head | -3.6    | -15.8    |  |  |  |  |
|                          |         |          |  |  |  |  |

Table 3: Effect of metal sheet on sar reduction (Category-3 : 6X4 cm metal size at the top of mobile phone)

a metal sheet, no strong EM field is induced in its neighbor because of its ability to absorb EM energy and transform it into heat.

## 5. Conclusions

The EM interaction between an antenna and the human head with metal sheet has been discussed in this paper. Utilizing metal in the phone model a SAR value is reduced for 10 gm and for SAR 1 gm. Based on the 3-D FDTD method with lossy-Drude model, it is found that for the both cases peak SAR 1 gm and SAR 10 gm of the head can be reduced by placing metals between the antenna and the human head.

# Acknowledgement

The authors would like to thank Institute of Space Science (ANGKASA), Universiti Kebangsaan Malaysia (UKM) and the MOSTI Secretariat, Ministry of Science, Technology and Innovation of Malaysia, e- Science fund: 01-01-02-SF0566, for sponsoring this work.

### References

- /1/ IEEE C95.1-2005. "IEEE Standards for safety levels with respect to Human Exposure to Radio Frequency Electromagnetic fields,3KHz to 300GHz," *Institute of Electrical and Electronics Engineers*, Inc. New York, NY 2005.
- /2/ G.F. Pedersen and J.B. Andersen, "Integrated antennas for handheld telephones with low absorption", *Proc. 44th IEEE Veh. Tech. Conf.*, Stockholm, Sweden, June. 1994, pp. 1537-1541.
- /3/ C .M .Kuo. and C. W. Kuo, "SAR distribution and temperature increase in the human head for mobile communication," in IEEE-APS Int. Symp. Dig., Columbus, OH, 2003, pp. 1025-1028.
- /4/ J. Wang and O. Fujiwara, "FDTD computation of temperature rise in the human head for portable telephones," *IEEE Trans. Microwave Theory Tech.*, vol. 47, no. 8, pp. 1528-1534, Aug.1999.
- /5/ J. B. Pendry, A. J. Holen, D. J. Robbins, and W. J. Stewart, "Magnetism from conductors and enhanced nonlinear phenomena," *IEEE Trans. Microwave Theory Tech.*, vol. 47, no. 11, pp. 2075– 2084, Nov. 1999.

- /6/ A. Hirata, T. Adachi, and T. Shiozawa, "Folded loop antenna with a reflector for mobile handsets at 2.0 GHz," *Microwave Opt. Technol, Lett.*, vol.40, no.4, pp. 272-275, Feb. 2004.
- /7/ K. H. Chan, K. M. Chow, L. C. Fung, and S. W. Leung, "Effects of using conductive materials for SAR reduction in mobile phones," *Microwave Opt. Technol, Lett.*, vol.44, no. 2, pp. 140-144, Jan. 2005.
- /8/ K Kiminami, T Iyama, T Onishi, and S Uebayashi, "Novel specific absorption rate (SAR) estimation method based on 2-D scanned electric fields," *IEEE Trans. on Electromagnetic Compatibility.*, vol. 50, no. 4, Nov. 2008.
- /5/ L.C Fung, S .W. Leung, and K. H. Chan, "An investigation of the SAR reduction methods in mobile phone application," *IEEE International Symposium on EMC.*, vol.2, pp. 656-660, Aug. 2002.
- /7/ S. Curto, P. McEvoy, X. L. Bao, and M. J. Ammann, "Compact patch antenna for electromagnetic interaction with human tissue at 434 MHz," *IEEE Trans. on Antennas and Propagation*, vol. 57, no. 9, Sep. 2009.
- /8/ J. Wang and O. Fujiwara, "Reduction of electromagnetic absorption in the human head for portable telephones by a ferrite sheet attachment," *IEICE Trans. Commun.*, vol. E80b, no. 12, pp. 1810-1815, Dec. 1997.
- /9/ R. Y. S. Tay, Q. Balzano and N. Kuster, "Dipole configuration with strongly improved radiation efficiency for hand-held transceivers", *IEEE Trans. Antennas Propagat.*, vo1. 46, no. 6, pp. 798-806, June. 1998.
- /10/ C. H. Li, N. Chavannes, and N. Kuster, "Effects of hand phantom on mobile phone antenna performance," *IEEE Trans. On Antennas and Propagation*, vol. 57, no. 9, Sep. 2009.
- /11/ L. C. Fung, S. W. Leung, and K. H. Chan, "Experimental study of SAR reduction on commercial products and shielding materials in mobile phone applications," *Microwave and Optical Technology Letters*, vol. 36, no. 6, pp. 419-422, March. 2003.
- /12/ R. G. Vaughan, and N. L. Scott, "Evaluation of Antenna Configurations for Reduced Power Absorption in the Head," *IEEE Trans.* On Vehicular Technology, vol. 48, no. 5, Sep. 1999

Mohammad Tariqul Islam Institute of Space Science (ANGKASA),Universiti Kebangsaan Malaysia, Bangi, Selangor, Malaysia titareq@yahoo.com

Mohammad Rashed Iqbal Faruque, Norbahiah Misran Dept. of Electrical, Electronic and Systems Engineering, Faculty of Engineering and Built Environment, Universiti Kebangsaan Malaysia, 43600 UKM, Bangi, Selangor, Malaysia. rashedgen@yahoo.com, bahiah@vlsi.eng.ukm.my

Prispelo (Arrived): 02.03.2009 Sprejeto (Accepted): 09.09.2010

# Primerjava med Dubois in Shi empiričnim modelom ocenjevanja vlažnosti iz TerraSAR-X podatkov

Matej Kseneman, Dušan Gleich

## Fakulteta za elektrotehniko, računalništvo in informatiko, Maribor, Slovenija

Kjučne besede: TerraSAR-X, ocenjevanje vlažnosti, empirični model, samoorganizirajoče se nevronske mreže, MBD, GMRF, CUDA

Izvleček: Zelo pomembno za Zemljino okolje je napovedovanje naravnih nesreč, kot so požari, poplave, potresi itd. Naš cilj je oceniti kvaliteto empiričnih modelov, ki se uporabljajo pri pridobivanju ocenjenih parametrov vlažnosti. V tem članku smo primerjali delovanje Dubois in Shi empiričnega modela nad reko Dravo v bližini mesta Maribor, natančneje na področju umetnega kanala Zlatoličje. Ti modeli so testirani na TerraSAR-X satelitskih posnetkih, kateri so last nemškega vesoljskega satelita, ki deluje v X-pasu radarskih frekvenc. Da smo lahko primerjali in napravili validacijo rezultatov, smo napravili terenske meritve s senzorjem Pico64 v času zajema satelitske slike. Prišli smo do zaključka, da Shi empirični model deluje boljše, tako v natančnosti ocenjenih vrednosti vlažnosti terena, kot tudi v stabilnosti delovanja algoritma.

# Comparison Between Dubois and Shi Empirical Models Used for Soil Moisture Estimation for TerraSAR-X Data

Key words: TerraSAR-X, soil moisture estimation, empirical models, self-organizing maps, MBD, GMRF, CUDA

Abstract: It is essential for Earth's environment to predict natural disasters like fires, floods, earthquakes, etc. Our goal is to asses quality of empirical models used in soil moisture parameter retrieval. In this paper we assessed Dubois and Shi empirical model over the river Drava next to the city of Maribor, Slovenia. These models were applied to TerraSAR-X satellite, which is a German national space satellite operating at X-band radar frequency. In order to compare and validate results, field measurements were done with a Pico64 sensor at the time of image capture. As we have concluded, Shi model is preferred when it comes to accuracy of estimated volumetric soil moisture.

# 1 Uvod

Človeštvo teži k napovedovanju naravnih katastrof, kot so požari, potresi, poplave, itd. Tako predstavlja ocenjevanje vlažnosti ključni parameter na različnih področjih študije okolja, kar vključuje hidrologijo, meteorologijo in agrikulturo. Ne glede na njegovo pomembnost, se ocenjevanje vlažnosti ni razširilo med modeliranje hidroloških in biokemijskih procesov ter pripadajoče dinamike ekosistema, saj je vlažnost zemlje težko meriti na širšem področju, cenovno ugodno in v krajših časovnih intervalih. Vendar predstavljajo zadnji napredki v mikrovalovnem daljinskem zaznavanju potencial kvantitativnega merjenja vlažnosti zemlje na neporaščenih ali predelih z majhno poraščenostjo z vegetacijo /1/. To prinese možnost razvoja algoritmov pridobivanja teh pomembnih parametrov s pomočjo mikrovalovnih daljinskih meritev.

Naš zadani cilj je oceniti vlažnost terena s pomočjo radarskih SAR podob. Na voljo imamo radarske podobe področja umetnega kanala elektrarne Zlatoličje, zajete s TerraS-AR-X /2/ satelitom, ki je bil izstreljen 15. junija 2007 in za svoje delovanje uporablja radar s sintetično odprtino (ang. SAR). Satelit deluje neodvisno od vremenskih razmer in naravnih osvetlitev ter je sposoben zajeti slike z resolucijo enega metra. Satelit ima tri načine delovanja, in sicer Spot-Light, StripMap in ScanSAR. Pri Spot-Light načinu se radarski žarek tekom letenja platforme vedno usmeri na enako področje in s tem pridobi maksimalno možno resolucijo. Pri StripMap načinu gre za kompromis med resolucijo in velikostjo zajetega področja. Način ScanSAR pa je namenjen zajemanju slik ogromnega področja (100 000 km<sup>2</sup>), vendar je resolucija le 16 m.

# 2 Problem ocenjevanja vlažnosti terena

Mnoge študije so dognale, da so aktivni mikrovalovni podatki odvisni od mnogih parametrov naravnega površja, ko sta to dielektrična konstanta /3/ in razgibanost površja. Dielektrična konstanta je močno odvisna od vlažnosti zemlje. Zaradi velike razlike med dielektrično konstanto suhe zemlje (tipična vrednost relativne dielektrične konstante znaša nekje med 2 in 3) in vodo (tipična vrednost dielektrične konstante je približno 80) /4/, se pokaže možnost ocenjevanja vlažnosti z daljinskim zaznavanjem, vendar je ocenjevanje odvisno od veliko dejavnikov, kot so: orientacija molekul, tip in agregatno stanje zemlje. Ocenjevanje vlažnosti zemlje iz aktivnih mikrovalovnih podatkov je zadnje čase postalo sila popularno. Zaradi karakteristične visoke resolucije v kombinaciji z globalno pokritostjo, lahko vesoljske SAR platforme doprinesejo unikaten pogled na prostorske in časovne spremembe vlažnosti zemlje pri relativno visoki resoluciji. Veliko število SAR sistemov (AIR-SAR, E-SAR, ERS-1, JERS-1, SIR-C in TerraSAR-X) s svojimi dobrimi kalibracijskimi izvedbami, je napravilo podatke SAR, dostopne za kvantitativno pridobivanje informacij o vlažnosti zemlje.

Pri ocenjevanju vlažnosti zemlje bomo uporabljali le empirična modela Dubois /5/ in Shi /6/, ki temeljita na teoretičnih modelih, in sicer na enačbah integralnega modela (IEM) /7/. Ta empirična modela sta razširjena in modificirana glede na fizikalna dognanja in zapisana s pomočjo regresijske analize zajetih podatkov LCX POLARSCAT in RASAM /13/ sistemov ter simuliranih podatkov, ki so rezultat zemeljskih meritev z odbojniki, nameščenimi na tovornjak. Primerjava s teoretičnimi modeli govori v korist empiričnih modelov, predvsem v širšem področju uporabe in predvsem manj kompleksnejšimi algoritmi.

Dielektrična konstanta je tudi močno odvisna od frekvence, pri kateri opazujemo medij. Frekvenčna odvisnost dielektrične konstante vode je podana z Debyejevo enačbo

$$\varepsilon_{w} = \varepsilon_{w\infty} + \frac{\varepsilon_{w0} - \varepsilon_{w\infty}}{1 + j2\pi f \tau_{w}}$$
(1)

kjer je  $\varepsilon_{\omega 0}$  statična dielektrična konstanta vode in  $\varepsilon_{\omega \infty}$  predstavlja visokofrekvenčno (ali optično) mejo  $\varepsilon_{\omega}$  (oba parametra sta brez enote).  $\tau_{\omega}$  je čas prehodnega pojava vode, merjen v sekundah /s/ in *f* je frekvenca elektromagnetnega (EM) valovanja v /Hz/ /8/.

### 2.1 Odbojni modeli razgibanega terena

Zemljino površje je večinoma razgibano, ravno razgibanost pa je ključnega pomena pri odbojih in sipanju elektromagnetnega valovanja. Vpliv razgibanosti terena se primarno kaže pri sipanju EM valovanja, medtem ko pri odbojih igra ključno vlogo vpadni kot, kar si lahko predstavljamo, kot metanje senc ob sončnem zahodu.

Teoretični modeli, kot so: model majhnih motenj, fizikalni optični model in geometrični optični model, napovedujejo trend radarskega povratnega sipanja kot odziv na spremembe v razgibanosti ali vlažnosti zemlje. Vendar so ti modeli le redko uporabljeni pri inverznih problemih, torej pri pridobivanju inverznih podatkov iz naravnega površja, v veliki večini zaradi omejitev, ki jih uporabijo pri izpeljevanju modela, saj ga s tem tudi poenostavijo. Nedavno so predstavili tudi enačbo integralnega modela /7/, ki je resda veliko kompleksnejši od predhodno omenjenih, ampak je uporaben na širšem področju površja. Zaradi tega se večina literature o aplikacijah nanaša na empirične modele, ki so poenostavljeni analitični modeli, uporabni le za določeno valovno dolžino in področje.

### 2.1.1 Dubois model in njegova inverzija

Je empirični model /5/, ki je bil predstavljen leta 1995 in je poenostavitev Oh modela /5, 9/, saj vsebuje le ko-polarizirane komponente (horizontalnih in vertikalnih polarizacij).

Empirični model, izpeljan z uporabo podatkovnih naborov POLARSCAT in RASAM /13/, opisuje le koeficiente kopolariziranega povratnega sipanja neporaščenega površja kot funkcijo razgibanosti površja, dielektrične konstante, vpadnega kota in frekvence. Dielektrična konstanta je parameter, občutljiv na prostorninsko vlažnost zemlje. *hh*- in *vv*-polarizirani koeficienti povratnega sipanja  $\sigma^0_{hh}$  in  $\sigma^0_{vv}$ so bili empirično določeni in sledijo naslednji zakonitosti:

$$\sigma_{hh}^{0} = 10^{-2.75} \, \frac{\cos^{1.5} \theta}{\sin^5 \theta} \, 10^{0.028 \varepsilon \, \tan \theta} \, \left( ks \cdot \sin \theta \, \right)^{1.4} \, \lambda^{0.7} \tag{2}$$

$$\sigma_{\nu\nu}^{0} = 10^{-2.35} \frac{\cos^{3}\theta}{\sin^{3}\theta} 10^{0.046\epsilon \tan\theta} (ks \cdot \sin\theta)^{1.1} \lambda^{0.7}$$
(3)

kjer je  $\Theta$  vpadni kot,  $\varepsilon$  je realni del dielektrične konstante, *h* je RMS višina površja, *k* je valovno število in  $\lambda$  je valovna dolžina v cm. Ti dve relaciji sta veljavni za frekvence znotraj intervala 1,5 in 11 GHz (TerraSAR-X ima tipično frekvenco 9,65 GHz), za razgibanost terena na intervalu 0,3-3 cm RMS višine (kar je bil razmik učne množice podatkov) in vpadni kot med 30 in 65°.

Obnašanje splošnega povratnega sipanja s podano razgibanostjo, zapisano z enačbo (2, 3), je zelo podobno, kot napovedujeta modela majhnih motenj in fizikalni optični model; povratni odziv se zmanjšuje s povečevanjem lokalnega kota in z zmanjševanjem razgibanosti površja. Tako v empiričnem modelu, kot tudi pri MMP, se RMS višina površja vpelje kot faktor  $kh \cdot sin(\Theta)$ , ki je brez dimenzionalna projekcija razgibanosti na ravnino vpadnega valovanja. Potenca tega parametra v empiričnem modelu (1,4 in 1,1 za  $\sigma^0_{hh}$  in  $\sigma^0_w$ ) je še vedno dovolj blizu vrednosti 1.

Omenimo še razlog, zakaj ta model zajema le dva polarizirana kanala (vertikalni in horizontalni). Razlog je čisto fizikalen, in sicer sta dva kanala manj občutljiva na sistemski šum in presluh in tudi njuna kalibracija je enostavnejša ter teži k večji natančnosti. Novejši algoritmi na tem področju prinašajo večjo robustnost in učinkovitost tudi ob prisotnosti vegetacije. Izjemna prednost tega so aplikacije, pri katerih imamo možnost zajema le dveh polariziranih kanalov, kar pa ni mogoče pri Oh modelu /9/.

Empirično podani formuli večkrat navajajo kot razmerje  $\sigma^0_{hh}/\sigma^0_{vv}$ , ki je odvisno od razgibanosti terena in se zvišuje predvsem z večanjem razgibanosti terena, in sicer po zakonitosti log(*ks* sin $\Theta$ ).

Invertiranje empiričnega algoritma, ki zajema Duboisov model /5/ je poenostavljena procedura invertiranja Oh modela /9/. Oba neznana parametra lahko direktno izračunamo iz dveh koeficientov ko-polariziranega odboja in lokalnega vpadnega kota z uporabo sledečih enačb.

Prvi korak je izračun dielektrične konstante po enačbi:

$$A_{hh} = 1.5 \log(\cos\theta) - 5\log(\sin\theta) + 0.7 \log(\lambda) - 2.75$$
(4)

$$A_{\nu\nu} = 3\log(\cos\theta) - 3\log(\sin\theta) + 0.7\log(\lambda) - 2.35$$
(5)

$$B_{hh} = 0.028 \dots B_{vv} = 0.046 \dots C_{hh} = 1.4 \dots C_{vv} = 1.1$$
 (6)

$$\Sigma_{hh} = \log\left(\sigma_{hh}^{0}\right) \dots \Sigma_{w} = \log\left(\sigma_{w}^{0}\right)$$
(7)

$$\varepsilon = \frac{C_{\nu\nu} \left( \Sigma_{hh} - A_{hh} \right) - C_{hh} \left( \Sigma_{\nu\nu} - A_{\nu\nu} \right)}{\tan \theta \left( B_{hh} C_{\nu\nu} - B_{\nu\nu} C_{hh} \right)}$$
(8)

V naslednjem koraku pa s pomočjo pravkar ocenjene vrednosti dielektrične konstante zapišemo enačbo za ocenitev razgibanosti terena

$$ks = \sigma_{HH}^{0} 1/1.14} 10^{2.75/1.4} \frac{\sin^{2.57} \theta}{\cos^{1.07} \theta} 10^{-0.02\varepsilon \tan \theta} \lambda^{-0.5}$$
(9)

Kasnejši eksperimenti z modelom algoritma so pokazali, da le-ta izkazuje dobro robustnost nad področji, posejanimi z vegetacijo, vsaj na področju nizkih frekvenc. Za določitev področja vegetacije si pomagamo tudi z razmerjem  $\sigma^0_{hh}/\sigma^0_w$ . Razmerje vrednosti  $\sigma^0_{hh}/\sigma^0_w > -11$  dB izkazuje močno prisotnost vegetacije, česar pa se skušamo izogniti s tem algoritmom. Tako lahko zaključimo, da se inverzni algoritem uporablja le pri izpolnjenem pogoju  $\sigma^0_{hh}/\sigma^0_w < 1$  in  $\sigma^0_{hh}/\sigma^0_w < -11$  dB.

### 2.1.2 Shi model in inverzni algoritem

Temelj Shi modela predstavlja enojni odbojni IEM model, ki vključuje efekt spektralnega frekvenčnega odziva površja. Ker je število neodvisnih SAR meritev omejeno, so pri razvoju modela uporabili numerične simulacije IEM modela za širše področje razgibanosti površja in pogoje  $m_v$ . Algoritem so tudi testirali za natančnost pridobljenih parametrov ocenjevanja  $m_v$  in razgibanosti površja na AIRSAR in SIR-C meritvah /6/ L-pasu, na neporaščenih ali malo poraščenih površjih z vegetacijo. Ta pristop se razlikuje od klasičnih empiričnih pristopov, saj ne uporablja predhodno merjenih podatkov pri razvoju algoritma.

IEM model oceni vrednosti  $\sigma^0_{hh}$  in  $\sigma^0_w$  za neporaščena površja, ki se dobro skladajo z AIRSAR in SIR-C meritvami. Model in njegova odvisnost od parametrov  $m_v$  in ostalih ostaja kljub temu zelo kompleksna. Za potrebe ocenjevanja parametrov površja, je ta model težko aplicirati na realne podatke SAR, zato je potrebna nadaljnja poenostavitev enojnega odbojnega modela IEM. S tem dosežemo izvedljivi inverzni algoritem, ki je sposoben obdelovanja velike količine SAR podatkov.

Dokazano je bilo /7/, da se v primerih, ko je s majhen, da integralni model ločeno povezati s funkcijo  $\alpha_{pp}$  (polarizacijska amplituda), ki je odvisna le od  $\varepsilon_s$  in  $\Theta$ , in funkcijo, ki je odvisna od *ks*. Funkciji sta podani kot:

$$\alpha_{hh} = \frac{(\varepsilon_s - 1)}{\left(\cos\theta + \sqrt{\varepsilon_s - \sin^2\theta}\right)^2}$$
(10)

$$\alpha_{vv} = \frac{(\varepsilon_s - 1)(\sin^2\theta - \varepsilon_s (1 + \sin^2\theta))}{(\cos\theta + \sqrt{\varepsilon_s - \sin^2\theta})^2}$$
(11)

 $\sigma^0{}_{pp}$  je lahko predstavljen kot produkt funkcije dielektričnih lastnosti in funkcije povezane z lastnostmi razgibanosti površja. Pri modelu uporabljamo dvo-kanalno polarizacijo, zaradi tega se enačba poenostavi le na vpadni kot in dielektričnost zemlje. Splošno enačbo inverznega algoritma dielektrične konstante  $\epsilon_s$  lahko zapišemo kot:

$$10\log_{10}\left[\frac{\left|\alpha_{pp}\right|^{2}}{\sigma_{pp}^{0}}\right] = a_{pq}\left(\theta\right) + b_{pq}\left(\theta\right) 10\log_{10}\left[\frac{\left|\alpha_{qq}\right|^{2}}{\sigma_{qq}^{0}}\right] \quad (12)$$

lzkazalo se je, da je v smislu največje občutljivosti na spremembe vlažnosti najboljše izbrati par dveh polariziranih kanalov, saj se s tem odpravi občutljivost na natančnost kalibracije in efekta vegetacije, kajti težimo k uporabi algoritma na predelih z nizko poraščenostjo z vegetacijo. S temi zahtevami je bilo v literaturi predstavljeno, da je najboljša izbira para ko-polariziranih meritev  $\sqrt{(\sigma^0_w \sigma^0_{hh})}$  in  $\sigma^0_w + \sigma^0_{hh}$ , ki se najbolj prilegajo podatkom pri ocenjevanju  $\epsilon_s$ . Zamenjava  $\sigma^0_{pp}$  in  $\sigma^0_{qq} z \sigma^0_w + \sigma^0_{hh}$  in  $\sqrt{(\sigma^0_w \sigma^0_{hh})}$ , in  $|\alpha_{pp}|^2$  in  $|\alpha_{qq}|^2 z |\alpha_w|^2 + |\alpha_{hh}|^2$  in  $|\alpha_w|^2 |\alpha_{hh}|^2$  spremeni enačbo (12) na

$$10\log_{10}\left[\frac{\left|\alpha_{vv}\right|^{2}+\left|\alpha_{hh}\right|^{2}}{\sigma_{vv}^{0}+\sigma_{hh}^{0}}\right]=a_{vh}\left(\theta\right)+b_{vh}\left(\theta\right)10\log_{10}\left[\frac{\left|\alpha_{vv}\right|\left|\alpha_{hh}\right|}{\sqrt{\sigma_{vv}^{0}\sigma_{hh}^{0}}}\right]$$
(13)

Ker je enačba (13) neodvisna od parametra razgibanosti površja in je odvisna le od  $\alpha_w$  in  $\alpha_{hh}$ , ki sta funkciji dielektrične konsistentne in vpadnega kota, lahko z uporabo dveh polariziranih meritev napravimo oceno vlažnosti zemlje. Z uporabo  $\varepsilon_s$  iz enačbe (13) je kasneje mogoče priti do parametra razgibanosti površja.

Vsi koeficienti, uporabljeni v enačbah (12) in (13) so odvisni le od vpadnega kota, zapisani pa so bili s pomočjo regresijske analize. Formule koeficientov modela, ki je povezan s  $\sigma^0_w + \sigma^0_{hh}$  in  $\sqrt{(\sigma^0_w \sigma^0_{hh})}$ , ter  $\alpha_{hh}$  in  $\alpha_w$  so podane kot

$$a_{vh}(\theta) = e^{-12.37 + 37206\sin(\theta) - 41.187\sin^2(\theta) + 18.898\sin^3(\theta)}$$
(14)

$$b_{\nu h}(\theta) = 0.649 + 0.659 \cos(\theta) - 0.306 \cos^2(\theta)$$
 (15)

### 3 Algoritem primerjave modelov

Omenjeni algoritem je sestavljen iz več delnih algoritmov, in sicer iz dela za pretvorbo iz  $\beta^0$  na  $\sigma^0$ , pod-vzorčenja za faktor 2 po vrsticah in stolpcih, dela za odpravljanje pegastega šuma po metodi MBD /10/ in šele nato sledi uporaba empiričnega algoritma s pripadajočo enačbo pretvorbe na prostorninsko vlažnost zemlje. Omenjen algoritem grafično prikazuje slika 1.



Slika 1. Diagram poteka predlaganega algoritma.

1

Pri TerraSAR-X SSC produktih, ki jih imamo na voljo pri tej študiji, se vrednosti digitalnih podatkov izračuna iz kompleksnih podatkov podanih v DLR COSAR formatu (.cos datoteka) kot /2/:

$$DN = \sqrt{I^2 + Q^2} \tag{16}$$

V enačbi (16) predstavljata *I* in *Q* realno in imaginarno komponento kompleksnega signala povratnega signala. Povratno sipanje na objektih je odvisno od relativne orientacije osvetljene resolucijske celice in senzorja, kakor tudi razdalje med njima. Izpeljava  $\sigma^0$  zahteva detajlno poznavanje lokalnega vpadnega kota, kot je to zapisano z enačbo (17):

$$\sigma^{0} = \left(k_{s} \left| DN \right|^{2} - NEBN\right) \sin \theta_{loc}$$
(17)

kjer je  $k_s$  kalibracijski in procesorski faktor skaliranja, *DN* vrednost intenzitete slikovnega elementa, *NEBN* je šumni ekvivalent  $\beta^0$  predstavitvi (ang. *Noise Equivalent Beta Naught*), ki predstavlja vpliv različnih prispevkov šuma k signalu /2/ in  $\Theta_{loc}$  je lokalni vpadni kot. Pridobimo ga iz geokodirane maske vpadnega kota Zemljinega terena, in sicer iz podatkov SRTM /14/ topografske misije (ang. *Shuttle Radar Topography Mission*).

*NEBN* se podaja kot polinom, ki je skaliran s koeficientom  $k_s$ . Ti polinomi opisujejo nivo šuma kot funkcijo razdalje, upoštevajoč glavne faktorje doprinosa šuma (npr. vzorec dvigovanja antene, oddana moč in sprejeti šum) in so funkcija časa, potrebnega za premostitev dolžine.

$$NEBN = k_s \sum_{i=0}^{\deg} k_i \left( \tau - \tau_{ref} \right)^i, \tau \in [\tau_{\min}, \tau_{\max}]$$
(18)

kjer so: *deg* je stopnja polinoma,  $k_i$  je eksponent koeficienta,  $\tau_{ref}$  je referenčna točka,  $\tau_{min}$  in  $\tau_{max}$  sta dva parametra TerraSAR-X produkta, ki sta vsebovana v vsakem produktu.

### 3.1.1 Odpravljanje pegastega šuma

Radarsko valovanje lahko interferira konstruktivno ali destruktivno in s tem se ustvarijo svetli ali temni slikovni elementi, imenovani pegasti šum. Pegasti šum se najpogosteje pojavlja pri sistemih radarskega zaznavanja (mikrovalovno ali milimetrsko valovanje), čeprav se lahko pojavi pri poljubnem tipu podobe daljinskega zaznavanja, ki izkoriščajo koherentno valovanje. Podobno, kot se dogaja pri laserski svetlobi, potuje oddano valovanje aktivnih senzorjev v fazi in minimalno medsebojno vpliva na poti do področja objekta. Po interakciji valovanja s področjem objekta to valovanje ni več v fazi (sofazno), saj posamezni valovi zaradi razgibanih površin opravijo različne razdalje do objektov ali pa se pojavi efekt enojnega ali dvojnega odboja. Ko so radarski valovi izven faze, radarsko valovanje medsebojno interferira in pojavijo se svetlejši in temnejši slikovni elementi ali pegasti šum.

Bayesov pristop k odstranjevanju šuma iz radarskih podob s sintetično odprtino (SAR) se uporablja pri izboljšanju kvalitete slik in pri tehnikah pridobivanja informacij iz radarskih podob. Za ocenjevanje maksimalnega a posteriorja (MAP) se uporablja prvi red Bayesovega sklepanja. Prior ali vnaprejšnja informacija v Bayesovi formuli se modelira z uporabo Gauss-Markovih naključnih polj. Da najdemo najboljše modelne parametre, ki predstavljajo informacijo o teksturi SAR podob, uporabimo Bayesov sklep drugega reda. Omenjena metoda dobro deluje pri odstranjevanju pegastega šuma in opisovanju teksture, vendar je računsko zelo zahtevna. Iz tega razloga se je pojavila ideja, da bi se celotni algoritem morebiti dal zapisati v obliki paralelnega modela tako za cenilko MAP, kot tudi za maksimiranje verjetja s pomočjo procesorja na grafični kartici (GPU). Osnovni princip je razkosati celotno sliko na manjše bloke, v katerih nato grafična kartica z vsako nitjo posebej preračunava vrednosti za vsak slikovni element posebej.

Bayesov sklep je podan z enačbo

$$p(x|y,\theta) = \frac{p(y|x,\theta)p(x|\theta)}{p(y|\theta)}$$
(19)

kjer je y podoba z vsebovanim pegastim šumom, x je njegov ekvivalent brez pegastega šuma,  $\Theta$  pa predstavlja parametre modela. p(y|x, $\Theta$ ) predstavlja pogojno porazdelitev gostote verjetnosti y za podani x, in ga imenujemo tudi verjetje, p(x| $\Theta$ ) je prior in p(y| $\Theta$ ) pa predstavlja verjetnost podatkov. V enačbi (19) gostota porazdelitve verjetja p(y| $\Theta$ ) ne igra vloge pri maksimiranju glede na x, zato je cenilka maksimalne aposteriorne (MAP) verjetnosti podana z

$$\hat{x}(y) = \arg\max_{a} p(y|x,\theta) p(x|\theta)$$
(20)

kjer morata obstajati prior in verjetje. V izvirni podobi SAR je pegasti šum modeliran kot multiplikativen šum y = xn, kjer *n* predstavlja pegasti šum. Gostota porazdelitve verjetja modelira porazdelitev pegastega šuma z gama porazdelitvijo:

$$p(y_s|x_s) = 2\left(\frac{y_s}{x_s}\right)^{2L-1} \frac{L^L}{x\Gamma(L)} \cdot \exp\left(-L\left(\frac{y_s}{x_s}\right)^2\right)$$
(21)

kjer *L* predstavlja ekvivalentno število pogledov in *s* je opazovani slikovni element,  $\Gamma$  je gama funkcija.

Gauss-Markova naključna polja (GMRF) spadajo v družino Gibbsovih modelov, in dobro opisujejo lastnosti SAR podob. GMRF je podan z enačbo /10/

$$p(y|\theta) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{x - \sum_{r \in N_s} \theta_r \left(x_{s+r} + x_{s-r}\right)}{2\sigma^2}\right)$$
(22)

kjer  $N_s$  opisuje okolico, *r* definira sosednje slikovne elemente centralnega slikovnega elementa *s* in  $\Theta_r$  predstavlja parameter teksture GMRF.

Cenilka MAP je podana z enačbo 21. Rešitev prvega odvoda za GMRF je podana z naslednjo enačbo

$$x_{s}^{4} - x_{s}^{3} \sum_{r \in N_{s}} \Theta_{r} \left( x_{s+r} + x_{s-r} \right) + 2L\sigma^{2} x_{s}^{2} - 2L\sigma^{2} y_{s}^{2} = 0$$
(23)

Parametri teksture modela GMRF  $\Theta_r$  definirajo njegove parametre modela. Ti parametri so ocenjeni z uporabo pro-

cedure imenovane maksimiranje verjetja. Verjetja podanega z enačbo

$$p(y) = \int p(y|x)p(x|\theta)dx$$
(24)

ne moremo izračunati analitično, zato raje uporabimo aproksimiranje preko Hessianove matrike. Za krajši zapis enačbe raje uporabimo logaritemsko obliko zapisa

$$\log p(y|\theta) \approx \sum_{i=1}^{N \times N} \frac{1}{2} (\log 2\pi - \log h_{ii}) + \log p(y_i|\hat{x}_i) + \log p(\hat{x}_i|\theta)$$
(25)

kjer koeficient hii modela GMRF podaja naslednja enačba

$$h_{ii} = \frac{6Ly_s^2}{x_{sMAP}} - \frac{2L}{x_{sMAP}^2} + \frac{1}{\sigma^2} \left( 1 + \sum_{j \in N_s} \sigma_j^2 \right)$$
(26)

# 4 Eksperimentalni rezultati

V namene eksperimentalnih rezultatov smo uporabili TerraS-AR-X podobo predela Zlatoličja. Podoba je bila zajeta 6. 6. 2009, pri vpadnem kotu okoli 49°. Podoba je zajeta v visoko-resolucijskem dualno polariziranem načinu, torej v horizontalni in vertikalni polarizaciji. V smeri Zemlje znaša resolucija 1,56 m, medtem ko resolucija v smeri leta znaša 2,2 m. Dualna polarizacija je posledica oddajanja elektromagnetnega valovanja, kjer satelit pošlje elektromagnetno valovanje v valovni dolžini 9.65 GHz, odbiti signal pa se selektivno sprejema preko horizontalnih in vertikalnih filtrov.

Pred obdelavo podob s predlaganima algoritmoma je potrebno podobe najprej podvzorčiti in odpraviti pegasti šum. Podvzorčili smo za faktor 2, tako da ima izhodna slika enkrat manj vrstic in stolpcev. Zaradi velike prisotnosti pegastega šuma (slika 2), ki nam pokvari pravilno ocenjevanje prostorninske vlažnosti zemlje, najprej izvedemo odpravo šuma slike po modelno zasnovanem algoritmu MBD /10/ (ang. *Model-Based Despeckling*), ki odpravi večino pegastega šuma na homogenih področjih. Zaradi kompleksnosti algoritma, smo le-tega spremenili v večnitni model in sprogramirali na grafičnem procesorju s tehnologijo CUDA /11/.



Slika 2. Hitri predogled področja zajetega 6. junija 2009 (TerraSAR-X® / DLR).

Parametri MBD algoritma odstranjevanja pegastega šuma so znašali: velikost večjega okna 13x13, znotraj katerega se premika manjše okno velikosti enega samega slikovnega elementa, kjer je red modela enak 2. Algoritem je bil ločeno pognan na horizontalni in na vertikalni polarizaciji. Rezultat odpravljanja pegastega šuma prikazujeta sliki 3 in 4 za oba izseka slike 2.



Slika 3. Primerjava med izsekom 1. področja slike s prisotnim pegastim šumom in izhodom algoritma MBD.



Slika 4. Primerjava med izsekom 2. področja slike s prisotnim pegastim šumom in izhodom algoritma MBD.

Naslednji korak predstavlja ocenjevanje vlažnosti, ki ga ločeno poženemo prvič za Dubois, drugič za Shi model. Pri tem upoštevamo zgoraj zapisane enačbe Dubois modela (4-8) in Shi modela (13-15), kjer je potrebno poudariti temeljno razliko med modeloma. Enačbe Dubois modela so izpeljane analitično, medtem ko pri Shi modelu uporabimo numerično reševanje po diskriminanti. To je iz stališča računskega časa tudi zahtevnejši algoritem. Izhoda algoritma sta ločeno prikazana na slikah 5 in 6.

Da si bralec lažje predstavlja omenjeni sliki prostorninske vlažnosti zemlje, jih glede na vrednosti izhoda umetno pobarvamo z barvno paleto mavričnih barv in jih primerjamo na sliki 7. Tukaj je potrebno poudariti, da algoritma delujeta inverzno, in sicer se omenjeni efekt vidi ravno na področju reke, kjer se pri Dubois algoritmu obarva belo, pri Shi modelu pa črno. Pri obeh slikah pa pomeni, da so najmanjše vrednosti obarvane rdeče, najvišje pa vijolično.

Pretvorbo v prostorninsko vlažnost zemlje smo ločeno izvedli za Dubois in Shi model. Tako enačba (27) opisuje



Slika 5. Prostorninska vlažnost kot izhod Dubois algoritma omenjenega izseka.

pretvorbo dielektrične konstante Dubois modela v  $m_v$ , medtem ko enačba (28) pretvori dielektrično konstantno Shi modela v  $m_v$ .

| m, | $= 0.000237\epsilon^{3} - 0.03421\epsilon^{2} + 2.435\epsilon - 2.86$ | (27) |
|----|-----------------------------------------------------------------------|------|
|----|-----------------------------------------------------------------------|------|

$$m_{\nu} = 0.00043 \varepsilon^3 - 0.055 \varepsilon^2 + 2.52 \varepsilon - 5.3$$
(28)



Slika 6. Prostorninska vlažnost kot izhod Shi algoritma omenjenega izseka.

Primerjavo med ocenjenimi vrednostmi teh dveh algoritmov izvedemo s pomočjo nekaj meritev na samem terenu, ob enakem času zajema podobe. V ta namen smo uporabili senzor TRIME-PICO64 / 12/, pri katerem smo izhodno



Slika 7. K-means klasifikacija vrednosti prostorninske ocene vlažnosti; levo – Dubois model, desno – Shi model.

napetost neposredno zajemali na prenosnem računalniku preko modula osciloskopa. Posnetek senzorja in uporabe le-tega je prikazan na sliki 8. Zbrane meritve z izhodi teh dveh algoritmov, so zbrani v Tabeli 1, kjer sta  $\Delta x_D$  in  $\Delta x_S$  odstopanje v %.

| Tabela   | 1.  | Primerjava   | med  | ocenjenimi | in | merjenimi |
|----------|-----|--------------|------|------------|----|-----------|
| prostori | nin | skimi vlažno | stmi |            |    |           |

| Meritev | Teren | Dubois | Shi    | $\Delta x_{D}$ | $\Delta x_s$ |
|---------|-------|--------|--------|----------------|--------------|
| 1.      | 0,26  | 0,1904 | 0,2511 | 26,7           | 3,4          |
| 2.      | 0,34  | 0,2422 | 0,2901 | 28,7           | 14,6         |
| 3.      | 0,33  | 0,1864 | 0,3220 | 43,5           | 2,4          |
| 4.      | 0,33  | 0,2396 | 0,3285 | 27,3           | 0,4          |
| 5.      | 0,29  | 0,2311 | 0,2875 | 20,3           | 0,8          |
| 6.      | 0,30  | 0,2823 | 0,2804 | 5,9            | 6,5          |
| 7.      | 0,26  | 0,3687 | 0,2749 | 41,8           | 5,7          |



Slika 8. Senzor za zajemanje merilnih podatkov na terenu in njegova uporaba.

Tabelo 1 prikažemo tudi grafično, kar prikazuje Slika 9. lz tabele je razvidno, da je Shi model mnogo boljše ocenil prostorninsko vlažnost, se pa čas računanja zelo podaljša. Omenjeno trditev je moč videti tudi na Sliki 9, kjer opazimo, da je deviacija vrednosti Shi modela mnogo manjša kot Dubois modela.



Slika 9. Razpršenost vrednosti obeh modelov.

Seveda se pojavijo tudi napačni rezultati, ki so bolj izraziti pri Dubois modelu. Pri meritvah je potrebno omeniti, da so bile izvedene na področju majhne vegetacije (prve štiri meritve) in na območju brez vegetacije (zadnje tri meritve). Sliki 3 in 4 prikazujeta oba omenjena izseka. Rezultati so podobni merjenim podatkom in podatkom, ki jih najdemo v literaturi /5/, vendar lahko opazimo, da algoritem nekatere stvari označi kot zelo vlažne, kar v resnici ne more biti res. Razlog gre iskat v odbojih radarskega valovanja, kjer poznamo tri različne odboje, k temu pa se doda še prisotnost pegastega šuma, kar se dobro vidi na sliki 7, kjer se pojavi popolni radarski odboj in oba algoritma omenjeni efekt napačno interpretirata. Podobno se dogaja tudi na področjih s prisotno močno vegetacijo, na kateri žal omenjena valovna dolžina ne more dobro delovati, lahko pa z uporabo alternativnega L- ali C-pasu odpravimo omenjeno pomanjkljivost.

K vsem skupaj pa je potrebno dodati tudi časovno kompleksnost predlaganega algoritma. Pri tem ne dajemo direktne primerjave med računanjem MBD algoritma na CPU in GPU, ki je znašala okoli faktorja 25x. Močno odstopanje v času procesiranja (Tabela 2) je posledica različnega časa konvergence MBD algoritma. Za nadaljnjo obdelavo slike znaša čas računanja Dubois modela za sliko 512 x 512 okoli 0,3 s, medtem ko za Shi model znaša okoli 14 s. Uporabljen računalnik je bil Intel Core2 Quad Q9450 procesor, 4 GB sistemskega pomnilnika in nVidia GF 9600 GT s 512 MB pomnilnika ter MS Vista SP2 x64 operacijski sistem.

Tabela 2. Računski časi MBD algoritma različnih polarizacij in izsekov

# 5 Zaključek

Namen študije je prostorninsko ocenjevanje vlažnosti površja zemlje na področju umetnega kanala elektrarne Zlatoličje. Na izhodnih slikah opazimo, da se meritve dobro skladajo z realnim stanjem, saj je prostorninska vsebnost vode nekje med 10 in 40 %. Dobljeni rezultati so pokazali, da daje v primeru TerraSAR-X podatkov pri omenjenem vpadnem kotu uporaba Shi modela mnogo boljše rezultate. Ta model se zelo dobro prilega podatkom na področjih brez poraščenosti, kakor tudi na področjih z majhno poraščenostjo. To se dobro vidi tudi iz razpršenosti vrednosti na sliki 9, kjer ima Shi model vrednosti dokaj blizu realnim. Tudi s primerjave modelov s slike 7 opazimo, da so vrednosti mnogo bolj enakomerne pri Shi modelu, kot pri Dubois modelu, kar pomeni, da je Shi model mnogo manj občutljiv na razgibanost površja. Tudi tabela 1 govori o mnogo boljšem Shi modelu, saj odstopanje ni veliko, medtem ko pri Dubois modelu doseže tudi 40 %.

Podamo lahko tudi dejstvo, da se dielektrična konstanta spreminja časovno, torej glede na letni čas, saj se po letnem času spreminja tudi vlažnost terena. Ker so SAR podobe zajete pri zelo visoki frekvenci valovanja, in sicer v Xpodročju (9,65 GHz), se pojavi problem prodiranja v zemljo, saj se to lahko naredi le do valovne dolžine elektromagnetnega valovanja, ki v našem primeru znaša le okoli 3 cm. Tukaj slonimo na dejstvu, da se spremembe v vlažnosti tal dobro odražajo na površju terena, ki ga opazujemo z radarsko svetlobo.

Algoritem nameravamo nadgraditi s samo-organizirajočimi nevronskimi mrežami, saj želimo odstraniti motnje v algoritmu, ki se pojavljajo zaradi različnih oblik cest, naselij, gozdov in nepravilnega odboja, nad katerim izvedemo omenjeni algoritem. Vhodni vektor, namenjen učenju samo-organizirajoče se nevronske mreže, bo sestavljen iz naslednjih vzorcev: sivinska vrednost dotičnega slikovnega elementa, srednja vrednost 5x5 okolice slikovnega elementa, kakor tudi standardna deviacija enake okolice, vrednost robov po Sobel algoritmu in na koncu še maske (v vseh štirih osnovnih smereh), s katerimi ugotovimo ali gre za področje rečnega nabrežja ali ne. Te maske so sestavljene iz 5x5 matrike, ki ima po vseh štirih glavnih oseh vrednosti nič, nad to diagonalo so vrednosti 1, pod njo pa vrednosti -1.

# Literatura

- /1/ E. T. Engman, and N. Chauhan, "Status of microwave soil moisture measurements with remote sensing," Remote Sens. Environ., vol. 51, no. 1, 189–198, 1995
- /2/ TerraSAR-X brošura, http://wwwserv2.go.t-systems-sfr.com/tsx/ documentation/TerraSAR-X\_Brochure\_final.pdf
- /3/ E. T. Engman, "Applications of microwave remote sensing of soil moisture for water resources and agriculture," Remote Sens. Environ., vol. 35, pp. 213–226, 1991
- /4/ F. Ulaby, R. Moore, & A. Fung: Microwave Remote Sensing: Active and Passive I – III, Addison-Wesley Publication, str. 2162, 1981-1986
- /5/ P. C. Dubois, J. J. van Zyl & T. Engman: Measuring Soil Moisture with Imaging Radars, IEEE Transactions on Geoscience and Remote Sensing, vol. 33, no. 4, str. 915-926, 1995
- /6/ J. Shi, J. Wang, A. Y. Hsu, P. E. O'Neil and E. T. Engman: Estimation of Bare Surface Soil Moisture and Surface Roughness Parameter Using L-band SAR Image Data, IEEE Transactions on Geoscience and Remote Sensing, vol. 35, no. 5, str. 1254-1266, 1997

- /7/ A. K. Fung, Z. Li, and K. S. Chen, "Backscattering from a randomly rough dielectric surface," IEEE Trans. Geosci. Remote Sensing, vol. 30, pp. 356-369, 1992
- /8/ A. von Hippel: Dielectrics and Waves, vol. I and II, str. 284, 1995
- /9/ Y. Oh, Y. C. Kay: Condition for precise measurement of soil surface roughness, IEEE Transactions on Geoscience and Remote Sensing, vol. 36, no. 2, str. 691-695, 1998
- /10/ M. Walessa and M. Datcu: Model-based despeckling and information extraction from sar images, IEEE Transactions on Geoscience and Remote Sensing, vol. 38, str. 2258–2269, 2000
- /11/ CUDA, programski modul, http://www.nvidia.com/object/cuda\_programming\_tools.html
- /12/ TRIME-PICO64, http://www.imko.de/ENG/index.php?option= com\_content&task=view&id=107&Itemid=110
- /13/ M. A. Tassoudji, K. Sarabandi, and F. T. Ulaby, "Design consideration and implementation of the LCX polarimetric scatterome-

ter (POLARSCAT)," Rep. 022486-T-2, Radiation Lab., Univ. Michigan, Ann Arbor, June 1989.

/14/ NASA, SRTM, http://www2.jpl.nasa.gov/srtm/

Matej Kseneman, Dušan Gleich Fakulteta za elektrotehniko, računalništvo in informatiko, Smetanova ulica 17, 2000 Maribor E-pošta: matej.kseneman@gmail.com

Prispelo (Arrived): 02.12.2009 Sprejeto (Accepted): 09.09.2010





Strokovno društvo za mikroelektroniko, elektronske sestavne dele in materiale MIDEM pri MIKROIKS Stegne 11, 1521 Ljubljana SLOVENIJA TEL.: +386 (0)1 5133 768 FAX: +386 (0)1 5133 771 Email / WWW iztok.sorli@guest.arnes.si http://paris.fe.uni-lj.si/midem/

# MIDEM SOCIETY REGISTRATION FORM

| 1. First Name                                                | 1. First Name Last Name |             |                    |            |              |  |
|--------------------------------------------------------------|-------------------------|-------------|--------------------|------------|--------------|--|
| Adress                                                       |                         |             |                    |            |              |  |
| City                                                         |                         |             |                    |            |              |  |
| Country                                                      | Country Postal Code     |             |                    |            |              |  |
| 2. Date of Bi                                                | rth                     |             |                    |            |              |  |
| 3. Education (please, circle whichever appropriate)          |                         |             |                    |            |              |  |
| PhD                                                          | MSc                     | BSc         | High School        | Studer     | ıt           |  |
| 3. Profession (please, circle whichever appropriate)         |                         |             |                    |            |              |  |
| Electronic                                                   | S                       | Physics     | Chemistry          | Metallurgy | Material Sc. |  |
| 4. Company                                                   |                         |             |                    |            |              |  |
| Adress                                                       |                         |             |                    |            |              |  |
| City                                                         |                         |             |                    |            |              |  |
| Country Postal Code                                          |                         |             |                    |            |              |  |
| Tel.:                                                        | Tel.:                   |             |                    |            |              |  |
| Email                                                        |                         |             |                    |            |              |  |
| 5. Your Primary Job Function                                 |                         |             |                    |            |              |  |
| Fabricatio                                                   | n                       | Engineering | neering Facilities |            | QA/QC        |  |
| Manageme                                                     | ent                     | Purchasing  | Cons               | ulting     | Other        |  |
| 6. Please, send mail to a) Company adress b) Home Adress     |                         |             |                    |            |              |  |
| 7. I wil regularly pay MIDEM membership fee, 25,00 EUR/year  |                         |             |                    |            |              |  |
| MIDEM member recive Journal "Informacije MIDEM" for free !!! |                         |             |                    |            |              |  |
| Signature Date                                               |                         |             |                    |            |              |  |

MIDEM at MIKROIKS Stegne 11

> 1521 Ljubljana Slovenija

### Informacije MIDEM

Strokovna revija za mikroelektroniko, elektronske sestavine dele in materiale

### NAVODILA AVTORJEM

Informacije MIDEM je znanstveno-strokovno-društvena publikacija Strokovnega društva za mikroelektroniko, elektronske sestavne dele in materiale - MIDEM. Revija objavlja prispevke s področja mikroelektronike, elektronskih sestavnih delov in materialov. Ob oddaji člankov morajo avtorji predlagati uredništvu razvrstitev dela v skladu s tipologijo za vodenje bibliografij v okviru sistema COBISS.

Znanstveni in strokovni prispevki bodo recenzirani.

# Znanstveno-strokovni prispevki morajo biti pripravljeni na naslednji način:

- 1. Naslov dela, imena in priimki avtorjev brez titul, imena institucij in firm
- 2. Ključne besede in povzetek (največ 250 besed).
- 3. Naslov dela v angleščini.
- Ključne besede v angleščini (Key words) in podaljšani povzetek (Extended Abstract) v anglešcčini, če je članek napisan v slovenščini
- 5. Uvod, glavni del, zaključek, zahvale, dodatki in literatura v skladu z IMRAD shemo (Introduction, Methods, Results And Discsussion).
- Polna imena in priimki avtorjev s titulami, naslovi institucij in firm, v katerih so zaposleni ter tel./Fax/Email podatki.
- Prispevki naj bodo oblikovani enostransko na A4 straneh v enem stolpcu z dvojnim razmikom, velikost črk namanj 12pt. Priporočena dolžina članka je 12-15 strani brez slik.

**Ostali prispevki**, kot so poljudni cčlanki, aplikacijski članki, novice iz stroke, vesti iz delovnih organizacij, inštitutov in fakultet, obvestila o akcijah društva MIDEM in njegovih članov ter drugi prispevki so dobrodošli.

### Ostala splošna navodila

- 1. V članku je potrebno uporabljati SI sistem enot oz. v oklepaju navesti alternativne enote.
- 2. Risbe je potrebno izdelati ali iztiskati na belem papirju. Širina risb naj bo do 7.5 oz.15 cm. Vsaka risba, tabela ali fotografija naj ima številko in podnapis, ki označuje njeno vsebino. Risb, tabel in fotografij ni potrebno lepiti med tekst, ampak jih je potrebno ločeno priložiti članku. V tekstu je treba označiti mesto, kjer jih je potrebno vstaviti.
- Delo je lahko napisano in bo objavljeno v slovenščini ali v angleščini.
- 4. Uredniški odbor ne bo sprejel strokovnih prispevkov, ki ne bodo poslani v dveh izvodih skupaj z elektronsko verzijo prispevka na disketi ali zgoščenki v formatih AS-CII ali Word for Windows. Grafične datoteke naj bodo priložene ločeno in so lahko v formatu TIFF, EPS, JPEG, VMF ali GIF.
- 5. Avtorji so v celoti odgovorni za vsebino objavljenega sestavka.

Rokopisov ne vračamo. Rokopise pošljite na spodnji naslov.

Uredništvo Informacije MIDEM MIDEM pri MIKROIKS Stegne 11, 1521 Ljubljana, Slovenia Email: Iztok.Sorli@guest.arnes.si tel. (01) 5133 768, fax. (01) 5133 771

### Informacije MIDEM

Journal of Microelectronics, Electronic Components and Materials

### INSTRUCTIONS FOR AUTHORS

Informacije MIDEM is a scientific-professional-social publication of Professional Society for Microelectronics, Electronic Components and Materials – MIDEM. In the Journal, scientific and professional contributions are published covering the field of microelectronics, electronic components and materials. Authors should suggest to the Editorial board the classification of their contribution such as : original scientific paper, review scientific paper, professional paper...

Scientific and professional papers are subject to review.

#### Each scientific contribution should include the following:

- 1. Title of the paper, authors' names, name of the institution/company.
- 2. Key Words (5-10 words) and Abstract (200-250 words), stating how the work advances state of the art in the field.
- Introduction, main text, conclusion, acknowledgements, appendix and references following the IMRAD scheme (Introduction, Methods, Results And Discsussion).
- 4. Full authors' names, titles and complete company/institution address, including Tel./Fax/Email.
- Manuscripts should be typed double-spaced on one side of A4 page format in font size 12pt. Recommended length of manuscript (figures not included) is 12-15 pages
- Slovene authors writing in English language must submit title, key words and abstract also in Slovene language.
- 7. Authors writing in Slovene language must submit title, key words and extended abstract (500-700 words) also in English language.

**Other types of contributions** such as popular papers, application papers, scientific news, news from companies, institutes and universities, reports on actions of MIDEM Society and its members as well as other relevant contributions, of appropriate length , are also welcome.

### **General informations**

- 1. Authors should use SI units and provide alternative units in parentheses wherever necessary.
- 2. Illustrations should be in black on white paper. Their width should be up to 7.5 or 15 cm. Each illustration, table or photograph should be numbered and with legend added. Illustrations, tables and photographs must not be included in the text but added separately. However, their position in the text should be clearly marked.
- 3. Contributions may be written and will be published in Slovene or English language.
- 4. Authors must send two hard copies of the complete contributon, together with all files on diskette or CD, in ASCII or Word for Windows format. Graphic files must be added separately and may be in TIFF, EPS, JPEG, VMF or GIF format.
- 5. Authors are fully responsible for the content of the paper.

Contributions are to be sent to the address below.

Uredništvo Informacije MIDEM MIDEM pri MIKROIKS Stegne 11, 1521 Ljubljana, Slovenia Email: Iztok.Sorli@guest.arnes.si tel.+386 1 5133 768, fax.+386 1 5133 771