Informatica 39 (2015) 115-123 115 Parallel Implementation of Desirability Function-Based Scalarization Approach for Multiobjective Optimization Problems O. Tolga Altinoz Ankara University, Electrical and Electronics Engineering, Turkey E-mail: taltinoz@ankara.edu.tr Eren Akca HAVELSAN A.S., Ankara, Turkey E-mail: eren.akca@havelsan.com.tr A. Egemen Yilmaz Ankara University, Electrical and Electronics Engineering, Turkey E-mail: aeyilmaz@eng.ankara.edu.tr Anton Duca and Gabriela Ciuprina Politehnica University of Bucharest, Romania E-mail: anton.duca@upb.ro, gabriela@lmn.pub.ro Keywords: parallel implementation, CUDA, particle swarm optimization Received: December 1, 2014 Scalarization approaches are the simplest methods for solving the multiobjective problems. The idea of scalarization is based on decomposition of multiobjective problems into single objective sub-problems. Every one of these sub-problems can be solved in a parallel manner since they are independent with each other. Hence, as a scalarization approach, systematically modification on the desirability levels of the objective values of multiobjective problems can be employed for solving these problems. In this study, desirability function-based scalarization approach is converted into parallel algorithm and applied into seven benchmark problems. The performance of parallel algorithm with respect to sequential one is evaluated based on execution time on different graphical processing units and central processing units. The results show that even the accuracy of parallel and sequential codes are same, the execution time of parallel algorithm is up to 24.5-times faster than the sequential algorithm (8.25-times faster on average) with respect to the complexity of the problem. Povzetek: Pristopi s skalarizacijo sodijo med najenostavnejše načine reševanja veckriterijskih problemov. Zamisel skalarizacije temelji na dekompoziciji veckriterijskih problemov v enokriterijske podprobleme, ki jih lahko rešujemo sočasno, saj niso medsebojno odvisni. Torej lahko uporabimo za reševanje veckriterijskih problemov sistematično spreminjanje nivoja zaželenosti ciljnih vrednosti teh problemov. V tej študiji smo implementirali vzporedni način skalarizacije na osnovi funkcije zaželenosti in ga aplicirali na sedmih tesnih problemih. Učinek vzporednega algoritma glede na zaporednega smo ovrednotili z ozirom na čas izvajanja na razlicnih graficno-procesnih in centralno-procesnih enotah. Vzporedna različica daje enako natančne rezultate in je tudi do 24,5-krat hitrejša od zaporedne (8,25-krat v povprecju), glede na zahtevnost problema. 1 Introduction The problem for determining the best possible solution set with respect to multiple objectives is referred to as a multi-objective (MO) optimization problem. There are many approaches for the solution of these kinds of problems. The most straightforward approach, the so-called "scalarization" or "aggregation" is nothing but to combine the objectives in order to obtain a single-objective [1]. Scalarization approaches are the simplest methods for solving the multiobjective problems. The idea of scalarization is based on decomposition of multiobjective problems into single objective sub-problems. The solutions of these single objective sub-problems form the Pareto approximation set. However, since the number of sub-problems is much higher than the number of objectives in multiobjec-tive problem, and each problem is desired to be solved by single objective optimization algorithm, the computa- 116 Informatica 39 (2015) 115-123 O. T. Altinoz et. al. tion time of scalarization approaches is much higher such that it becomes unfeasible to be solved by scalarization approaches. For each sub-problem, a specific number of function evaluations must be performed by a single objective optimization algorithm. Hence, a bunch of function evaluations are evaluated for solving multiobjective optimization problem. Before development of powerful multi-objective optimization algorithms such as the Non-Dominated Sorting Genetic Algorithm (NSGA) [2], NSGA-II [3] or Vector Evaluated Genetic Algorithm (VEGA) [4], scalarization techniques were preferred to solve engineering optimization problems. After the development of successful multi-objective optimization algorithms, scalarization techniques were considered to be old-fashioned, and they were abandoned due to the necessary of much higher number of function evaluations to obtain approximately same performance as multiobjective optimization algorithms. However, with the aid of parallel architectures and devices, it is possible to reconsider and revisit the scalarization techniques since these techniques are usually suitable for parallelization. One of the scalarization approaches for a-priori process is defined with the aid of a desirability function in this study. Desirability function is integrated to the particle swarm optimization algorithm in order to normalize the joint objective function values [5]. Then, geometric mean of the desirability levels of each objective is computed in order to obtain a single value. For each sub-problem, the shape of the desirability function is shrunk. Therefore the desirability level is changed and the optimization results are also varied. At the end of this method, a set of possible solutions are composed. This set contains both the dominated and the non-dominated solutions. If necessary, the programmer might run a posterior method like non-dominated sorting for selecting the non-dominated solutions, as well. However, in this study, the main focus is to obtain the possible solution set. In this study, with a similar motivation, we demonstrate how one of these techniques can be parallelized and present performance of the approach by implementing on the Graphic Processing Units (GPUs) via the Compute Unified Device Architecture (CUDA) framework. This paper is organized as follows: Section 2 explains the desirability function-based scalarization approach in detail and Section 3 presents a parallel implementation of the proposed method. Section 4 gives the implementation environment, benchmark problems and performance evaluation of the proposed method. The last section presents the conclusion and future work off the proposed method. 2 Desirability Function-Based Scalarization Approach In a general manner, the desirability functions can be applied in order to incorporate the decision maker's preferences without any modification of the single-objective optimization algorithm. The decision maker chooses a desir- ability function and corresponding level. At each steps/iterations of the algorithm, instead of objective values; desirability index is calculated. At the end of the algorithm only a single solution is ready for collected by the decision maker. Even this method uses the advantages of desirability functions (Desirability functions are explained in Section 2.1) decision maker has small control on final result since a solution is obtained on a region defined by the desirability function (Figures 3 and 4) instead of on a line like weighted sum approach. However, in this study, by defining a systematical reduction approach, our aim is not to include or incorporate the preference of the decision maker but to present a generalized multi-objective optimization method for obtaining many possible solution candidates, that proposed method is applied as a scalarization approach like weighted sum method. Therefore a systematic approach was previously proposed by changing the shape of desirability functions by three of the authors of this paper [6]. For N objective problem, N numbers of desirability functions are selected with respect to the boundaries of the problem. Next, desirability functions are divided into levels and each level corresponding to one of the single objective implementation. For example of two objective problem case which was investigated in this paper, two desirability functions are defined and they are divided into same level (let's say 10) per function. Since there are two desirability functions defined, there are 100 single objective implementations in total. The previous study [6] show that the performance of the desirability function is greatly depends on the number of the levels, in other words the number of the single objective evaluations. Also the results obtained in the previous study are showed that, it is acceptable for bi-objective problems. However, still the performance of the proposed approach is greatly depends on the number of levels, which increases the total number of computation time. Hence, in this study, the parallel cores of CPU and GPU are using as computation units for single objective optimization algorithms, and the total evaluation times are recorded for comparison. The aim of this paper is to show the applicability of the proposed method with the aid of parallel architectures of CPU and GPU. 2.1 Desirability Function The desirability function idea was first introduced by Harrington in 1965 for the multi-objective industry quality control. After the proposition of the desirability function concept, Deringer and Suich [7] introduced two different desirability function formulations, which become the fundamental equations of desirability functions. These two desirability function definitions are given by (1), (2) and (3), which are called one-sided and two-sided, respectively. The parameters given in equations are as follows: y is the input, for our case it is the objective function value, hmin, hmax and hmed are the minimum, maximum and the median acceptable values for the domain of the two-sided desirability function. Informática 39 (2015) 115-123 117 ./imin tol f\ max tol ./2min_tol ./2 max tol Figure 2: The linear desirability functions constructed for the bi-objective optimization problem. 1, di(y) = < ( h ■ —h "-mm r d2(y) = 0, 0, ( y-hr y < hmin / , hmin < y < h m y > hmax (1) hh "-max "-r )r 1, y < hmin hmin < y < hm y > hmax (2) ds (y) = < 0, y < hmin ( h , —) , hmin < y < hmed hmed hmin ( h , ) , hmed < y < hmax hmed hmax ,0, y > hmax (3) The desirability level d(y) = 1 is the state for fully desirable, and d(y) = 0 is for a not-desired case. In this respect, di one-sided desirability function is useful for minimization problem. The curve parameters are r, t and s. They are used in order to plot an arc instead of solid line, when desired. Curves plot in Figure 1 demonstrate the effects of the curve parameters and the graphs of the desirability functions. 2.2 Method of Desirability Function-Based Scalarization The main idea beneath the desirability functions is as follows: - The desirability function is a mapping from the domain of real numbers to the range set [0,1]. - The domain of each desirability function is one of the objective functions; and it maps the values of the relevant objective function to the interval [0,1]. - Depending on the desire about minimization of each objective function (i.e., the minimum / maximum tolerable values), the relevant desirability function is constructed. - The overall desirability value is defined as the geometric mean of all desirability functions; this value is to be maximized. Particularly, for a bi-objective optimization problem in which the functions /1 and /2 are to be minimized, the relevant desirability functions d1 (/1) and d2(/2) can be defined as in Figure 2. The desirability functions are not necessarily defined to be linear; certainly, non-linear definitions shall also be made as described in [7]. Throughout this study, we prefer the linear desirability functions. In [6], a method for extraction of the Pareto front was proposed by altering the shapes of the desirability functions in a systematical manner. Particularly by: - Fixing the parameters f1 . , and f2 . , seen in O r J 1max_tol *f 2max_tol Figure 2 at infinity, and - Varying the parameters /1min_tol and /2min_tol systematically, It is possible to find the Pareto front regardless of its convexity or concavity. This claim can be illustrated for the bi-objective case as follows: as seen in Figure 3, the parameters /1 . tl and /2 . t , determine the sector which is *f Amin_t ol *f ^min_to l traced throughout the solution. The obtained solution corresponds to a point for which the geometric mean of the two desirability values. As seen in Figure 4, even in the case of concave Pareto front, the solution can be found without loss of generality. In other words, unlike the weighted-sum approach, the method proposed in [6] does not suffer from the concave Pareto fronts. In [6], the applicability and the efficiency of the proposed scalarization approach was demonstrated via some multi-objective benchmark functions. Each single-objective problem (i.e., the scalarization scheme) was 116 Informatica 39 (2015) 115-123 O. T. Altinoz et. al. — r < 1 / / / / r = 1 / / 1 / / 1 I / / 1 r > 1 Estimated Response S \ • \ * \ N t< 1 / / • ' / • t / i\ / / • \ \ • \ ^ \ \ ^ • \ \ \ \ \ s< 1 ~t T in 1 / ;t=y / V v : V = 1 i \ i i ! / / /1 > 1 / \ \ | \ s > l\ | I I i r/ 7 \ Figure 3: The solution via the desirability-function based approach for convex Pareto front. hmin hmed hmax Estimated Response Figure 1: The graphical demonstration of the desirability functions. Figure 4: The solution via the desirability-function based approach for concave Pareto front. solved with Particle Swarm Optimization. Despite no explicit demonstration or proof, it was claimed that: - There were no limitations about the usage of Particle Swarm Optimization; i.e., any other heuristic algorithm could be incorporated and implemented. - The proposed method can be easily parallelizable. In this study, we demonstrate the validity of these claims by performing a parallel implementation on GPUs via the CUDA framework. The next section is devoted to the implementation details. 3 Parallel Multiobjective Optimization with GPU This section is dedicated to explaining the steps and idea of parallelizing the Desirability function-based scalarization approach with the aid of CUDA library. 3.1 Fundamentals of CUDA Parallel Implementation The researchers familiar with the programming languages used to desire a programming language or framework letting them write parallel codes easily. For this purpose in 2007, NVidia [8] introduced a software framework called CUDA. By means of this, a sequential function code can Parallel Implementation of Desirability. Informatica 39 (2015) 115-123 119 be converted to a parallel kernel by using the libraries and some prefix expressions. By this way, the programmers do not need to learn a new programming language. They are able to use their previous know-how related to C/C++, and enhance this knowledge with some basic expressions introduced by CUDA. However, without the knowledge about the CUDA software and the parallel architecture hardware, it is not possible to write efficient codes. CUDA programming begins with the division of the architectures. It defines the CPU as host and GPU as device. The parallel programming actually is the assignment of duties to parallel structure and collection of the results by CPU. In summary, the codes are written for CPU on C/C++ environment, and these codes include some parallel structures. These codes are executed by the host. Host commands device for code executed. When the code is executed by the device, the host waits until the job is finished, then a new parallel duty can be assigned, or results from the finished job can be collected by the host. Thus, the device becomes a parallel computation unit. Hence, parallel computing relies on the data movement between host and device. Eventhough both host and device are very fast computation units, the data bus is slower. Therefore, in order to write an efficient program, the programmer must keep his/her code for minimum data transfer between the host and the device. The GPU has stream multiprocessors (SMs). Each SM has 8 stream processors (SPs), also known as cores, and each core has a number of threads. In tesla architecture there are 240 SPs, and on each SP has 128 threads, which is the kernel execution unit. The bodies of threads are called groups. The groups are performed collaterally with respect to the core size. If the GPU architecture has two cores, then two blocks of threads are executed simultaneously. If it has four cores, then four blocks are executed collaterally. Host and device communicate via data movement. The host moves data to the memory of the GPU board. This memory is called global memory which is accessed from all threads and the host. The host has also access to constant and texture memories. However, it cannot access the shared memory, which is a divided structure assigned for every block. The threads within the block can access their own shared memory. The communication of the shared memory is faster than the global memory. Hence, a parallel code must contain data transfers to shared memory more often, instead of global memory. In this study, random numbers are needed to execute the algorithm. Hence, instead of the rand() function of the C/C++ environment, CURAND library of the CUDA pack has been employed. In addition, the CUDA Event is preferred for accurate measurement of the execution time. In the next section, the parallel implementation of desirability function-based scalarization was explained in detailed. 3.2 Parallel Implementation of Desirability Function-Based Scalarization The main idea of our parallel implementation throughout this study is illustrated in Figure 5. Each scalarization scheme is handled in a separate thread; after the relevant solutions are obtained, they are gathered in a centralized manner to constitute the Pareto front from which the human decision maker picks a solution according to his/her needs. This approach ensures that the number of solutions found that can be found in parallel is limited by the capability of the GPU card used. As stated before, we implemented the Particle Swarm Optimization Algorithm for verification of the aforementioned claims. The parallel CUDA implementation was compared to the sequential implementation on various GPUs and CPUs. Figure 5: The parallel CUDA implementation of the desirability-function based approach. It was seen that both implementations (sequential and parallel CUDA) were able to find the same solutions but in different elapsed times. As seen in Figure 6, if the number of Pareto front solutions increase, the advantage of the parallel CUDA increases dramatically. Figure 6 presents parallel implementation of scalariza-tion approach for the weighted sum method. The simple convex problem is selected and defined in (4) and (5) as a test bed for present the performance of the parallelization method for scalarization. 116 Informatica 39 (2015) 115-123 O. T. Altinoz et. al. fi(x) = x2 (4) f2 (x) = (x - 2)2 (5) According to Figure 6, the performance of high and mid-level GPU cards are approximately 10-times faster than sequential implementation. The results obtained in Figure 6 yields the following conclusions: - For a small number of Pareto solutions, CPU performs better against GPU - After 64 solutions, parallel implementation presents better results than sequential code - An old-fashion mobile GPU performs almost same as a relatively high level CPU. - As the number of solution increases, the professional high level GPU devices perform more stable than general purpose GPUs. 4 Implementation, Results, and Discussion The parallel desirability function-based scalarization approach was applied to solve seven benchmark problems. These problems are selected based on the complexity against execution time on computation unit. Since the average number of execution time is considered in the study, problems from simple calculation to problems with more branch and complex functions. In this section the benchmark problems and the results with respect to execution time is presented. 4.1 Benchmark Problems In this study, ten benchmark problems [9] with different complexity and Pareto shape are selected to present the performance of the method. Table 1 gives the mathematical formulations of the problems. The performance comparison is performed not only on the accuracy of the results, but more importantly on the execution time. As given in Table 1 the complexity of the benchmark problems are given from simple to more complex problems. The reason behind is that as the complexity of the function is increased, the single processors have to accomplish much more calculations, and since the single processors on a GPU has lower capacity than CPU, it will be a good comparison for not only the number of solutions in solution space but also the problem complexity. Table 1 presents as three columns. The first column gives the known-names of benchmark problems. The reader can be access amount of information about the function by searching by selecting keyword as function name. The second column is the mathematical formulation of the function. As the order of row increases the complexity of the function also increases. The last column is for the defines of the range of the decision variables. 4.2 Implementation Results Table 2 presents the execution time comparison of CPU (Xeon E2620) and GPU (Tesla K20) for various numbers of levels from 8x8 to 100 x 100, number of single objective evaluations are 64 and 104 respectively. For low complex problems, until 225 numbers of levels (400 levels need for hard problems), the CPU outperforms GPU implementation with respect to execution time. It is reasonable since only small portion of cores on GPU can be used. But lower number of relatively very fast cores are finished the executions earlier than GPU. From 400 to 6,400 levels, GPU computation time of parallel codes exceeds CPU time. At 6, 400 levels, the difference between CPU and GPU is at the peak grade. After that level, the advantage of GPU reduces. In other words, the GPU implementation acts more sequentially, since there are not any empty resources to execute parallel implementation. Among all of the problems, UF1 is the hardest for GPU implementation since the computation time is the longest for this problem. The main reasons are that: a) checking mechanism for even and odd parts that adds branch to the code, b) square of the trigonometric function. for GPU implementation branch are the time consuming programming codes such that in an if-else, both parts are evaluated by the architecture, that reduces the resources. The average execution time of CPU is 8.25-times slower than average GPU execution time. The following results are obtained for comparison the execution time: - For a small number of solutions, CPU outperforms GPU - The increase on CPU execution time is proportional to the number of solutions. Hence, the execution time on CPU increases. - The GPU implementations are much beneficial for overall comparison. - For a very high number of solutions, the improvements obtained in GPU slowly decreases since GPU contains limited number of stream (multi)processors. At some point the improvements are not lower than « 10-times on average. 5 Conclusion In this study, desirability function-based scalarization approach is evaluated in a parallel fashion. Since the performance of sequential and parallel implementations are similar to each other, the execution time of these codes are compared based on different number of solutions. The results show that, for small number of solutions, parallel implementation is slower when compared to sequential implementation. But as the number of solution increases, the Parallel Implementation of Desirability. Informatica 39 (2015) 115-123 119 Table 1: Multiobjective benchmark problems Function name Mathematical description Decision variable range ZDT1 fi(x f2(x g = gd -V » 1 + n 1 + n-i i Xi 0 < xi < 1 ZDT2 /i(x) = Xi /2(X)= g(1 - (jl)2) g = 1 + n^î E n=2 xi 0 < xi < 1 ZDT3 fi(x f2(x g= = xi g(1 -- f sin(10nxi)) 1 + n 1 + n-i ^i UF1 fi(x f2(x Ji = = xi + ijiE ieJ, (xi - sin(6nxi + f ))2 = 1 - Vxi + J Eie j2 (xi - sin(6nxi + in))2 {i| i is odd and 2 < i < n}, J2 = {i| i is even and 2 < i < n} 0 < xi < 1 -1 < xi-i < 1 UF2 fi(x f2(x Vi = = xi + J Eie Ji V2 = 1 -^xI + J EieJi Vi (xi - (0.3x2 cos(24nxi + ) + 0.6xi) cos(6nxi + ^), i G Ji i xi - (0.3xi cos(24nxi + ^) + 0.6xi) sin(6nxi + ^), i g J2 0 < xi < 1 -1 < xi i < 1 UF3 fi(x f2(x Vi = = xi + ij-i ((4 EieJi v2) - (2 aeJi cos(^)) + 2) = 1 - Vxi + |j-t((4 EieJ2 V?) - (2 EUj cos(^)) + 2) o.5(i+1 ) - xi UF4 fi(x f2(x Vi = = xi + tJ^TT Eie Ji h(Vi) = 1 - xi + J EieJ2 h(Vi) xi - sin(6nxi + in), h(t) = 0 < xi < 1 -2 < xi-i < 2 i+e2 0 xi 1 x 0 xi 1 t 116 Informatica 39 (2015) 115-123 O. T. Altinoz et. al. Table 2: Execution time comparison [seconds] of benchmark functions, where improvement, impr, is the scale factor shows how many times the GPU is faster than CPU, so that if impr < 1 means CPU is faster than GPU # of levels for Devices 2 desirability & ZDT1 ZDT2 ZDT3 UF1 UF2 UF3 UF4 Average functions impr CPU 0.133 0.109 0.19 0.11 0.109 0.109 0.094 0.1220 8 x 8 GPU 0.433 0.4504 0.483 0.4917 0.4861 0.4906 0.408 0.4633 impr 0.3072 0.2420 0.3934 0.2237 0.2242 0.2222 0.2304 0.2633 CPU 0.221 0.153 0.291 0.222 0.199 0.197 0.168 0.2073 10 x 10 GPU 0.439 0.451 0.4848 0.4934 0.49 0.4914 0.405 0.4649 impr 0.5034 0.3392 0.6002 0.4499 0.4061 0.4009 0.4148 0.4450 CPU 0.446 0.333 0.576 0.42 0.418 0.413 0.372 0.4254 15 x 15 GPU 0.4424 0.4576 0.4904 0.499 0.4944 0.4967 0.409 0.4699 impr 1.0081 0.7277 1.1746 0.8417 0.8455 0.8315 0.9095 0.9055 CPU 0.8 0.564 0.997 0.717 0.706 0.728 0.811 0.7604 20 x 20 GPU 0.4281 0.442 0.4781 0.5 0.4977 0.5 0.4146 0.4658 impr 1.8687 1.2760 2.0853 1.4340 1.4185 1.4560 1.9561 1.6421 CPU 1.21 0.893 1.521 1.12 1.444 1.114 0.987 1.1841 25 x 25 GPU 0.4393 0.4573 0.491 0.5 0.4954 0.499 0.408 0.4700 impr 2.7544 1.9528 3.0978 2.2400 2.9148 2.2325 2.4191 2.5159 CPU 1.753 1.266 2.279 1.582 1.589 1.59 1.428 1.6410 30 x 30 GPU 0.4424 0.4566 0.4871 0.501 0.4973 0.4979 0.4132 0.4708 impr 3.9625 2.7727 4.6787 3.1577 3.1953 3.1934 3.4560 3.4880 CPU 3.162 2.186 4.094 2.794 2.854 2.757 2.508 2.9079 40 x 40 GPU 0.4451 0.453 0.4893 0.4999 0.4983 0.4991 0.4151 0.4714 impr 7.1040 4.8256 8.3671 5.5891 5.7275 5.5239 6.0419 6.1684 CPU 4.879 3.431 6.138 4.412 4.382 4.298 3.889 4.4899 50 x 50 GPU 0.4488 0.4639 0.4967 0.5119 0.5 0.501 0.4321 0.4792 impr 10.8712 7.3960 12.3576 8.6189 8.7640 8.5788 9.0002 9.3695 CPU 6.946 4.798 9.492 6.236 6.411 6.391 6.233 6.6439 60 x 60 GPU 0.4709 0.4864 0.518 0.5287 0.518 0.519 0.4587 0.5000 impr 14.7505 9.8643 18.3243 11.7950 12.3764 12.3141 13.5884 13.2876 CPU 9.52 6.764 11.959 8.566 8.548 8.562 7.592 8.7873 70 x 70 GPU 0.4995 0.5144 0.5417 0.5489 0.539 0.5435 0.4923 0.5256 impr 19.0591 13.1493 22.0768 15.6058 15.8590 15.7534 15.4215 16.7036 CPU 12.488 8.87 15.892 11.11 11.366 11.538 13.307 12.0816 80 x 80 GPU 0.6179 0.6321 0.6488 0.6388 0.635 0.6362 0.607 0.6308 impr 20.2104 14.0326 24.4945 17.3920 17.8992 18.1358 21.9226 19.1553 CPU 15.776 11.246 20.027 14.039 14.138 14.053 14.583 14.8374 90 x 90 GPU 0.8299 0.854 0.8749 0.8424 0.84 0.8432 0.8335 0.8454 impr 19.0095 13.1686 22.8906 16.6655 16.8310 16.6663 17.4961 17.5325 CPU 19.2579 13.863 24.504 17.252 19.219 17.74 15.49 18.1894 100 x 100 GPU 1.1157 1.149 1.1812 1.12 1.222 1.125 1.132 1.1493 impr 17.2608 12.0653 20.7450 15.4036 15.7275 15.7689 13.6837 15.8078 Parallel Implementation of Desirability. Informatica 39 (2015) 115-123 119 Figure 6: Comparison of the sequential Java and the parallel CUDA implementations. GPU is almost 20-times faster than sequential implementation. Acknowledgement This study was made possible by grants from the Turkish Ministry of Science, Industry and Technology (Industrial Thesis - San-Tez Programme; with Grant Nr. 01568.STZ.2012-2) and the Scientific and Technological Research Council of Turkey - TÛBITAK (with Grant Nr. 112E168). The authors would like to express their gratitude to these institutions for their support. References [1] R. Marler, S. Arora (2009) Transformation methods for multiobjective optimization, Engineering Optimization, vol. 37, no. 1, pp. 551-569. [2] N. Srinivas, K. Deb (1995) Multi-Objective function optimization using non-dominated sorting genetic algorithms, Evolutionary Computation, vol. 2, no. 3, pp. 221—248. [3] K. Deb, A. Pratap, S. Agarwal, T. Meyarivan (2002) A fast and elitist multiobjective genetic algorithm: NSGA-II, IEEE Transactions on Evolutionary Computation, vol. 6, no. 2, pp. 182--197. [4] J. D. Schaffer (1985) Multiple objective optimization with vector evaluated genetic algorithms, Proceedings of the International Conference on Genetic Algorithm and their Applications, pp. 93-100. [5] J. Branke, K. Deb (2008) Integrating user preferences into evolutionary multiobjective optimization, Knowledge Incorporation in Evolutionary Computing, Springer, pp. 461-478. [6] O. T. Altinoz, A. E. Yilmaz, G. Ciuprina (2013) A Multiobjective Optimization Approach via Systematical Modification of the Desirability Function Shapes, Proceedings of the 8th International Symposium on Advanced Topics in Electrical Engineering. [7] G. Derringer, R. Suich (1980) Simultaneous optimization of several response variables,Journal of Quality Technology, vol. 12, no. 1, pp. 214-219. [8] NVIDIA Corporation (2012) CUDA dynamic parallelism programming, NVIDIA. [9] E. Ziztler, K. Deb, L. Thiele (2000) Comparison of multiobjective evolutionary algorithms: Empirical results, Evolutionary Computation Journal, vol. 8, no. 2, pp. 125-148.