Informatica 39 (2015) 169-176 169 Data Mining-Assisted Parameter Tuning of a Search Algorithm Jurij Šilc Computer Systems Department, Jožef Stefan Institute, Jamova cesta 39, SI-1000 Ljubljana, Slovenia E-mail: jurij.silc@ijs.si Katerina Taškova Institute of Computer Science, Johannes Gutenberg University Mainz, Staudingerweg 9 55128 Mainz, Germany E-mail: ktaskova@uni-mainz.de Peter Korošec Computer Systems Department, Jožef Stefan Institute, Jamova cesta 39, SI-1000 Ljubljana, Slovenia, and Faculty of Mathematics, Science and Information Technologies, University of Primorska Glagoljaška 8, SI-6000 Koper, Slovenia E-mail: peter.korosec@ijs.si Keywords: data mining, differential ant-stigmergy algorithm, low-discrepancy sequences, meta-heuristic optimization, parameter tuning Received: December 1, 2014 The main purpose of this paper is to show a data mining-based approach to tackle the problem of tuning the performance of a meta-heuristic search algorithm with respect to its parameters. The operational behavior of typical meta-heuristic search algorithms is determined by a set of control parameters, which have to be fine-tuned in order to obtain a best performance for a given problem. The principle challenge here is how to provide meaningful settings for an algorithm, obtained as result of better insight in its behavior. In this context, we discuss the idea of learning a model of an algorithm behavior by data mining analysis of parameter tuning results. The study was conducted using the Differential Ant-Stigmergy Algorithm as an example meta-heuristic search algorithm. Povzetek: Osnovni namen članka je pokazati, kako se lahko z uporabo tehnik podatkovnega rudarjenja lotevamo problema uglaševanja sposobnosti metahevristcnega iskalnega algoritma z vidika njegovih parametrov. Delovanje značilnega metahevrističnega iskalnega algoritma je določeno z naborom njegovih krmilnih parametrov, ki morajo biti za dosego najboljših sposobnosti pri danem problemu dobro uglašeni. Temeljni izziv je kako zagotoviti najboljšo nastavitev algoritma, ki bo rezultat vpogleda v njegovo vedenje. V zvezi s tem razpravljamo o ideji učenja modela za obnašanje algoritma na osnovi analize podatkovnega rudarjenja rezultatov uglaševanja njegovih parametrov. Študija je narejena z uporabo Diferencialnega algoritma s stigmergijo mravelj, kot primera metahevrističnega iskalnega algoritma. 1 Introduction The research interest for meta-heuristic search algorithms has been significantly growing in the last 25 years as a result of their efficiency and effectiveness to solve large and complex problems across different domains [2]. The state-of-the-art nature-inspired meta-heuristic algorithms for high-dimensional continuous optimization include also algorithms inspired from the collective behavior of social organisms [14]. One such algorithm, which we will address in this paper, is the Differential Ant-Stigmergy Algorithm (DASA) initially proposed by Korošec [6], and further improved in [8]. DASA is inspired by the efficient self-organizing behavior of ant colonies emerging from a pheromone-mediated communication, known as stigmergy [3]. One of the first stigmergy-based algorithms designed for continuous func- tion optimization was Multilevel Ant Stigmargy Algorithm [7]. Naturally, DASA can be classified within the Ant Colony Optimization (ACO) framework. However, the use of a continuous pheromone model in the form of Cauchy probability distribution with representation of the search space in the form of the so-called differential graph makes DASA dissimilar from the original ACO paradigm. The rationale behind DASA is in memorizing (via the pheromone model updates) the "move" in the search space that improves the current best solution and using it in further search. As it is the case with most of the meta-heuristic algorithms, the operational behavior of DASA is determined by a set of control parameters. In practice, these parameters have to be fine-tuned in order to obtain best performance of the algorithm. It can be quite inconvenient for the users as: 176 Informatica 39 (2015) 169-176 J. Šilc et al. - they usually do not have insight into the behavior of the algorithm, and - even if a default setting exists, it may not be adequate for a specific instance or type of a problem. Moreover, parameter tuning is computationally expensive task. The principle challenge here is how to provide meaningful (default) settings forDASA, obtained as result of better insight into the algorithm's behavior. Furthermore, can we find optimal regions in DASA parameter space by analyzing the patterns in the algorithm's behavior with respect to the problem characteristics? Related to this, we discuss the preliminary findings based on data mining analysis of parameter tuning results. More precisely, the parameter tuning task is approached by two-step procedure that combines a kind of experimental design with data mining analysis. We use Sobol' sequences [10] for even sampling of the algorithm parameter space to generate a large and diverse set of parameter settings. These are used as input to DASA to be tuned on a given function optimization problem. The performance of DASA on the given function optimization problem, in terms of function error, is captured at different time points for all sampled parameter settings. The data collected in the first step, DASA performance with corresponding parameter settings, is subject for intelligent data analysis, i.e., multi-target regression with Predictive Clustering Trees [1]. Parameter sampling combined with regression has been already used by Stoean et al. [11] for tuning meta-heuristics: Latin hypercubes parameter sampling is combined with single-target regression with Support Vector Machines. Our approach modifies the former by replacing the Latin hypercube sampling by Sobol' sequences, as the former is best suited in the case when a single parameter dominates the algorithm's performance, while it should be used with care if there are interactions among the sampled parameters [9]. Moreover, we define the regression task as multi-target regression, taking into account more than one target (in this case the function error at few time points) with the goal to find parameter settings for the given algorithm that will not only solve the problem but will also solve the optimization problem fastest. The reminder of this paper is structured as follows. Section 2 introduces the differential ant-stigmergy algorithm. Then, Section 3 addresses the parameter tuning task and Section 4 presents the experimental evaluation with the results. After that, Section 5 discusses the idea of post-hoc analysis of parameter tuning by data mining. Finally, Section 6 summarizes this study and outlines possible directions for further work. 2 The Differential Ant-Stigmergy Algorithm The version of DASA used in our experimental evaluation is described in details by Korosec et al. [8] (see Figure 1). DASA introduces the concept of variable offsets (referred as to parameter differences) for solving the continuous optimization problems. By utilizing discretized offsets of the real-valued problem parameters, the continuous optimization problem is transformed to a graph-search problem. More precisely, assuming a multidimensional parameter space with x' being the current solution for the i-th parameter, we define new solutions x' as follow: x' = xi + wS', (1) where S' is called the parameter difference and is selected from the following set: A' = A- U{0}U A+, (2) where A- = {Srj S7k = -bk+Li-1,k = 1, 2,..., d'} (3) and A+ = {S+kI S+k = bk+Li-1, k =1, 2,..., d'}. (4) Here, d' = U' - L' + 1, (5) L' = Llogb^J, (6) U' = Llogb(max(x') - min(x'))J, (7) i = 1,2,... ,D, D is dimension of the problem, b is the discretization base, e is the maximal computer arithmetic's precision, and the weight w = Random_lnteger(1, b -1) is added to enable a more flexible movement over the search space. In principle, DASA relies on two distinctive characteristics, differential graph and continuous pheromone model. Here, we will briefly discuss these two characteristics and outline the main loop of the DASA search process. First, DASA transforms the D-dimensional optimization problem into a graph-search problem. The corresponding differential graph is a directed acyclic graph obtained by fine discretization of the continuous parameters' offsets. The graph has D layers with vertices, where each layer corresponds to a single parameter. Each vertex corresponds to a parameter offset value that defines a change from the current parameter value to the parameter value in the next search iteration. Furthermore, each vertex in a given layer is connected with all vertices in the next layer. The set of possible vertices for each parameter depends on the parameter's range, the discretization base and the maximal computer arithmetic's precision, which defines the minimal possible offset value. Ants use these parameters' offsets to navigate through the search space. At each search iteration, a single ant positioned at a certain layer moves to a specific vertex in the next layer, according to the amount of pheromone deposited in the graph vertices belonging to this layer. Data Mining-Assisted Parameter Tuning of a Search Algorithm Informatica 39 (2015) 169-176 173 1: J?tbest = Rnd_Solution() 2. ^best _ y^tbest-j 3: jtbest = inf 4: Q - Graph_Initialization(6, t, U, e) 5: Pheromone_Initialization(^) 6: while terminating condition is not met do 7: k= 0 8: for all m ants do 9: repeat 10: pj = Find_Path(^) {path of the i-th ant} ll: k=k+1 12: if k > n? then 13: ^tbest = Rnd_Solution() 14- ybest _ y^tbest^ 15: Pheromone_Initialization(^) 16: goto line 7 {a local optimum was found, so the search process is restarted} 17: end if 18: until {pi = 0) {means of all parameters' offsets are 0} 19: a) = Random_Integer(l, 6-1) 20: % = ^tbest + co5{p) 21: end for{ants created solutions} 22: ycbest = inf 23: for all m ants do 24: y = f(Xj) {function evaluation} 25: if / 1.0E+02 □ 1.0E+01... 1.0E+02 □ 1.0E+00 ... 1.0E-01 □ 1.0E-01... 1.0E-02 □ 1.0E-02 ... 1.0E-04 □ 1.0E-04... 1.0E-02 □ 1.0E-06 ... 1.0E-08 ■ < 1.0E-08 1.0E+03 1.0E+04 1.0E+05 1.0E+06 FEs Figure 2: Median error distributions for the Sphere function in the case of D = 40. 5 Data Mining Analysis Parameter tuning of an algorithm leads to a better performance, however it is a computationally expensive and problem-dependent task. Considering this, the ideal is to extend the simple tuning that delivers a single parameter set and analyze the gathered data in an intelligent way. The intelligent analyst can extract patterns (regularities) in the; explored parameter space that define a specific behavior of DASA. To this end, data mining methods for automated discovery of patterns in data can be used. As data mining methods can only discover patterns that are present in the data, the dataset subject to analysis must be large enough and informative enough to contain these patterns, i.e., to describe different types of algorithm's behavior. Related to this, we considered a data mining approach on a representative example, i.e., error model of the Sphere function. To begin with, consider the graph in Figure 2 that visualizes the Sphere function error distribution. The graph depicts the distribution of the median error values obtained by 5000 parameter settings at four different points for D = 40. As we are more concerned with the practical significance between a large and a small error value than the statistical significant difference between two actual error values, the error values are discretized into nine intervals, each of which is represented with a color chosen according to the error magnitude between black (error below 10—8) and white (error above 102). The graph clearly shows that the sampled settings determine different DASA performance. As evident, there is a big cluster of parameter settings that solve this function to the aimed accuracy (error below 10—8) in the given time budget (106 FEs). Moreover, subset of this cluster solves the function for an order of magnitude less FEs. Our aim, therefore, is to find a (common) description of this cluster, in terms of DASA parameter relations, that represents a good behavior of DASA (as well as what parameter relations lead to a bad DASA performance). For this purpose, we formulated the problem as a predictive modeling task using decision trees to model the function error values in terms of the values of DASA parameters. Since the function error variables are continuous, the task at hand is a regression task. Furthermore, as our goal is to model the behavior of DASA at all time points simultaneously, the problem at hand is then a multi-target regression. To this end, we used Predictive Clustering Trees (PCTs) which are implemented in Clus system [1]. In this case, the median error at the four time points define the four target attributes considered for modeling (with the PCT) in dependence from the descriptive attributes, i.e., the function dimension and the five DASA parameters. The resulting dataset is composed of 10002 rows described with the 5001 parameter settings (including the default setting) of DASA applied to the two different dimensions, and 10 columns corresponding to the 10 attributes. Figure 3 presents a PCT model for the Sphere function error. Each internal node of the tree contains a test on a descriptive attribute, while the leaves store the model predictions for the function error, i.e., a tuple of four values. The predictions are calculated as the mean values of the corresponding error values for the data instances belonging to the particular leaf (represented by a box). In fact, each leaf identifies one cluster of data instances (the size of the cluster is the value in the small box). The predictive performance of the model was assessed with 10-cross-validation. Note that this particular model was learned on the complete dataset subject to constraints on the maximal tree size of 25 nodes. We did this because the original model contained 1643 nodes (of which 822 leaves) and despite its better predictive performance, both training and testing, it was not comprehensible; aiming for an explanatory model, small and comprehensible, we considered the smaller tree obtained with the limitation of the size. The predictive performance of both models in terms of Root Relative Mean Squared Error (RRMSE) and Pearson Correlation Coefficient (PCC) are given in Table 3. Note that RRMSE represents the relative error with respect to the mean predictor performance, while PCC represents the linear correlation between the data and the model predictions. Good models have RRMSE values closer to 0 and PCC closer to 1. The model in Figure 3 outlines 13 clusters of data instances, of which two (depicted with light-gray boxes) represent a good DASA performance. According to this model, the number of ants, m, is the most important DASA parameter for its performance on the Sphere function. More precisely, if m > 83, independent of the values of the other parameters, DASA solves the 20-dimensional functional problem for the given time budget. Moreover, if m < 83 another DASA parameters become important as well. For example, if 43 < m < 83 and s+ > 0.009 and D = 20 then DASA solves the function with error 3 x 10—6, while the pattern m < 43 and s<0.040 and s— > 0.656 describes a poor DASA performance regard- 0% Data Mining-Assisted Parameter Tuning of a Search Algorithm Informatica 39 (2015) 169-176 173 Figure 3: Predictive clustering tree representation of the error model for the Sphere function. less of the function dimension. An interesting fact is that, the evaporation factor is not essential for DASA performance on the Sphere function. Moreover, the model also shows that is more difficult to describe the behavior of DASA for the 40-dimensional function problem than the 20-dimensional one. Finally, note that the training performance (learned on the complete dataset) of the model in terms of the error and the correlation coefficient is best for the first target, while it gets worse with respect to the other three targets (see Table 3). This is especially significant if we take into account the testing performance of the model estimated with 10-cross-validation. However, the training performance is acceptable in our case, as we are interested in understanding the behavior of DASA and not aiming to obtain a model for prediction. 6 Conclusion The principle challenge of meta-heuristic design is providing a default algorithm configuration, in terms of parameter setting, that will perform reasonably well in general (problem) case. However, while it is a good initial choice, the default algorithm configuration may result in low quality solutions on a specific optimization problem. In practice, the algorithms parameters have to be fine-tuned in order to obtain best algorithm's performance for the problem at hand, leading to the computational expensive task of parameter tuning. So, if the tuning task is unavoidable, the question is: can we use the results from the parameter tuning to extract some knowledge about the algorithm's be- havior? Related to this, the study focused on the problem of tuning the performance of the Differential Ant-Stigmergy Algorithm (DASA) with respect to its parameters. As it is the case with most of the meta-heuristic algorithms, the operational behavior of DASA is determined by a set (five) of control parameters. The existing default setting of DASA parameters [8] is obtained by experimentation with both real and benchmark optimization problems, but not as a result of some systematic evaluation. Furthermore, there is no deeper understanding of the impact of a particular parameter or parameters relations on the performance of DASA. In this context, we performed a systematic evaluation of DASA performance obtained by solving the Sphere function optimization problem with 5000 Sobol' sampled DASA parameter settings regarding two dimensions, 20 and 40. Furthermore, we discussed the idea of learning a model of DASA behavior by data mining analysis of the parameter tuning results. In this context, we formulated the problem as multi-target regression and applied predictive clustering trees for learning a model of DASA behavior with respect to the function error performance. The obtained model revealed that the parameter denoting number of ants is the most important parameter for DASA performance on the 20-dimensional function problem. On the other hand, the evaporation factor is not essential for DASA performance on the Sphere function. Further work will focus on additional experimental evaluation and data mining analysis of data with respect to more complex functions problems. This idea can be fur- 176 Informatica 39 (2015) 169-176 J. Šilc et al. Table 3: Model performance with respect to RRMSE and PCC Measure RRMSE PCC Training 1643 nodes 25 nodes 1643 nodes 25 nodes 25 x D 0.166 0.408 0.986 0.913 250 x D 0.373 0.679 0.928 0.734 2500 x D 0.487 0.562 0.873 0.827 25000 x D 0.472 0.546 0.882 0.837 Mean 0.396 0.557 0.843 0.689 Testing 1643 nodes 25 nodes 1643 nodes 25 nodes 25 x D 250 x D 2500 x D 25000 x D Mean 0.219 0.638 1.019 1.053 0.806 0.450 0.817 1.039 1.055 0.875 0.976 0.777 0.304 0.234 0.426 0.893 0.584 0.246 0.218 0.312 ther extended to building models of DASA behavior that will include the optimization problem characteristics (such as multimodality, separability, and ill-conditioning) as descriptive attributes as well. The latter can provide insights on how to configure DASA performance with respect to the type of the optimization problem. Moreover, these insights can serve as a valuable information for improvement of DASA design. References [1] H. Blockeel, J. Struyf (2002) Efficient Algorithms for Decision Tree Cross-validation, Journal of Machine Learning Research, vol. 3, pp. 621-650. [2] C. Blum, A. Roli (2003) Metaheuristics in Combinatorial Optimization: Overview and Conceptual Comparison, ACM Computing Surveys, vol. 35, no. 3, pp. 268-308. [3] E. Bonabeau, M. Dorigo, G. Theraulaz (1999) Swarm Intelligence: From Natural to Artificial Systems, Oxford University Press. [4] A. E. Eiben, S. K. Smit (2011) Parameter Tuning for Configuring and Analyzing Evolutionary Algorithms, Swarm and Evolutionary Computation, vol. 1, no. 1, pp. 19-31. [5] S. Joe, F. Y. Kuo (2008) Constructing Sobol Sequences with Better Two-dimensional Projections, SIAM Journal on Scientific Computing, vol. 30, no. 5, pp. 2635-2654. [6] P. Korošec (2006) Stigmergy as an Approach to Metaheuristic Optimization, Ph.D. dissertation, Jožef Stefan International Postgraduate School, Ljubljana, Slovenia. [7] P. Korošec, J. Šilc (2008) Using Stigmergy to Solve Numerical Optimization Problems, Computing and Informatics, vol. 27, no. 3, pp. 341-402. [8] P. Korošec, J. Šilc, B. Filipic (2012) The Differential Ant-stigmergy Algorithm, Information Sciences, vol. 192, no. 1, pp. 82-97. [9] W. H. Press, S. A. Teukolsky, W. T. Vetterling, B. P. Flannery (1992) Numerical Recipes, Cambridge University Press. [10] I. M. Sobol' (1967) Distribution of Points in a Cube and Approximate Evaluation of Integrals, USSR Com-pututational Mathematics and Mathematical Physics, vol. 7, no. 4, pp. 86-112. [11] R. Stoean, T. Bartz-Beielstein, M. Preuss, C. Stoean (2009) A Support Vector Machine-Inspired Evolutionary Approach for Parameter Setting in Meta-heuristics, CIOP Technical report 01/09, Faculty of Computer Science and Engineering Science, Cologne University of Applied Science, Germany. [12] G. Taguchi, T. Yokoyama (1993) Taguchi Methods: Design of Experiments, ASI Press. [13] E.-G. Talbi (2009) Metaheuristics: From Design to Implementation, John Wiley & Sons. [14] X.-S. Yang (2008) Nature-Inspired Metaheuristic Algorithms, Luniver Press.