Acta Chim. Slov. 2003, 50, 259-273. 259 A QSPR STUDY OF BOILING POINT OF SATURATED ALCOHOLS USING GENETIC ALGORITHM Mohsen Kompany-Zareh Institute for AdvancedStudies in Basic Sciences, Zanjan 45195-159, Iran. Received 28-10-2002 Abstract A QSPR method is applied to study the boiling point of alcohols by the employment of the following properties: Schulz's index, Randič's connectivity index, Wiener's number, surface area, volume, log P, molar refractivity, and polarizability. The idea behind the choice of these topological and physicochemical descriptors is to use realistic molecular quantities, which can, in principle, express aH of the topological, electronic and geometric properties of molecules and their interactions. The boiling point values for a set of 44 alkanols were used, and by using a genetic algorithm (GA) coupled with partial least squares (PLS) method, aH different possible relations between boiling point (bp) and the molecular properties up to the fourth order were examined and a group of multiple regression models with high fitness scores was generated. Using a backward elimination method on the top descriptors obtained from genetic algorithm, Randič's index, surface area (grid), log of octanol-water partition coefficient, molecular refractivity, and polarizability were selected as significant descriptors. The analysis of computed data, and test of model for a validation set including 10 alcohols, shows that selected descriptors and selected order for each one are extremely well fitted tools for assessing the boiling point of alcohols. In particular, we have verified that using higher level relationships (i.e. square, cubic, and/or quadratic) in several-variable equations give excellent accuracy. Introduction In chemistry, anything that can be said about the magnitude of the property and its dependence upon changes in the molecular structure depends on the chemisfs capability to establish valid relationships betvveen structure and property. In many physical organic, biochemical and biological areas, it is increasingly necessary to translate those general relations into quantitative associations expressed in useful algebraic equations known as Quantitative Structure-Activity(-property) Relationships (QSA(P)R).' To obtain a significant correlation, it is crucial that appropriate descriptors be employed, whether they are theoretical, empirical or derived from readily available experimental features of the structures. Many descriptors reflect simple molecular properties and thus they can provide some meaningful insights into the physico-chemical nature of the activity/property under consideration. In a relatively recent paper Castro et al. applied three well known topologic indices in the QSPR study of boiling point of saturated alcohols: The Schultz index, the M. Kompany-Zareh: A QSPR Study of boiling point of saturated alcohols using genetic algorithm 260 Acta Chim. Slov. 2003, 50, 259-273. Wiener number, and a connectivity index of Randič. They analyzed several polynomial correlations between the boiling points and the three topological indices, and found a satisfactory enough agreement between the theoretical and experimental results. Table 1. Features used in the QSPR analysis of the data set. MTI xv w SAG V Log P MR POL Schultz index [references 4-6].a Randič's Valence connectivity index [references 9 and 10].a Wiener number [references 7 and 8].° Surface area (grid) [references 11 and 12]. * Volume.b log of octanol-water partition coefficient [references 13 and 14].* Molar refractivity [references 14 and 15]. * Polarizability [reference 16].* a Topological descriptor. * Molecular descriptor. Table 2. Experimental and calculated boiling points for the validation set by equations 4 and 10. bp(°C) bp(°C) Error bp(°C) Error Alkan01 obsd calcd" %" calcd* %* Training set 1. Methanol 64.7 63.61 -1.68 65.88 1.82 2. Ethanol 78.3 81.88 4.58 81.67 4.30 3. 1-propanol 97.2 98.93 1.78 97.98 0.80 4. 2-propanol 82.3 83.13 1.01 86.81 5.49 5. 1-butanol 117.7 116.28 -1.20 113.42 -3.64 6. 2-methyl-1 -propanol 107.9 104.90 -2.78 104.53 -3.12 7. 2-methyl-2-propanol 82.4 79.49 -3.53 80.59 -2.20 8. 1-pentanol 137.8 134.25 -2.58 134.09 -2.69 9. 3-pentanol 115.3 122.89 6.58 122.98 6.66 10. 2-methyl-1 -butanol 128.7 126.15 -1.98 125.35 -2.60 11. 3 -methy 1-1 -butanol 131.2 125.20 -4.58 124.21 -5.33 12. 3 -methyl-2-butanol 111.5 112.67 1.05 111.76 0.23 13. 2,2-dimethyl-1 -propanol 113.1 111.19 -1.69 109.35 -3.31 14. l-hexanol 157.0 155.76 -0.79 156.92 -0.05 15. 2-hexanol 139.9 140.51 0.43 140.42 0.37 16. 2-methyl-1 -pentanol 148.0 146.52 -1.00 146.23 -1.19 17. 4-methyl-1 -pentanol 151.8 146.64 -3.40 145.50 -4.15 18. 2-methyl-2-pentanol 121.4 125.67 3.52 124.23 2.33 19. 4-methyl-2-pentanol 131.7 133.22 1.16 131.98 0.21 " The values of bp were calculated using equation 4. * The values of bp were calculated using equation 10. M. Kompany-Zareh: A QSPR Study of boiling point of saturated alcohols using genetic algorithm Acta Chim. Slov. 2003, 50, 259-273. 261 Table 2. continued. Alkanol bp(°C) obsd bp(°C) calcd" Error %" bp(°C) calcd * Error %* 20 2-methyl-3 -pentanol 126.5 131.24 3.75 131.45 3.91 21. 3 -methy 1-3 -pentanol 122.4 127.08 3.82 127.28 3.99 22. 2-ethyl-l-butanol 146.5 148.55 1.40 149.16 1.81 23. 2,2-dimethyl-1 -butanol 136.8 136.68 -0.09 134.80 -1.46 24. 2,3-dimethyl-l-butanol 149.0 141.34 -5.14 140.86 -5.46 25. 3,3 -dimethyl-1 -butanol 143.0 144.98 1.38 143.19 0.13 26. 3,3 -dimethyl-2-butanol 120.0 123.13 2.61 121.03 0.86 27. 1-heptanol 176.3 174.83 -0.84 175.47 -0.47 28. 4-heptanol 155.0 156.98 1.28 158.06 1.98 29. 2-methyl-2-hexanol 142.5 145.42 2.05 142.92 0.29 30. 3 -methy 1-3 -hexanol 142.4 144.34 1.36 146.16 2.64 31. 3 -ethyl-3 -pentanol 142.5 144.12 1.14 149.63 5.00 32. 2,2-dimethyl-3 -pentanol 136.0 138.40 1.76 138.96 2.18 33. 2,4-dimethyl-3 -pentanol 138.8 137.81 -0.71 139.76 0.69 34. 2-octanol 179.8 179.32 -0.26 179.76 -0.02 35. 2-Ethyl-1 -hexanol 184.6 186.18 0.86 192.21 4.12 36. 2,2,3 -trimethyl-3 -pentanol 152.2 143.08 -5.99 143.37 -5.80 37. 1-Nonanol 213.1 214.39 0.60 216.34 1.52 38. 2-Nonanol 198.5 196.10 -1.21 194.88 -1.82 39. 4-Nonanol 193.0 190.33 -1.38 191.50 -0.78 40. 5-Nonanol 195.1 190.84 -2.18 191.03 -2.09 41. 7-methyl-1 -octanol 206.0 206.48 0.23 205.82 -0.09 42. 2,6-dimethyl-4-heptanol 178.0 179.56 0.88 178.48 0.27 43. 3,5,5 -trimethyl-1 -hexanol 193.0 196.31 1.71 192.87 -0.07 44. 1-Decanol Validation set 230.2 232.73 1.10 230.22 0.01 45. 2-pentanol 119.0 121.35 1.97 121.05 1.72 46. 2-methyl-2 -butanol 102.0 105.51 3.44 104.36 2.32 47. 3-hexanol 135.4 140.32 3.63 141.25 4.32 48. 3 -methy 1-1 -pentanol 152.4 148.85 -2.33 148.75 -2.39 49. 3 -methyl-2 -pentanol 134.2 133.24 -0.72 133.36 -0.62 50. 2,3-dimethyl-2-butanol 118.6 118.78 0.15 117.21 -1.17 51. 3-heptanol 156.8 156.70 -0.07 158.07 0.81 52. 2,3 -dimethyl-3 -pentanol 139.0 137.34 -1.19 139.52 0.38 53. 1-octanol 195.2 196.17 0.50 199.23 2.06 54. 3-Nonanol 194.7 189.46 -2.69 191.55 -1.62 " The values of bp were calculated using equation 4. * The values of bp were calculated using equation 10. M. Kompany-Zareh: A QSPR Study ofboilingpoint of saturated alcohols using genetic algorithm 262 Acta Chim. Slov. 2003, 50, 259-273. M. Kompany-Zareh: A QSPR Study of boilingpoint of saturated alcohols using genetic algorithm Acta Chim. Slov. 2003, 50, 259-273. 263 However, it seems also necessary to use molecular structures to make a correlation study of a property like boiling point because of possibility of taking into consideration the electronic and geometric properties of molecules and their corresponding interactions, in addition to topological properties. Accordingly, and in addition to the three topological properties reported in the previous study, we have chosen as molecular descriptors the following five molecular properties: surface area (grid) (SAG), volume (V), log P (log of the octanol-water partition coefficient, which is a measure of hydrophobicity), molar refractivity (MR), and polarizability (POL). In the next step, second- to fourth- orders from each of eight main descriptor was generated. The 32 (=4x8) subfeatures obtained in this way was used as initial set of data from which, and based on a GA procedure, crucial descriptors for performing a proper quantitative structure-property relationship (QSPR) analysis were selected. Calculation of physicochemical properties: Calculation of the Schultz index4"6 is explained by Nikolič et al,6 the Wiener number by Hosoya, ' and the valence connectivity index by Randič and also by Kier and Hali.910 The grid calculation of surface area is rather accurate for a given set of atomic radii, and was described by Bodor et al. using the atomic radii of Gavezotti. The volume calculation is very similar to the SAG calculations and employs a grid method described by Bodor. Calculation of logP is carried out using atomic parameters first derived by Ghose et al.13 and extended later by Ghose and coworkers.14 The molar refractivity is estimated by the same method of computing log P. Ghose and Crippen presented atomic contributions to the refractivity in exactly the same way as the hydrophobicity.1415 The polarizability is estimated from an additivity scheme given by Miller where different increments are associated with different atom types. Methods QSPRs /based on genetic algorithms: Recently, some published papers suggested that genetic algorithm (GA) is useful in data analysis, especially in the task of reducing the number of features for regression 1 7 7^ 1 7 models. " Roger and Hopfmger first applied this method in QSA(P)R analysis and M. Kompany-Zareh: A QSPR Study of boiling point of saturated alcohols using genetic algorithm 264 Acta Chim. Slov. 2003, 50, 259-273. (a) (b) 200 180 160 140 120 100 80 60 - 40 20 0 Feature number 200 180 160 140 120 100 80 60 40 20 0 Feature number (c) 200 180 160 140 120 100 80 60 40 20 0 Feature number Figure 1. Number of models including each of the features in the elite population, obtained from runing the genetic algorithm for the 4th time(a), 2nd time(b), and 8th time(c) that result to equations 4, 2, and 8 in Table 3. M. Kompany-Zareh: A QSPR Study ofboilingpoint ofsaturated alcohols using genetic algorithm Acta Chim. Slov. 2003, 50, 259-273. 265 proved GA a very effective tool and had many merits that other methods did not have. Compared to other traditional methods, QSPRs based on GAs fmd a group of reliable QSPR models from a large number of sample polynomials. Moreover, from the analysis of the variables used in the evolution procedure, we might obtain the crucial physicochemical properties related to the property. The QSPR in this study was based on the a GA, coupled with a partial least squares procedure (PLS), which was obtained from PLS-Toolbox of MATLAB. One advantage of using PLS instead of multiple linear regression (MLR) beside GA is the possibility of selecting a number of variables more than the number of samples, that is important when only a small number of samples are utilized in the modeling. The second advantage is the possibility of preparation of models free of errors from the high degrees of collinearity betvveen variables, as will be discussed more in the next section. The brief basic steps of the module are as follows: Creation of the Initial Population. According to the genetic algorithm, an individual should be represented as a linear string of randomly chosen subfeatures, which plays the role of the DNA for the individuals. The initial population is generated by randomly selecting some number of subfeatures from the data set. Then these individuals are scored according to their fitness score. An elite population is used to retain the best different individuals. a. Crossover Operation. Once ali the models in the population have been rated using the fitness scores, the crossover operation is performed repeatedly. In the operation, two good models are probabilistically selected as "parents" with the likelihood of being chosen proportional to a model fitness score; a pair of children are produced by dividing both parents at a randomly chosen point and then joining the pieces together. b. Mutation operation. After crossover operation, mutation operation may randomly alter ali individuals in the new population, and new model fitness is determined. c. Comparison Operation. After the crossover and mutation operation, the newly created population and the elite population are compared. If there are some individuals in the newly created population that are better than some individuals in the elite M. Kompany-Zareh: A QSPR Study ofboilingpoint of saturated alcohols using genetic algorithm 266 Acta Chim. Slov. 2003, 50, 259-273. M. Kompany-Zareh: A QSPR Study ofboilingpoint of saturated alcohols usinggenetic algorithm Acta Chim. Slov. 2003, 50, 259-273. 267 population, these better individuals are copied to the elite population. When the total fitness of the elite population cannot be improved and about 80% of individuals containing the same subfeatures, "convergence " is achieved. Upon completion, from the elite population, the models with the highest fitness scores can be obtained. For a population of 200 models, 20- 50 operations are enough when the data set contains 32 subfeatures. This process takes about 2 min on a PC (Pentium 200). Reliabilitv of the Models obtained from GA. Most of the models in the elite population had similar fitness scores, after convergence. In this study, the fitness function was defined as the multiple linear regression coefficient (r). The reliabilitv of the models were mainlv tested with their F-values of their coefficients, as will be discussed in the next part. Results and Discussion Construction of the polvnomial QSPR models. The training and validation data sets contained 44 and 10 compounds (Table 2), respectivelv, and 8 main topological and molecular descriptors. The abbreviations for these descriptors are given in Table 1. For this data set populations with 200 individuals were used. The genetic operator was applied until the total fitness score of the elite populations no longer improved significantlv and 70% of individuals include similar subfeatures. The convergence criterion was met after less than 50 operations. After nine times repeating the GA calculations, nine top seven- to ten-term multiple linear regression models were obtained and are listed in Table 3 as equations 1 to 9. Because a model could not be properlv evaluated only by its multiple linear regression coefficient, the quality of the models was tested statistically by the standard error of mean (SD), and overall F statistic for multiple linear regression modeling. The values for the 15 top subfeatures obtained from the nine top models in Table 3 are listed in Table 4. Generally, for the analysis by MLR, the data must be reduced to fewer and less correlated variables. The cross-correlated descriptors would mislead the QSPR model in uncovering the actual relationship betvveen the property and these descriptors. The M. Kompany-Zareh: A QSPR Study ofboilingpoint of saturated alcohols using genetic algorithm 268 Acta Chim. Slov. 2003, 50, 259-273. correlation study of these subfeatures in the top 9 models are listed in Table 5 and many equations in Table 3 were proven to contain descriptors that were highly cross correlated. Considering the significance of the F-value (at 95% confidence level) for each of the coefficients in the polynomial model, ali nine equations from GA were unsatisfactory. The results for coefficients in equations 4 and 6, i.e., the F values at the 0.95 confidence level, are listed in Table 6. 210 -, 190 ¦I 170 o. O) i 150 -| o S 130 - JO i no o 90 Slope = 0.98±0.08, Intercept = 2.93±11.65 (at 95% confidence Krnite), r = 0.9952 90 110 130 150 170 Actual boiling point 190 210 Figure 2. Comparison of actual boiling points with calculated obtained from equation 10 for validation set. To modify the models into a unique and satisfactory polvnomial, and deletion of less importants from the correlated variables, a backward elimination procedure was carried out to a polvnomial containing ali of the top 15 subfeatures from Tables 3 and 4. The procedure was based on the significance of the F-value (at 95% confidence level) for each of the coefficients in the polvnomial model in each step. In this way, equation 10 was obtained, as the most suitable polynomial QSPR model, with ali coefficients statistically significant. The predicted boiling point values for ali 54 alkanols (44 training set and 10 from validation set) using equation 10 are listed in Table 2. The results for the validation set are also shown in Figure 2. Principal Features Determined. Figure 1 shows the number of models including each of the subfeatures in the elite population after the convergence for different runs of GA. As illustrated, the appearance frequency of subfeatures in the models due to the final elite populations were quite different from that at the beginning, which was almost equal frequency of appearance for ali of the subfeatures. After running GA nine times M. Kompany-Zareh: A QSPR Study of boiling point of saturated alcohols using genetic algorithm Acta Chim. Slov. 2003, 50, 259-273. 269 M. Kompany-Zareh: A QSPR Study ofboilingpoint of saturated alcohols usinggenetic algorithm 270 Acta Chim. Slov. 2003, 50, 259-273. and preparation of models based on the most frequent subfeatures for each run, as shown in Table 3 and Figure 1, Table 4 was obtained which is the list and values of top 15 subfeatures accounted for nearly ali the features in the top 9 QSPR models in Table 3. These top 15 subfeatures present ali eight features (MTI, %v, W, SAG, V, log^, MR, and POL) as important factors affecting the boiling point. The appearance frequencies of the other 17 subdescriptors were very low in the elite population and show that these 17 forms of 8 main features are not the effective forms. In the last step, according to a backward elimination procedure the significance of presence for each of fifteen selected variables was tested using F-values of their coefficients in the multiple regression model and a QSPR equation was obtained in which F-values calculated for ali of the coefficients were significant. The final model is: bp = 172.87 +49.23xv-0.40SAG +57.841ogP -18.08POL +1.30MR2 -7.66POL2 -0.119(xv)4 (10) (n = 44, r = 0.9945, F = 464.92, SD = 4.26) Statistical results from this final model compare to some of the polvnomials obtained from GA are in Table 6 and well illustrate the significance of the final model. According to the logP definition, which stood for lipophilicitv of the compound, the positive coefficient of it pointed out that more lipophile alcohols contributed to high boiling points. The positive high value coefficient of Randič's connectivitv index xv in equation 10 is similar to the previous study by Nikolič et al.6 It was also suggested from equation 10 that MR, which was the molar refractivity of the molecule, was a necessary contributor to the boiling point. A positive sign of the coefficient for this term indicate that molecular volume and polarizability of the molecules were very vital to the boiling point, in addition to the topology of them. Polarizability (POL) was assigned as an effective variable on boiling point, but with a negative coefficient. Totally the resulting equation illustrates that boiling point could be satisfactorily explained by one topological and four molecular descriptors. From the GA results in Table 4, the parameters MTI and W and V" seem also important to the boiling point. But the correlation studies, listed in Table 5, showed that they are not independent features. MTI and W were highly crosscorrelated with (xv)4, with the correlation coefficient of more than 0.98, and V was highly correlated with M. Kompany-Zareh: A QSPR Study of boiling point of saturated alcohols using genetic algorithm Acta Chim. Slov. 2003, 50, 259-273. 271 Table 6 The 95% confidence level and F statistics for the coefficient of variables in equations 4, 6, and 10. Variable Coeff 95% Conf t-statistic F Significance MTI -0.07 ±0.22 -0.65 0.43 NSfl xv +116.93 ±44.32 +5.37 28.8 Sa SAG -1.99 ±1.38 -2.94 8.65 sa logP +61.13 ±14.05 +8.85 78.30 sa POL -23.84 ±12.31 -3.94 15.52 sa (xv)2 -17.142 ±8.99 -3.88 15.04 sa SAG2 +0.0031 ±0.0027 +2.3467 15.46 sa MR2 +1.400 ±0.317 +8.995 80.91 sa POL2 -7.878 ±1.641 -9.768 95.41 sa SAG4 -1.25xl0"9 ±6.41xl0"9 -0.398 0.16 NSfl xv +53.81 ± 10.30 +10.60 112.40 S" logP +58.27 ± 12.72 +9.30 86.52 sb POL -16.17 ±7.21 -4.55 20.71 s" SAG2 -0.0030 ± 0.0020 -2.9785 3.89 NS* MR2 +1.329 ±0.215 +12.550 157.49 Sb POL2 -7.845 ± 1.302 -12.233 149.65 s" SAG3 +5.2xl0"6 ±4.4xl0"6 2.4 149.96 s" (xv)4 -0.2304 ±0.1240 -3.7704 150.05 sb xv +49.23 ±9.78 +10.20 104.13 sc SAG -0.40 ±0.19 -4.29 18.44 sc logP +57.84 ± 13.28 +8.83 77.99 sc POL -18.07 ±6.38 -5.75 33.04 sc MR2 +1.301 ±0.224 +11.795 139.13 sc POL2 -7.665 ±1.363 -11.403 130.02 sc (xv)4 -0.1188 ±0.0623 -3.8686 14.97 sc "Not significant, compared to one-tailed F(0.05; 1, 33) = 4.35. *Not significant, compared to one-tailed F(0.05; 1, 35) = 4.35. c Significant, compared to one-tailed F(0.05; 1, 36) = 4.35. MR with the correlation coefficient of 0.98. That is to say, the change of the values of MTI, W, and V3 were mainly caused by the changes of the (xv)4 and MR2. Compared with these seven subfeatures at equation 10, other subfeatures, with high frequencies in the elite populations obtained from GA, contributed a little to the vame of boiling point. Addition of these descriptors to equation 10 not only results in no improvements in r vame, but also would cause a decrease in the F vame of the regression M. Kompany-Zareh: A QSPR Study of boiling point of saturated alcohols using genetic algorithm 272 Acta Chim. Slov. 2003, 50, 259-273. from that of fmal model (F=464.92). From the correlation study, it could be found that (SAG) was highly cross-correlated with SAG, and was not selected in the fmal polynomial in spite of its high frequency of presence in the elite populations from GA. Conclusion In this study we attempted to correlate boiling points of 54 alkanols with toplogical and molecular properties. By using a GA, the polynomial regression models were constructed. These derived models were tested from the viewpoint of statistical significance and from the fmal statistically significant obtained QSPR polynomial (equation 10), five principal features relevant to the boiling point of alcohols, including Xv, SAG, \ogP, MR, and POL were obtained. Considering SAG, logP, MR, and POL as molecular descriptor in the fmal model, it could be concluded that the molecular effects, such as surface area, lipophilicity, molecular volume and polarizability, would influence the boiling point of alkanols in addition to the topology of them. References 1. L. H. Hali, L. B. Kier, Rev. Comp. Chem. 1991, 2, 367-422. 2. M. Karelson, V. S. Lobanov, A. R. Katritzky, Chem. Rev. 1996, 96, 1027-1043. 3. E. A. Castro, M. Tueros, http://preprint.chemweb.com/cps/pliyscliem/0110012. 4. H. P. Schultz, J. Chem. Inf Comput. Sel 1989, 29, 227-228. 5. R. Todeschini, V. Consoni, Handbook of Molecular Descriptors, Wiley-VCH, Weinheim, 2000, p. 381. 6. S. Nikolič, N. Trinajstič, Z. Mihalič, J. Math. Chem. 1993, 12, 251-264. 7. H. Wierner, J. Am. Chem. Soc. 1947, 69, 17-20. 8. H. Hosoya, Buli. Chem. Soc. Japan 1971, 44, 2332-2339. 9. M. Randič, J. Am. Chem. Soc. 1975, 97, 6609-6615. 10. L. B. Kier, L. H. Hali, J. Pharm. Sci. 1976, 65, 1806-1809. 11. N. Bodor, Z. Gabanyi, C. Wong, J. Am. Chem Soc. 1989, 111, 3783-3786. 12. A. Gavezotti, J. Am. Chem. Soc. 1990, 112, 6127-6129. 13. A. K. Ghose, P. Pritchett, G. Crippen, J. Comput. Chem. 1988, 9, 80-90. 14. V. N. Visvanadhan, A. K. Ghose, G. Revankar, R. K. Robins, J. Chem. Inf. Comput. Sci. 1989, 29, 163-172. 15. A. K. Ghose, G. M. Crippen, J. Chem. Inf. Comput. Sci. 1987, 27, 21-35. 16. K. J. Miller, J. Am. Chem. Soc. 1990, 112, 8543-8551. 17. D. Roger, A. J. Hopfrnger, J. Chem. Inf. Comput. Sci. 1994, 34, 854-866. 18. R. Leardi, R. Boggia, M. Terrile, J. Chemom. 1992, 6, 267-281. 19. R. Leardi, J. Chemom. 1994, 8, 65-79. 20. T. Hou, X. Xu, Chemom. Intel. Lab. Syst. 2001, 56, 123-132. 21. B. M. Smith, P. J. Gemperline, Anal. Chim. Acta 2000, 423, 167-177. 22. R. Leardi, M. B. Seasholtz, R. J. Peli, Anal. Chim. Acta 2002, 461, 189-200. M. Kompany-Zareh: A QSPR Study of boiling point of saturated alcohols using genetic algorithm Acta Chim. Slov. 2003, 50, 259-273. 273 23. S. Agatanovič-Kuštrin, I. G. Tucker, M. Zečevič, L. J. Živanovič, Anal. Chim. Acta 2000, 418, 181-195. 24. B. M. Wise, N. B. Gallagher, PLS-Toolbox, ver. 2.0, Eigenvector Research, Inc., Natick, MA, 1995. 25. D. L. Massart, B. G. M. Vandeginste, L. M. C. Buvdens, S. De Jong, P. J. Lewi, J. Smevers-Verbeke, Handbook ofChemometrics andQualimetrics: Part A. Elsevier, Amsterdam, 1997, p. 280. Povzetek Pri QSPR študiji vrelišč alkoholov so bili uporabljeni Schulzov indeks, Randičev indeks, Wienerjevo število, površina molekule, njena prostornina, logP, molska refrakcija in polarizabilnost. Za vrelišča 44 alkoholov so bile s pomočjo genetskega algoritma v povezavi s PLS metodo preizkušene vse možne povezave med vrelišči in lastnostmi molekul do četrte potence. Ustvarjena je bila skupina modelov na osnovi multiple regresije z visoko stopnjo ujemanja z vrelišči. Z metodo vzvratnega odstranjevanja so bili izbrani kot pomembni deskriptorji Randičev indeks, površina molekule, logP, molska refrakcija in polarizabilnost. Izbrani deskriptorji in njihov pravi vrstni red zelo dobro ocenijo vrelišča alkoholov, zlasti pri uporabi višje (druge, tretje in/ali četrte) potence. M. Kompany-Zareh: A QSPR Study ofboilingpoint of saturated alcohols using genetic algorithm