Scientific paper Ranking of QSAR Models to Predict Minimal Inhibitory Concentrations Toward Mycobacterium tuberculosis for a Set of Fluoroquinolones Marjan Vra~ko,1* Nikola Minovski1 and Karoly Héberger2 1 Kemijski institut/National Institute of Chemistry, Hajdrihova 19, 1000 Ljubljana, Slovenia 2 Chemical Research Center, Hungarian Academy of Sciences, H-1025 Budapest, Pusztaszeri ut 59/67, Hungary * Corresponding author: E-mail: marjan.vracko@ki.si Fax: + 386 1 4762 300, Phone: +386j 1 4762 315 Received: 31-05-2010 This paper is dedicated to Professor Milan Randic on the occasion of his 80th birthday Abstract CP-ANN technique was used to build 54 different QSAR models. The models were built for three sets (assays) of fluoroquinolones considering their antituberculosis activity and using different technical parameters (dimension of network and number of learning epochs). The models served as a reliable basis for ranking by a new powerful method based on sum of ranking differences (SRD). With the applied SRD procedure we can find the optimal ones. The best model can be selected easily for the first assay. Two models can be recommended for the second assay, and no recommended model was found for the assay3. Keywords: QSAR, modeling, ranking, method comparison, neural networks, prediction of antituberculosis activity 1. Introduction In the age of antibiotics the tuberculosis (TBC) still represents a global health problem. It treats the developing countries, where 95% of cases and 98% deaths occur, and also developed world mostly due its fatal combination with HIV infections.1 The disease is mainly caused by the microorganism Mycobacterium tuberculosis or in some cases by Mycobacterium fortuitum, Mycobacterium smeg-matis, or Mycobacterium avium-intracellulare complex. In combat of TBC the chemotherapy plays the major role.2 Because of the drug resistance, which has been developed by microorganisms, the searching for new drugs represents an important task in this combat. Some fluoroquinolones seem to be promising candidates.3 In this report we present an analysis of Quantitative Structure-Activity Relationship (QSAR) models, which were developed to estimate the activity of fluoriquinolones against the Mycobacterium tuberculosis. The details on data sets, which were used to build the models, are given in Section 3. As the modeling technique we selected the Counter Propagation Artificial Neural Networks (CP-ANN), which are briefly described in Section 2.4-6 During the development of a QSAR model many models are constructed and at the end the crucial question arises, how to select the best final model? In this report we applied a novel procedure the Sum of Ranking Differences (SRD) method to rank the models. The ranking was validated by comparison of ranks by random numbers (a kind of simulation test).7,8 The aim of this study was to find the models, which can be used for further application, i.e., the prediction of activity for new (untested) compounds. 2. Methods 2. 1. Modeling Algorithm CP-ANN is often used in QSAR applications. Its architecture and learning strategy is described in many textbooks and articles, and therefore, we provide here only its brief description.4-6 The CP-ANN is a generalization of Self Organizing Map (SOM), which is indeed a mapping from multidimensional descriptor space on two-di- mensional map of neurons. It is an iterative procedure taking many learning epochs. The dimension of the network and the number of learning epochs must be optimized to get the optimal performance of the model. In our CP-ANN procedure the SOM technique was applied for division of data into training set and the test set.910 In this procedure entire map is divided into several sub-parcels and from each sub-parcel some objects are randomly selected into training set and remaining objects to the test set. It is ensured that both sets cover entire information space equi-vocally.11-13 It is important to emphasize that in SOM only independent variables (molecular descriptors) participate in the training. The CP-ANN is a generalization of SOM in a way that it includes additionally the property variables (activity values) in a training procedure. 2. 2. Sum of Ranking Differences The method has recently been introduced in the field of chemistry.7 The ranking of models is carried out by the following way: first, the data should be arranged in a matrix form, chemical compounds were enumerated in the rows, whereas models to be compared were arranged in the columns. Each matrix element contains the biological activity values (minimal inhibitory concentrations -MIC). The original measured data were also included among the 'models to be compared' for checking purposes. The row averages (consensus, ensemble averaging)13-15 were selected as reference values. Each (individual) model is ranked and compared to known, reference values. Second, the absolute values of the differences between the reference and individual rankings are calculated and summed for each model and, additionally, for experimental values. In such a way the sum of ranking differences, SRD values were calculated for each model and for the experimental values. The closer is the SRD value to zero (i.e. the closer is the ranking to the reference value) the better is the model. The proximity of SRD values shows that the models predict similarly the ordering of MIC values. Groupings of models can also be observed, whereas their distance show dissimilarity from the ordering point of view, i.e. SRD can be considered as a dissimilarity measure (the smaller its value, the more similar to the reference value).7,8 2. 3. Comparison of Ranks by Random Numbers Validation of the SRD procedure can be easily carried out by the theoretical distribution of the (random) SRD values corresponding to the number of compounds, using simulated random numbers, where simulated "models" are formed with the same number of compounds. The SRD procedure can be carried out for simulated "models" many times. When the number of simulated random "models" is quite large, (above hundred thousands) the results can be used to characterize the theoretical SRD di- stribution function with very small error. The theoretical discrete distribution has been calculated earlier using a recursive algorithm if the number of compounds was small (n < 14). The normal distribution was applied to approximate the theoretical (random) SRD distribution function for large number of compounds, if n > 13.8 3. Data and Models The data, which consist of chemical structures and inhibitory activities against Mycobacterium tuberculosis, were collected from the Internet. The activity is expressed as the Minimal Inhibitory Concentration (MIC (|ag/mL)). Three data sets (assays) were collected from internet applying different searching criteria. Assay 1 consists of 66 compounds and was collected using the searching criterion "fluoroquinolones" in NIAID database.16 Assay 2 consists of 145 fluoroquinolones selected from NIAID using the criteria "antituberculous agent". As several structures have the same MIC values we eliminated some structures getting the reduce set of 66 compounds (assay 3). In the next step the molecular descriptors were calculated with DRAGON and CODESSA software.1718 All descriptors were imported to CODESSA where the heuristic option was applied for searching of the best ones. On this way, for each assay, ten descriptors have been selected, which have been used in CP-ANN.19 Applying the SOM technique each of the three assays was divided into the training and the test set. In the assay 1, 48 compounds were selected into training set and 18 into the test set. In the assay 2 and assay 3 the divisions were: training sets 115 and 46 and test sets 30 and 20, respectively. In our ranking procedure 16 models for each assay have been taken into account. For assays 1 and 3 the dimensions 7 x 7, 8 x 8, 9 x 9, and 10 x 10 were considered taking 400, 600, 800, and 1000 learning epochs. For assay 2 the learning epochs were the same, on the other hand, the selected dimensions were 11 x 11, 12 x 12, 13 x 13, and 14 x 14. The details of models are given in the reference.19 In this report we present the ranking of models using the SRD method. The strategy of this method was outlined above.7,8 4. Results and Discussion Assay 1: The ranking by SRD for the training and test sets are given in table 1. It is obvious that model A5 is the best one considering the training and test sets. Similarly it is striking how far the simulated random SRDs (the distribution is characterized by percentiles, median, etc.) are from the SRDs of models. Any statistical test would suggest significant difference between them. The SRDs of models, however, are not significantly different from each other. In the test set model A5, A6, A7, and A8 are equivalent according to the SRD ranking. They can be distin- Table 1: Sum of ranking differences for assay 1, training (48) and test (18) sets Name A5 A7 A4 A6 A8 exp A3 A1 A2 XX1 Q1 Med Q3 XX19 SRDtrain 76 86 88 92 96 98 114 134 134 650 720 766 812 882 Name A5 A6 A7 A8 A4 exp A3 A1 A2 XX1 Q1 Med Q3 XX19 SRDtest 10 10 10 10 16 18 18 22 22 80 96 108 120 136 XX1-first icosaile (5%), Q1 - first quartile, Med - median, Q3 - last quartile, XX19 - last icosaile (95%) guished from each other using more information (larger number of compounds - training set) A7 > A4 > A6 > A8 (where the sign ">" means that the previous model is better than the later one). The model A4 is better for the training set than for the test set. The experimental data were included into the ranking procedure for checking purposes: models A1, A2 and A3 are not better than the measured data, their usage should be avoided. The raw SRD values for training and test sets cannot be compared directly because the number of compounds was different. Therefore we recalculated the SRD values for the same scale between 0 and 100. Figure 1 visualizes the ranking of models by scaled SRD values. of assay 1 the difference between simulated random SRD-s (the distribution is characterized by percentiles, median, etc.) and SRDs of models are larger and could be identified by statistical tests as in previous case: the SRDs of models are not significantly different from each other. In the test set the model B3 is the best one, it is clearly distinguished from the other models. In the training set the same model is worse than the experimental one showing that the split of training and test sets are far from being optimal. The experimental values are ranked to the latest position, showing a clear over-fitting effect for all of the models on the test set. The SRD ranking shows: B5 > B6 > B8 > B4 > B7 > B2 for the training set whereas the same a) x axis - scaled SRD values=SRD% Al ,A2 A 4 f r A3 A5 flj oxp 13 15 a) x axis - scaled SRD values=SRD% B ■ E 7 B2 e*P 03 B1 65 06 06 5 6 7 8 9 10 11 b) x axis - scaled SRD values=SRD% 15 12 9 6 3 □ exp ,A3 A1A2 A4 A5 ,A6 ,A7 A5 b) 16 € 12 = e a £ 4 0 SRD ranking for assay No. 2; prediction set x axis - scaled SRD values=SRD% EC BA Effi B6 exp B2.B5 B1 67 ■ 63 10 S * * Ï2 | 'X' □ « 13 15 10 11 12 13 14 15 IS Figure 1a, b. Scaled SRD values for predictive models for assay 1 for the training set (a) and the test set (b). The mean and standard deviation of the validation distribution are given in the header. Figure 2a, b. Scaled SRD values for predictive models for assay 2 for the training set (a) and the test set (b). The mean and standard deviation of the validation distribution are given in the header. If the prediction capacity is modeled by the test set the equivalent models are to be used on equal bases. However, there is no doubt that Model A5 is the best one and we recommend its usage. Assay 2: The ranking by SRD for the training and test sets are given in table 2. Model B5, which is the best one for training set, is not consistently the best one in this case. As the number of compounds are larger than in case for the test set is slightly different: B3 > B2 > B5 > B1 > B7 > B8 > B6 > B4. The scaled SDR values are shown in Figure 2. The models of assay 2 should be handled with caution. Models B5 and B8 could be recommended for further applications. On the other side, the models B1, B3, B4, and B6 are not recommended. Assay 3: The ranking by SRD for the training and test Table 2: Sum of ranking differences for assay 2, training (115) and test (30) sets Name B5 B6 B8 B4 B7 B2 Exp B3 B1 XX1 Q1 Med Q3 XX19 SRDtrain 416 492 538 540 564 566 634 642 680 3802 4132 4396 4596 4926 Name B3 B2 B5 B1 B7 B8 B6 B4 exp XX1 Q1 Med Q3 XX19 SRDtest 26 40 40 42 46 50 54 64 70 240 272 300 322 358 For notation see Table 1. a) 12 9 ■ b) 20 -, J 15 ■ 10 - n CL t/> 5 - 0 - SRD ranking for assay No. 3; training set x axis - scaled SRD values=SRD% d_.es ci2 cß-a3 Bxp'c 2 C4.C9 cicV * / C11 ce C8 10 a o « 10 SRD ranking for assay No. 3. prediction set x axis - scaled SRD values=SRD% CI 2 ê) !,C11 C2. C9.CÎ0 ce C7 C4.C6.C13 C1,C3 CFi I 10 S 8 10 12 14 16 18 20 Figure 3a, b. Scaled SRD values for predictive models for assay 3 for the training set (a) and the test set (b). The mean and standard deviation of the validation distribution are given in the header. 5. Conclusions The SRD method was applied to select the optimal QSAR model. This analysis was performed on training and test sets. For the first assay we found one model, which could be selected as the best one and recommended for further applications. For the second assay we could select two models, which could be recommended for further usage. The SRD analysis shown some weak points of models, like over-fitting and not optimal training/test set division. In the third assay the different models were found as the best ones for training and test sets. It follows that no unambiguous selection for an optimal model can be made. 6. Acknowledgement Authors thank Agency of Research of R. Slovenia (ARRS) for the financial support through the Grants P1-0017 and 1000-07-310016. One author (KH) thanks for the support by TET No. 8/2010 (Croatian-Hungarian). Table 3: Sum of ranking differences for assay 3, training (46) and test (20) sets Name C6 C8 C4 C9 C10 C1 C3 C7 C12 C5 C13 C11 exp C2 XX1 Q1 Med Q3 XX19 SRDtrain 46 48 54 54 62 66 66 68 70 72 72 76 100 100 588 662 704 746 810 Name C5 C1 C3 C9 C10 C8 C2 C7 C4 C6 C13 C12 exp C11 XX1 Q1 Med Q3 XX19 SRDtest 6 12 12 16 16 18 20 20 22 22 22 30 32 32 90 108 120 132 150 For notation see Table 1. sets are given in table 3. It is difficult to select the best models consistently for the training and test sets. Most models are better than the experimental data (except C2 for training set and C11 for the test set) what indicates the over-fitting effect. The models C6, C8 and C9 might be selected as acceptable ones. The difference between simulated random SRDs (the distribution is characterized by percentiles, median, etc.) is significant. On the other side, the differences among SRD values of models are less significant, however, some groupings can be observed. The best models for training and test sets are different indicating that the split of training and test sets are far from being optimal. The scaled SRD values for assay 3 are shown in Figure 3. The comparison of models for assay 3 shows that no unambiguous selection of the optimal model can be made. 7. References 1. C. Dye, S. Scheele, P. Dolin, V. Pathania, M. C. Raviglione, JAMA- J. Am. Med. Assoc. 1999, 282, 677-686. 2. Y. L. Janin, Bioorg. Med. Chem. 2007, 15, 2479-2513. 3. K. Drlica, X. Zhao, Microbiol. Mol. Biol. Rev. 1997, 61, 377-392. 4. J. Zupan, J. Gasteiger, Neural networks in chemistry and drug design. Weinheim Wiley-VCH, Weinheim, 1999, p.380. 5. J. Zupan, M. Novic, J. Gasteiger, Chemometr. Intell. Lab. Syst. 1995, 27, 175-187. 6. M. Vracko, Curr. Comput.-Aid. Drug Des. 2005, 1, 73-78. 7. K. Héberger, TrAC Trends Anal. Chem. 2010, 29, 101-109. 8. K. Héberger, K. Kollâr-Hunek, J. Chemometr. (in press, 2010). 9. V. Simon, J. Gasteiger, J. Zupan, J. Am. Chem. Soc. 1993, 115, 9148-9159. 10. I. Valkova, M. Vracko, S. C. Basak, Anal. Chim. Acta 2004, 509, 179-186. 11. J. T. Leonard, K. Roy, QSAR Comb. Sci. 2006, 25, 235-251. 12. M. Kotnik, M. Oblak, J. Humljan, S. Gobec, U. Urleb, T. Solmajer, QSAR Comb. Sci. 2004, 23, 399-405. 13. P. Gramatica, P. Pilutti, E. Papa, J. Chem. Inf. Comput. Sci. 2004, 44, 1794-1802. 14. T. Hastie, R. Tibshirani, J. Friedman, in: The Elements of Statistical Learning: Datas Mining, Interference, and Prediction. (2nd edition), Springer, New York, 2009, pp 261-294. 15. P. Gramatica, E. Giani, E. Papa, J. Mol. Graphics Modell. 2007, 25, 755-766. 16. NIAID-Division of AIDS, HIV/OI/TB Therapeutics Database http://chemdb2.niaid.nih.gov. (accessed: May 27, 2010). 17. Talete, srl. DRAGON for Windows (Software for Molecular Descriptor Calculations), version 5.4. 2006. 18. A. R. Katritzky, V. S. Lobanov, M. Karelson, CODESSA, Training Manual. Gainsville: s.n., 1995. 19. N. Minovski, M. Vracko, T. Solmajer, Quantitative structure-activity relationship study of antitubercular fluoroquinolones. Mol. Divers. (in press 2010) DOI: 10.1007/s11030-010-9238. Povzetek V prispevku smo analizirali 54 QSAR modelov, ki so bili narejeni s protitočnimi nevronskimi mrežami (CP-ANN). Modeli so bili narejeni na treh nizih fluorokinolonov kot potencialnih antituberkuloznih učinkovinah. Modele smo razvili ob upoštevanju različnih tehničnih parametrov, kot sta dimenzija nevronske mreže in število učnih epoh. Na nizu modelov smo uporabili pred kratkim predlagano metodo za hierarhično urejanje modelov (SRD), s katero lahko poiščemo najbolj zanesljive modele. Z metodo SRD smo za prvi niz izbrali en model, za drugi niz smo predlagali dva modela, za tretji niz pa nismo našli modela, ki bi bil boljši kot ostali modeli.