Metodološki zvezki, Vol. 2, No. 1, 2005, 147-159
Using Machine Learning to Predict the Impact of Agricultural Factors on Communities of Soil
Microarthropods
Damjan Demšar1, Sašo Džeroski1, Paul Henning Krogh2, and
Thomas Larsen2
Abstract
With the newly arisen ecological awareness in the agriculture the sustainable use and development of the land is getting more important. With the sustainable use of soil in mind, we are developing a decision support system that helps making decisions on managing agricultural systems and is able to handle both conventional and genetically modified crops as a part of the ECOGEN project. The decision support system considers economical and agricultural factors and actions including crop selection, crop sequence, pest and weed control actions etc. For such decision support system to work, it needs modules that predict results of different agricultural actions. One of the most important factors for sustainable use and fertility of soil is soil flora and fauna. Any change of that community can influence the short or long term soil fertility and soil usability.
With soil fauna being one of the most important factors we first need to model it. However, since the function of the individual species is not known, the only action we have is to try and model the community of soil fauna. We start by modelling the community soil microarthropods. For that goal we used machine learning methods - regression trees, model trees and linear equations. We identified previous crops and time since different kinds of tillage as the most important factors for the community of soil microarthropods.
1    Introduction
The possible use of genetically modified (GM) plants in agriculture needs in-depth investigations of ecological and economic consequences (Birch, 2003 and ECOGEN). The investigations are important for both the European Commission
1 Department of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia, {damjan.demsar, saso.dzeroski}@ijs.si
2 Department of Terrestrial Ecology, National Environmental Research Institute, Roskilde, Denmark, {phk, thl}@dmu.dk
148
Damjan Demšar et al.
(EC), who needs specifications for GM-plant risk assessment, and to farmers and the public who are concerned about the possible ecological and economic implications. Crop production involves complex decision-making processes, which require and justify the application of decision support systems.
The ECOGEN project (Soil ecological and economic evaluation of genetically modified crops) is an EC-funded project aimed at combining simple lab tests, studies of multi-species model mesocosms ecosystems, and field studies to acquire realistic knowledge about economic and ecological impacts of GM crops on the soil. Economic trade-offs are assessed and related to ecological effects. The economic and ecological knowledge gained in ECOGEN will be combined into a rule-based model for a decision support tool.
The goals of the ECOGEN project are to:
1.  Provide ecological and economical assessment and comparison of integrated cropping systems using GM or conventional crops, respectively.
2.  Provide an ecological risk assessment of a GM cropping system and a conventional cropping system for the soil ecosystem based on single species tests, multispecies tests and long-term field investigations.
3.  Adapt existing ecotoxicity testing tools to GM plant material and validate their use.
4.  Provide economic assessment of GM crops and conventional crops with respect to a quantification of the expected trade-offs between the two and the implications for the EU Agriculture Policy.
Finally, we wish to incorporate ecological knowledge from single species tests, multispecies tests, and field investigations, as well as economic information from farming practices into a rule-based model to be used for predictions of economic decision-making processes and ecosystem behaviour.
In this paper we present the current generation of the microarthropod models that are to be used as a part of decision support model. These models will be used to judge the results of the agricultural actions, and thereby act as an input to the upper levels of the decision support system.
2    Data
We combined the two available datasets: The first dataset describes four experimental farming systems (Foulum experimental station, Denmark) in the years from 1989 to 1993, allocated to 15 fields, with pesticide use in a conventional system and in two integrated farming systems and no pesticide use on the other (organic) fields, with 530 microarthropod samples collected (Krogh, 1994). The second dataset describes several organic farms (Foulum and Flakkebjerg experimental stations plus various farms in Jutland) in the year 2002.
Using Machine Learning to Predict the Impact…
149
Table 1: The available attributes.
Attribute	Explanation
soil_JB	soil classification number
samp_time	1 = March - April, 2 = May - June, 3 = July - August, 4 = September - November
ba	winter barley
be	beets/carrots
ca	cattle
cc	catch crop
ch	chicory
chgr	chicory+grass
clgr	clover+grass
fa	fallow
gr	grass
le	leeks
lu	lupin
oa	oates
pe	peas
po	potatoes
ra	rape
rd	radish
ry	rye
sba	spring barley
sf	stubble field
sh	Sheep
Si	silage/hay
Swh	spring wheat
Tc	Triticale
Wh	winter wheat
Wc	whole crop
O	seed bed (<1 mo)
Seha	seed bed harrowed
Sepl	seed bed plowed
Soha	soil treatment harrowed
Sopl	soil treatment plowed
Pesticide	Pesticide. 1=fields in a rotation where pesticides are used. 0=no pesticide
tr_packing	packing (months since) transformed using: months - 10 10
	shallow tillage (weed harrowing etc) 0-5 cm layer (months since) transformed using:
tr_shal_till	f months -10^\ 1     10     )
 subshallow tillage 5-10 cm layer (months since) transformed using (months - 10Y	
	V       10       J
	deep tillage (plowing, rotovation) >10 cm layer (months since)
tr_deep_till	transformed using f months -10\2 {        10        )
fert_lev	low=0, normal=1, high=2.
fert_type	no=0, solid=1, liquid=2
fert_time	fertilizaton time (mo)
crop_1	crop prev year
ca_1               no cattle=0, cattle=1
sotr_1             no treat=0, s or a=1, s and a=2
crop_2            crop prev 2nd year
ca_2               no cattle=0, cattle=1
sotr_2             no treat=0, s or a= 1, s and a=2
crop_3            crop prev 3rd year
ca_3               no cattle=0, cattle=1
sotr 3             no treat=0, s or a=1, s and a=2
150_________________________________________________Damjan Demšar et al.
430 samples were collected. To those datasets we added newly available data from 2003, giving us a total of 1330 samples, of which 1192 were suitable for predicting Acari species, 1214 for prediction Collembolan species and 1138 for predicting biodiversity.
The sampling was replicated for each field. The distance between each sample was 5 m and all samples were collected within a 20x20 m area. The distance to hedges and ditches was at least 10 m. Sampling was performed in the upper 5.5 cm soil layer. The sampling containers measured 6 cm in diameter. Sampling was done using a split soil corer and extraction was performed using a MacFadyen high gradient heat extractor.
The datasets available for the study include the agricultural measures (attributes), for example, packing, tillage, fertilizer and pesticide use, crops planted and cattle grazing. The history of crops and grazing for the last 3 years is also available. The datasets also contain environmental variables describing the circumstances of the samples where community data on soil microarthropods have been produced. The variables used to model microarthropods are listed in Table 1, and were selected by domain experts. The transformations used on some attributes (different forms of tillage) were used to simulate the occasional non-linear reducing impact of tillage (different powers simulate differently steep curves of impact). The dataset also includes measured species (listed in Table 2). Some species were grouped into Acari group (mites), the rest of the measured species belong into Collembolan group (springtails) and all were used to calculate biodiversity using formula (2.1):
H = - = p i  log2p i                                                                                                      (2.1)
Where pi represents the proportion of abundance of species i of total sample abundance and S represents total number of species in sample.
3    Methodology
In this section we describe the machine learning methods we used to produce the models predicting the number of springtails, the number of mites and their biodiversity. We describe regression trees (as used in M5’ (Wang and Witten, 1997) in Weka 3.2 (Witten and Frank ,1999). In parallel we describe model trees (also used in M5’), which are based on regression trees, by highlighting the differences between regression trees and model trees
Regression trees are used to represent piecewise constant functions. Model trees on the other hand represent piecewise linear functions (model trees are sometimes also called regression trees, however we use model trees, to avoid confusion between models). Both predict the value of a dependant variable (class) from values of independent variables (attributes).
Using Machine Learning to Predict the Impact…
151
Table 2: The observed species (1Collembolan groups – springtails 2Acari groups –
mites).
Abbreviation
Iang2
Ipalu2
Hdent2
Hsuc2
Xarma2
Llanu2
Lcyan2
Seleg2
Onych2
Sviri2
Sminsp2
crypt1
prost1
Tull2
Inot2
Entosp2
Fmirab2
ast1
meso1
Ffim2
Palba2
Bparv2
Species
Isotoma anglicana Isotomurus palustris Ceratophysella denticulata Ceratophysella succinea Hypogastrua sp. Lepidocyrtus lanunginosus Lepidocyrtus cyaneus Sminthurinus elegans Protaphorura sp. Sminthurus viridis Smint. Sp.
Cryptostigmata (Oribatida mite) Prostigmata (Actinedida mite) Mesaphorura sp. Isotoma notabilis Entomobrya sp. Friesea mirabilis Astigmata (Acaridida mite) Mesostigmata (Gamasida mite) Folsomia fimetaria Pseudosinella alba Brachystomelle parvula
Abbreviation
Apygm2
Iminor2
Hniti2
Tquad2
Nmini2
Saure2
Fspino2
Cterm2
Will2
Ocinct2
Owillo2
Nmusco2
Psexoc2
Iprod2
Iarma2
IBiset2
Fquad2
Icilia2
Tomosp2
Tflav2
Tminor2
Species
Anurida pygmaea isotomiella minor Heteromurus nitidus Stenaphorura quadrispina Neelus minimus Sminthurinus aureus Folsomia spinosa Cryptopygus thermophilus Willemia sp. Orchesella cincta Orchesella villosa Neanura
Pseudosinella sexoculata Isotomodes productus Isotomodes armata Isotomodes bistosus Folsomia quadrioculata Isotomurus sp. Tomoserus sp. Tomocerus flavescens Tomocerus minor
While usual regression methods (linear and non-linear regression) fit a single function to whole set of data, regression trees partition the data space into hyper-rectangles (multidimensional) and fit a model for each partition (in our case regression trees fit a constant and model trees fit a linear function). To achieve the partition the tree is build from inner nodes, that each include a test of particular attribute on it value. The terminal nodes, also called leaves, on the other hand include models.
In order to predict the value of class variable of new (or even test) example the evaluation of the tree starts from the root node. In each inner node the test is performed and according to the result of the test a particular branch is followed from that inner node. This process is repeated until a terminal node (leaf) is reached. In regression trees the predicted value of the class variable is the constant predicted by the leaf node, while in model trees the predicted value is the value of evaluated linear equation (which is a part of the leaf).
The regression trees are constructed from the top (root) down, starting with the all training examples and then continues recursively for each subtree. At each step the most discriminating attribute is selected, and subsets of the training examples are created according to the values of the selected attribute. If the selected attribute is continuous then a threshold value is selected and two branches (and two subtrees) are created. If the attribute is nominal (discrete) then either a branch is constructed for each possible value of attribute, or two subsets of the values are
152
Damjan Demšar et al.
created (again the most discriminate) and two branches are created (all in-between types are possible, but uncommon).
The most discriminating discrete attribute or continuous attribute test is the one that reduces most the variance of the values of the class variable. For continuous attributes, the values of the attribute that appear in the training set are considered as thresholds. For the subsets of training examples in each branch, the tree construction algorithm is called recursively. Tree construction stops when the variance of the class values of all examples in a node is small enough (or if some other stopping criterion is satisfied). Such nodes are called leaves and are labelled with a model (constant or linear equation) for predicting the class value.
An important mechanism used to prevent trees from over-fitting data is tree pruning. Pruning can be used during tree building (pre-pruning) or after the tree has been built (post-pruning). Usually, a minimum number of examples in branches can be set for pre-pruning and confidence level in error estimates in leaves for post-pruning.
A number of systems exist for inducing regression trees from examples, such as CART (Breiman, 1984) and M5 (Quinlan, 1993). M5 is one of the most well known programs for regression tree induction. We used the system M5’ (Wand and Witten,1997), a re-implementation of M5 within the software package WEKA (Witten and Frank, 1999): simple model trees have simplified equations and are induced with the –U option, complex model trees are induced by M5’ with default parameter settings. We also used regression trees and linear regression. The sizes of both model and regression trees were regulated using post-pruning methods. The nearest neighbour method IBk (Aha and Kibler, 1991) with 1, 5 or 10 neighbours was used as a benchmark for comparing accuracy. Each method was applied to each of the three regression problems. For measuring the predictive performance of the model, we evaluated the correlation coefficient and several error measures using ten-fold cross-validation. We evaluated mean average error, root mean square error, and relative average error and root relative square error.
4    Results
4.1     Acari models
First we produce the models describing the dependence of abundance of mites (Acari) to agricultural factors. Correlation factors and different error measures of different models can be seen in Table 3. As can be seen, the best correlation and are produced by nearest neighbour methods with 1 or 5 neighbours taken into consideration (depending on the chosen error measure). However nearest neighbours method does not create any model and thereby cannot be used to gain any new knowledge or even describe any already known knowledge. From the
Using Machine Learning to Predict the Impact…
153
descriptive models the best of the produced models (when only accuracies are taken into account) is the model tree produced by M5 with default parameters. But the default model tree has rather complex equations attached to the leafs of the tree, and is thereby rather difficult to understand. It is even harder to judge which of the attributes used in the equations is most influential in each of the equations. To gain more understanding we prefer the simple model trees (or even regression trees), since the exacter models can be harder or even to hard to interpret. Here we can take the simple model tree (produced with default pruning) seen in Figure 1 and take only slight performance hit, but gain a lot in understandability. The complex model tree is the same size than the simple model tree, but has more complex linear equations in leafs. Usual equations in the complex model tree have the length of 20 or more, while the equations in the simple trees have the length of about 5.
Both the Acari simple model tree (Figure 1) and Acari regression tree (Figure 2) are very similar, however the model tree has much better correlation and much lower (10% or more) error measures. If we look at both models we can see that the most important factors for the community of soil mites are crops in previous years and tillage (especially deep tillage and subshallow tillage). And from the questions that we can ask when interpreting the models we can gain some new understanding of the problem that will help us with the construction of the final decision support model. For example, we found out (from the model and experts) that the previous years crops that were covered with grass or grazed by cattle are usually more undisturbed, and produce more food, which is better for mites. Also, such fields leave a lot of decomposing matter (food) in the soil, which helps mites in the following years and even speeds up the recovery of mites after tillage (which is the most important negative factor for mites). That one of the reasons that generally, it is very beneficial for the soil fauna and microbial life that a field is resting for a few years or is covered with clover where there is a minimum of tillage.
Table 3: Correlation coefficient and errors of Acari models (best models are in bold,
shown models in italics). MAE- mean average error, RMSE – root mean square error,
RAE – relative average error, RRSE – root relative square error.
name	size corr	MAE         RMSE        RAE	RRSE
ibk 1	0   0.668	20703.395 43540.345 57.189	75.112
ibk 5	0   0.666	20953.490 43510.279 57.880	75.060
ibk 10	0   0.617	22337.137 45650.914 61.702	78.753
M5 linear equation	1   0.626	24773.783 45255.734 68.433	78.071
M5 model tree	5   0.650	22315.132 44137.612 61.641	76.142
M5 model tree simple	5   0.643	22465.097 44560.573 62.056	76.872
M5 model tree pruning 5	5   0.610	23973.521 46070.750 66.222	79.477
M5 model tree pruning 5 simple	5   0.606	24032.788 46284.588 66.386	79.846
M5 model tree pruning 15.5	1   0.579	25254.081 47268.335 69.760	81.543
M5 model tree pruning 15.5 simple	1   0.576	25300.044 47409.002 69.887	81.786
M5 regression tree	10 0.604	23434.691 46257.361 64.734	79.799
M5 regression tree pruning 15	4   0.538	26215.648 48881.437 72.416	84.326
154
Damjan Demšar et al.
While domain experts agree with the models and have mostly learned only the ranking of factors we have gained much knowledge that will help us in modelling mites in order to help decision makers choose the right decisions.
LM1:acari = 11300 + 4250samp_time - 5670wh=1 - 29300tr_subshal_till - 11200crop_1=wc,sba,ra,pe,wh,gr-sh,ba-clgr,ba-gr-sh,clgr, ca-clgr,clgr-si,ca-gr,gr-oa-sh,le,fa,rye,lu, clgr-wc,po,ca-si,clgr-sh + 8580crop_1=pe,wh,gr-sh,ba-clgr,ba-gr-sh,clgr,ca-clgr, clgr-si,ca-gr,gr-oa-sh,le,fa,rye,lu,clgr-wc, po,ca-si,clgr-sh + 9630crop_2=ba,pe,gr,ba-ra,be,ba-gr,oa,swh,gr-sh,gr-si, ba-clgr,clgr,ca-clgr,ca-gr-wc,clgr-sh,peas, ca-gr,gr-tc,tc,fa,clgr-sba,sba,ca
LM2:acari
LM3:acari
35300 + 14000clgr=1 11100sotr_3=2,0
+ 36600wh=1 - 20400fert_lev
17800 - 8070o=0 - 11200tr_subshal_till
63 4 0crop_3=ba-gr,pe,ba-clgr,ba,gr,be,gr-sh,ba-ra,oa,gr-si, ca-clgr,clgr,wh,tc,ba-clgr-sh,sba,ca-gr,ba-pe, fa,clgr-wc,lu,gr-oa-sh,ca - 5140sotr_3=2,0
LM4:acari = 158000 - 55700soil_JB + 66400sotr_1=0
+ 156000tr_subshal_till
LM5:acari = 15300 + 48000o=0 - 26100sotr_1=0 + 23 2 0 0crop_2=tc,fa,clgr-sba,sba,ca
Figure 1: Acari model tree.
Using Machine Learning to Predict the Impact…
155
Figure 2: Acari regression tree.
4.2     Collembolan models
From the correlations and error measures of models describing collembolans (springtails) seen in Table 4, we can see that again the nearest neighbour method is the best, but only slightly outperforms the best of the descriptive models (in this case our preferred model – simple model tree). However since the simple model tree is too big and regression tree size can be better regulated with pruning we show only the regression tree (Figure 3), which is significantly worse than the simple model tree, but has similar main structure. From the shown regression tree and from the simple model tree we can recognize as the most important factors for Collembolan species are again previous crops (in descending order with time past) and tillage (especially subshallow tillage and deep tillage). However the experts expected tillage to be more important than previous crops. The questions that follow from the models lead us to knowledge like:
·    Deep tillage has less impact with some crops. Crops that include grass/clover provide protection even if the field is deep tilled because the sods will still be intact and the clover residues add a lot of nitrogen to the soil (enhances microbial life and thus the food base).
·    In the case that the crops are still there in the current year it means that there has been no tillage plus clover fertilizes the soil. Lupin also fertilizes the soil
156                                                                                         Damjan Demšar et al.
·    Tillage injures/kill Collembolan by physical disruption and destroys their habitat (pathways in the soil are broken and the soil structure is destroyed).
·    Collembolans are considered to be more sensitive to tillage than mites because their cuticula are softer. The Oribatid mites (Cryptostigmata) differ from other microarthropods by having a calcareous exoskeleton that protect them
·    If tillage was in the past then the biomass/growth of the standing crop is likely to be bigger than is the field was recently tilled. Higher biomass=more Collembolan food. Sometimes this is counteracted by high ammonia concentrations in the fertilizer that can kill the Collembolan if the animals are exposed to it directly.
The experts liked the produced models; the only surprise was the fact that some effects can be seen even 6 months after subshallow tillage.
Table 4: Correlation coefficient and errors of Collembolan models.
name	size   corr    MAE          RMSE        RAE    RRSE	
ibk 1	0	0.647 17124.997  33295.410 62.692 78.359
ibk 5	0	0.621 18107.641  33616.717 66.289 79.115
ibk 10	0	0.603 18693.767  33939.280 68.435 79.874
m5 linear equation	1	0.562 21471.686  35331.697 78.604 83.151
m5 model tree	1	0.583 20445.514  34737.713 74.848 81.754
m5 model tree simple	1	0.592 20516.408  34471.645 75.107 81.127
m5 model tree pruning 0.57	18	0.631 18853.541  33059.075 69.020 77.803
m5 model tree pruning 0.57 simple   18		0.636 18836.467  33094.360 68.957 77.886
m5 regression tree	22	0.533 20000.864  36151.180 73.220 85.080
m5 regression tree pruning 10	5	0.442 22836.821   38130.009  83.602 89.737
4.3     Biodiversity models
When we model biodiversity, again the nearest neighbour method has the best correlation (Table 5), but in this case the difference to the best descriptive model is quite big with correlation, however the model tree has better root mean square error, and the same is true for the simple model tree. Again the simple model tree is too big to show in this paper, so we show only the regression tree (Figure 4), which has considerably lower correlation and higher error measures, but is similar in the main structure. From the models we can identify the most important factor for biodiversity is (the lack of) subshallow and shallow tillage, according to the models, the next most important factor is the crop from three years ago, which is surprising to the experts and the only explanation they could find was, that the crops in the past define the cropping sequence and thereby even the current crop.
Using Machine Learning to Predict the Impact…
157
Figure 3: Collembolan regression tree.
Table 5. Correlation coefficient and errors of biodiversity models.
corr        MAE     RMSE      RAE        RRSE
0.623	0.361	0.505	77.485	79.936
0.588	0.369	0.479	79.204	81.442
0.518	0.392	0.470	84.032	85.770
0.533	0.394	0.500	84.656	84.960
0.575	0.373	0.483	80.154	82.059
0.570	0.376	0.486	80.722	82.675
0.511	0.398	0.509	85.342	86.501
0.506	0.400	0.512	85.782	86.945
0.495	0.401	0.515	86.160	87.570
0.423	0.420	0.534	90.096	90.767
0.383	0.430	0.544	92.279	92.528
The models helped us to get some additional knowledge from the domain experts, for example:
·    The effects of tillage are long lasting - at least 5 months with subshallow tillage and 7 months with shallow tillage. A lot of species are sensitive to tillage and this will lower the biodiversity. More opportunistic and small species will dominate in intensive tilled soils. More sensitive species takes a longer time to recover.
·    Fertilization increases the biomass of the plants (gives higher soil microbial activity = food) + the fertilizers stimulates soil microbial activity and creates new habitats (can live inside or around the organic matter)
Name	size
ibk 1	0
ibk 5	0
ibk 10	0
m5 linear equation	1
m5 model tree	9
m5 model tree simple	9
m5 model tree pruning 4	3
m5 model tree pruning 4 simple	3
m5 regression tree	21
m5 regression tree pruning 5	12
m5 regression tree pruning 7	6
158
Damjan Demšar et al.
Figure 4: Biodiversity regression tree.
5    Conclusions
We tried to model community of soil microarthropods with machine learning methods from the data describing chemical, biological and mechanical actions on the fields. We then used so produced models to identify the most important parameters for soil mites, springtails and biodiversity of soil microarthropods. By preferring small and simple models to bigger and complex models. We discovered that the most important factor for community of soil microarthropods are previous crops grown in the observed field, and the different forms of tillage. Furthermore we used the models as a source of questions for the domain experts. We gained knowledge that will help us in further modelling and building decision support system for the management of farms. While the domain experts will mainly be relying on their knowledge in participating in decision support model building, they are somewhat guided by the models. With newly gained knowledge we also identified parts of the decision support model that need special care when building. We have shown that the machine learning models can be used in multiple ways from predicting new values, to gaining new knowledge about the relation between the attributes and the dependent variable, to extracting knowledge from the domain experts.
Using Machine Learning to Predict the Impact…                                               159
Acknowledgements
This work was supported by ECOGEN funded by the Fifth European Community Framework Programme: Quality of Life and management of living resources contract no QLK5-CT-2002-01666 and DARCOF, Nat. quality in organic farming.
References
[I]   Aha, D. and Kibler, D. (1991): Instance-based learning algorithms, Machine Learning, 6, 37-66.
[2] Birch, A.N.E., Krogh, P.H., Cortet, J., Tabone, E., Griffiths, B.S., Džeroski, S., Wesseler, J., Gomot de Vaufleury, A., Badot, P-M., Andersen, M.N., and Messéan, A. (2003): ECOGEN: Soil ecological and economic evaluation of genetically modified crops. Poster at Biodiversity Implications of Genetically Modified Plants, September 7-12, 2003 Monte Veritŕ, Ascona, Switzerland Centro Stefano Franscini, Swiss Federal Inst. of Technology Zürich.
[3] Breiman, L., Friedman, J.H., Olshen, R.A., and Stone, C.J. (1984): Classification and Regression Trees. Belmont: Wadsworth.
[4] Demšar, D., Džeroski, S., Krogh, P.H., and Larsen, T. (2003): Identifying the most important agricultural factors for the soil community of microathropods, Proceedings of the International Electrotechnical and Computer Science Conference. Ljubljana, Slovenia
[5] ECOGEN: Soil ecological and economic evaluation of genetically modified crops. http://www.ecogen.dk
[6] Krogh, P.H. (1994): Microarthropods as bioindicators. A study of disturbed populations. PhD thesis Ministry of the Environment and Energy. National Environmental Research Institute, Silkeborg.
[7] Quinlan, J.R. (1993): Combining instance-based and model-based learning. In Proceedings of the X. International Conference on Machine Learning. 236– 243.
[8] Morgan Kaufmann.Recio, B., Rubio, F., and Criado, J.A. (2002): A decision support system for farm planning using AgriSupport II. Decision Support Systems, 36, 189–203.
[9] Steen, E. (1983): Soil animals in relation to agricultural practices and soil productivity. Swedish J. agric. Res., 13, 157-165.
[10] Wang, Y. and Witten, I.H. (1997): Induction of model trees for predicting continuous classes. Proceedings of the Poster Papers of the ECML 97. University of Economics. Prague: Faculty of Informatics and Statistics.
[II]  Witten, I.H. and Frank, E. (1999): Data Mining: Practical Machine Learning Tools with Java Implementations. San Francisco: Morgan Kaufmann.