APPROACHING THE LIMIT OF CLASSIFICATION ACCURACV INFORMATICA 2/88
UDK 519.226.3
Matjaž Gams, Matija Drobnič
Jozef Stefan Institute
ABSTRACT. One of the recently developed systems for machine learning
(GINESYS) signi.f icantly outperformed all compared systems including
theoretically optimal Bayesian classifier, which was the second in both
tests. We tested several options in Bayesian classifier to investigate the
real cause for nonoptimal results and to estimate the upper limit in
classification accuracy. The conclusion is that while it is possible to
achieve even highec classification accuracy with suitable parameter
adjustment in Bayesian classifier, it seems that GINESYS practically
achieved the optimal classification accuracy.
Sistem za empirično u6enje GINESYS je v praktičnih meritvah presegel
primerjane sisteme z Bayesovim klasifikatorjem vred. Podrobna analiza kaže,
da je dosežene rezultate roogoče preseči, vendar so že zelo blizu optimalne
me je.
1. INTRODUCTION
Nachine learning is a quickly developing area
of Artificial Intelligence [Hinston].
According to the major inference type used it
can be divided into rote learning, learning
from instruction, learning by deduction, by
analogy, from examples and from observation
and discovery [Carbonell et al; Hichalski).
The scope of this article is learning from
examples or Empirical Learning (EL). The aim
of EL is to induce general descriptions of
concepts from examples (instances) of these
concepts. Examples are usually objects of a
known class described in tecms of attributes
and values. The final product of learning ace
symbolic descriptions in human understandable
forms. Induced descriptions of concepts,
representing different classes of objects,
can be used for classifying new objects. EL
systems basically perfocm the same task
(classification) as statistical methods and
can be directly compared to thera from the
point of classification accuracy. On the
other hand, EL systems offer fucther
advantages, namely a) explanation during
classification of new examples and b) the
insight into the laws of the domain by
observing classification rules. Explanation
during classification (a) is important since
it enables the user to check the line of
reasoning and veri£y the system's decision.
The knowledge base (b) can be viewed as a new
representation of the domain knovrledge, which
can be of great value to domain experts,
especially in domains that are not yet well
formalized and understood.
2. A SIMPLE EXAMPLE
For a simple example let us consider a case
where we have a device with 8 binary switch.es
representing 256 legal combinations. Device
reports errors in some combinations and we
want to find out what subsequence causes
them.
1
2
3
4
5
1
0
1
0
1
1
SWITCHES
2
1
0
0
0
0
3
1
1
1
1
1
4
0
1
0
1
1
5
1
1
1
0
1
6
0
1
1
0
0
7
0
1
0
1
1
8
1
0
1
1
1
STATUS
ERROR
OK
ERROR
OK
ERROR
Table 1. Device reports error in some
combinations of switches. Which subsequence
of switches causes them?
Probably the most common answer in EL systeras
would be, that error is reported when
switches 5 and 8 are on (-1).
In practical tasks EL systems deal with
domains with 10 to 10.000 examples (typically
some hundred) with 2 to 500 attributes
(typically ten or some ten) [Bceiman et al;
Quinlan; Lavrac et al]. Attributes can be
real, integer or categorical with many
possible values.
3. EMPIRICAL LEARNING
The whole process of empitical learning
consists of four steps:
- preprocessing of learning examples,
- construction of a classification rule,
- classification of new instances and
- analysing the laws of the domain.
Detailed description can be found elsewhere,
e.g. [Kononenko] or [Gams, Lavrac] with
detailed overview of some well known
algorithms - C4 [Quinlan], CART [Breiman et
al], ASSISTANT 86 [Cestnik et al], CN2
[Clark, Niblett], AQ15 [Carbonell et alj. We
shall formally represent here only a domain
area and a classification rule.
A set of learning examples L - {(x,c)}
consists of pairs ..'., c), where x is a vector
(denoting propercies of the object) in a
13
measurement space X and c represents
index of the class of example x.
the Domain 2
Components of vectors x are called attributes
or variables. The values of attributes can be
numerical or categorical.
A classification or a decision rule d(x) is a
mapping which maps every x from X into some c
from C or into the probability distribution
(pl,p2...pj) where pi is a real number
between 0 and 1.
A classification rule d(x) splits the whole
space X into spaces X1,X2,...XJ, such that
for every Xi only a certain subset of d(x) is
relevant.
The syntax of a classification rule d(x) is:
<d> ::- <Rule> | <Rule> and <d>
classification rule
<Rule> ::- <Class> if <Cpx>
rule
<Cpx> :J- <Sel> | <Sel> and <Cpx>
complex
<Sel>
selector
Atr <op> <Values>
<Values> ::- Val | val or <Values>
values
<Class> ::- 112 | 3 . . . | J
class
<op> ::- < | - | >
operators
Atr corresponds to the name of the attribute
and Val is a categorical or numerical value.
This syntax is transformable into DNF and is
similar to the syntax of most rule-based
systems or expert systems [Waterman, Hayes-
Roth]. Note that the actual syntax is
slightly raore complicated [Gams].
4. DOMAIN DESCRIPTION
We performed practical measurements on two
real-world domains. Data were obtained by I.
Kononenko and represent descriptions and
diagnoses of patients from the Oncological
Institute Ljubljana. The only correction was
replacement of missing values by the most
probable ones for a given class. More
detailed description is in [Gams], here we
present only cumulative data about these
domains:
Domain 1
number of attributes 18
no. of possible values per attribute 2-8
( average 3.3 ).
number of classes 9
total number of examples 150
distribution of examples amongst classes
number of exampl.es in Cl to C9
12 34 56789
2 1 12 8 69 53 1 4 0
importance of attributes - Al to A18
of them is redundant
number of attributes 17
no. of possible values per attribute 2-3
( average 2.2 )
number of classes 22
total number of examples 339
distribution of examples amongst classes
number of examples in Cl to C22
1 2 3 4 5 6 7 8 9 10 11
84 20 9 14 39 114 6 0 2 28
12 13 14 15 16 17 18 19 20 21 22
16 7 24 2 1 10 29 6 2 1 24
iroportance of attributes - Al to A17
(counting how many examples overlap when
omitting the i-th attribute)
123456 7 89
80 80 58 85 60 53 65 55 68
10 11 12 13 14 15 16 17
63 55 60 53 54 57 65 65
5. GINESYS
5.1. ALGORITHM DESCRIPTION
The top level description of GINESVS (Generic
INductive Expert SYstem Shell) is as follovs:
repeat
initialize Rule;
generate Rule;
add Rule to d(x);
L - L - {examples covered by Rule)
until satisfiable(d(x))
In this general view GINESVS represents a
prototype of a unifying algorithm for
empirical learning covering many other
systems. In a slightly more specified
desctiption we obtain the following
algorithm:
repeat
generalize Rule;
repeat
specialize Rule
until stop(Rule) ;
postprocess(Rule);
add Rule to d(x);
L » L - {examples covered by Rule}
until satisfiable(d(x))
The main difference between other EL systems
and GINESYS is in "confirmation rules". Basic
idea of confirmation cules is using several
sources of information for classification.
That seems to be common practice in every day
life. For example when we try to predict the
vreather, we look at the official weather
report, but also look at the sky and ask our
neighbour. The impleraentation of this idea in
GINESYS is that instead of using only one
rule for classification several rules confirm
or confute the first one. In case of a
confrontation between these rules the
Bayesian classifier is consulted as a method
of a conflict resolution [Waterman, Hayes-
Roth]. One confirmation rule in our siraple
example in Table 1 could be : Error is
reported, when switches 3, 5 and 8 are on.
This rule could be redundant or even wrong,
but on the other hand it could be the only
correct onel From examples in Table 1 it if
none not ciear which of these possibilities is the
right one, so both (and other) rules are
14
stored and consulted. In more detailed tests
[Gams] it was shown that this method of
conEulting several rules (- using different
kinds of information) significantly improved
classification accuracy.
5.2. COMPARATIVE RESULTS
A detailed comparison was made with other
well known EL systems in two noisy medical
domains (oncology). Table 2 shows results in
classification accuracy.
GINESYS
BAYES
othec systems
domain
69.9
68.4
67.3
1 domain
51.9
50.1
48.7
2
Table 2. Classification accuracy measured as
the peccentage of correct guesses.
While GINESYS achieved bost results and
Bayesian classifier the second ones, none of
the compared systems [Gams] outperformed
results in the last row in Table 2. These
results are actually an average ovec ten runs
on randomly chosen 70% of data for learning
and remaining 30% of data for testing. In
further tests (t-tests, [Gams]) it was shown
that the number of tests, distribution and
the difference between classification
accuracies was sufficient to ensure that
differences are a result of some deeper cause
(e.g. better algocithm) and not a chance
choice.
Other measurements proved superiority not
only from the point of classification
accuracy, but also in generality, complexity
of classification rule and explanation [Gams].
GINESYS and othec algocithms discussed in
this paper were implemented in Pascal on VAX
11/750.
5.3. IRREPROACHABILITY OF MEASUREMENTS
We argue that our measurements are
irreproachable (unbiased) since:
all systems were measured on exactly the
same data
- no "cleaning" of data was performed
- no special form of data was allowed
- no unusual method of measuring
classification accuracy was used
- no domain dependent parameters were allowed
- the number of data and tests was sufficient
(t-tests) to avoid chance choice
- results were strictly checked and verified
by many supervisors from the program
source level to the level of classification
trace.
accuracy as other statistical methods. In our
measurements some of the EL systems,
especially those vrithout special mechanisms
for noisy domains, gave unexpectedly poor
performance compared to the results published
by the originators of algorithms. Since our
iraplementations of those systems were the
same as published, several possible
explanations remain. It might be that actual
implementations use some unpublished extra
features, maybe the domains used for testing
were especially suitable for specific
algocithms etc.
The authors of this article 'also find
questionable comparing between the system and
the expert, since we regard EL systems mainly
as a helping tool and not as a stand-alone
progcam. The other reason is that fair
comparison between machine and human is
extremely difficult. The correct comparison
should be (system + user) : user.
In most complex realistic domains mechanisms
for dealing with noise are of greatest
importance as independently discovered in
[Breiman et al; Kononenko) and it is not
realistic to achieve even tolerable results
without them [Kononenko; GamsJ.
6. BAYESIAN CLASSIFIER
6.1. THEORETICAL FOUNDATIONS
The concept of the Bayes rule is one of the
most important concepts in the field of
classification and also learning. For the
data dravm from a probability distribution
P(A,j), the most accucate rule can be given
in the terms of P(A,J) and this tule is
called the Bayes rule. It is norraally denoted
by dB(x).
Precisely, suppose that (x,y), x « X, y« Y
is a random sample fcora the probability
distribution P(A,j) on X x C, i. e., P(xe A,
y=j)-P(A,j). Then we define dB(x) as the
Bayes rule if for any other classifier d(x),
P{dB{z)tc{x))<P(d{x)tc{z))
Let us assume that X is N-dimensional
euclidean space and for every j,
j-1, ... ,J, P(A/j) has the probability
density fj(x) and for sets AC X
Then we can prove the following theorem:
5.4. DISCUSSION ABOUT RESULTS
Some of the systems for empirical learning
achieved good results in practical testing in
several real life domains, practically
approaching or even outperforming domain
experts and statistical methods [Kononenko;
Hichalski, Chilausky; Breiman et al; Gams]).
Moce acceptable is the opinion [Breiman et
al], that although all methods are moce or
less domain dependent, EL systems in general
achieve about the same classification
where J is the number of classes and P(j) is
the prior probability of the class j. The
proof can be found in [Breiman et al].
In practice, neither the P(j) nor the fj(x)
are known. The three most common
classification procedures, used to
approximate the Bayes rule by using the
leacning saraple data, are discriminant
analysis, kernel density estimation and k-th
nearest neighbour. Accuracy of two of them
have been compared with the results of
Ginesys on both domains.
15
6.2. PRACTICAL IMPLEMENTATIONS 6.3. A PRACTICAL EXAMPLE
The k-th nearest neighbour method [Fix,
Hodges] was implemented as simple as
possible. The algorithm searches through the
set of learning examples and determinates
distance between learning and test example as
the number of mismatches in their attribute
values. Test example is then classified by
its first nearest neighbour, and if there are
more equally distant neighbour, the last one
found is picked for classification. it is so
called Nearest-neighbour classifier
[Batcheles 1974]. Although it is so
primitive, this method classifies test
examples with 72.9% average accuracy in the
domain 1, what is even better than GINESYS.
Hovever, in the domain 2, vrhich is far more
complex, the classification accuracy is only
40.4% what is considerably lower than that
df the other methods.
The following approximation of the Bayes rule
[Clark, Niblett; Kononenko] is one of the
most commonly used. In general, the rule is
formed as
At this point the assumption is made, that
all attributes are independent :
(1)
When classifying a new example we need to
evaluate formula (1). One practical solution
when dealing with categorical values is to
store all factors into a 3-dimensional table
TB[i,j,k] with the folloving indexes :
i - attribute index
j - attribute's value's index
k - class index
TB[i,j,k] is the number of examples in the
learning set with the properties, denoted by
index values.
When evaluating formula (1) during
classification of a new example, one of the
factocs can be 0. The result can be either 0
or undefined. The solution in the second case
is obvious - delete this attribute from the
formula. The same solution is sometimes used
when the cesult is 0.
In Table 2 GINESYS (vdthout domain dependent
parameters or other adjustments [Gams])
achieved higher classification accuracy than
the practical implementation of theoretically
optimal Bayesian classifier. The reason for
this must be in practical implementation,
especially in
a) appcoximation of probabilities from the
learning set,
b) assumption, that attcibutes are
independent,
c) practical solutions to numerical problems.
Problem (a) can be discarded, since all
systems [Gams] processed exactly the same
data. But it could be the case, that
ditferent classifiers (also different
implementations of Bayesian classifier) are
more and othec less sensible to the number
and distribution of input data.
Problems appearing during the evaluation of
formula (1) can be shovm in a simple example.
Let us try to classify examples el and e2
from data obtained from Table 1.
el -0000 0000
e2 -10111011
P{OK/A) = \
P(ERRORjel) = \
None of the two examples gives the sum of all
classes equal to 1. Even if we delete all
columns having 0 we obtain results like
» 2.6. Furthermore, in case of
e2 the probability for class OK is
than for the class ERROR, although
e2 is the same as example e5 from
, belonging to class ERROR. However
note that the nearest neighbour method would
classify correctly in this case.
P(OK/el)
example
greater
example
Table ]
A small number of examples is insufficient
for most statistical methods and also for
Bayesian classifier. in real measurements in
domain 1 and 2 the nutnber of examples was
always greater than one hundred and was
considered sufficient. Nevertheless these
counterexamples show that better
classification accuracy is possible.
7. ADJUSTING PARAMETERS
CLASSIFIER
IN BAYESIAN
Probabilities used in evaluating formula (1)
are appcoximated by pcior probabilities in
the learning set, what yields some error in
classification. The forraula is then evaluated
as follows
lli 7
(2)
where
ai is the number of examples of the class
c with the same value of the i-th
attribute ae the test example,
b is the number of exampl.es of the class
c,
ci is the number of all examples with the
same value of the i-th attribute as
the test example, and
d is the number o£ all learning
examples.
When dealing with noisy data, errors may
occur during evaluation of formula (2). Two
methods have been used to avoid this errors.
7.1. OMITTING THE UNRELIABLE FACTORS
The main idea of this. method is that the
accuracy of estimations in formula (2) grows
with the number o£ examples. Therefore, if b
16
oc d (only b in practice) is smaller than
parameter MINN, which is set before
evaluation, then the factor with this b is
omitted during evaluation of formula (2) and
the class probability is estimated by its
prior probability. Table 3 shows the results
in classification accuracy with different
values of MINN.
HINN
0
1
2
3
4
6
10
15
domain 1 domain 2
68.4
68.4
68.4
69.3
68.7
68.0
66.7
67.1
0.0)
11.1)
47.8)
52.2)
56.7)
60.0)
75.6)
77.8)
50.
50.
49.
49.
0.0)
4.5)
7 (31.8)
7 (32.3)
49.8 (33.6)
50.
48.
9 (47.7)
7 (60.5)
46.2 (72.7)
Table 3. Classification accuracy measured as
the percentage of correct guesses. Values in
brackets are percentages of classifications
with prior probability.
It is interesting that accuracy is almost
independent of the number of classifications
with prior probabilities and it decreases
only if we classify approximately 75% of
examples this way. Yet we can see that
accuracy on both domains reaches its maximum
when approximately 50% of classifications are
done by the prior pcobability of classes and
this maximal accucacy is near the accuracy
of GINESYS.
\
MINN
0
2
4
6
EPS 0.00
0.0
0.0
0.3
-0.4
0.01
0.3
1.4
1.2
0.0
0.05
2.7
1.1
0.9
-0.4
0.10
3.8
1.1
0.3
-0.4
Table 5. Increase of classification accuracy
by combination of both methods in domain 1
(basic accuracy 68.4%).
EPS 0.00 0.01 0.05 0.10
0.0
0.4
0.3
0.8
2.7
1.9
1.8
2.7
2.8
2.2
2.1
3.2
1.8
2.0
2.0
3.2
\
MINN
0
2
4
6
Table 6. Increase of classification accuracy
by combination of both methods in domain 2
(basic accuracy 50.1%).
The accuracy of GINESYS is in both cases
exceeded by more than 1% what is also the
difference between basic Bayesian classifiec
and GINESYS. Yet there is a problem which
combination of EPS and MINN values to choose
and how far this decision is domain
independent. Therefore, GINESYS can still be
considered to reach the practical upper bound
of classification accuracy.
7.2. ADJUSTING ZERO FACTORS
8. EXTERNAL RULE
CLASSIFIER
DIRECTED BAYESIAN
A problem occurs, what to do when ai in the
formula (2) is 0. One solution is to omit
this factor from the formula. Another idea
could be to set al to a very small number EPS
in such case. The idea is that this zero is
the result of domain noise and, with more
learning examples, we would sooner or later
find such example and therefore we made
almost no mistake and we also don't lose the
information contained in the distribution of
other attributes. The results of this method
are shovm in Table 4.
EPS :
0.00
0.01
0.05
0.10
0.50
1.00
Table 4. Classification accuracy achieved by
adjusting zero factors in formula (2) by some
small value EPS.
In both cases this method achieves accuracy
higher than GINESYS. But, rapid dcop o£
accuracy also shows that this method is very
sensible to the value of EPS. Vfhenever we set
zero factor to some value different from 0 we
introduce an error into the evaluation of
formula (2) and if the value of EPS is too
big the cesults are no more reliable at all.
domain 1
68.4
68.7
71.1
72.2
24.2
2.2
domain
50.1
52.8
52.9
51.9
27.7
7.5
Bayesian classifier itself does not derive
any explicit rules and therefore rules
generated by some other system (in our case
GINESYS) can be used to control the
evaluation of formula (1). Two such methods
have been tested. The idea of the first one
is that any (moce or less successful) rule
denotes a complex of attributes which are
logically connected and therefore a deviation
from the optimal Bayesian classifier is
somewhat corrected. The second one is an
attempt to cross GINESYS and Bayes together
to yield better results.
8.1. CLASSIFICATION
ATTRIBUTES
WITH IMPORTANT
This raethod uses rules, generated by GINESYS.
During evaluation of formula (1) the rule
which matches the current example is searched
for. If it is found, we calculate adequate
probabilities by searching through the table
for entire complex and not by decomposing
the attribute complex to basic attributes. On
the other hand, if the matching rule is
not found, undisturbed evaluation of formula
(1) follows. The results are shown in
Table 7.
Measurements show that introducing of
externally generated rules into Bayesian
classifier only slightly disturbs its
classification accuracy.
7.3. COMBINATION OF BOTH METHODS
In this case, both methods described in 7.1.
and 7.2. are used togethec. First we look
vhethec b and d are bigger than MINN and than
we set zero factors in formula (2) to EPS
vrhere needed. The results of measurements on
both domains are shown in Table 5 and Table
6.
8.2. CLASSIFICATION
ATTRIBUTES ONLY
WITH IMPORTANT
The main idea of this method is to use
impoctant attcibute complexes in
classification if possible. For each example
classified we first search for the matching
GINESYS rule. If such one is found, it is
used for classification. If not (when only
Null rule of GINESYS is found), the
17
classification is carried out by formula
(1). The results are shown in Table 7.
Bayes
Rule Directed
Important Attr
Bayes
. & Bayes
domain 1
68.
68.
69.
.4
,4
.3
domain 2
50.
49.
47.
.1
,2
.9
Table 7. Classification accuracy measured as
the percentage of correct guesses.
9. COMPARISON BETWEEN EL
STATISTICAL CLASSIFIERS
SYSTEMS AND
Let us summarize conclusions from previous
paragcaphs:
It is possible to further improve
classification accuracy of
implementations of Bayesian classifier,
even to overpass best results of
GINESYS.
- Among compared methods without domain-
independent parameters GINESYS
performs best and is very close to the
practical limit . in classification
accuracies in measured domains.
Statistical classifiers are basically unable
to pecform explanation during classification
and to build a human understandable knowledge
base. Besides these, other disadvantages can
be pointed out [Breiman et al; GamsJ:
- they can not deal with domains with small
number of learning examples;
- it is difficult to deal with unusual
situations (deleting by 0, unknown values,
)
- their results vary according
suitability of problem domain.
to the
It is only fair to notice that more advanced
statistical methods eliminate some of these
disadvantages. However these properties
remain basically unchanged.
Another reason for so good results of EL
6ystems like GINESYS compared to statistical
methods is shown in Figure 1. Real-life
complex domains probably contain logical laws
which cover greater areas regardless of noise
in given examples. On the contrary
statistical methods depend on variations of
probability distribution. In Figure 1 the
correct probability distribution for classes
1 and 2 is presented by bold lines. Dotted
lines represent probability
obtained from given examples.
the fact that probability
is more sensible to chance
possible that dotted lines 1
causing incorrect classification.
distribution,
Because of
distribution
choice it is
and 2 overlap
Figure 1: A graphical representation of one of
the possible reasons why GINESYS performs so
well compared to statistical methods.
10. CONCLUSION AND DISCUSSION
oidec systems for empirical learning (EL)
outperformed the statistical methods
from the point of explanation during
classification and possibility of building a
human understandable knowledge base. While it
was in some cases reported that older EL
systems outperformed statistical methods as
well as domain experts this opinion is not
undoubtedly shared with the authors of this
article. More acceptable is the conclusion
[Breiman et al], that the best EL systems
achieve about the same classification
accuracy as statistical methods. Nevertheless
it seems that the new breed of EL systems
with GINESYS as one of the most promising
representatives outperforms statistical
methods even in classification accuracy (at
least in so far measured domains).
ACKNOWLEDGES
We are grateful for suggestions to prof. Ivan
Bratko. Marjan Petkovsek provided
mathematical background for our analysis.
Students Izidor Jerebic, Borut Znidar, Aram
Karalic and Darko Zupanic were of great help
in programming tasks. Igor Kononenko and the
Oncological Institute Ljubljana enabled us to
do the measurements on real-world domains.
This work was partly supported by the Slovene
Research Council and COST-13. Research
facilities were provided by the "Jozef
Stefan" Institute.
REFERENCES
Breiman L., Friedman J.H., Olshen R.A., Stone
C.J. (1984): "Classification and Regression
Trees", Wardsworth Int. Group.
Carbonell J.G., Michalski R.S., Mitchell T.M.
(1983): "An Overview of Machine Learning", in
Michalski R.S., Carbonell J.G., Mitchell T.M.
(ed.), Machine Learning: an Artificial
Intelligence Approach, Tioga Publishing.
Cestnik B., Kononenko I., Bratko I. (1987):
"ASSISTANT 86: A Knowledge Elecitation Tool
For Sophisticated Users", Progress in Machine
Learning, Sigma Press.
Clark P., Niblett P. (1987): "Induction in
Noisy Domains", Progress in Machine Learning,
SIGMA Press.
Kononenko I. (1985): "Razvoj sistema za
induktivno učenje ASISTENT", magistrsko delo,
Fakulteta za eletrotehniko, Ljubljana.
Kononenko I. (1985): "Strukturno avtomatsko
ucenje", Informatica 3, str. 44 - 56.
Gams M. (1987): "Principi poenostavljanja v
sistemih za avtomatsko ucenje", doktorska
disertacija, Ljubljana.
Gams M., Lavrač N. (1987): "Review of Five
Empirical Leaning Systems Within a Proposed
Schemata", Progress in Machine Learning, (ed.
Bratko•I.> Lavrac N.), Sigma Press.
LavraČ N., Varšek A., Gams M., Kononenko I.,
Bratko I. (1986): "Automatic construction of
the knowledge base for a steel classification
expert system", The 6th International
Workshop on Expert Systems, Avignon.
Michalski R.S. (1987): "Machine Learning",
IJCAI 1987, Milano.
R.S., Chilausky L.R. (1980):
by Being Told and Learning from
an Experimental Comparison for
Soybean Disease Diagnosis", Policy Analysis
and Information Systems, Vol. 4, No 2.
Quinlan J.R. (1986): "induction of Decision
Trees", AI Summer Seminar, Dubrovnik.
Waterraan D.A., Hayes-Roth F. (ed.) (1977):
"Pattern-Directed Inference Systems",
Academic Press.
Winston H.P. (1984): "Artificial
Intelligence", Addison-Wesley.
Tutorial 6,
Hichalski
"Learning
Examples: