|
School of ECM University of Surrey Guildford, Surrey GU2 5XH, UK |
Tel: +44 (0)1483 259823 Fax: +44 (0)1483 876051 |
Order and Disorder in a decision tree
Procedure sprouter for generating identification trees
Procedure pruner for converting an identification tree into rule sets
The specification, design and implementation of computer programs for specific learning tasks primarily by using the data structures (knowledge representation schemata) and algorithms (inference strategies and control) developed for a variety of problem solving tasks (diagnosis, interpretation, classification etc.).
Learning processes include the acquisition of new declarative knowledge, the development of motor and cognitive skills through instruction; the organization of new knowledge into general, effective representations; and
the discovery of new facts and theories through observation and experimentation
The literature on learning frequently refers to decision trees.
A decision tree is a semantic network in which each node is connected to a set of possible answers. Each nonleaf node is connected to a test that splits its set of possible answers into subsets corresponds to different test results. Each branchcarroes a particular test result's subset to another node.
An identification tree is a representation. That is a decision tree in which each set of possible conclusions is established implicitly by a list of samples of known class. Occam's razor specialised to identification trees: the world is inherently simple. Therefore the smallest identification tree that is consistent with the samples is the one that is most likely to identify unknown objects correctly.
Identification-tree building is the most widely used learning method. Thousands of practical identification trees, for applications ranging from medical diagnosis to process control, have been built using the idea.
Order and Disorder in a decision tree
For a real database of any size, it is unlikely that any test would produce even one completely homogeneous subset. Accordingly, for real databases, one needs a powerful way to measure the total disorder, or inhomogeneity, in the subsets produced by each test. Information theory can be used to compute a measure disorder:
Average disorder = Sb (nb/nt) x Sc (nbc/nt) log2 (nbc/nb)
where
nb is the number of samples in branch b,
nt is the total number of samples in all branches,
nbc is the total of samples in branch b of class c.
To see why this borrowed formula works, one needs to focus on the set of samples lying at the end of one branch b. What is required here is a formula involving nb and nbc that gives a high number when a test produces highly inhomogeneous sets and a low number when a test produces completely homogeneous sets. The following formula involving nb and nbc generally works:
Disorder = Sc (nbc/nt) log2 (nbc/nb)
The following are the observations of an astute gastronome about the reaction of a group of diners of different ages, weight and height, enjoying chilli peppers in a Mexican restaurant. Mexicans obviously enjoy their national dish, but non-Mexicans appear to have difficulty with the dish:
|
Name |
Age Group |
Height |
Weight |
Mexican? |
Likes Chilli peppers? |
|
Diana |
Middle Aged |
Average |
Light |
No |
No |
|
Josè |
Middle Aged |
Tall |
Average |
Yes |
Yes |
|
Zapata |
Youth |
Short |
Average |
Yes |
Yes |
|
Charles |
Middle Aged |
Short |
Average |
No |
No |
|
Philip |
Senior Citizen |
Average |
Heavy |
No |
No |
|
Zara |
Youth |
Tall |
Heavy |
No |
Yes |
|
April |
Youth |
Average |
Heavy |
No |
Yes |
|
Pablo |
Middle Aged |
Short |
Light |
Yes |
Yes |
An dentification trees based on age group, height, weight and nationality (whether or not the indvidual is Mexican) on the above chilli pepper database, is shown below:

 
The question now is this: What happens when you only select middle aged people from the database? Once the middle-aged people are isolated, the available tests perform as follows:

The Mexican nationality test clearly does the best job of dividing the middle-aged set into homogeneous subsets.
Now if we compute the average disorder produced by the age-group test and on the basis of the person's Mexican origin (Note that log
2 (1/3) = -1.5849625, log2(2/3)=-0.5849625, log2 (2/5) = -1.3219281 and log2 (3/5)=-0.7369656), we will find that age and nationality test ensure proper identification for all the samples shown in the table above?
Average Disorder (Height) = 3/8 (-1/3 log
3/8 (-2/3 log
2 (2/3) -1/3 log2 (1/3))2/8 (-0 log
2 (0) -2/2log2 (2/2))= 2* 3/8 (0.52832) = 0.69
Average Disorder (Weight) = 2/8 (-1/2 log
2 (1/2) -1/2log2 (1/2))3/8 (-1/3 log
2 (1/3) -2/3 log2 (2/3))3/8 (-2/3 log
2 (2/3) -1/3 log2 (1/3))= 2/8*1 + 2* 3/8 (0.52832) = 0.94
Average Disorder (Nationality) = 5/8 (-3/5 log
2 (3/5) -2/5log2 (2/5))3/8 (-0 log
2 (0) -3/3 log2 (3/3))= 0.625 * (0.52877124 + 0.44218936) =
0.60684412 = 0.61
Average Disorder (Age Group) = 4/8 (-2/4 log
2 (2/4) -2/4 log2 (2/4))1/8 (-1/1 log
2 (1/1) -0 log2 (0))3/8 (-3/3 log
2 (3/3) -0 log2 (0))= 4/8 (0.5+0.5) = 0.5
|
Test |
Disorder |
|
Age Group |
0.5 |
|
Nationality |
0.61 |
|
Height |
0.69 |
|
Weight |
0.94 |
The age group and nationality test are by far the best discrminators in that their use together will ensure the proper identification of all the samples. Now if we focus exclusively on the middle aged people we get the following results:
|
Test |
Disorder |
|
Nationality |
0 |
|
Height |
0.5 |
|
Weight |
1 |
 
Procedure sprouter for generating identification trees
This procedure helps in the generation of of an identification based on the computation of disorder introduced by each of the concepts:
|
Procedure pruner for converting an identification tree into rule sets
This procedure helps in the conversion of an identification tree into a rule set:
 
|
Now if we are asked to consruct an identification tree for determining peoples appetite for chilli peppers, we will come up with:

The following rule set can be derived using PRUNER:
If age_group is middle_aged
and nationality is not Mexican
then the person does_not_like chilli pepper
If age_group is senior_citizen
then the person does_not_like chilli pepper
If no other rule applies
then the person likes chilli pepper
Concept Learning: Inductive Learning
Inductive Learning is sometimes described as a learning that is akin to a heuristic search through a space of symbolic descriptions, generated by an application of various inference rules to the initial observational statements, The inference rules include generalisation rules , which perform generalising transformatins on descriptions, and conventional truth-preserving deductive rules.
Human beings have this capcity ot discover regularities, some would say patterns, in apparently diverse, interacting and noisy collections of observations. There are instances when human beings are guided by a teacher or instructor, who provides facts, and the pupil inductively infers new facts from the given. This is inductive learning.
The key output of inductive learning is a set of production rules.
 
Arches, Non-arches and non-conventional arches
An arch is conventionally defined as one brick lying on top of, and supported by, of two other bricks with the proviso that the supporting bricks do not touch each other.