University of Surrey
School of ECM
University of Surrey
Guildford, Surrey
GU2 5XH, UK

Tel: +44 (0)1483 259823
Fax: +44 (0)1483 876051

 

Introduction

Decision Trees

Identification Tree

Order and Disorder in a decision tree

An example from gastronomy

Computing Order and Disorder

Procedure sprouter for generating identification trees

Procedure pruner for converting an identification tree into rule sets

Concept Learning: Inductive Learning

 

 

 

 

Introduction

The specification, design and implementation of computer programs for specific learning tasks primarily by using the data structures (knowledge representation schemata) and algorithms (inference strategies and control) developed for a variety of problem solving tasks (diagnosis, interpretation, classification etc.).

Learning processes include the acquisition of new declarative knowledge, the development of motor and cognitive skills through instruction; the organization of new knowledge into general, effective representations; and
the discovery of new facts and theories through observation and experimentation

 


 

Decision Trees

The literature on learning frequently refers to decision trees.

A decision tree is a semantic network in which each node is connected to a set of possible answers. Each nonleaf node is connected to a test that splits its set of possible answers into subsets corresponds to different test results. Each branchcarroes a particular test result's subset to another node.

 


 

Identification Tree

An identification tree is a representation. That is a decision tree in which each set of possible conclusions is established implicitly by a list of samples of known class. Occam's razor specialised to identification trees: the world is inherently simple. Therefore the smallest identification tree that is consistent with the samples is the one that is most likely to identify unknown objects correctly.

 

Identification-tree building is the most widely used learning method. Thousands of practical identification trees, for applications ranging from medical diagnosis to process control, have been built using the idea.

 


 

Order and Disorder in a decision tree

For a real database of any size, it is unlikely that any test would produce even one completely homogeneous subset. Accordingly, for real databases, one needs a powerful way to measure the total disorder, or inhomogeneity, in the subsets produced by each test. Information theory can be used to compute a measure disorder:

Average disorder = Sb (nb/nt) x Sc (nbc/nt) log2 (nbc/nb)

 

where

nb is the number of samples in branch b,

 

nt is the total number of samples in all branches,

 

nbc is the total of samples in branch b of class c.

 

To see why this borrowed formula works, one needs to focus on the set of samples lying at the end of one branch b. What is required here is a formula involving nb and nbc that gives a high number when a test produces highly inhomogeneous sets and a low number when a test produces completely homogeneous sets. The following formula involving nb and nbc generally works:

 

Disorder = Sc (nbc/nt) log2 (nbc/nb)

 


 

An example from gastronomy

The following are the observations of an astute gastronome about the reaction of a group of diners of different ages, weight and height, enjoying chilli peppers in a Mexican restaurant. Mexicans obviously enjoy their national dish, but non-Mexicans appear to have difficulty with the dish:

 

Name

Age Group

Height

Weight

Mexican?

Likes

Chilli

peppers?

Diana

Middle Aged

Average

Light

No

No

Josè

Middle Aged

Tall

Average

Yes

Yes

Zapata

Youth

Short

Average

Yes

Yes

Charles

Middle Aged

Short

Average

No

No

Philip

Senior Citizen

Average

Heavy

No

No

Zara

Youth

Tall

Heavy

No

Yes

April

Youth

Average

Heavy

No

Yes

Pablo

Middle Aged

Short

Light

Yes

Yes

 

 


 

An dentification trees based on age group, height, weight and nationality (whether or not the indvidual is Mexican) on the above chilli pepper database, is shown below:

 

 

 


 

The question now is this: What happens when you only select middle aged people from the database? Once the middle-aged people are isolated, the available tests perform as follows:

 

The Mexican nationality test clearly does the best job of dividing the middle-aged set into homogeneous subsets.

 

 


Computing Order and Disorder

Now if we compute the average disorder produced by the age-group test and on the basis of the person's Mexican origin (Note that log2 (1/3) = -1.5849625, log2(2/3)=-0.5849625, log2 (2/5) = -1.3219281 and log2 (3/5)=-0.7369656), we will find that age and nationality test ensure proper identification for all the samples shown in the table above?

 

Average Disorder (Height) = 3/8 (-1/3 log2 (1/3) -2/3 log2 (2/3))

3/8 (-2/3 log2 (2/3) -1/3 log2 (1/3))

2/8 (-0 log2 (0) -2/2log2 (2/2))

= 2* 3/8 (0.52832) = 0.69

 

Average Disorder (Weight) = 2/8 (-1/2 log2 (1/2) -1/2log2 (1/2))

3/8 (-1/3 log2 (1/3) -2/3 log2 (2/3))

3/8 (-2/3 log2 (2/3) -1/3 log2 (1/3))

= 2/8*1 + 2* 3/8 (0.52832) = 0.94

 

Average Disorder (Nationality) = 5/8 (-3/5 log2 (3/5) -2/5log2 (2/5))

3/8 (-0 log2 (0) -3/3 log2 (3/3))

= 0.625 * (0.52877124 + 0.44218936) =0.60684412 = 0.61

 

Average Disorder (Age Group) = 4/8 (-2/4 log2 (2/4) -2/4 log2 (2/4))

1/8 (-1/1 log2 (1/1) -0 log2 (0))

3/8 (-3/3 log2 (3/3) -0 log2 (0))

= 4/8 (0.5+0.5) = 0.5

 

Test

Disorder

Age Group

0.5

Nationality

0.61

Height

0.69

Weight

0.94

 

The age group and nationality test are by far the best discrminators in that their use together will ensure the proper identification of all the samples. Now if we focus exclusively on the middle aged people we get the following results:

 

Test

Disorder

Nationality

0

Height

0.5

Weight

1

 

 


 

Procedure sprouter for generating identification trees

This procedure helps in the generation of of an identification based on the computation of disorder introduced by each of the concepts:

 

  • Until each leaf node is populated by as homogeneous a sample set as possible:

  • Select a leaf node with an inhomogenous sample set

  • Replace that leaf node by a test node that divides the inhomogeneous sample set into minimally inhomogeneous subsets, according to some measure of disorder.

 


Procedure pruner for converting an identification tree into rule sets

This procedure helps in the conversion of an identification tree into a rule set:

 

  • Create one rule for each root-to-leaf path in the identification tree

  • Simplify each rule by discarding antecedants that have no effect on the conclusion reached by the rule

  • Replace those rules that share the most common consequent by a default rule that is triggered when no other rule is triggered. In the event of a tie, use some heuristic tie-breaker.

 

 


 

Now if we are asked to consruct an identification tree for determining peoples appetite for chilli peppers, we will come up with:

 

 

The following rule set can be derived using PRUNER:

If age_group is middle_aged

and nationality is not Mexican

then the person does_not_like chilli pepper

 

If age_group is senior_citizen

then the person does_not_like chilli pepper

 

If no other rule applies

then the person likes chilli pepper

 


Concept Learning: Inductive Learning

Inductive Learning is sometimes described as a learning that is akin to a heuristic search through a space of symbolic descriptions, generated by an application of various inference rules to the initial observational statements, The inference rules include generalisation rules , which perform generalising transformatins on descriptions, and conventional truth-preserving deductive rules.

Human beings have this capcity ot discover regularities, some would say patterns, in apparently diverse, interacting and noisy collections of observations. There are instances when human beings are guided by a teacher or instructor, who provides facts, and the pupil inductively infers new facts from the given. This is inductive learning.

The key output of inductive learning is a set of production rules.

 

Arches, Non-arches and non-conventional arches

An arch is conventionally defined as one brick lying on top of, and supported by, of two other bricks with the proviso that the supporting bricks do not touch each other.