Mitchell Ch. 2 Concept Learning

In machine learning terms, what is meant by a concept?

For machine learning purposes, a concept is a boolean function defined on a specified domain.
What is concept learning?

Concept learning is the process of inferring a boolean function from a set of training pairs, E, each pair consisting of inputs and the corresponding boolean output.
What is meant by a hypothesis universe (or space) of a concept?

The hypothesis universe for the concept is the power set of the domain, I, of its boolean function.
What is meant by a hypothesis for a concept?

A hypothesis for the concept is any element of the hypothesis universe, that is, any set of inputs to the function.
Explain the following hypotheses for the “Aldo favorite sport” problem using the hypothesis space of conjunctions?
- (0,0,0,0,0,0) // hypothesis that there are no days on which Aldo enjoys his favorite sport
- (0,Cold,0,0,0,0) // same hypothesis as above since, if any position has a 0, no choice of inputs can satisfy the hypothesis
- (?,?,?,?,?,?) // hypothesis that Aldo enjoys his favorite sport on all days
- (?,Cold,High,?,?,?) // hypothesis that Aldo enjoys his favorite sport precisely on Cold days with High humidity, no matter what the status of the sky, wind, water, or forcast
- (?,?,Normal,?,?,Change)
In the terminology of concept learning, explain the meaning of:
```
 h2 ≥g h1
 h2 >g h1
```
Is (?,Cold,High,?,?,?) ≥g (0,Cold,0,0,0,0)? Explain.
Is (0,Cold,0,0,0,0) ≥g (?,Cold,High,?,?,?)? Explain.
Is (0,Cold,Normal,0,0,0) ≥g (?,Cold,High,?,?,?)? Explain.
Is (?,Cold,High,?,?,?)≥g (0,Cold,Normal,0,0,0)? Explain.
Give the inductive learning hypothesis.

Any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate it well over other unobserved examples.
Give a pseudo-code description of the Find-S algorithm for conjuntion.

Start with the most specific hypothesis in the space. For each example in D, generalize the hypothesis on each parameter of the hypothesis to include the example.

Apply the Find-S algorithm to a simple data set.

Previous hypothesis	Next positive example	New hypothesis
(0,0,0,0,0,0)	(S,W,N,S,W,S)
	(S,C,H,S,W,S)
	(S,W,H,S,C,S)

What does it mean to say that hypothesis h is consistent with training set, D?

h(x)=D(x) for all x in D.
What is the version space, VS_H,D?

The set of all hypotheses in hypothesis space, H, that are consistent with data set D.
What does it mean to say that hypothesis, g, is maximally general in VS_H,D?

if g is a proper subset of hypothesis h, then h is not in VS_H,D (because it contains a negative example). This is the same as saying that making the g slightly more general it will include a negative example, which is NOT acceptable.
What does it mean to say that hypothesis, s, is maximally specific in VS_H,D?

VS_H,D contains no hypotheses that are proper subsets of s (because all proper subsets are missing a positive example).
In the Candidate-Elimination algorithm, what does it mean to minimally moderate hypothesis, s, that is too specific for positive instance d?

Replace s in S by all s', minimal generalizations of s to include d , such that s' is a subset of some element of G.
In the Candidate-Elimination algorithm, what does it mean to minimally moderate hypothesis, g, that is too general for negative instance d?

Replace g in G by all g', minimal specializations of g to exclude d , such that g' includes some element of S as a subset.
What does the Candidate-Elimination algorithm, do if positive instance, d, is inconsistent with an hypothesis, h, in S or G?

When hypothesis, h, is inconsistent with positive example, p, then h is too specific, so if h∈G, eliminate h, and if h∈S, moderate h to include p.
What does the Candidate-Elimination algorithm, do if negative instance, d, is inconsistent with an hypothesis, h, in S or G?

When hypothesis, h, is inconsistent with negative example, n, then h is too general, so if h∈S, eliminate h, and if h∈G, moderate h to exclude n.

Given S, G, and a training example, provide the next S and G for the Candidate-Elimination algorithm using a conjunction hypothesis space.

Origin	Manufacturer	Color	Decade	Type	Example Type
Japan	Honda	Blue	1980	Economy	Positive
Japan	Toyota	Green	1970	Sports	Negative
Japan	Toyota	Blue	1990	Economy	Positive
USA	Chrysler	Red	1980	Economy	Negative
Japan	Honda	White	1980	Economy	Positive

Solution:

Positive Example: (Japan, Honda, Blue, 1980, Economy)


Initialize G to a singleton set that includes everything. Initialize S to a singleton set that includes the first positive example.	G = { (?, ?, ?, ?, ?) } S = { (Japan, Honda, Blue, 1980, Economy) }

Step 1a

Step 1b

These models represent the most general and the most specific heuristics one might learn.
The actual heuristic to be learned, Japanese Economy Car, probably lies between them somewhere within the version space.

Negative Example: (Japan, Toyota, Green, 1970, Sports)

Specialize G to exclude the negative example.
```
 G = {
     (?, Honda, ?, ?, ?),
     (?, ?, Blue, ?, ?),
     (?, ?, ?, 1980, ?),
     (?, ?, ?, ?, Economy)
 }

 S = { (Japan, Honda, Blue, 1980, Economy) }
```
Refinement occurs by generalizing S or specializing G, until the heuristic hopefully converges to one that works well.
Positive Example: (Japan, Toyota, Blue, 1990, Economy)

Prune G to exclude descriptions inconsistent with the positive example.

Generalize S to include the positive example.
```
 G =  {
     (?, ?, Blue, ?, ?),  
     (?, ?, ?, ?, Economy)
 }

 S = { (Japan, ?, Blue, ?, Economy) }
```
Negative Example: (USA, Chrysler, Red, 1980, Economy)

Specialize G to exclude the negative example (but stay consistent with S)
```
 G = {
     (?, ?, Blue, ?, ?),  
     (Japan, ?, ?, ?, Economy)
 }

 S = { (Japan, ?, Blue, ?, Economy) }
```
Positive Example: (Japan, Honda, White, 1980, Economy)

Prune G to exclude descriptions inconsistent with positive example.
Generalize S to include positive example.
```
 G = { (Japan, ?, ?, ?, Economy) }  
 S = { (Japan, ?, ?, ?, Economy) }
```
G and S are singleton sets and S = G.
Converged.
No more data, so algorithm stops.

Under what circumstances does the Candidate-Elimination algorithm converge and to what?

It converges to a version space (not necessarily a unique hypothesis) that describes all examples presented, but only when there are no training data errors and when the hypothesis space contains the target function.
Describe Mitchell’s classification rules for a partially learned concept using candidate elmination.

Classify as positive any example satisfying every hypothesis in S. Classify as negative any example eliminated by every hypothesis in G. Don't classify anything else .
What is the weakness of using an unbiased hypothesis space?

No ability to generalize.
What is the inductive bias for the rote-learner?

None (makes predictions only for learning data, so always correct).
What is the inductive bias for the candidate-elimination?

Hypothesis space holds target (makes predictions only when all hypotheses in version space agree).
What is the inductive bias for the Find-S?

Learns only from positive examples and ingnores the negative examples. Another way of saying this: there are no bad positive examples and all positive instances are provided.