Mitchell, Ch. 3

Give a non-empty set that has entropy 0.
Give a non-empty set that has entropy 1.
Compute: Entropy[17+,5-] (complete expression only, no need for calculator).
Give the formula for the information gain, Gain(S,A), resulting from decomposing set, S, on a given attribute, A.
Given the entropy of set S, calculate the Gain(S,A), resulting from decomposing, S on a given attribute, A.
For the data on text page 59 use variables, function log2 and the fact that Sunny has entropy, .970, to compute a precise exspression that evaluates to Gain(Sunny,Wind).
Wind when Sunny:
- Weak: 2N, 1 Y,proportion= 3/5
  
  entropyWeak = -(2/3)log2(2/3) -(1/3)log2(1/3) = .9183
- Strong: 1N 1Y ,proportion= 2/5
  
  entropyStrong = -(1/2)log2(1/2) -(1/2)log2(1/2) = 1
- Gain(Sunny,Wind) =
  
  =.97 - (3/5)(entropyWeak) - (2/5) entropyStrong = 97- (3/5)(.9183) - (2/5)(1) = .97 - .55098 - .4 = .01902
Describe the inductive bias of ID3. good trees have low depth and nodes near the root corresponding to high entropy variables
Calculate the entropy for a given variable that has more than two values (like V = {1,2,3,2,1,1}).
Describe approaches to avoiding overtraining of decision trees (p68).
What is pruning of a decision tree node?
How does a trainer determine that it is desirable to prune a node?
Give an example of a decision-tree rule.
Give rules for the four leaves in the decision tree on p53.
Describe the components of a decision-tree rule, defining the terms If portion , antecedent, then portion, post condition.
List the steps of Rule Post-Pruning (p71).
Explain why rule post-pruning is more powerful than node post pruning (p72). (Can remove paths through nodes instead of entire nodes, can remove entire node without need to restructure a tree, easier to read.)