Mitchell, Ch. 6

State Bayes Theorem. P(A|B) =...
State Bayes Rule using the notation of Bayes Theorm, see problem 7 below.P(h|D) =...
Using the notation of Bayes Theorm, what is the prior probability, see problem 7 below.
Using the notation of Bayes Theorm, what is the posterior probability, see problem 7 below.
Using the notation of Bayes Theorm, what is the likelihood of h, see problem 7 below.
Using the notation of Bayes Theorm, what is the evidence for h, see problem 7 below.

Given part of this table, complete it.

P(h\|D)	=	P(D\|h)	P(h)	/	P(D)
posterior	=	likelihood	prior	/	P(D)
posterior	=	evidence	evidence	/	P(D)

Using the notation of Bayes Theorm, what is the MAP hypothesis, p 157?
Using the notation of Bayes Theorm, what is the difference between the MAP hypothesis and the ML hypothesis, p 157?
Given the needed probabilities, apply Bayes theorem to some data similar to the cancer test data. Showing all work, determine the actual probabilities of the hypotheses, p 158.
Product rule, p159.
Sum rule-find “basic probability formulae” in chapter 6 notes, p159.
Theorem of total probability, p159.
Summarize the steps of the brute force MAP learning algorithm, p 159.
Give the formula that determines classification, vj selected by the Bayes Optimal Classifier.
What can you say about the power of the Bayes Optimal Classifier? No classifier system based on D and H that assumes P(hi |D,x)=P(hi|D) can exceed the performance of the Bayes Optimal classifier when provided with randomly selected input.
What can you say about how the hypothesis selected by the Bayes Optimal Classifier relates to H? It computes a new hypothesis that may not be (and usually is not) in H. (because it uses all h in H to make a classification.)
Describe the Gibbs classifier,p 176. `To predict for input x, pick an hypothesis according to the prior probabilities P(h) [or posterior P(h|D)] of the hypotheses and use that hypothesis’ prediction for x.
How does the error performance of the Gibbs classifier compare to that of the Bayes Optimal Classifier? When the priors are uniform, E(errorGibbs)<=2E(errorBayesOptimal).
Since the Bayes Optimal Classifier is optimal, why would one use the Gibbs classifier? Very easy to use since it uses priors rather than posteriors, and under the assumption of uniform priors, has performance no more than twice as bad as BOC.
Describe how one might apply the Gibbs classifier to classify input x if the prior probabities are P(h1) = .2, P(h2) = .35, and P(h3) = .45, if h1 predicts s, h2 predicts t, and h3 predicts s.Generate a random number, r,between 0 and 1, If r <=.65, predict s (because it means that either h1 or h3 was selected and they both predict "s"). Otherwise, predict t (it means that h2 was randomly selected and h2 predicts "t").
To what category of problems does the Naive Bayes classifier apply? The Naive Bayes Classifier applies only to learning MAP hypotheses for classification of conjunctions of discrete attribute values.
Give a formula indicating the simplifying assumption of the Naive Bayes classifier. P(a1, a2, ... , an\|vj) = ΠiP(ai\|vj).
Use the data on page 59 to confirm either of the two sample calculations of P(a1, a2, … , an|vj) = ΠiP(ai|vj).on page 179.
Given arbitrary data, use the Naive Bayes classifier to predict a classification. pp178,179.
Assume that each of the 4 possible values for the i-th attribute are equally likely and that the equivalent sample size is m = 40. If the number of examples with classification vc is nc=7,and the total numberof examples with attribute ai and class vc is ni=3, give the m-estimate of probabilty for P(ai|vc) . Use the formula P(ai | vj) = (ni+mp)/(nc+m). P(ai \| vc) = (ni+mp)/(nc+m)=(3+40*(1/4))/(7+40)=.277.
In the diagram on page 186, use the conditional relationships implied by the graph to describe P(Storm,BusTourGroup,Lightning,Campfire,Thunder,ForestFirest) as an optimized product of other probabilities and conditional probabilities. P(S,B,L,C,T,F) = P(S)P(B)P(L\|S)P(C\|S,B)P(T\|L)P(F\|L,S,C)
For any node, v, its immediate ancestors, a1.. an, and any node, na, that is not one of v’s ancestors or descendants, v is conditionally independent of na given a1,..,an, give a formula for P(v|na, a1,..,an). P(v\|na, a1,..,an) = P(v\|a1,..,an).
In the diagram on page 186, describe P(ForestFire|Thunder,Storm,BusTourGroup,Lightning,Campfire) as a product of other probabilities. P(Storm)P(Lightning)P(Campfire)
In a bayesian net, what is inference? Inference is the process of predicting the value at one or more nodes (the prediction nodes) of the net, given values of variables at other nodes (evidence nodes).
What does it mean to say that X is conditionally independent of Z, given Y? P(X\|Y,Z)=P(X\|Y)