Link Search Menu Expand Document

Mitchell, Ch. 8

  1. Instance-based learning does not store all training examples for testing purpose. False

  2. The most computationally intensive part in instance-based learning is testing. True

  3. K-means clustering delivers k clusters or k-prototypes for a given cluster. True

  4. Give an advantage of k-nearest neighbors learning. Has zero training time, all work is done during classification.

  5. Give a disadvantage of k-nearest neighbors learning. Storage expensive: one must store all data to be able to use this method -> cannot apply it easily to big data. Also, works better for data with less than 20 attributes because it uses distance to compute similarity between data pionts. A third problem is, attributes must be normalized or scaled to be in same range, otherwise attributes with large range will overpower the attributes with smaller range.

  6. What is meant by the curse of dimensionality? In high-dimensional space data becomes very sparce and distance-based techniques are less meaningful especially in classification.

  7. Know to explain and apply k-nn classifier and k-means clustering.

  8. Draw the Voronoi diagram for 1-Nearest Neighbor with Euclidian distance for data D below. (Hint: A similar picture to the right picture from Fig. 8.1, pg. 232.) D = [1 1 Class1; 5 1 Class2; 5 5 Class3]

  9. Perform k-means (k=3) clustering on the dataset below. Assume the initial 3 random prototypes/centroids are the top left points.